BABY STEPS
One of the most dramatic ways the Internet has chance the beginning of it's commercialization in early 1990s, is on the language front. Over a dozen years ago, English was the primary language for most internet content.
Today, the figure is less than thirty percent, and it's going lower.
This is an increasingly important trend to keep in mind, as an internet user, geek, or investor.
Of course, the major web companies are all over this trend, as one might expect.
Google for example, had an interesting announcement on this front today, as highlighted by this post by the Google Operating System:
"Google switched the translation system from Systran to its own machine translation system for all the 25 language pairs available on the site. Until now, Google used its own system only for Arabic, Chinese, and Russian."
I've been tracking Google's system for some time, using it's English-Arabic translation system to gauge the progress. (I grew up in the Middle East, so Arabic websites are of personal interest*).
For example, this translation tour is a good example, where one can search for a phrase like "Dubai Tours" in English, which then goes onto to query Arabic language websites, and return results back in English.
You can then go to the various websites that show up in the results page, and see the various Arabic pages translated back in English.
While this stuff won't work on poetry or novels that well, it's OK for news and information sites as this page illustrates. There's still a lot of "noise" in the results, but one can get the gist.
Most importantly, one can get the "Contact Us" pages translated, where an email and/or phone call in English will more than likely get a human reply back in English.
And that's a good start, despite the very long road ahead for these systems.
*P.S. As an aside, it's interesting that Hindi, which is the national language of India, is still not available in Google's language pairs.
I bring it up in the context of the billion people market available in
India over time. Other Asian languages like Chinese, Japanese, Korean etc., are well represented.
It probably has to do with the fact that English is the de facto primary language in India for most of the middle class, while that can't be said for other countries like China or Japan.
This trend is not Google specific, as most U.S. technology companies, including Microsoft, Dell, Apple, Yahoo! etc., tend not to have Hindi versions of it's products and services as a high priority, especially as compared to Chinese.
Possible subject for a separate post.
On the upside, once URLs go multi-lingual many many english based URLs will open up.
Posted by: jon burg | Wednesday, October 24, 2007 at 10:11 AM
What a great driver for improved language translation this trend is. One can see Google using statistical approaches plus some "mechanical turk" stuff to slowly improve the quality of translations, all running in their massive compute farms. Maybe they should christen one of those farms, "Babel".
Posted by: Alex Tolley | Wednesday, October 24, 2007 at 06:20 PM