Last September, just prior to Google switching to its own statistical machine translation system for all the language pairs it offers, I set up a crude comparison between the rules-based and statistical methods used at that time by Google for different language pairs. The crudity stemmed in part from my use of the "round trip" comparison method (defects outlined below), from the use of only one sample text, and from the inherent drawback of comparing translation methods across different language pairs, each of which presents different translation challenges.
At the time, Google was offering its statistical method for six pairs:
English to Arabic, Arabic to English, English to Russian, Russian to
English, English to Chinese (Traditional and Simplified) and Chinese to
English. I was curious about whether the statistical method (described here in this 11/5/2008 Linux Insider interview with Google's Peter Norvig) was better
than the rules-based approach.
The comparison method was to use Google to do a "round trip" translation of a piece of English text to a test language (Arabic, French, German, Russian, Spanish, Traditional Chinese) and back, and then to ask respondents to judge the resulting texts according to
- how much meaning had been lost
- the extent of obvious howlers
and then to categorise each from best to worst.
Some of the deficiencies of "round trip" translation as a comparison method are summarised here.
The round trip translations were randomly presented, and the only way that respondents could have found out which language pair was involved would have been to use Google Translator on the test text, which I doubt any of them bothered to do. All respondents provided a name and email address; and all but two asked to be sent a summary of the results of the comparison, which makes me confident that they were taking the exercise at least reasonably seriously. The number of respondents was (only) 13, but the Russian and Arabic (statistical) translations seemed remarkably superior, being judged:
- best or second best by 12 and 10 respondents respectively;
- to have suffered only a little loss of meaning by 12 and 11 respondents;
- to have few if any howlers by 10 and 8 respondents.
The French (rules based) and Chinese (statistical) translations were judged to be rather poorer, with the Spanish and German (rules based) translations judged to be the least good.
This table summarises.
n=13 Only a little loss of meaning Few if any obvious howlers Judged best Judged next best Judged next to worst Judged worst Spanish 1 0 0 1 5 1 German 2 0 0 1 3 8 French 3 2 0 4 1 0 Arabic 11 8 4 6 1 0 Russian 12 10 6 6 1 0 Chinese 2 1 0 2 3 1
Since Autumn 2007, Google seems to have dispensed entirely the rules-based system, which I think had been provided to it by Systrans, and is offering translations using its statistical method for the 29 language pairs shown in note C below. To give you feel for the output of the current system, here are links to this piece, translated into the 6 languages used for the comparison: Spanish; German; French; Arabic; Russian; Traditional Chinese.
Notes
A. The text used was a paragraph from and article in the Economist:
"The passing of a United Nations resolution on July 31st to deploy up to 26,000 troops and police in Darfur is a welcome diplomatic breakthrough in trying to end the conflict there. At least 200,000 people have been killed and about 2.5m displaced since hostilities broke out in 2003 in Sudan's western region. The UN, led by America, Britain and France on the Security Council, had been pushing an extremely reluctant Sudanese government into accepting such a force for over a year, so it is a victory for relentless diplomatic pressure. Bouquets, then, to the Western trio for keeping at it."
B. Previous Fortnightly Mailing pieces on Machine Translation
- 12/6/2005 - Combining human with machine translation;
- 15/3/2006 - Machine translation;
- 24/11/2006 - Machine translation - the 2006 NIST Comparisons.
C. Google's currently available language pairs
- Arabic to English
- Chinese to English
- Chinese (Simplified to Traditional)
- Chinese (Traditional to Simplified)
- Dutch to English
- English to Arabic
- English to Chinese (Simplified)
- English to Chinese (Traditional)
- English to Dutch
- English to French
- English to German
- English to Greek
- English to Italian
- English to Japanese
- English to Korean
- English to Portuguese
- English to Russian
- English to Spanish
- French to English
- French to German
- German to English
- German to French
- Greek to English
- Italian to English
- Japanese to English
- Korean to English
- Portuguese to English
- Russian to English
- Spanish to English
Comments