NIST data set | BLEU-4 | ||||
Site ID | Language | Overall | Newswire | Newsgroup | Broadcast News |
Arabic | 0.4569 | 0.5060 | 0.3727 | 0.4076 | |
Chinese | 0.3615 | 0.3725 | 0.2926 | 0.3859 |
GALE data set | BLEU-4 | |||||
Site ID | Language | Overall | Newswire | Newsgroup | Broadcast News | Broadcast Conversation |
Arabic | 0.2024 | 0.2820 | 0.1359 | 0.1932 | 0.1925 | |
Chinese | 0.1576 | 0.2086 | 0.1454 | 0.1532 | 0.1300 |
Summary score table from NIST "Unlimited Plus Data" track
I included something about machine translation in the (pre- web log) Fortnightly Mailing Number 54. This June 2005 article by Gregory Lamb in the Christian Science Monitor is jargon-free, and explains the difference between the two main approaches to machine translation: rules-based - as developed by Systrans, and still used by Google; or statistically-based, as being developed by Google. (4/12/2006 - see also Not Lost in Translation, from the MIT Techology Review, by Stephen Ornes.)
As to the effectiveness of statistically-based methods, Google's system continues to score considerably better overall than the competition (who may or may not be using statistically-based methods), in both Arabic to English and Chinese to English, with the margin rather bigger for Arabic to English than for Chinese to English, in the US Government's NIST Translation Evaluations of machine translations of different genres of text (Newswire, Newsgroup, Broadcast News, Broadcast Conversation). Bear in mind however that a score of 0.503 out of a maximum of 1 (the best score achieved by Google for the translation from Arabic to English of Newswire genre, with scores for Chinese to English consistently worse) does not mean that the absolute quality of the translation was particularly high.
It is also worth noting, although the test regime was different, so this observation needs taking with a pinch of salt, that Google's results do not seem to have improved much on what was achieved in the equivalent 2005 NIST tests. (Here, for reference are the 2008 NIST results, which I have not had time to analyse.)
Updated 4/12/2006 and 1/1/2009
Systran is not the only company who develops the rules-based machine translation approach, and at present, this method, although not yet providing 100% perfect translation quality, is the most reliable. The most popular translation services, as freetranslation.com, translate.ru, are based on the rules-based approach.
As for statistical-based method, it seems to be universal and applicable to any language pair (whrereas the rules-based approach is stictly language-bound), and it's strange that, using the same software, Google obtained such different results for Arabic and Chinese languages (perhaps it depends on the input and output text bases?).
Posted by: Elena Temnova | 27/11/2006 at 10:09