Google Translate is Finished

Google Translate is all over. Sorry to have to tell you. Well, not really. TranslationGuy loves to lower his voice, put his arm around your shoulder, and act all compassionate and stuff, when he has the inside scoop. But I’d better explain.

I don’t mean that Google Translate is all over in that no one is ever going to use it again. I mean it is all over, but I mean that in the sense that it’s everywhere and everybody’s using it. In fact, since we have been assimilated by Google as a human translation vendor, let me take this opportunity to inform you that resistance is futile―you will use Google Translate for all your machine translation needs.

What I mean is that Google Translate is finished in the sense that it is completed. Done. The dramatic improvements that we have seen over the last few years have brought this wonderful tool to fruition. And it’s not going to get any better unless Google radically rethinks its approach.

I figured this out from reading  Tim Adams of the Guardian. Now, before you click away, a warning. Tim doesn’t know shit about machine translation. But that ol’ news hound sure how to do an interview because he got Google to spill the beans. This piece is like Gibbon on the Decline and Fall of the Statistical MT Empire.

As he reports, the recent rapid improvement in MT is the result of the use of statistical engines rather than the older rules-based systems.

In the early 1990s, IBM produced a model that abandoned any effort to have the computer ‘understand’ what was being fed into it, and instead loaded the engine with as much translation as they could shovel in and then did a statistical analysis. This was the preferred approach of Frederick Jelinek at IBM, who didn’t think much of rules-based systems pioneered in the 1970s. Jelinek once said, ”Whenever I fire a linguist, the performance of our system improves.” (As someone who has fired as many linguists as I have, its usually the bad ones that get fired, ergo…)

Anyway, Google’s ability to bring MT along so far and so fast is less the result of any great breakthroughs in the statistical algorithms than the result of the ability of their spiders to lift millions of man-hours worth of human translation from the Web, dump it into the vast digital hoppers of their translation pigs (aka MT server), and spit it out on demand, for free, sort of.

“This technology can make the language barrier go away,” says Franz Och, who leads Google’s machine translation team. “It will allow anyone to communicate with anyone else.”

This is true, to a point. But what point? I’ve been selling MT and giving it away for the last 10 years, and while it’s much better than it used to be, the utility of it comes from ease of use. It’s all in the interface. If it isn’t fast, free and easy, better doesn’t matter.

But here’s the shocker: better isn’t even in the MT cards, statistically speaking.

Google Translate guy Andreas Zollmann abmits that more is not enough. “We are now at this limit where there isn’t that much more data in the world that we can use,” he admits. Because the MT databases are so large already that more doesn’t make the engines any better. To improve output quality by even .05% you’ve got to double the size of the database.

And there aren’t that many doublings left, if any. I can’t say how much text Google has assimilated into their machine translation databases, but it’s been reported that they have scanned about 11% of all printed content ever published. So double that, and double it again, and once more, shoveling all that into the translation hopper, and pretty soon you get the sum of all human knowledge, which means a whopping 1.5% improvement in the quality of the engines when everything has been analyzed. That’s what we’ve got to look forward to, at best, since Google spiders regularly surf the Web, which in its vastness dwarfs all previously published content. So to all intents and purposes, the statistical machine translation tools of Google are done. Outstanding job, Googlers. Thanks.

I’ve got to also thank Z, the translator formerly known as Jost Zetzsche, who pointed this out to me in a recent discussion. Well, actually it wasn’t Z personally, but Jeromebot, during the hand-puppet part of Z‘s presentation. (This guy really knows how to get a point across to a CEO).

PS I highly recommend his Punch and Judy version of the Common Sense Advisory Translation Industry Competative Analysis. But if you can’t find that on YouTube, at least subscribe to his invaluable newsletter, Translator’s Toolkit.

6 Responses to “Google Translate is Finished”

  1. Alon Lavie says:

    Ken,

    The story is more complex and nuanced than that, for several different reasons. Here are just a couple of points to consider:
    (1) For many language-pairs, Google has not yet tapped into sufficiently large amounts of data. For these languages, you can expect performance to keep improving for quite some time.
    (2) In the language-pairs for which Google already has lots of data, improvements will indeed likely slow down significantly. However, Google develops “one-size-fits-all” generic MT systems for each of its language-pairs. That’s good for casual users, but is far from optimal for professional and business users. For commercial enterprises, it is well-established that customized MT systems can deliver dramatically higher translation quality. That will likely remain true for the foreseeable future. The key to high-quality in commercial settings is client-specific data. In these scenarios, Google Translate is just the generic baseline to beat.

    - Alon

    • Ken says:

      As you so wisely observe, Alon, the narrower the subject area, the better you can do with the machine translation. Look at the complex implementation of machine translation systems by the EU as an example of how its done in the real world. Is google really in the business of constructing data sets as domain specific as that? I suppose if you can do it all statistically, than Google can do it. And then the question becomes, “how well can they do it.” Did I just hear a starting gun?

  2. Johnny says:

    There’s a number of different approaches to machine translation.

    One is processing a high volume of existing translations to train machine translation engines – one of the major problems with this approach is retraining the engines to ‘forget’ incorrect translations. It requires a massive volume of correct translations.

    asiaonline.net has developed a system that requires a low volume input to train the translation engine which allows for an ‘easier’ way to improve the quality. If it works – I haven’t used it.

    In addition there is still being developed new intelligence. Stanford University introduced a new toolkit based on Phrasal which should improve the way text is being processed and as a result improve the output.

    So I think that machine translation will be able to generate a better output in the future.

    On the other hand I am not sure that improved quality will be the number one factor for companies to use machine translation. As rates are constantly going down, so is the quality delivered by human translators as translation agencies are relying on less experienced freelancers. As an end customer to translation services you have to compare a still lower quality from human translation to a slightly improving quality in machine translation. So it is not neccessarily important for machine translation to improve to the level of a skilled human translator – the quality of professional human translation is on a fast decline, so the quality gap is closing in in the ‘wrong’ direction.

  3. Translation says:

    i think this is very great job here , i find it very informative and i hope that every one will find this very nice full of information
    thanks to share such a great information with us

  4. Rich German says:

    Nice play on words in the title. You had me steaming for a moment.

  5. Machine Gun says:

    GT sucks. Hands down.

Trackbacks/Pingbacks

  1. Tweets that mention Google Translate is Finished : : Translation Guy -- Topsy.com - [...] This post was mentioned on Twitter by Suyash Suprabh and Daniel Feingold. Daniel Feingold said: RT @translationguy: Google Translate …
  2. Where Next for Google Translate? And What of Information Quality? (not lost in translation) - [...] Translation Guy has a further discussion on this, called Google Translate is Finished. He says: "And there aren't that …