Artificial Intelligence for Localization: Cutting Through the Hype to Develop Our Product

Here’s our story how we’re developing a product using machine learning and neural networks to boost translation and localization

Blog of Alconost Inc.
Product Coalition

--

Artificial intelligence and its applications are one of the most sensational topics in the IT field. Many people try to imagine AI at work in their own lives, picturing it almost as a panacea, a red pill that fixes everything. The topic is rife full of rumors, hype, predictions, and even fantasies about what the future may hold. There are also a lot of misconceptions surrounding the term “artificial intelligence” itself. Often people confuse it with automation.

We decided to bypass the fads and the hype, and sat down for an insightful conversation about AI with the localization team leader from Alconost, Stas Kharevich. For over 10 years Stas has been helping IT companies enter foreign markets with new products. He is also launching the pilot for a new Alconost service using artificial intelligence: domain-adaptive machine translation with subsequent proofreading. Domain-adaptive machine translation, in layman’s terms, is a translation executed by a “smart” machine trained using texts from a selected sphere, or domain. Let’s talk with Stas about how this kind of solution differs from the traditional approach to localization, how to train a machine and achieve high quality, and what disadvantages there are to artificial intelligence.

Source: designed by alconost.com

Since 2004 Alconost has been providing professional audio, video, and text content localization services for games, apps, other software, and websites. The company has two products of its own: Nitro and GitLocalize. Nitro is an online professional human translation platform for translating small texts within 2–24 hours. GitLocalize is a platform for translating open source projects that lets you sync a translation with a GitHub repository. Currently, Alconost is working on a localization solution that uses adaptive machine translation.

— Stas, you’re involved in implementing AI in localization projects, correct? Please give us the “for dummies” version: just what exactly is artificial intelligence?

— I can’t tell you about all the spheres of application for AI — no one can. If they did, it would be meaningless. Data analysts say with good reason that the narrower the data pool and a given software’s sphere of application, the more precise it is. But I’ll be happy to talk about AI’s application in the field of localization. I would start off by mentioning PEMT (post-edited machine translation) — machine translation with subsequent editing. There are “old” machine translation engines with algorithms based on rules and statistics, where the machine executes the translation without regard for previous and current translations or the subject matter of the text. This kind of translation requires substantial editing by professional linguists. And then there is the “new” solution: NMT (neural machine translation). The use of neural networks has significantly improved translation quality. The main advantage of neural networks is their learning ability. Consequently, the solution we’re currently working on is domain-adaptive neural machine translation, which takes into account the subject matter, a glossary, and memories from previous translations.

— Interesting. So does that mean there’s already localization software on the market that uses machine translation? If so, what is the value of the solution you’re developing? Is there really any point in reinventing the wheel?

— That’s right, software that uses machine translation does exist. There are machine translation engines available to companies for a one-time fee or by subscription. There are many of them, actually. For example, I have worked quite closely with at least two engines from Google: Google AutoML and GNMT (Google Neural Machine Translation). And there are many alternatives on the market, such as Whatson Language Translator from IBM and a neural network from Yandex. The thing is, some companies offer a stock solution — you pay for a subscription and just use the engine. Stock solution providers claim that their engines are trained using vast amounts of data, and deliver correspondingly high quality. But the truth is that in practice some engines work just fine with certain language pairs, but with others quality suffers, especially when it comes to niche topics such as games. Unlike stock solutions, custom software allows you to retrain the machine on its own specialized data array. And this is precisely the kind of software that we’re currently working on at Alconost.

— So what makes you think you can do it better than other companies? For example, why hasn’t Google, with all its abilities, turned Google Translate into something like that? I mean, they have lots of programmers, data scientists, the data itself, and other resources. Yet professional companies come to you for localization, not to Google Translate.

— First of all, for a number of fairly general topics and certain language pairs, Google Translate actually works pretty well. But our strong point is, firstly, our experience in niche translation in IT-related topics into 100+ languages. Since Alconost’s inception we have localized several thousand projects. And we have our own “big data” on which to retrain the machine. Additionally, we have enough data to adapt translations even to a specific genre of games, such as logic games, simulation games, or fighting games.

And then there are open data cloud localization platforms. For example, with CrowdIn and GitLocalize, translators and localization managers work on projects and communicate with clients in real time. The glossaries and translation memories used on these platforms for one project can technically be used by other projects as well. And we evaluate how to structure that data and what exactly the machine translation engine needs to master for each project. Essentially, this is a component of data science — the structuring and categorizing of a relevant data pool for retraining a machine. This is the chief advantage of our localization solution, which is actually very much a niche solution, which helps to ensure accuracy and quality.

Secondly, we have our own human translation platform called Nitro, where clients themselves send short texts for translation and receive the finished result within 2–24 hours. Nitro’s interface and user experience have borne the test of time, and we are constantly improving the product. Recently, for example, the Nitro API was released. Now, instead of taking the time to submit orders via the Nitro interface, companies with high workloads can receive their translations via the Nitro API directly to their content management system. In other words, Nitro is a potential client application for processing machine translation orders. All it needs is to be synchronized with a custom machine translation engine.

— Sounds interesting. Tell us, if you would — how do you cope with the technical side of the issue? After all, you’re translators, not tech wizzes. The task of “retraining the machine” sounds pretty technical and fairly ambitious.

— Actually, that’s exactly what we are — tech wizzes. Alconost was founded by developers, for developers. We have many programmers both on the team (including our CEO and founder) and in outsourcing. Somehow it so happened that the rest of the guys also have a fairly strong technical background and an entrepreneurial spirit. We love coming up with various custom integrations and solutions that simplify our work with various projects. And then we have two of our own products that we’re developing, which I mentioned above: Nitro and GitLocalize.

Getting back to the topic of machine translation, we’re even considering a scenario where we acquire a machine translation engine, host it, and retrain it in house, on our own hardware. Of course, to do that our team would at least need an assigned programmer and data specialist, as well as a localization engineer for projects. But we’re well aware of the technical side.

— Got it, sounds like a solid approach. But here’s another question: why would companies request machine translation, instead of good old-fashioned human localization? People are usually wary of anything new. Do you have any plans for allaying mistrust of machine translation? I mean (at risk of repeating myself) that a risk may persist to be associated with very Google Translate.

Source: giphy.com

— Most customers are interested in speed and quality. And few care how we actually accomplish that. In other words, first and foremost we’re concerned with optimizing our own internal processes. I’ve already described our approach to retraining a machine translation engine: we have a huge data pool, and we use dictionaries, translation memory, and style guides. And so we’re optimistic about the anticipated quality. Otherwise we would never have tackled the problem — if it’s not broken, why fix it? In any case, we will offer our clients on-demand editing of our machine translations, so the quality certainly won’t suffer. The real difference lies in the speed. Imagine if you could get texts localized into 100 languages, even in raw form, as early as overnight. Wouldn’t that be awesome? Of course, the editing will take additional time if the client needs it. But machine translation speeds up the entire project several times over. And who wouldn’t want to receive their order several times faster without sacrificing quality?

— When it comes to editing machine translations, couldn’t “quality” turn out to be a weak point?

— In general, we already have experience editing machine translations. We’ve had several large projects of this kind, and there are separate processes for editing, localization testing, and quality assessment. But the process for editing machine translation is different from the standard localization process. Here it’s more a question of the quality of the source material. Different machines translate differently, so customers come to us with machine translations of various quality. We pre-test them and evaluate the quality. If we want to end up with high-quality localized texts, we need a glossary, a localization brief, some samples of previous translations, and a style guide. In general, these are attributes of any professional localization project. When editing the texts on the cloud platform, we automatically add all these initial data to the project and then use them in our work. So basically, yes, we are confident of the quality.

— Are there any unknown variables in this machine translation project?

— One unknown issue is pricing. Different topics require different levels of effort to retrain the machine. And we don’t have the same amount and quality of data for all topics. For example, we have thousands of projects for localizing games and apps, but when it comes to fiction we haven’t translated that many books. Consequently, we would expect the quality of machine-generated literary translation to be inferior to translations of something like games. But we haven’t evaluated that in detail yet. We may be able to offer some sort of flexible pricing, such as rates for our machine translation both with and without editing. We’re still thinking about it, and we will test different options depending on the needs of the client. I can’t name an exact rate yet, but from our timeline and volume of translation work we can already see that machine translation is well worth our while.

Another unknown variable is the exact timing. It takes time to retrain the machine, and it’s difficult to say how long this will take for each project. I think we will take more time implementing the initial projects in order to debug the data categorization algorithm.

— Regarding the timing for releasing your solution, can you give us at least a rough estimate?

— I think in 2022 we will have a complete solution for customized machine translation. This will be a solution for English plus another language: that’s how we currently work, and we’re going to continue in the same vein. In other words, when we translate from one language (English) into all other languages, this ensures translation consistency and, ultimately, quality.

— Do you see any new niches that a machine translation solution will open up for you? New services, perhaps, or optimization?

— Primarily it will help to optimize our work time and labor expenditures for translation, but that concerns our internal operations. As far as new niches, consider that since translation will be done much faster and cost less than exclusively “human” work, companies will be able to translate more. For example, it may make sense to translate content that previously was not translated by everyone, or all the time, or in full. Consider that certain companies like Booking.com or Airbnb translate customer reviews into different languages. Why not do the same for other apps?

Today, the technologies of text, image, and video mining are very popular for constructing analytics and audience predictions. Here, too, machine translation can be helpful. For example, previously computer linguists analyzed user reviews to determine their tone, emotional color, and mood. Today, machines can handle the job just fine. In other words, machine translation can be used to obtain data to construct other models using artificial intelligence. I truly believe that we are not even aware yet of all the possibilities.

— Very interesting, Stas. One last, futuristic question: as you gaze into your crystal ball, do you believe that someday machines will translate better than humans?

Source: giphy.com

— Honestly, I think there really are certain areas of translation where machines will prove superior to humans. For example, take texts for search engine optimization in app stores. Clients often give us keywords that we need to use as a glossary when creating a description for a game or app. The same goes for localizing advertising texts for Google Ads: often you need to include certain phrases in the title that contain grammatical errors or misspellings. But they are frequently encountered, so the client needs exactly those words in the text. Experienced translators, as a rule, react negatively to tasks containing “mistakes.” Here, as I see it, a machine is ideal for the job.

Machine translation is also great for translating help documentation or material for corporate wiki systems. These are generally tasks that are identical in their structure and linguistic constructions, where the priority is to produce an accurate, adequate translation, and not a colorful, imaginative text. But the main question, as I said before, is how the machine is trained — on what data, glossaries, and rules. And compiling those is a task for a human being. So whatever the approach, you’ve still got to have a person involved.

About the author

The article is written by Alconost, a professional translation and localization company.

--

--

We localize apps, games, websites, & software and provide video production, multilingual marketing & instant translation services. Visit us at alconost.com