Tag: Machine Translation

What machine translation teaches us about the challenges ahead for AI

What machine translation teaches us about the challenges ahead for AI

oão Graça, co-founder and CTO of Unbabel, on what machine translation can teach us about the challenges still lying ahead for artificial intelligence.

Can you understand this sentence? Now try understanding the long and convoluted and unexpectedly – maybe never-ending, or maybe ending-sooner-than-you-think, but let’s hope it ends soon – nature of this alternative sentence.

The complexities of language can be an inconvenience to a reader. But even to today’s smartest machine learning algorithms, there are more translation challenges remaining than advances in other fields would have you believe.

These challenges in particular are a good demonstration of the multitude of complexities that still remain for machines to catch up with human performance.

You say tomato

When it comes to translation, there are two categories of content. On one hand, you have “commodity” translation. Perhaps you want to point your phone at a menu and get a rough idea of what it is. Or you want to impress a colleague with a phrase from their local language.

Here, phrases are short, the content is often formal and errors aren’t life or death.

But on the other hand, you have interactions where context is key – understanding the intent of the writer or speaker, and the expectations of the reader or listener. Take any example where a business speaks to its customers – you better hope you are speaking their language respectfully when they have a complaint or problem.

It’s not enough to solve the problem at a superficial level, and to achieve comparably “human quality” communication still has an enormous amount of research ahead of it. This need for perfection is why most research is focused in this second area.

In the examples below, I discuss the challenges still ahead for the translation industry, and touch on what they mean for how we use machine learning tech more broadly.

Challenge 1: Long-distance lookups

Many of the biggest challenges are structural.

A good example is long distance lookups. If you are translating a sentence word by word, but the order is the same, it’s just solving “what is the correct equivalent of this for that?”

But once you start having to think about reordering the sentence, the problem space that has to be explored is exponentially larger. And in languages like Chinese and Japanese, you find verbs at the end of the sentence, potentially producing the longest distances possible.

The system needs to assess at least three reordering systems. This is why these languages are so hard, because you have to cater to very different grammatical patterns, very different vocabularies, and how many characters are in each word.

Here, you can see how expanding problem spaces create difficulties in an area the human brain handles with ease.

Challenge 2: Taxonomy

The second major area of complexity involves different formats of data.

For example, conversational language has a completely different structure and appropriate models than formal documents. In areas like customer service translation, this makes a big difference. Nobody likes to feel like the representative of a company is being overly officious when handling their problem.

Therefore, any model that is able to learn from a volume of real human queries will have an advantage — and doubly so if it’s able to take it from a particular industry sector. Meanwhile, other models might be relying on news stories or generic online text, and output completely different results.

Similarly, with other machine learning challenges, the ability to learn from the most valuable and representative data can give a big advantage – or risk limiting taxonomical flexibility.

This brings us to context.

Challenge 3: Context

Most translation models still translate sentence by sentence, so they don’t take the context into account.

If they are translating a pronoun, they have no clue which pronoun should be translated. They will randomly generate sentences that are formal or informal. They don’t guarantee consistency of terminology – for instance, translating a legal term correctly in the same way throughout. There’s no way you can guarantee the whole document is correct.

The other problem is the content is not always in the same language. Sometimes it’s one sentence in Chinese, one sentence in English. The sentences are much shorter, so you probably have to look much higher for context. This reaches its extreme in “chat” interactions.

And the context problem is different than if you were translating an email. For example, if you are doing a legal document and the document is ten pages long, you would need to use the entire document for an accurate contextual translation.

This is next to impossible with current models – you have to find some way to summarise it. Otherwise, consistency is nearly impossible.

On the other hand, if you are translating for something like SEO, what you are actually translating is key words that don’t form a sentence, just keywords by themselves. This means you turn to more dictionary-like translation to disambiguate and use other words or the image associated with it.

People think “Oh, we are in the age of unlimited data” but actually we are still enormously lacking in many ways.

Yes, we have a lot of data but often not enough relevant data.

Looking to the future

There will be many translation engines but what makes them different is their models.

The model is going to look at the data and predict patterns and assign them to different customers, and from then, will decide which voice/ language/ tone/ etc. to choose.

In current common public translation tools, they aren’t aware of this yet. They don’t even have the knowledge of the document from where the translation came from, let alone the speaker or their translation preferences.

This will bring in the next level of sophistication in this area. Machine learning, exercised against use-specific corpus of language, will give fast and accurate translations, while being able to forward them to humans to finalise and learn from further.

Languages might still drive machines crazy – but with careful human thinking, we can teach them to persevere.

Reference: https://bit.ly/2PYYHAB

The Augmented Translator

The Augmented Translator

The idea that robots are taking over human jobs is by no means a new one. Over the last century, the automation of tasks has done everything from making a farmer’s job easier with tractors to replacing the need for cashiers with self-serve kiosks. More recently, as machines are getting smarter, discussion has shifted to the topic of robots taking over more skilled positions, namely that of a translator.

A simple search on the question-and-answer site Quora reveals dozens of inquiries on this very issue. While a recent survey shows that AI experts predict that robots will take over the task of translating languages by 2024. Everyone wants to know if they’ll be replaced by a machine and more importantly, when will that happen?

“I’m not worried about it happening in my lifetime” translator, Lizajoy Morales, told me when I asked if she was afraid of losing her job to a machine. This same sentiment echoes with most of Lilt’s users. Of course, this demographic is already using artificial intelligence to their advantage and tend to see the benefits over than the drawbacks.

null

Many translators, however, are quick to argue that certain types of content are impossible to be translated accurately by a machine, such as literature, which relies on a human’s understanding of nuance to capture the author’s intention. Or in fields like legal or medicine, that rely on the accuracy of a human translator.

But even in these highly-specialized fields, machines can find their place in the translation workflow. Not as a replacement, but rather as an assistant. As translators, we can use machines to our advantage, to work better and faster.

But I’m not talking about post-editing of machine translation. In a recent article from a colleague, Greg Rosner talks of the comparison of post-editing to the job of a janitor — just cleaning up a mess. True machine assistance augments the translator’s existing abilities and knowledge, letting them have the freedom to do what they do best — translate — and keeping interference to a minimum.

So how do machines help translators exactly? With an interactive, adaptive machine translation, such as that found in Lilt, the system learns in real-time from human feedback and/or existing translation memory data. This means that as a translator is working, the machine is getting to know their content, style and preferences and thus adapting to this unique translator/content combination. This adaptation allows the system to progressively provide better suggestions to human translators, and higher quality for fully automatic translation. In basic terms, it’s making translators faster and better.

Morales also pointed out another little-known benefit from machine translation suggestions: an increase in creativity. “This is an unexpected and much-appreciated benefit. I do all kinds of translations, from tourism, wine, gastronomy, history, social sciences, financial, legal, technical, marketing, gray literature, even poetry on occasion. And Lilt gives me fantastic and creative suggestions. They don’t always work, of course, but every so often the suggestion is absolutely better than anything I could have come up with on my own without spending precious minutes searching through the thesaurus…once again, saving me time and effort.”

Many are also finding that with increased productivity, comes increased free time. Ever wish there were more hours in the day? If you’re a translator, machine assistance may be the solution.

David Creuze, a freelance translator, told us how he spends his extra time, “I have two young children, and to be able to compress my work time from 6 or 7 hours (a normal day before their birth) to 4 hours a day, without sacrificing quality, is awesome.”

With these types of benefits at our fingertips, we should stop worrying about machines taking the jobs of translators and focus on using the machine to our advantage, to work better and ultimately focus on what we do best: being human.

 

Reference: https://bit.ly/2MgDaAj

Is This The Beginning of UNMT?

Is This The Beginning of UNMT?

Research at Facebook just made it easier to translate between languages without many translation examples. For example, from Urdu to English.

Neural Machine Translation

Neural Machine Translation (NMT) is the field concerned with using AI to translate between any language such as English and French. In 2015 researchers at the Montreal Institute of Learning Algorithms, developed new AI techniques [1] which allowed machine-generated translations to finally work. Almost overnight, systems like Google Translate became orders of magnitude better.

While that leap was significant, it still required having sentence pairs in both languages, for example, “I like to eat” (English) and “me gusta comer” (Spanish).  For translations between languages like Urdu and English without many of these pairs, translation systems failed miserably. Since then, researchers have been building systems that can translate without sentence pairings, ie: Unsupervised Neural Machine Translation (UNMT).

In the past year, researchers at Facebook, NYU, University of the Basque Country and Sorbonne Universites, made dramatic advancements which are finally enabling systems to translate without knowing that “house” means “casa” in Spanish.

Just a few days ago, Facebook AI Research (FAIR), published a paper [2] showing a dramatic improvement which allowed translations from languages like Urdu to English. “To give some idea of the level of advancement, an improvement of 1 BLEU point (a common metric for judging the accuracy of MT) is considered a remarkable achievement in this field; our methods showed an improvement of more than 10 BLEU points.”

Check out more info at Forbes.

Let us know what do you think about this new leap!

Here’s Why Neural Machine Translation is a Huge Leap Forward

Here’s Why Neural Machine Translation is a Huge Leap Forward

Though machine translation has been around for decades, the most you’ll read about it is the perceived proximity to the mythical “Babel Fish” –an instantaneous personal translation device– itself ready to replace each and every human translator. The part that gets left out is machine translation’s relationship with human translators. For a long time, this relationship was no more complex than post-editing badly translated text, a process most translators find to be a tiresome chore. With the advent of neural machine translation, however, machine translation is not just something that creates more tedious work for translators. It is now a partner to them, making them faster and their output more accurate.

So What’s the Big Deal?

Before we jump into the brave new translating world of tomorrow, let’s put the technology in context. Prior to neural machine translation, there have been two main paradigms in the history of the field. The first was rules-based machine translation (RBMT) and the second, dominant until very recently, was phrase-based statistical machine translation (SMT).

When building rules-based machine translation systems, linguists and computer scientists joined forces to write thousands of rules for translating text from one language to another. This was good enough for monolingual reviewers to be able to get the general idea of important documents in an otherwise unmanageable body of content in a language they couldn’t read. But for the purposes of actually creating good translations, this approach has obvious flaws: it’s time consuming and, naturally, results in low quality translations.

Phrase-based SMT, on the other hand, looks at a large body of bilingual text and creates a statistical model of probable translations. The trouble with SMT is its reliance on systems. For instance, it is unable to associate synonyms or derivatives of a single word, requiring the use of a supplemental system responsible for morphology. It also requires a language model to ensure fluency, but this is limited to a given word’s immediate surroundings. SMT is therefore prone to grammatical errors, and relatively inflexible when it encounters phrases that are different from those included in its training data.

Finally, here we are at the advent of neural machine translation. Virtually all NMT systems use what is known as “attentional encoder-decoder” architecture. The system has two main neural networks, one that receives a sentence (the encoder) and transforms it into a series of coordinates, or “vectors”. A decoder neural network then gets to work transforming those vectors back into text in another language, with an attention mechanism sitting in between, helping the decoder network focus on the important parts of the encoder output.

The effect of this encoding is that an NMT system learns the similarity between words and phrases, grouping them together in space, whereas an SMT system just sees a bunch of unrelated words that are more or less likely to be present in a translation.

Interestingly, this architecture is what makes Google’s “zero-shot translation” possible. A well-trained multilingual NMT can decode the same encoded vector into different languages it knows, regardless of whether that particular source/target language combination was used in training.

As the decoder makes its way through the translation, it predicts words based on the entire sentence up to that point, which means it produces entire coherent sentences, unlike SMT. Unfortunately, this also means that any flaws appearing early in the sentence tend to snowball, dragging down the quality of the result. Some NMT models also struggle with words it doesn’t know, which tend to be rare words or proper nouns.

Despite its flaws, NMT represents a huge improvement in MT quality, and the flaws it does have happen to present opportunities.

Translators and Machine Translation: Together at Last

While improvements to MT typically mean increases in its usual applications (i.e. post-editing, automatic translation), the real winner with NMT is translators. This is particularly true when a translator is able to use it in real time as they translate, as opposed to post-editing MT output. When the translator actively works with an NMT engine to create a translation, they are able to build and learn from each other, the engine offering up a translation the human may not have considered, and the human serving as a moderator, and in so doing, a teacher of the engine.

For example, during the translation process, when the translator corrects the beginning of a sentence, it improves the system’s chances getting the rest of the translation right. Often all it takes is a nudge at the beginning of a sentence to fix the rest, and the snowball of mistakes unravels.

Meanwhile, NMT’s characteristic improvements in grammar and coherence mean that when it reaches a correct translation, the translator spends less time fixing grammar, beating MT output and skipping post-editing all together. When they have the opportunity to work together, translators and their NMT engines quite literally finish each other’s sentences. Besides speeding up the process, and here I’m speaking as a translator, it’s honestly a rewarding experience.

Where Do We Go Now?

Predicting the future is always a risky business, but provided the quality and accessibility of NMT continues to improve, it will gradually come to be an indispensable part of a translator’s toolbox, just as CAT tools and translation memory already have.

A lot of current research has to do with getting better data, and with building systems that need less data. Both of these areas will continue to improve MT quality and accelerate its usefulness to translators. Hopefully this usefulness will also reach more languages, especially ones with less data available for training. Once that happens, translators in those languages could get through more and more text, gradually improving the availability of quality text both for the public and for further MT training, in turn allowing those translators, having already built the groundwork, to move on to bigger challenges.

When done right, NMT has the potential to not just improve translators’ jobs, but to move the entire translation industry closer to its goal of being humanity’s Babel Fish. Not found in an app, or in an earbud, but in networks of people.

 

Reference: https://bit.ly/2CewZNs

Four Ways the Translator Role Will Change

Four Ways the Translator Role Will Change

Translators have been the staple of the translation business for decades. Linguistics, multilingual communication, and quality of language—this is their domain. They are grammarians and often self-admitted language nerds.

Translators are usually bilingual linguists who live in the country where their native language is spoken—it’s the best way for them to stay connected with linguistic and cultural changes. Their responsibility traditionally has been to faithfully render a source text into a specific target language.

But now, with the advent of more complicated content types such as highly branded content (like mobile apps or PPC ads), and with much higher volumes of content like User Generated Content (think customer reviews), all bets are off. The role of the translator has to evolve: translators now have to offer the right solution to new globalization problems…or risk being left behind.

This reality isn’t just relevant for translators: localization project managers need to know what new qualifications to look for as they try to match resources to their content. Four new specialist roles have evolved out of changing global content needs, allowing translators and linguists to expand their offerings and learn new skills.

Transcreators

In transcreation, a highly specialized linguist recreates the source content so that it’s appropriate for the target locale. They key term here is ‘recreates’, which means re-invent or build again. The goal is to create content that inspires the same emotions in the target language as the source content does in the home market.

Typically, the process of transcreation applies to taglines, product names, slogans, and advertisement copy; anything highly branded.

The linguists performing this service are highly creative translators: senior, experienced professionals with lots of marketing content translation experience. They also might have agency expertise.

Many transcreators begin their professional career as translators, and as they gain proficiency in marketing content, they become adept at the re-creation process. If they are creative types, this expertise can lead them right into the specialization of transcreation.

Content creators or copywriters

In-country copywriters create materials from scratch for a target market—a highly creative process. There’s no actual translation here. While the resource is often an ad agency professional with copywriting experience, they may also be a translator—or have been one in the past. (It’s not uncommon for a translator with creative content experience to move into copywriting.) Like translators, these professionals must be in-country in order to represent the latest trends in that market.

Cultural consultants

These folks, who also reside in the target country, provide guidance to a client on the motivations and behaviors of target buyers. They are researchers and representatives of their culture. They also may be experts in local paid media, search marketing, social media, influencer marketing, CRO, and UX.

Whatever their areas of expertise, these in-country experts could, for example, plan and manage an international digital campaign, conduct focus groups to determine user preferences, or do demographic research to help an enterprise understand or identify their target client. Bilingual, in-country translators already have—or can learn—the skills required to become a cultural consultant.

Post-editors

It’s not uncommon for an enterprise with a maturing localization program to deploy Machine Translation (MT). And most MT programs involve some level of post-editing: the process by which a linguist edits the machine’s output to a level of quality agreed upon between the client and vendor.

Post-editing needs a different skillset than translation: instead of converting source text to target text faithfully, a post-editor has to understand how an MT engine operates and what errors might be typical, and then fix all issues required to meet the requested quality bar. It’s one part translator, one part linguistic reviewer, and one part Machine Translation specialist. Translators with good critical thinking skills can train to do this work.

The needs of global businesses give translators an opportunity to stretch and grow into a variety of other industry positions that make use of their unique skillset and cultural expertise. Will translators of the future do any actual translation? Only time will tell. In the meantime, these newer linguistic services are growing in demand, and thus, so will the need for talent.

Reference: https://bit.ly/2P1hIRJ

A Beginner’s Guide to Machine Translation

A Beginner’s Guide to Machine Translation

What is Machine Translation?

Machine translation (MT) is automated translation by computer software. MT can be used to translate entire texts without any human input, or can be used alongside human translators. The concept of MT started gaining traction in the early 50s, and has come a long way since. Many used to consider MT an inadequate alternative to human translators, but as the technology has advanced, more and more companies are turning to MT to aid human translators and optimize the localization process.

How Does Machine Translation Work?

Well, that depends on the type of machine translation engine. There are several different kinds of MT software which work in different ways. We will introduce Rule-based, Statistical, and Neural.

Rule-based machine translation (RBMT) is the forefather of MT software. It is based on sets of grammatical and syntactical rules and phraseology of a language. RBMT links the structure of the source segment to the target segment, producing a result based on analysis of the rules of the source and target languages. The rules are developed by linguists and users can add terminology to override the MT and improve the translation quality.

Statistical MT (SMT) started in the age of big data and uses large amounts of existing translated texts and statistical models and algorithms to generate translations. This system relies heavily on available multilingual corpora and an average of two millions words are needed to train the engine for a specific domain – which can be time and resource intensive. When a using domain specific data, SMT can produce good quality translations, especially in the technical, medical, and financial field.

Neural MT (NMT) is a new approach which is built on deep neural networks. There are a variety of network architectures used in NMT but typically, the network can be divided into two components: an encoder which reads the input sentence and generates a representation suitable for translation, and a decoder which generates the actual translation. Words and even whole sentences are represented as vectors of real numbers in NMT. Compared to the previous generation of MT, NMT generates outputs which tend to be more fluent and grammatically accurate. Overall, NMT is a major step in MT quality. However, NMT may slightly lack behind previous approaches when it comes to translating rare words and terminology. Long and/or complex sentences are still an issue even for state-of-the-art NMT systems.

The Pros and Cons of Machine Translation

So now you have a brief understanding of MT – but what does it mean for your translation workflow? How does it benefit you?

  • MT is incredibly fast and can translate thousands of words per minute.
  • It can translate into multiple languages at once which drastically reduces the amount of manpower needed.
  • Implementing MT into your localization process can do the heavy lifting for translators and free up their valuable time, allowing them to focus on the more intricate aspects of translation.
  • MT technology is developing rapidly, and is constantly advancing towards producing higher quality translations and reducing the need for post-editing.

There are many advantages of using MT but we can’t ignore the disadvantages. MT does not always produce perfect translations. Unlike human translators, computers can’t understand context and culture, therefore MT can’t be used to translate anything and everything. Sometimes MT alone is suitable, while others a combination of MT and human translation is best. Sometimes it is not suitable at all. MT is not a one-size-fits-all translation solution.

When Should You Use Machine Translation?

When translating creative or literary content, MT is not a suitable choice. This can also be the case when translating culturally specific-texts. A good rule of thumb is the more complex your content is, the less suitable it is for MT.

For large volumes of content, especially if it has a short turnaround time, MT is very effective. If accuracy is not vital, MT can produce suitable translations at a fraction of the cost. Customer reviews, news monitoring, internal documents, and product descriptions are all good candidates.

Using a combination of MT along with a human translator post-editor opens the doors to a wider variety of suitable content.

Which MT Engine Should You Use?

Not all MT engines are created equal, but there is no specific MT engine for a specific kind of content. Publicly available MT engines are designed to be able to translate most types of content, however, with custom MT engines the training data can be tailored to a specific domain or content types.

Ultimately, choosing an MT engine is a process. You need to choose the kind of content you wish to translate, review security and privacy policies, run tests on text samples, choose post-editors, and several other considerations. The key is to do your research before making a decision. And, if you are using a translation management system (TMS) be sure it is able to support your chosen MT engine.

Using Machine Translation and a Translation Management System

You can use MT on its own, but to get the maximum benefits we suggest integrating it with a TMS. With these technologies integrated, you will be able to leverage additional tools such as translation memories, term bases, and project management features to help streamline and optimize your localization strategy. You will have greater control over your translations, and be able to analyze the effectiveness of your MT engine.

Reference: http://bit.ly/2P85d7P

Court Rules That Free MT Isn’t Enough for Legal Scenarios

Court Rules That Free MT Isn’t Enough for Legal Scenarios

In recent months, we have increasingly heard from enterprise localization groups that their executives are pushing for the adoption of neural machine translation (NMT), driven largely by a very successful public relations campaign from Google that has touted the very real improvements in NMT over the past two years. Unfortunately, some business leaders have seen media coverage and concluded that they no longer need language professionals and can simply replace translators with the “magic” of AI.

Given the way many people have come to treat Google Translate and its competitors as authorities on all matters linguistic, it was really only a matter of time before free, online MT played a role in a court case. Recently, an English-speaking police officer in Kansas City used Google Translate to converse with a Spanish-speaking individual and obtain consent to search his car. In the course of the officer’s search he discovered a large quantity of illegal narcotics. It seemed an open-and-shut case: he had permission to search the vehicle and found the drugs.

But a judge threw out the case: Google Translate rendered the officer’s “Can I search the car?” in Spanish as “¿Puedo buscar el auto?,” which is more along the lines of “Can I look for the car?” The defendant successfully argued that he gave permission only for the officer to look for the car, not look in it. The court ruled that the Google Translate output was not sufficient for consent and tossed the case.

Although legal experts argue that this particular case is unlikely to change things much – police can take additional steps to clarify consent – it points to the danger that comes from relying on MT uncritically and should serve as a caution against uncritical MT boosterism. It won’t slow down the adoption of MT – the economic requirements it fulfills are too compelling – but cases like these should provide a wake-up call for naïve adoption in cases where accuracy matters. NMT may be great when you are willing to ask questions and clarify responses, but you cannot rely upon it for cases where the results can affect life, liberty, or liability… or your bottom line.

The lesson here is not that MT is bad. After all, humans can make similar mistakes. Consider the case of Willie Ramirez, which resulted in a US$71 million judgment against a hospital, centered around a misunderstanding of the Spanish word “intoxicado” – which means “poisoned” rather than “intoxicated” – that left a young baseball star with permanent disability.

The difference is that humans respond to context and can take steps to clarify, while MT by itself does not. It provides a best machine guess at a translation, but takes no responsibility when things go wrong. Google specifically states that it does not provide any sort of warranty that its services will be accurate or usable, and indeed the company could not do so given the way its technology functions. By contrast, a human interpreter who would be liable for getting something wrong will have a strong incentive to make sure that the details are correct. An expert linguist will know what matters in a given context and ensure that the communication reflects it. MT doesn’t care.

Contrary to fears that MT will replace human translators, CSA Research’s examination of the issue shows that MT can augment human translators, making them more efficient and better able to focus on the important details.

Our research shows LSPs that MT accelerates the growth of LSPs that adopt it. LSPs and enterprises alike need to understand the technology, how to work with it, where it applies, and how best to deploy it. Translation buyers need a realistic assessment of what it can and cannot do for them and should work closely with providers to achieve their goals. Like any technology, MT is a tool, and tools used incorrectly can harm their users and those around them, but when applied properly, technology tools deliver real benefits. Just don’t expect NMT to provide you with legal or medical advice and always involve professional linguists when accuracy and message matter.

Reference: http://bit.ly/2mK8ywQ

Creative Destruction in the Localization Industry

Creative Destruction in the Localization Industry

Excerpts from an article with the same title, written by Ameesh Randeri in Multilingual Magazine.  Ameesh Randeri is part of the localization solutions department at Autodesk and manages the vendor and linguistic quality management functions. He has over 12 years of experience in the localization industry, having worked on both the buyer and seller sides.

Te concept of creative destruction was derived from the works of Karl Marx by economist Joseph Schumpeter. Schumpeter elaborated on the concept in his 1942 book Capitalism, Socialism, and Democracy, where he described creative destruction as the “process of industrial mutation that incessantly revolutionizes the economic structure from within, incessantly destroying the old one, incessantly creating a new one.

What began as a concept of economics started being used broadly across the spectrum to describe breakthrough innovation that requires invention and ingenuity — as well as breaking apart or destroying the previous order. To look for examples of creative destruction, just look around you. Artificial intelligence, machine learning and automation are creating massive efficiency gains and productivity increases, but they are also causing millions to lose jobs. Uber and other ride hailing apps worldwide are revolutionizing transport, but many traditional taxi companies are suffering.

Te process of creative destruction and innovation is accelerating over time. To understand this, we can look at the Schumpeterian (Kondratieff) waves of technological innovation. We are currently in the fifth wave of innovation ushered in by digital networks, the software industry and new media.

Te effects of the digital revolution can be felt across the spectrum. Te localization industry is no exception and is undergoing fast-paced digital disruption. There is a confluence of technology in localization tools and processes that are ushering in major changes.

The localization industry: Drawing parallels from the Industrial Revolution

All of us are familiar with the Industrial Revolution. It commenced in the second half of the 18th century and went on until the mid-19th century. As a result of the Industrial Revolution, we witnessed a transition from hand production methods to machine-based methods and factories that facilitated mass production. It ushered in innovation and urbanization. It was creative destruction at its best. Looking back at the Industrial Revolution, we see that there were inflection points, following which there were massive surges and changes in the industry.

Translation has historically been a human and manual task. A translator looks at the source text and translates it while keeping in mind grammar, style, terminology and several other factors. Te translation throughput is limited by a human’s productivity, which severely
limits the volume of translation and time required. In 1764, James Hargreaves invented the spinning jenny, a machine that enabled an individual to produce multiple spools of
threads simultaneously. Inventor Samuel Compton innovated further and came up with the spinning mule, further improving the process. Next was the mechanization of cloth weaving through the power loom, invented by Edmund Cartwright. These innovators and their inventions completely transformed the textile industry.

For the localization industry, a similar innovation is machine translation (MT). Tough research into MT had been going on for many years, it went mainstream post-2005. Rule-based and statistical MT engines were created, which resulted in drastic productivity increases. However, the quality was nowhere near what a human could produce and hence the MT engines became a supplemental technology, aiding humans and helping them increase productivity.

There was a 30%-60% productivity gain based on the language and engine that was used. There was fear that translators’ roles would diminish. But rather than diminish, their role evolved into post-editing.

The real breakthrough came in 2016 when Google and Microsoft went public with their neural machine translation (NMT) engines. Te quality produced by NMT is not yet flawless, but it seems to be very close to human translation. It can also reproduce some of the finer
nuances of writing style and creativity that were lacking in the rule-based and statistical machine translation engines. NMT is a big step forward in reducing the human footprint in the translation process. It is without a doubt an inflection point and while not perfect yet, it
has the same disruptive potential as the spinning jenny and the power loom. Sharp productivity increases, lower prices and since a machine is behind it, the volumes that can be managed are endless. And hence it renews concerns about whether translators will be needed. It is to the translation industry what the spinning jenny was to textiles, where several manual workers were
replaced by machines.

What history teaches us though is that although there is a loss of jobs based on the existing task or technology, there are newer ones created to support the newer task or technology.

In the steel industry, two inventors charted a new course: Abraham Darby, who created a cheaper, easier method to produce cast iron, using a coke-fueled furnace and Henry Bessemer, who invented the Bessemer process, the first inexpensive process for mass-producing steel. The Bessemer process revolutionized steel manufacturing by decreasing its cost, from £40 per long ton to £6–7 per long ton. Besides the
reduction in cost, there were major increases in speed and the need for labor decreased sharply.

The localization industry is seeing the creation of its own Bessemer process, called continuous localization. Simply explained, it is a fully-connected and automated process where the content creators and developers create source material that is passed for translation in continuous, small chunks. The translated content is continually merged back, facilitating continuous deployment and release. It is an extension of the agile approach and it can be demonstrated with the example of mobile applications where latest updates are continually pushed through to our phones in multiple languages. To facilitate continuous localization, vendor platforms or computer-assisted translation (CAT) tools need to be able to connect to client systems or clients need to provide CAT tool-like interfaces for vendors and their resources to use. The process would flow seamlessly from the developer or content creator creating content to the post-editor doing edits to the machine translated content. The Bessemer process in the steel industry paved the way for large-scale continuous and efficient steel production. Similarly, continuous localization has the potential to pave the way for large-scale continuous and efficient localization enabling companies to localize more, into more languages at lower prices.

There were many other disruptive technologies and processes that led to the Industrial Revolution. For the localization industry as well, there are several other tools and process improvements in play.

Audiovisual localization and interpretation: This is a theme that began evolving in recent years. Players like Microsoft-Skype and Google have made improvements in the text-to-speech, speech-to-text arena. The text to speech has become more human-like though it isn’t there yet. Speech-to-text has improved significantly as well, with the output quality going up and errors reducing. Interpretation is the other area where we see automated solutions springing up. Google’s new headphones are one example of automated interpretation solutions.

Automated terminology extraction: This is one that hasn’t garnered as much attention and focus. While there is consensus that terminology is an important aspect of localization quality, it always seems to be relegated to a lower tier from a technological advancement standpoint. There are several interesting commercial as well as open source solutions that have greatly improved terminology extraction and reduced the false positives. This area could potentially be served by artificial intelligence and machine learning solutions in the future.

Automated quality assurance (QA) checks: QA checks can be categorized into two main areas – functional and linguistic. In terms of functional QA, automations have been around for several years and have vastly improved over time. There is already exploration on applying machine learning and artificial intelligence to functional automations to predict bugs, to create scripts that are self-healing and so on. Linguistic QA on the other hand has seen some automation primarily in the areas of spelling and terminology checks. However, the automation is limited in what it can achieve and does not replace the need for human checks or audits. This is an area that could benefit hugely from artificial intelligence and machine learning.

Local language support using chatbots: Chatbots are fast becoming the first level of customer support for most companies. Most chatbots are still in English. However, we are starting to see chatbots in local languages powered by machine translation engines in the background thus enabling local language support for international customers.

Data (big or small): While data is not a tool, technology or process by itself, it is important to call it out. Data is central to a lot of the technologies and processes mentioned above. Without a good corpus, there is no machine translation. For automated terminology extraction and automated QA checks, the challenge is to have a big enough corpus of data making it possible to train the machine. In addition, metadata becomes critical. Today metadata is important to provide translators with additional contextual information, to ensure higher quality output. In future, metadata will provide the same information to machines – to a machine translation system, to an automated QA check and so on. This highlights the importance of data!

The evolution in localization is nothing but the forces of creative destruction. Each new process/technology is destructing an old way of operating and creating a new way forward. It also means that old jobs are being made redundant while new ones are being created.

How far is this future? Well, the entire process is extremely resource and technology intensive. Many companies will require a lot of time to adopt these practices. This provides the perfect opportunity for sellers to spruce up their offering and provide an automated digital localization solution. Companies with access to abundant resources or funding should be able to achieve this sooner. This is also why a pan-industry open source platform may accelerate this transformation.

Nimdzi Language Technology Atlas

Nimdzi Language Technology Atlas

For this first version, Nimdzi has mapped over 400 different tools, and the list is growing quickly. The Atlas consists of an infographic accompanied by a curated spreadsheet with software listings for various translation and interpreting needs.

As the language industry becomes more technical and complex, there is a growing need for easy-to-understand materials explaining available tech options. The Nimdzi Language Technology Atlas provides a useful view into the relevant technologies available today.

Software users can quickly find alternatives for their current tools and evaluate market saturation in each segment at a glance. Software developers can identify competition and find opportunities in the market with underserved areas.

Reference: https://bit.ly/2ticEyT