Tag: human-aided machine translation

Nimdzi Language Technology Atlas

Nimdzi Language Technology Atlas

For this first version, Nimdzi has mapped over 400 different tools, and the list is growing quickly. The Atlas consists of an infographic accompanied by a curated spreadsheet with software listings for various translation and interpreting needs.

As the language industry becomes more technical and complex, there is a growing need for easy-to-understand materials explaining available tech options. The Nimdzi Language Technology Atlas provides a useful view into the relevant technologies available today.

Software users can quickly find alternatives for their current tools and evaluate market saturation in each segment at a glance. Software developers can identify competition and find opportunities in the market with underserved areas.

Reference: https://bit.ly/2ticEyT

Six takeaways from LocWorld 37 in Warsaw

Six takeaways from LocWorld 37 in Warsaw

Over the past weekend, Warsaw welcomed Localization World 37 which gathered over 380 language industry professionals. Here is what Nimdzi has gathered from conversations at this premiere industry conference.

1. A boom in data processing services

A new market has formed preparing data to train machine learning algorithms. Between Lionbridge, Pactera, appen, and Welocalize  – the leading LSPs that have staked a claim in this sector – the revenue from these services already exceeds USD 100 million.

Pactera calls it “AI Enablement Services”, Lionbridge and Welocalize have labelled it “Global services for Machine Intelligence”, and appen prefers the title, “data for machine learning enhanced by human touch.” What companies really do is a variety of human tasks at scale:

  • Audio transcription
  • Proofreading
  • Annotation
  • Dialogue management

Humans help to train voice assistants and chat bots, image-recognition programs, and whatever else the Silicon Valley disruptors decide to unleash upon the world. One prime example was performed at the beginning of this year when Lionbridge recorded thousands of children pronouncing scripted phrases for a child-voice recognition engine.

Machine learning and AI are the second biggest areas for venture investment, according to dealroom.co. According to the International Data Corporation’s (IDC) forecast, this is likely to  quadruple in the next 5 years, from USD 12 billion in 2017 to USD 57.6 billion. Companies will need lots of accurate data to train their AI, hence there is significant business opportunity in data cleaning. Compared to flash platforms like Clickworker and Future Eight, LSPs have a broader human resource management competence and can compete for a large slice of the market.

2. LSP AI: Separating fact from fantasy

Artificial intelligence was high on information at #Locworld 37, but apart from the advances in machine translation, nothing radically new was presented. If any LSPs use machine learning for linguist selection, ad-hoc workflow building, or predictive quality analytics, they didn’t show it.

On the other hand, everyone is chiming in to the new buzzword. In a virtual show of hands at the AI panel discussion, an overwhelming proportion of LSP representatives voted that they already use some AI in their business. That’s pure exaggeration to put it mildly.

3. Introducing Game Global

Locworld’s Game Localization Roundtable expanded this year into a fully-fledged sister conference – the Game Global Forum. The two-day event gathered just over 100 people, including teams from King.com, Electronic Arts, Square Enix, Ubisoft, Wooga, Zenimax / Bethesda, Sony, SEGA, Bluehole and other gaming companies.

We spoke to participants on the buying side who believe the content to be very relevant, and vendors were happy with pricing – for roughly EUR 500, they were able to network with the world’s leading game localization buyers. This is much more affordable than the EUR 3,300+ price tag for the rival IQPC Game QA and Localization Conference.

Given the success of Game Global and the continued operation of the Brand2Global event, it’s fair to assume there is room for more industry-specific localization conferences.

4. TMS-buying rampage

Virtually every client company we’ve spoken to at Locworld is looking for a new translation management system. Some were looking for their first solution while others were migrating from heavy systems to more lightweight cloud-based solutions. This trend has been picked up by language technology companies which brought a record number of salespeople and unveiled new offerings.

Every buyer talked about the need for integration as well as end-to-end automation, and shared the “unless there is an integration, I won’t buy” sentiment. Both TMS providers and custom development companies such as Spartan Software are fully booked and churning out new connectors until the end of the 2018.

5. Translation tech and LSPs gear up for media localization

Entrepreneurs following the news have noticed that all four of the year’s fastest organically-growing companies are in the business of media localization. Their success made ripples that reached the general language services crowd. LSP voiceover and subtitling studios are overloaded, and conventional CAT-tools will roll out media localization capabilities this year. MemoQ will have a subtitle editor with video preview, and a bigger set of features is planned to be released by GlobalLink.

These features will make it easier for traditional LSPs to hop on the departed train of media localization. However, LSP systems won’t compare to specialized software packages that are tailored to dubbing workflow, detecting and labeling individual characters who speak in videos, tagging images with metadata, and the like.

Reference: https://bit.ly/2JZpkSM

A Gentle Introduction to Neural Machine Translation

A Gentle Introduction to Neural Machine Translation

One of the earliest goals for computers was the automatic translation of text from one language to another.

Automatic or machine translation is perhaps one of the most challenging artificial intelligence tasks given the fluidity of human language. Classically, rule-based systems were used for this task, which were replaced in the 1990s with statistical methods. More recently, deep neural network models achieve state-of-the-art results in a field that is aptly named neural machine translation.

In this post, you will discover the challenge of machine translation and the effectiveness of neural machine translation models.

After reading this post, you will know:

  • Machine translation is challenging given the inherent ambiguity and flexibility of human language.
  • Statistical machine translation replaces classical rule-based systems with models that learn to translate from examples.
  • Neural machine translation models fit a single model rather than a pipeline of fine-tuned models and currently achieve state-of-the-art results.

Let’s get started.

What is Machine Translation?

Machine translation is the task of automatically converting source text in one language to text in another language.

In a machine translation task, the input already consists of a sequence of symbols in some language, and the computer program must convert this into a sequence of symbols in another language.

— Page 98, Deep Learning, 2016.

Given a sequence of text in a source language, there is no one single best translation of that text to another language. This is because of the natural ambiguity and flexibility of human language. This makes the challenge of automatic machine translation difficult, perhaps one of the most difficult in artificial intelligence:

The fact is that accurate translation requires background knowledge in order to resolve ambiguity and establish the content of the sentence.

— Page 21, Artificial Intelligence, A Modern Approach, 3rd Edition, 2009.

Classical machine translation methods often involve rules for converting text in the source language to the target language. The rules are often developed by linguists and may operate at the lexical, syntactic, or semantic level. This focus on rules gives the name to this area of study: Rule-based Machine Translation, or RBMT.

RBMT is characterized with the explicit use and manual creation of linguistically informed rules and representations.

— Page 133, Handbook of Natural Language Processing and Machine Translation, 2011.

The key limitations of the classical machine translation approaches are both the expertise required to develop the rules, and the vast number of rules and exceptions required.

What is Statistical Machine Translation?

Statistical machine translation, or SMT for short, is the use of statistical models that learn to translate text from a source language to a target language gives a large corpus of examples.

This task of using a statistical model can be stated formally as follows:

Given a sentence T in the target language, we seek the sentence S from which the translator produced T. We know that our chance of error is minimized by choosing that sentence S that is most probable given T. Thus, we wish to choose S so as to maximize Pr(S|T).

— A Statistical Approach to Machine Translation, 1990.

This formal specification makes the maximizing of the probability of the output sequence given the input sequence of text explicit. It also makes the notion of there being a suite of candidate translations explicit and the need for a search process or decoder to select the one most likely translation from the model’s output probability distribution.

Given a text in the source language, what is the most probable translation in the target language? […] how should one construct a statistical model that assigns high probabilities to “good” translations and low probabilities to “bad” translations?

— Page xiii, Syntax-based Statistical Machine Translation, 2017.

The approach is data-driven, requiring only a corpus of examples with both source and target language text. This means linguists are not longer required to specify the rules of translation.

This approach does not need a complex ontology of interlingua concepts, nor does it need handcrafted grammars of the source and target languages, nor a hand-labeled treebank. All it needs is data—sample translations from which a translation model can be learned.

— Page 909, Artificial Intelligence, A Modern Approach, 3rd Edition, 2009.

Quickly, the statistical approach to machine translation outperformed the classical rule-based methods to become the de-facto standard set of techniques.

Since the inception of the field at the end of the 1980s, the most popular models for statistical machine translation […] have been sequence-based. In these models, the basic units of translation are words or sequences of words […] These kinds of models are simple and effective, and they work well for man language pairs

— Syntax-based Statistical Machine Translation, 2017.

The most widely used techniques were phrase-based and focus on translating sub-sequences of the source text piecewise.

Statistical Machine Translation (SMT) has been the dominant translation paradigm for decades. Practical implementations of SMT are generally phrase-based systems (PBMT) which translate sequences of words or phrases where the lengths may differ

— Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Although effective, statistical machine translation methods suffered from a narrow focus on the phrases being translated, losing the broader nature of the target text. The hard focus on data-driven approaches also meant that methods may have ignored important syntax distinctions known by linguists. Finally, the statistical approaches required careful tuning of each module in the translation pipeline.

What is Neural Machine Translation?

Neural machine translation, or NMT for short, is the use of neural network models to learn a statistical model for machine translation.

The key benefit to the approach is that a single system can be trained directly on source and target text, no longer requiring the pipeline of specialized systems used in statistical machine learning.

Unlike the traditional phrase-based translation system which consists of many small sub-components that are tuned separately, neural machine translation attempts to build and train a single, large neural network that reads a sentence and outputs a correct translation.

— Neural Machine Translation by Jointly Learning to Align and Translate, 2014.

As such, neural machine translation systems are said to be end-to-end systems as only one model is required for the translation.

The strength of NMT lies in its ability to learn directly, in an end-to-end fashion, the mapping from input text to associated output text.

— Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Encoder-Decoder Model

Multilayer Perceptron neural network models can be used for machine translation, although the models are limited by a fixed-length input sequence where the output must be the same length.

These early models have been greatly improved upon recently through the use of recurrent neural networks organized into an encoder-decoder architecture that allow for variable length input and output sequences.

An encoder neural network reads and encodes a source sentence into a fixed-length vector. A decoder then outputs a translation from the encoded vector. The whole encoder–decoder system, which consists of the encoder and the decoder for a language pair, is jointly trained to maximize the probability of a correct translation given a source sentence.

— Neural Machine Translation by Jointly Learning to Align and Translate, 2014.

Key to the encoder-decoder architecture is the ability of the model to encode the source text into an internal fixed-length representation called the context vector. Interestingly, once encoded, different decoding systems could be used, in principle, to translate the context into different languages.

… one model first reads the input sequence and emits a data structure that summarizes the input sequence. We call this summary the “context” C. […] A second mode, usually an RNN, then reads the context C and generates a sentence in the target language.

— Page 461, Deep Learning, 2016.

Encoder-Decoders with Attention

Although effective, the Encoder-Decoder architecture has problems with long sequences of text to be translated.

The problem stems from the fixed-length internal representation that must be used to decode each word in the output sequence.

The solution is the use of an attention mechanism that allows the model to learn where to place attention on the input sequence as each word of the output sequence is decoded.

Using a fixed-sized representation to capture all the semantic details of a very long sentence […] is very difficult. […] A more efficient approach, however, is to read the whole sentence or paragraph […], then to produce the translated words one at a time, each time focusing on a different part of he input sentence to gather the semantic details required to produce the next output word.

— Page 462, Deep Learning, 2016.

The encoder-decoder recurrent neural network architecture with attention is currently the state-of-the-art on some benchmark problems for machine translation. And this architecture is used in the heart of the Google Neural Machine Translation system, or GNMT, used in their Google Translate service.

… current state-of-the-art machine translation systems are powered by models that employ attention.

— Page 209, Neural Network Methods in Natural Language Processing, 2017.

Although effective, the neural machine translation systems still suffer some issues, such as scaling to larger vocabularies of words and the slow speed of training the models. There are the current areas of focus for large production neural translation systems, such as the Google system.

Three inherent weaknesses of Neural Machine Translation […]: its slower training and inference speed, ineffectiveness in dealing with rare words, and sometimes failure to translate all words in the source sentence.

— Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Reference: https://bit.ly/2Cx8zxI

How machine learning can be used to break down language barriers

How machine learning can be used to break down language barriers

Machine learning has transformed major aspects of the modern world with great success. Self-driving cars, intelligent virtual assistants on smartphones, and cybersecurity automation are all examples of how far the technology has come.

But of all the applications of machine learning, few have the potential to so radically shape our economy as language translation. The content of language translation is the perfect model for machine learning to tackle. Language operates on a set of predictable rules, but with a degree of variation that makes it difficult for humans to interpret. Machine learning, on the other hand, can leverage repetition, pattern recognition, and vast databases to translate faster than humans can.

There are other compelling reasons that indicate language will be one of the most important applications of machine learning. To begin with, there are over 6,500 spoken languages in the world, and many of the more obscure ones are spoken by poorer demographics who are frequently isolated from the global economy. Removing language barriers through technology connects more communities to global marketplaces. More people speak Mandarin Chinese than any other language in the world, making China’s growing middle class is a prime market for U.S. companies if they can overcome the language barrier.

Let’s take a look at how machine learning is currently being applied to the language barrier problem, and how it might develop in the future.

Neural machine translation

Recently, language translation took an enormous leap forward with the emergence of a new machine translation technology called Neural Machine Translation (NMT). The emphasis should be on the “neural” component because the inner workings of the technology really do mimic the human mind. The architects behind NMT will tell you that they frequently struggle to understand how it comes to certain translations because of how quickly and accurately it delivers them.

“NMT can do what other machine translation methods have not done before – it achieves translation of entire sentences without losing meaning,” says Denis A. Gachot, CEO of SYSTRAN, a language translation technologies company. “This technology is of a caliber that deserves the attention of everyone in the field. It can translate at near-human levels of accuracy and can translate massive volumes of information exponentially faster than we can operate.”

The comparison to human translators is not a stretch anymore. Unlike the days of garbled Google Translate results, which continue to feed late night comedy sketches, NMT is producing results that rival those of humans. In fact, Systran’s Pure Neural Machine Translation product was preferred over human translators 41% of the time in one test.

Martin Volk, a professor at the Institute of Computational Linguistics at the University of Zurich, had this to say about neural machine translation in a 2017 Slator article:

“I think that as computing power inevitably increases, and neural learning mechanisms improve, machine translation quality will gradually approach the quality of a professional human translator over the coming two decades. There will be a point where in commercial translation there will no longer be a need for a professional human translator.”

Gisting to fluency

One telling metric to watch is gisting vs. fluency. Are the translations being produced communicating the gist of an idea, or fluently communicating details?

Previous iterations of language translation technology only achieved the level of gisting. These translations required extensive human support to be usable. NMT successfully pushes beyond gisting and communicates fluently. Now, with little to no human support, usable translations can be processed at the same level of quality as those produced by humans. Sometimes, the NMT translations are even superior.

Quality and accuracy are the main priorities of any translation effort. Any basic translation software can quickly spit out its best rendition of a body of text. To parse information correctly and deliver a fluent translation requires a whole different set of competencies. Volk also said, “Speed is not the key. We want to drill down on how information from sentences preceding and following the one being translated can be used to improve the translation.”

This opens up enormous possibilities for global commerce. Massive volumes of information traverse the globe every second, and quite a bit of that data needs to be translated into two or more languages. That is why successfully automating translation is so critical. Tasks like e-discovery, compliance, or any other business processes that rely on document accuracy can be accelerated exponentially with NMT.

Education, e-commerce, travel, diplomacy, and even international security work can be radically changed by the ability to communicate in your native language with people from around the globe.

Post language economy

Everywhere you look, language barriers are a speed check on global commerce. Whether that commerce involves government agencies approving business applications, customs checkpoints, massive document sharing, or e-commerce, fast and effective translation are essential.

If we look at language strictly as a means of sharing ideas and coordinating, it is somewhat inefficient. It is linear and has a lot of rules that make it difficult to use. Meaning can be obfuscated easily, and not everyone is equally proficient at using it. But the biggest drawback to language is simply that not everyone speaks the same one.

NMT has the potential to reduce and eventually eradicate that problem.

“You can think of NMT as part of your international go-to-market strategy,” writes Gachot. “In theory, the Internet erased geographical barriers and allowed players of all sizes from all places to compete in what we often call a ‘global economy,’ But we’re not all global competitors because not all of us can communicate in the 26 languages that have 50 million or more speakers. NMT removes language barriers, enabling new and existing players to be global communicators, and thus real global competitors. We’re living in the post-internet economy, and we’re stepping into the post-language economy.”

Machine learning has made substantial progress but has not yet cracked the code on language. It does have its shortcomings, namely when it faces slang, idioms, obscure dialects of prominent languages and creative or colorful writing. It shines, however, in the world of business, where jargon is defined and intentional. That in itself is a significant leap forward.

Reference: https://bit.ly/2Fwhuku

Adaptive MT – Trados 2017 New Feature

Adaptive MT – Trados 2017 New Feature


SDL Trados Studio 2017 includes new generation of machine translation.

How does it work?

It allows users to adapt SDL Language Cloud machine translation with their own preferred style. There is a free plan and it offers these features:

  • 400,000 machine translated characters per month.
  • only access to the baseline engines, so this means no industry or vertically trained engines.
  • 5 termbases, or dictionaries, which can be used to “force” the engine to use the translation you want for certain words/phrases.
  • 1 Adaptive engine.
  • Translator… this is basically a similar feature to FreeTranslation.com except it’s personalized with your Engine(s) and your termbases.

How does it help?

  • Faster translation with smarter MT suggestions.
  • Easy to use and get started.
  • Completely secure – no data is collected or shared.
  • Unique MT output, personal to you.
  • Access directly within Studio 2017.
  • No translation memory needed to train the MT.
  • Automatic, real time learning – no pre-training required.

What are the available language pairs?

Uptill now, Adaptive MT is available in these language pairs:

English <-> French
English <-> German
English <-> Italian
English <-> Spanish
English <-> Dutch
English <-> Portuguese
English <-> Japanese
English <-> Chinese

For reference: https://www.sdltrados.com/products/trados-studio/adaptivemt/

Machine Translation Post-Editing Types

Machine Translation Post-Editing Types

Post Editing is the next step after completing the machine translation (MT) process and evaluating its output. A human translator processes the document to verify that the source and target texts convey the same information and that the tone of the translation is consistent with the original document. The quality of machine translation varies and affects the subsequent effort required for post editing. There are contributory factors to the quality of the MT such as the clarity and quality of the source text; it is important to make sure that the source text is well-written and well-suited for machine translation beforehand. Other considerable factors that affect MT output quality include: the type of MT used, and the compatibility of the source and target languages.

There are two types or levels of post editing

Read More Read More

Machine Translation History & Approaches

Machine Translation History & Approaches

Machine Translation (MT) refers to automated language translation. The concept has been around since the 1600’s but has come into its own beginning in the twentieth century. Along with the invention of electronic calculators came the development of ways to adapt computer technology to language translation of documents. Research became prevalent at universities in the mid 1950’s to develop and test machines to perform tasks previously only possible by human translators.

Read More Read More