Tag: Machine Translation

Top 5 Reasons Why Enterprises Rely on Machine Translation for Global Expansion

Top 5 Reasons Why Enterprises Rely on Machine Translation for Global Expansion

SDL published a whitepaper regarding the reasons behind why enterprises rely on Machine Translation for global expansion. SDL stated the case in point in the introduction, which is language barriers between companies and their global customers stifle economic growth. In fact, forty-nine percent of executives say a language barrier has stood in the way of a major international business deal. Nearly two-thirds (64 percent) of those same executives say language barriers make it difficult to gain a foothold in international markets. Whether inside or outside your company, your global audiences prefer to read in their native languages. It speeds efficiency, increases receptivity and allows for easier processing of concepts. 

SDL stated this point as a solution to the aforementioned challenge:

To break the language barrier and expand your global and multilingual footprint, there are opportunities to leverage both human translation and machine translation.

Then, the paper compared between human translation and MT from the perspective of usage. For human translation, it is the best for content that is legally binding, as well as high value, branded content. However, human translation can be costly, can take weeks (or even months) to complete and can’t address all of the real-time needs of your business to serve multilingual prospects, partners and customers.

And regarding MT, it is fast becoming an essential complement to human translation efforts. It is well suited for use as part of a human translation process, but also solves high-volume and real-time content challenges that human translation cannot on its own, including the five that are the focus of this white paper.

First reason:  Online user activity and multilingual engagement

Whether it’s a web forum, blog, community content, customer review or a Wiki page, your online user-generated content (UGC) is a powerful tool for customer experience and can be a great opportunity to connect customers around your brand and products. These are rarely translated because the ever-fluctuating content requires real-time translation that is not possible with traditional translation options. However, this content is a valuable resource for resolving problems, providing information, building a brand and delivering a positive customer experience.

Machine translation provides a way for companies to quickly and affordably translate user reviews on e-commerce sites, comments on blogs or within online communities or forums, Wiki content and just about any other online UGC that helps provide support or information to your customers and prospects. While the translation isn’t perfect, its quality is sufficient for its primary purpose: information.

Second reason:  Global customer service and customer relationship management

The goal of any customer service department is to help customers find the right answer – and to stay off the phone. Phone support is typically expensive and inefficient for the company and can be frustrating for the customer. Today, customer service departments are working to enhance relationships with customers by offering support over as many self-service channels as possible, including knowledge base articles, email support and real-time chat.

However, due to its dynamic nature, this content often isn’t translated into different languages, making multilingual customer service agents required instead. Because of its real-time capabilities, capacity to handle large volumes of content and ability to lower costs, machine translation is an extremely attractive option for businesses with global customer support organizations.

There are two key online customer support areas that are strong candidates for machine translation:
• Real-time communication
• Knowledge base articles

Third reason:  International employee collaboration

Your employees are sharing information every day: proposals, product specification, designs, documents. In a multinational company, they’re likely native speakers of languages other than the one spoken at headquarters. While these employees may speak your language very
well, they most likely prefer to review complex concepts in their native languages. Reading in their native languages increases their mental
processing speed and allows them to work better and faster.

Human translation isn’t possible in this scenario because of the time-sensitivity inherent to internal collaboration. But internal knowledge sharing doesn’t need the kind of letter perfect translation that public-facing documents often do. For internal content sharing, machine translation can provide an understandable translation that will help employees transcend language barriers. In addition, by granting all employees access to a machine translation solution, they are able to access and quickly translate external information as well without sending it through a lengthy translation process or exposing it outside of your walls.

This level of multilingual information sharing and information access can dramatically improve internal communications and knowledge sharing, increase employee satisfaction and retention and drive innovation among your teams.

Forth reason:  Online security and protection of intellectual property

In an effort to be resourceful, your employees will likely seek out free translation methods like Google Translate or Microsoft Bing. These public, web-based machine translation tools are effective, but they allow your intellectual property to be mined to improve search results or for other needs. There is a simple test to determine if your company’s information is being submitted through public channels for translation: Simply have your IT department audit your firewalls to determine how much traffic is going to the IP addresses of online translation services. Many companies have been surprised by the volume of information going out of their organization this way.

This security hole can be plugged with a secure, enterprise-grade machine translation hosted on-premises or in a private cloud. With this type of solution, you can give employees a secure translation option for translation of documents, websites and more. And, of course, you’ll protect your valuable intellectual property by keeping it in-house, where it belongs.

Fifth reason:  Translation capacity and turnaround time for internal teams or agencies

Machine translation can improve the capacity and productivity of internal translation departments or language service providers (LSPs) by 30 percent or more and greatly reduces the cost of content translaton. Large enterprises that translate massive volumes have seen increases up to 300 percent in translation productivity when machine translation is used to generate the initial translation, which is then edited by skilled translators.

Here’s how it works: instead of starting with a raw document, translators start with a machine translation, which they review in a post-editing process. Translators edit and fine-tune the content for readability, accuracy and cultural sensitivity. By front-loading the process with a high-quality machine translation, translators are still able to provide high-quality content, but in a fraction of the time. 

Reference: https://bit.ly/2wXRQSt

A Gentle Introduction to Neural Machine Translation

A Gentle Introduction to Neural Machine Translation

One of the earliest goals for computers was the automatic translation of text from one language to another.

Automatic or machine translation is perhaps one of the most challenging artificial intelligence tasks given the fluidity of human language. Classically, rule-based systems were used for this task, which were replaced in the 1990s with statistical methods. More recently, deep neural network models achieve state-of-the-art results in a field that is aptly named neural machine translation.

In this post, you will discover the challenge of machine translation and the effectiveness of neural machine translation models.

After reading this post, you will know:

  • Machine translation is challenging given the inherent ambiguity and flexibility of human language.
  • Statistical machine translation replaces classical rule-based systems with models that learn to translate from examples.
  • Neural machine translation models fit a single model rather than a pipeline of fine-tuned models and currently achieve state-of-the-art results.

Let’s get started.

What is Machine Translation?

Machine translation is the task of automatically converting source text in one language to text in another language.

In a machine translation task, the input already consists of a sequence of symbols in some language, and the computer program must convert this into a sequence of symbols in another language.

— Page 98, Deep Learning, 2016.

Given a sequence of text in a source language, there is no one single best translation of that text to another language. This is because of the natural ambiguity and flexibility of human language. This makes the challenge of automatic machine translation difficult, perhaps one of the most difficult in artificial intelligence:

The fact is that accurate translation requires background knowledge in order to resolve ambiguity and establish the content of the sentence.

— Page 21, Artificial Intelligence, A Modern Approach, 3rd Edition, 2009.

Classical machine translation methods often involve rules for converting text in the source language to the target language. The rules are often developed by linguists and may operate at the lexical, syntactic, or semantic level. This focus on rules gives the name to this area of study: Rule-based Machine Translation, or RBMT.

RBMT is characterized with the explicit use and manual creation of linguistically informed rules and representations.

— Page 133, Handbook of Natural Language Processing and Machine Translation, 2011.

The key limitations of the classical machine translation approaches are both the expertise required to develop the rules, and the vast number of rules and exceptions required.

What is Statistical Machine Translation?

Statistical machine translation, or SMT for short, is the use of statistical models that learn to translate text from a source language to a target language gives a large corpus of examples.

This task of using a statistical model can be stated formally as follows:

Given a sentence T in the target language, we seek the sentence S from which the translator produced T. We know that our chance of error is minimized by choosing that sentence S that is most probable given T. Thus, we wish to choose S so as to maximize Pr(S|T).

— A Statistical Approach to Machine Translation, 1990.

This formal specification makes the maximizing of the probability of the output sequence given the input sequence of text explicit. It also makes the notion of there being a suite of candidate translations explicit and the need for a search process or decoder to select the one most likely translation from the model’s output probability distribution.

Given a text in the source language, what is the most probable translation in the target language? […] how should one construct a statistical model that assigns high probabilities to “good” translations and low probabilities to “bad” translations?

— Page xiii, Syntax-based Statistical Machine Translation, 2017.

The approach is data-driven, requiring only a corpus of examples with both source and target language text. This means linguists are not longer required to specify the rules of translation.

This approach does not need a complex ontology of interlingua concepts, nor does it need handcrafted grammars of the source and target languages, nor a hand-labeled treebank. All it needs is data—sample translations from which a translation model can be learned.

— Page 909, Artificial Intelligence, A Modern Approach, 3rd Edition, 2009.

Quickly, the statistical approach to machine translation outperformed the classical rule-based methods to become the de-facto standard set of techniques.

Since the inception of the field at the end of the 1980s, the most popular models for statistical machine translation […] have been sequence-based. In these models, the basic units of translation are words or sequences of words […] These kinds of models are simple and effective, and they work well for man language pairs

— Syntax-based Statistical Machine Translation, 2017.

The most widely used techniques were phrase-based and focus on translating sub-sequences of the source text piecewise.

Statistical Machine Translation (SMT) has been the dominant translation paradigm for decades. Practical implementations of SMT are generally phrase-based systems (PBMT) which translate sequences of words or phrases where the lengths may differ

— Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Although effective, statistical machine translation methods suffered from a narrow focus on the phrases being translated, losing the broader nature of the target text. The hard focus on data-driven approaches also meant that methods may have ignored important syntax distinctions known by linguists. Finally, the statistical approaches required careful tuning of each module in the translation pipeline.

What is Neural Machine Translation?

Neural machine translation, or NMT for short, is the use of neural network models to learn a statistical model for machine translation.

The key benefit to the approach is that a single system can be trained directly on source and target text, no longer requiring the pipeline of specialized systems used in statistical machine learning.

Unlike the traditional phrase-based translation system which consists of many small sub-components that are tuned separately, neural machine translation attempts to build and train a single, large neural network that reads a sentence and outputs a correct translation.

— Neural Machine Translation by Jointly Learning to Align and Translate, 2014.

As such, neural machine translation systems are said to be end-to-end systems as only one model is required for the translation.

The strength of NMT lies in its ability to learn directly, in an end-to-end fashion, the mapping from input text to associated output text.

— Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Encoder-Decoder Model

Multilayer Perceptron neural network models can be used for machine translation, although the models are limited by a fixed-length input sequence where the output must be the same length.

These early models have been greatly improved upon recently through the use of recurrent neural networks organized into an encoder-decoder architecture that allow for variable length input and output sequences.

An encoder neural network reads and encodes a source sentence into a fixed-length vector. A decoder then outputs a translation from the encoded vector. The whole encoder–decoder system, which consists of the encoder and the decoder for a language pair, is jointly trained to maximize the probability of a correct translation given a source sentence.

— Neural Machine Translation by Jointly Learning to Align and Translate, 2014.

Key to the encoder-decoder architecture is the ability of the model to encode the source text into an internal fixed-length representation called the context vector. Interestingly, once encoded, different decoding systems could be used, in principle, to translate the context into different languages.

… one model first reads the input sequence and emits a data structure that summarizes the input sequence. We call this summary the “context” C. […] A second mode, usually an RNN, then reads the context C and generates a sentence in the target language.

— Page 461, Deep Learning, 2016.

Encoder-Decoders with Attention

Although effective, the Encoder-Decoder architecture has problems with long sequences of text to be translated.

The problem stems from the fixed-length internal representation that must be used to decode each word in the output sequence.

The solution is the use of an attention mechanism that allows the model to learn where to place attention on the input sequence as each word of the output sequence is decoded.

Using a fixed-sized representation to capture all the semantic details of a very long sentence […] is very difficult. […] A more efficient approach, however, is to read the whole sentence or paragraph […], then to produce the translated words one at a time, each time focusing on a different part of he input sentence to gather the semantic details required to produce the next output word.

— Page 462, Deep Learning, 2016.

The encoder-decoder recurrent neural network architecture with attention is currently the state-of-the-art on some benchmark problems for machine translation. And this architecture is used in the heart of the Google Neural Machine Translation system, or GNMT, used in their Google Translate service.

… current state-of-the-art machine translation systems are powered by models that employ attention.

— Page 209, Neural Network Methods in Natural Language Processing, 2017.

Although effective, the neural machine translation systems still suffer some issues, such as scaling to larger vocabularies of words and the slow speed of training the models. There are the current areas of focus for large production neural translation systems, such as the Google system.

Three inherent weaknesses of Neural Machine Translation […]: its slower training and inference speed, ineffectiveness in dealing with rare words, and sometimes failure to translate all words in the source sentence.

— Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Reference: https://bit.ly/2Cx8zxI

NEURAL MACHINE TRANSLATION: THE RISING STAR

NEURAL MACHINE TRANSLATION: THE RISING STAR

These days, language industry professionals simply can’t escape hearing about neural machine translation (NMT). However, there still isn’t enough information about the practical facts of NMT for translation buyers, language service providers, and translators. People often ask: is NMT intended for me? How will it change my life?

A Short History and Comparison

At the beginning of time – around the 1970s – the story began with rule-based machine translation (RBMT) solutions. The idea was to create grammatical rule sets for source and target languages, where machine translation is a kind of conversion process between the languages based on these rule sets. This concept works well with generic content, but adding new content, new language pairs, and maintaining the rule set is very time-consuming and expensive.

This problem was solved with statistical machine translation (SMT) around the late ‘80s and early ‘90s. SMT systems create statistical models by analyzing aligned source-target language data (training set) and use them to generate the translation. The advantage of SMT is the automatic learning process and the relatively easy adaptation by simply changing or extending the training set. The limitation of SMT is the training set itself: to create a usable engine, a large database of source-target segments is required. Additionally, SMT is not language independent in the sense that it is highly sensitive to the language combination and has a very hard time dealing with grammatically rich languages.

This is where neural machine translation (NMT) begins to shine: it can look at the sentence as a whole and can create associations between the phrases over an even longer distance within the sentence. The result is a convincing fluency and an improved grammatical correctness compared to SMT.

Statistical MT vs Neural MT

Both SMT and NMT are working on a statistical base and are using source-target language segment pairs as a basis. What’s the difference? What we typically call SMT is actually Phrase Based Statistical Machine Translation (PBSMT), meaning SMT is splitting the source segments into phrases. During the training process, SMT creates a translation model and a language model. The translation model stores the different translations of the phrases and the language model stores the probability of the sequence of phrases on the target side. During the translation phase, the decoder chooses the translation that gives the best result based on these two models. On a phrase or expression level, SMT (or PBSMT) is performing well, but language fluency and grammar is not good.

‘Buch’ is aligned with ‘book’ twice and only once with ‘the’ and ‘a’ – the winner is the ‘Buch’-’book’ combination

Neural Machine Translation, on the other hand, is using neural network-based, deep, machine learning technology. Words or even word chunks are transformed into “word vectors”. This means that ‘dog’ is not only representing the characters d, o and g, but it can contain contextual information from the training data. During the training phase, the NMT system tries to set the parameter weights of the neural network based on the reference values (source-target translation). Words appearing in similar context will get similar word vectors. The result is a neural network which can process source segments and transfer them into target segments. During translation, NMT is looking for a complete sentence, not just chunks (phrases). Thanks to the neural approach, it is not translating words, it’s transferring information and context. This is why fluency is much better than in SMT, but terminology accuracy is sometimes not perfect.

Similar words are closer to each other in a vector space

The Hardware

A popular GPU: NVIDIA Tesla

One big difference between SMT and NMT systems is that NMT requires Graphics Processing Units (GPUs), which were originally designed to help computers process graphics. These GPUs can calculate astonishingly fast – the latest cards have about 3,500 cores which can process data simultaneously. In fact, there is a small ongoing hardware revolution and GPU-based computers are the foundation for almost all deep learning and machine learning solutions. One of the great perks of this revolution is that nowadays, NMT is not only available for large enterprises, but also for small and medium-sized companies as well.

The Software

The main element, or ‘kernel’, of any NMT solution is the so-called NMT toolkit. There are a couple of NMT toolkits available, such as Nematus or openNMT, but the landscape is changing fast and more companies and universities are now developing their own toolkits. Since many of these toolkits are open-source solutions and hardware resources have become more affordable, the industry is experiencing an accelerating speed in toolkit R&D and NMT-related solutions.

On the other hand, as important as toolkits are, they are only one small part of a complex system, which contains frontend, backend, pre-processing and post-processing elements, parsers, filters, converters, and so on. These are all factors for anyone to consider before jumping into the development of an individual system. However, it is worth noting that the success of MT is highly community-driven and would not be where it is today without the open source community.

Corpora

A famous bilingual corpus: the Rosetta Stone

And here comes one of the most curious questions: what are the requirements of creating a well-performing NMT engine? Are there different rules compared to SMT systems? There are so many misunderstandings floating around on this topic that I think it’s a perfect opportunity to go into the details a little bit.

The main rules are nearly the same both for SMT and NMT systems. The differences are mainly that an NMT system is less sensitive and performs better in the same circumstances. As I have explained in an earlier blog post about SMT engine quality, the quality of an engine should always be measured in relation to the particular translation project for which you would like to use it.

These are the factors which will eventually influence the performance of an NMT engine:

Volume

Regardless of you may have heard, volume is still very important for NMT engines just like in the SMT world. There is no explicit rule on entry volumes but what we can safely say is that the bare minimum is about 100,000 segment pairs. There are Globalese users who are successfully using engines created based on 150,000 segments, but to be honest, this is more of an exception and requires special circumstances (like the right language combination, see below). The optimum volume starts around 500,000 segment pairs (2 million words).

Quality

The quality of the training set plays an important role (garbage in, garbage out). Don’t add unqualified content to your engine just to increase the overall size of the training set.

Relevance

Applying the right engine to the right project is the first key to success. An engine trained on automotive content will perform well on car manual translation but will give back disappointing results when you try to use it for web content for the food industry.

This raises the question of whether the content (TMs) should be mixed. If you have enough domain-specific content you shouldn’t necessarily add more out-of-domain data to your engine, but if you have an insufficient volume of domain-specific data then adding generic content (e.g. from public sources) may help improve the quality. We always encourage our Globalese users to try different engine combinations with different training sets.

Content type

Content generated by possible non-native speaking users on a chat forum or marketing material requiring transcreation is always a challenge to any MT system. On the other hand, technical documentation with controlled language is a very good candidate for NMT.

Language combination

Unfortunately, language combination still has an impact on quality. The good news is that NMT has now opened up the option of using machine translation for languages like Japanese, Turkish, or Hungarian –  languages which had nearly been excluded from the machine translation club because of poor results provided by SMT. NMT has also helped solve the problem of long distance dependencies for German and the translation output is much smoother for almost all languages. But English combined with Latin languages still provides better results than, for example, English combined with Russian when using similar volumes and training set quality.

Expectations for the future

Neural Machine Translation is a big step ahead in quality, but it still isn’t magic. Nobody should expect that NMT will replace human translators anytime soon. What you CAN expect is that NMT can be a powerful productivity tool in the translation process and open new service options both for translation buyers and language service providers (see post-editing experience).

Training and Translation Time

When we started developing Globalese NMT, one of the most surprising experiences for us was that the training time was far shorter than we had previously anticipated. This is due to the amazingly fast evolution of hardware and software. With Globalese, we currently have an average training time of 50,000 segments per hour – this means that an average engine with 1 million segments can be trained within one day. The situation is even better when looking at translation times: with Globalese, we currently have an average translation time between 100 and 400 segments per minute, depending on the corpus size, segment length in the translation and training content.

Neural MT Post-editing Experience

One of the great changes neural machine translation brings along is that the overall language quality is much better when compared to the SMT world. This does not mean that the translation is always perfect. As stated by one of our testers: if it is right, then it is astonishingly good quality. The ratio of good and poor translation naturally varies depending on the engine, but good engines can provide about 50% (or even higher) of really good translation target text.

Here are some examples showcasing what NMT post-editors can expect:

DE original:

Der Rechnungsführer sorgt für die gebotenen technischen Vorkehrungen zur wirksamen Anwendung des FWS und für dessen Überwachung.

Reference human translation:

The accounting officer shall ensure appropriate technical arrangements for aneffective functioning of the EWS and its monitoring.

Globalese NMT:

The accounting officer shall ensure the necessary technical arrangements for theeffective use of the EWS and for its monitoring.

As you can see, the output is fluent, and the differences are just preferential ones, more or less. This is highlighting another issue: automated quality metrics like BLEU score are not really sufficient to measure the quality. The example above is only a 50% match in the BLEU score, but if we look at the quality, the rating should be much higher.

Let’s look another example:

EN original

The concept of production costs must be understood as being net of any aid but inclusive of a normal level of profit.

Reference human translation:

Die Produktionskosten verstehen sich ohne Beihilfe, aber einschließlich eines normalen Gewinns.

Globalese NMT:

Der Begriff der Produktionskosten bezieht sich auf die Höhe der Beihilfe, aber einschließlich eines normalen Gewinns.

What is interesting here that the first part of the sentence sounds good, but if you look at the content, the translation is not good. This is an example of a fluent output with a bad translation. This is a typical case in the NMT world and it emphasizes the point that post-editors must examine NMT output differently than they did for SMT – in SMT, bad grammar was a clear indicator that the translation must be post-edited.

Post-editors who used to proof and correct SMT output have to change the way they are working and have to be more careful with proofreading, even if the NMT output looks alright at first glance. Also, services related to light post-editing will change – instead of correcting serious grammatical errors without checking the correctness of translation in order to create some readable content, the focus will shift to sorting out serious mistranslations. The funny thing is that one of the main problems in the SMT world was weak fluency and grammar, and now we have good fluency and grammar as an issue in the NMT world…

And finally:

DE original:

Aufgrund des rechtlichen Status der Beteiligten ist ein solcher Vorgang mit einer Beauftragung des liefernden Standorts und einer Berechnung der erbrachten Leistung verbunden.

Reference human translation:

The legal status of the companies involved in these activities means that this process is closely connected with placing orders at the location that is to supply the goods/services and calculating which goods/services they supply.

Globalese NMT:

Due to the legal status of the person, it may lead to this process at the site of the plant, and also a calculation of the completed technician.

This example shows that unfortunately, NMT can produce bad translations too. As I mentioned before, the ratio of good and bad NMT output you will face in a project always depends on the circumstances. Another weak point of NMT is that it currently cannot handle the terminology directly and it acts as a kind of “black box” with no option to directly influence the results.

Reference: https://bit.ly/2hBGsVh

How machine learning can be used to break down language barriers

How machine learning can be used to break down language barriers

Machine learning has transformed major aspects of the modern world with great success. Self-driving cars, intelligent virtual assistants on smartphones, and cybersecurity automation are all examples of how far the technology has come.

But of all the applications of machine learning, few have the potential to so radically shape our economy as language translation. The content of language translation is the perfect model for machine learning to tackle. Language operates on a set of predictable rules, but with a degree of variation that makes it difficult for humans to interpret. Machine learning, on the other hand, can leverage repetition, pattern recognition, and vast databases to translate faster than humans can.

There are other compelling reasons that indicate language will be one of the most important applications of machine learning. To begin with, there are over 6,500 spoken languages in the world, and many of the more obscure ones are spoken by poorer demographics who are frequently isolated from the global economy. Removing language barriers through technology connects more communities to global marketplaces. More people speak Mandarin Chinese than any other language in the world, making China’s growing middle class is a prime market for U.S. companies if they can overcome the language barrier.

Let’s take a look at how machine learning is currently being applied to the language barrier problem, and how it might develop in the future.

Neural machine translation

Recently, language translation took an enormous leap forward with the emergence of a new machine translation technology called Neural Machine Translation (NMT). The emphasis should be on the “neural” component because the inner workings of the technology really do mimic the human mind. The architects behind NMT will tell you that they frequently struggle to understand how it comes to certain translations because of how quickly and accurately it delivers them.

“NMT can do what other machine translation methods have not done before – it achieves translation of entire sentences without losing meaning,” says Denis A. Gachot, CEO of SYSTRAN, a language translation technologies company. “This technology is of a caliber that deserves the attention of everyone in the field. It can translate at near-human levels of accuracy and can translate massive volumes of information exponentially faster than we can operate.”

The comparison to human translators is not a stretch anymore. Unlike the days of garbled Google Translate results, which continue to feed late night comedy sketches, NMT is producing results that rival those of humans. In fact, Systran’s Pure Neural Machine Translation product was preferred over human translators 41% of the time in one test.

Martin Volk, a professor at the Institute of Computational Linguistics at the University of Zurich, had this to say about neural machine translation in a 2017 Slator article:

“I think that as computing power inevitably increases, and neural learning mechanisms improve, machine translation quality will gradually approach the quality of a professional human translator over the coming two decades. There will be a point where in commercial translation there will no longer be a need for a professional human translator.”

Gisting to fluency

One telling metric to watch is gisting vs. fluency. Are the translations being produced communicating the gist of an idea, or fluently communicating details?

Previous iterations of language translation technology only achieved the level of gisting. These translations required extensive human support to be usable. NMT successfully pushes beyond gisting and communicates fluently. Now, with little to no human support, usable translations can be processed at the same level of quality as those produced by humans. Sometimes, the NMT translations are even superior.

Quality and accuracy are the main priorities of any translation effort. Any basic translation software can quickly spit out its best rendition of a body of text. To parse information correctly and deliver a fluent translation requires a whole different set of competencies. Volk also said, “Speed is not the key. We want to drill down on how information from sentences preceding and following the one being translated can be used to improve the translation.”

This opens up enormous possibilities for global commerce. Massive volumes of information traverse the globe every second, and quite a bit of that data needs to be translated into two or more languages. That is why successfully automating translation is so critical. Tasks like e-discovery, compliance, or any other business processes that rely on document accuracy can be accelerated exponentially with NMT.

Education, e-commerce, travel, diplomacy, and even international security work can be radically changed by the ability to communicate in your native language with people from around the globe.

Post language economy

Everywhere you look, language barriers are a speed check on global commerce. Whether that commerce involves government agencies approving business applications, customs checkpoints, massive document sharing, or e-commerce, fast and effective translation are essential.

If we look at language strictly as a means of sharing ideas and coordinating, it is somewhat inefficient. It is linear and has a lot of rules that make it difficult to use. Meaning can be obfuscated easily, and not everyone is equally proficient at using it. But the biggest drawback to language is simply that not everyone speaks the same one.

NMT has the potential to reduce and eventually eradicate that problem.

“You can think of NMT as part of your international go-to-market strategy,” writes Gachot. “In theory, the Internet erased geographical barriers and allowed players of all sizes from all places to compete in what we often call a ‘global economy,’ But we’re not all global competitors because not all of us can communicate in the 26 languages that have 50 million or more speakers. NMT removes language barriers, enabling new and existing players to be global communicators, and thus real global competitors. We’re living in the post-internet economy, and we’re stepping into the post-language economy.”

Machine learning has made substantial progress but has not yet cracked the code on language. It does have its shortcomings, namely when it faces slang, idioms, obscure dialects of prominent languages and creative or colorful writing. It shines, however, in the world of business, where jargon is defined and intentional. That in itself is a significant leap forward.

Reference: https://bit.ly/2Fwhuku

A New Way to Measure NMT Quality

A New Way to Measure NMT Quality

Neural Machine Translation (NMT) systems produce very high quality translations, and are poised to radically change the professional translation industry. These systems require quality feedback / scores on an ongoing basis. Today, the prevalent method is via Bilingual Evaluation Understudy (BLEU), but methods like this are no longer fit for purpose.

A better approach is to have a number of native speakers assess NMT output and rate the quality of each translation. One Hour Translation (OHT) is doing just that: our new NMT index is released in late April 2018 and fully available for the translation community to use.

A new age of MT

NMT marks a new age in automatic machine translation. Unlike technologies developed over the past 60 years,  the well-trained and tested NMT systems that are available today,  have the potential to replace human translators.

Aside from processing power, the main factors that impact NMT performance are:

  •      the amount and quality of initial training materials, and
  •      an ongoing quality-feedback process

For a NMT system to work well, it needs to be properly trained, i.e. “fed” with hundreds of thousands (and in some cases millions) of correct translations. It also requires feedback on the quality of the translations it produces.

NMT is the future of translation. It is already much better than previous MT technologies, but issues with training and quality assurance are impeding progress.

NMT is a “disruptive technology” that will change the way most translations are performed. It has taken over 50 years, but machine translation can now be used to replace human translators in many cases.

So what is the problem?

While NMT systems could potentially revolutionize the translation market, their development and adoption are hampered by the lack of quality input, insufficient means of testing the quality of the translations and the challenge of providing translation feedback.

These systems also require a lot of processing power, an issue which should be solved in the next few years, thanks to two main factors. Firstly, Moore’s law, which predicts that processing power doubles every 18 months, also applies to NMT, meaning that processing power will continue to increase exponentially. Secondly, as more companies become aware of the cost benefit of using NMT, more and more resources will be allocated for NMT systems.

Measuring quality is a different and more problematic challenge. Today, algorithms such as BLEU, METEOR, and TER try to predict automatically what a human being would say about the quality of a given machine translation. While these tests are fast, easy, and inexpensive to run (because they are simply software applications), their value is very limited. They do not provide an accurate quality score for the translation, and they fail to estimate what a human reviewer would say about the translation quality (a quick scan of the text in question by a human would reveal the issues with the existing quality tests).

Simply put, translation quality scores generated by computer programs that predict what a human would say about the translation are just not good enough.

With more major corporations including Google, Amazon, Facebook, Bing, Systran, Baidu, and Yandex joining the game, producing an accurate quality score for NMT translations becomes a major problem that has a direct negative impact on the adoption of NMT systems.

There must be a better way!

We need a better way to evaluate NMT systems, i.e. something that replicates the original intention more closely and can mirror what a human would say about the translation.

The solution seems simple: instead of having some software try to predict what a human would say about the translation, why not just ask enough people to rate the quality of each translation? While this solution is simple, direct, and intuitive, doing it right and in a way that is statistically significant means running numerous evaluation projects at one time.

NMT systems are highly specialized, meaning that if a system has been trained using travel and tourism content, testing it with technical material will not produce the best results. Thus, each type of material has to be tested and scored separately. In addition, the rating must be done for every major language pair, since some NMT engines perform better in particular languages. Furthermore, to be statistically significant, at least 40 people need to rate each project per language, per type of material, per engine. Besides that, each project should have at least 30 strings.

Checking one language pair with one type of material translated with one engine is relatively straightforward: 40 reviewers each check and rate the same neural machine translation consisting of about 30 strings. This approach produces relatively solid (statistically significant) results, and repeating it over time also produces a trend, i.e. making it possible to find out whether or not the NMT system is getting better.

The key to doing this one isolated evaluation is selecting the right reviewers and making sure they do their job correctly. As one might expect, using freelancers for the task requires some solid quality control procedures to make sure the answers are not “fake” or “random.”

At that magnitude (one language, one type of material, one NMT engine, etc), the task is manageable, even when run manually. It becomes more difficult when an NMT vendor, user, or LSP wants to test 10 languages and 10 different types of material with 40 reviewers each. In this case, each test requires between 400 reviewers (1 NMT engine x 1 type of material x 10 language pairs x 40 reviewers) and 4,000 reviewers (1 NMT engine x 10 types of material x 10 language pairs x 40 reviewers).

Running a human based quality score is a major task, even for just one NMT vendor. It requires up to 4,000 reviewers working on thousands of projects.

This procedure is relevant for every NMT vendor who wants to know the real value of their system and obtain real human feedback for the translations it produces.

The main challenge is of course finding, testing, screening, training, and monitoring thousands of reviewers in various countries and languages — monitoring their work while they handle tens of thousands of projects in parallel.

The greater good – industry level quality score

Looking at the greater good,  what is really needed is a standardised NMT quality score for the industry to employ, measuring all of the various systems using the same benchmark, strings, and reviewers, in order to compare like for like performance. Since the performance of NMT systems can vary dramatically between different types of materials and languages, a real human-based comparison using the same group of linguists and the same source material is the only way to produce real comparative results. Such scores will be useful both for the individual NMT vendor or user and for the end customer or LSP trying to decide which engine to use.

To produce the same tests on an industry-relevant level is a larger undertaking. Using 10 NMT engines, 10 types of material, 10 language pairs and 40 reviewers, the parameters of the project can be outlined as follows:

  •      Assuming the top 10 language pairs are evaluated, ie EN > ES, FR, DE, PT-BR, AR, RU, CN, JP, IT and KR;
  •      10 types of material – general, legal, marketing, finance, gaming, software, medical, technical, scientific, and tourism;
  •      10 leading (web-based) engines – Google, Microsoft (Bing), Amazon, DeepL, Systran, Baidu, Promt, IBM Watson, Globalese and Yandex;
  •      40 reviewers rating each project;
  •      30 strings per test; and
  •      12 words on average per string

This comes to a total of 40,000 separate tests (10 language pairs x 10 types of material x 10 NMT engines x 40 reviewers), each with at least 30 strings, i.e. 1,200,000 strings of 12 words each, resulting in an evaluation of approximately 14.4 million words. This evaluation is needed to create just one instance (!) of a real, comparative, human-based NMT quality index.

The challenge is clear: to produce just one instance of a real viable and useful NMT score, 4,000 linguists need to evaluate 1,200,000 strings equating to well over 14 million words!

The magnitude of the project, the number of people involved and the requirement to recruit, train, and monitor all the reviewers, as well as making sure, in real time, that they are doing the job correctly, are obviously daunting tasks, even for large NMT players, and certainly for traditional translation agencies.

Completing the entire process within a reasonable time (e.g. less than one day), so that the results are “fresh” and relevant makes it even harder.

There are not many translation agencies with the capacity, technology, and operational capability to run a project of that magnitude on a regular basis.

This is where One Hour Translation (OHT) excels. They have recruited, trained, and tested thousands of linguists in over 50 languages, and already run well over 1,000,000 NMT rating and testing projects for our customers. By the end of April 2018, they published the first human-based NMT quality index (initially covering several engines and domains and later expanding), with the goal of promoting the use of NMT across the industry.

A word about the future

In the future, a better NMT quality index can be built using the same technology NMT is built on, i.e. deep-learning neural networks. Building a Neural Quality system is just like building a NMT system. The required ingredients are high quality translations, high volume, and quality rating / feedback.

With these ingredients, it is possible to build a deep-learning, neural network based quality control system that will read the translation and score it like a human does. Once the NMT systems are working smoothly and a reliable, human based, quality score/feedback developed, , the next step will be to create a neural quality score.

Once a neural quality score is available, it will be further possible to have engines improve each other, and create a self-learning and self-improving translation system by linking the neural quality score to the NMT  (obviously it does not make sense to have a closed loop system as it cannot improve without additional external data).

With additional external translation data, this system will “teach itself” and learn to improve without the need for human feedback.

Google has done it already. Its AI subsidiary, DeepMind, developed AlphaGo, a neural network computer program that beat the world’s (human) Go champion. AlphaGo is now improving, becoming better and better, by playing against itself again and again – no people involved.

Reference: https://bit.ly/2HDXbTf

AI Interpreter Fail at China Summit Sparks Debate about Future of Profession

AI Interpreter Fail at China Summit Sparks Debate about Future of Profession

Tencent’s AI powered translation engine, which was supposed to perform simultaneous transcribing and interpreting at China’s Boao Forum for Asia last week, faltered badly and became the brunt of jokes on social media. It even made headlines on the South China Morning Post, Hong Kong’s main English newspaper – which, incidentally, is owned by Tencent’s key rival Alibaba.

The Boao Forum, held in Hainan Province on April 8-11, 2018, is an annual nonprofit event that was started in 2001. Supported by the region’s governments, its purpose is to further progress and economic integration in Asia by bringing together leaders in politics, business and academia for high-end dialogs and networking.

Tencent is one of the tech giants of China, often dubbed the “B.A.T.” (for Baidu, Alibaba, Tencent; sometimes BATX if one includes Xiaomi). Its most well known products include the instant messenger WeChat as well as microblogging site Sina Weibo. Both are everyday apps used by just about all Chinese citizens as well as other ethnic Chinese around the world.

WeChat in China is pretty much an all-round, full service lifestyle mobile app in its local Chinese version. You could do just about anything in it these days – from buying train and movie tickets to making mutual fund investments to ordering groceries or an hourly maid from the neighbourhood.

In 2017, Tencent rolled out an AI powered translation engine called “Fanyijun”, which literally translates to “Mr. Translate”, since the Chinese character for “jun” is a polite, literary term for a male person.

What went Wrong?

Fanyijun is already in use powering the in-app translator in WeChat as well as available online as a free online service. However, it was supposed to have made a high-profile debut at the Boao Forum together with the Tencents “Zhiling” or literally translated, “Smart Listening” speech recognition engine, showcasing the company’s ability to do real-time transcription and interpreting. In retrospect, it seems the publicity effort has backfired on Tencent.

To be sure, human interpreters were still on hand to do the bulk of the interpreting work during the forum. However, Tencent used its AI engine to power the live translation and broadcast of some of the side conferences to screens next to the stage and for followers of the event within WeChat.

This resulted in many users making screenshots of the embarrassing errors made when the engine frequently went haywire and generated certain words needlessly and repeatedly, as well as getting confused when some speakers spoke in an unstructured manner or used certain terminology wrongly.

Chinese media cited a Tencent’s spokesperson who admitted that their system “did make errors” and “answered a few questions wrongly”. But he also said in their defense that the Boao Forum was a high-level, multi-faceted, multi-speaker, multi-lingual, discussion based event. That and the fact that the environment was sometimes filled with echo and noise, added to the challenges their system faced.

“They still need humans…”

The gloating hit a crescendo when someone circulated this screenshot from a WeChat group composed of freelance interpreters. It was an urgent request for English simultaneous interpreters to do a live webcast later that day for the Boao Forum.

One group member replied, “They still need humans…” Another said, “Don’t they have an interpreter device?” A third sarcastically added, “Where’s the AI?”

Tencent later clarified that this request was meant for engaging interpreters for their professional news team doing live reporting in Beijing, and not for the simultaneous interpreting team located onsite at the Boao Forum.

Tencent reportedly beat other heavyweight contenders such as Sogou and iFlytek to secure this prestigious demo opportunity at the Boao Forum after a 3-month long process. Sogou is the 2nd largest search engine in China, which also provides a free online translator, built in part through leveraging its investment in China startup UTH International, which provides translation data and NMT engines. iFlytek is a listed natural language processing (NLP) company worth about USD 13 billion in market capitalization. Its speech recognition software is reportedly used daily by half a billion Chinese users and it also sells a popular pocket translation device targeted at Chinese tourists going abroad.

But given what went down at the Boao Forum for “Mr. Translator”, Tencent’s competitors are probably seeing their ‘loss’ as a gain now. The social media gloating aside, this incident has sparked off an active online debate on the ‘what and when’ of AI replacing human jobs.

One netizen said on Sina Weibo, “A lot of people who casually say that AI can replace this or that job, are those who do not really understand or know what those jobs entail; translation included.”

However, Sogou news quoted a veteran interpreter who often accompanied government leaders on overseas visits. She said, “As an interpreter for 20 years, I believe AI will replace human translators sooner or later, at least in most day to day translation and the majority of conference interpreting. The former probably in 3-5 years, the latter in 10 years.”

She added that her opinions were informed by the fact that she frequently did translation work for IT companies. As such she was well aware of the speed at which AI and processor chips were advancing at, and hence did not encourage young people to view translation and interpreting as a lifelong career, which she considers to be a sunset industry.

Reference: https://bit.ly/2qGLhxu

SDL and TAUS Integration Offers Brands Industry Benchmarking Framework

SDL and TAUS Integration Offers Brands Industry Benchmarking Framework

SDL, a leader in global content management, translation and digital experience, today announced an integration between SDL Translation Management System (TMS), and the TAUS Dynamic Quality Framework (DQF), a comprehensive set of tools that help brands benchmark the quality, productivity and efficiency of translation projects against industry standards.

The SDL TMS integration with TAUS DQF enables everyone involved in the translation supply chain – from translators, reviewers and managers – to improve the performance of their translation projects by learning from peers and implementing industry best-practice. Teams can also use TAUS’ dynamic tools and models to assess and compare the quality of their translations output – both human and machine – with the industry’s average for errors, fluency and post-editing productivity.

This enables brands to maintain quality – at extreme scale – and eliminate inefficiencies in the way content is created, managed, translated, and delivered to global audiences.

“One marketing campaign alone could involve translating over 50 pieces of content – and that’s just in one language. Imagine the complexity involved in translating content into over a dozen languages?” said Jim Saunders, Chief Product Officer, SDL. “Brands need a robust way to ensure quality when dealing with such high volumes of content. Our ongoing integrations with TAUS DQF tackle this challenge by fostering a knowledge-filled environment that creates improved ways to deliver and translate content.”

“Translating large volumes of content quickly can present enormous quality issues, and businesses are increasingly looking to learn from peers – and implement best-practices that challenge traditional approaches,” said TAUS Director, Jaap van der Meer. “Our development teams have worked closely with SDL to develop an integration that encourages companies not just to maintain high standards, but innovate and grow their business.”

The TAUS DQF offers a comprehensive set of tools, best practices, metrics, reports and data to help the industry set benchmarking standards. Its Quality Dashboard is available as an industry-shared platform, where evaluation and productivity data is presented in a flexible reporting environment. SDL TMS, now integrated within the TAUS DQF, is used by many Fortune 500 companies across most industries.

SDL already provides TAUS-ready packages for enterprise with our other language solutions. Customers of SDL WorldServer benefit from a connector to the TAUS DQF platform, enabling project managers to add and track a project’s productivity on the TAUS Quality Dashboard. Users can access both SDL WorldServer and SDL TMS through their SDL Trados Studio desktop, making it easy to share projects with the TAUS platform.

All SDL’s integrations with TAUS are designed to help centralize and manage a brand’s translation operations, resulting in lower translation costs, higher-quality translations and more efficient translation processes.

Reference: https://bit.ly/2EslqhA

What happened at the TAUS Asia Conference 2018?

What happened at the TAUS Asia Conference 2018?

On 22-23 March, 2018, part of the TAUS team was in Beijing for the TAUS Asia Conference. It was the sixth time that TAUS came to China, but we quickly realized that it should actually be an annual event on our calendar. This was the first TAUS conference ever hosted by a university, namely the Beijing Language and Culture University (BLCU).

 BLCU was established in 1962 and is located in the Haidian District in Beijing. They have bachelor and master programs in 8 languages, but also teach computer science and technology and digital media as well as as a translation and interpretation major. In 2011, the university set up the first localization department in China. A tour through the classrooms impressed us all: high tech equipment identical to that found in the European Parliament is used by the students to practice their human interpretation skills.

Being at this prestigious university in the “Hall of Future Global Translation Talents”, was a perfect fit for TAUS and our plan to have, for the very first timelive automatic interpretation technology (using Microsoft Translator) running throughout the program, with the help of Mark Seligman from Spoken Translation. We are finding ourselves now at a crossroads with the rapid revolution of Neural MT and Artificial Intelligence and realize what a huge impact technology will have on the future of the translator profession. In addition to the live automatic interpretation, four students of BLCU provided live interpretation from the professional interpretation booths and via devices handed out to the attendees at the university. They confessed to being a bit nervous when they realized they were ‘competing’ with the live automatic interpretation from Microsoft.

At the end of the conference we invited the four interpreters and the automatic interpretation leader Mark Seligman on stage to evaluate the different interpretation methods and how they competed or interacted with each other. Before the conference, one of the students, Zhu Qiankun, noted the news from Microsoft that their translation quality is at human parity when compared to professional human translations and also find that it significantly exceeds the quality of crowd-sourced non-professional translations. He found that this declaration is somewhat unreasonable and irrational and wrote an essay about it still before he knew he was going to be functioning as a human interpreter at the TAUS Asia Conference. After teaming up with the machine to interpret the presentations at the conference he wrote another essay with his findings and although his overall view on machines taking over the human jobs did not change he also found that there were some advantages of human and machine working together, namely seeing the live translation transcription projected as a large image on the screen helped them interpret faster and more accurately. He also noted that numbers, names and dates are translated better by the machine than by the human interpreter. The live automatic interpretation phenomenon will be repeated at the upcoming TAUS Executive Forum in Tokyo (on 16 and 17 May).

The conference kicked off with a keynote address from Francis Tsang, President of China at LinkedIn. Francis provided deep insights into the Chinese market, with lots of facts and figures about the workforce and trends. It was a perfect start to two-days of brainstorming and knowledge sharing at this prestigious venue. This was followed by a CEO conversation, starring Marcus Casal from Lionbridge, Henry He from TransN and Adolfo Hernandez from SDL. TransN is the largest translation service provider in China and Henry He provided some great insights into the Chinese market. We quickly figured out that some of the recent trends in the western part of the world, such as blockchain technology, are also very much on the minds of people in China. Francis’s and Henry’s speeches confirmed many of the things that TAUS had predicted in the Nunc Est Tempus eBook that came out last December (see chapter: China’s Turn).

Over the next 48 hours we saw innovative presentations from many Chinese companies as well as western companies. For example: Alibaba presented their work with AI and cross border e-commerce, Niutrans showed their latest developments in MT technology, TalkingChina showcased their advances in boutique translation, and Johnson and Johnson gave a crash course on the challenges of pharmaceutical translation with a focus on China. In the Game Changers Innovation Contest we saw nine innovative technologies, ideas or perspectives. The most original idea came from Tianqi Zhang at the Universitat Autonoma de Barcelona, who showed how machine translation can advance the way football is reported all over the world. It’s no surprise that her innovative and unique research was voted the winner of the Game Changers Innovation Contest Beijing 2018.

With over 160 attendees, the TAUS Asia Conference Beijing 2018 was our biggest conference to date. As always, the participants comprised a good balance between buyers and providers. And since we were at the university, we also had some great representatives of the academic world.

The last session of the conference was focused on talents – bringing together the academic and the business world. Alex Han (professor at BLCU and TAUS representative) gave a presentation about what he thought would be ‘the future translator’. Skills and requirements of translators are changing, and Alex is taking the lead in adapting the study programs to meeting these changing needs. Frans de Laet, a guest professor at BLCU, presented his ideas about the humanization of machines in relation to translator jobs. You’ll see a thorough report of this session as well as the others in the upcoming Keynotes eBook coming out in April.

I think it’s safe to say that the TAUS Asia Conference 2018 was a great success. Lots of new perspectives and ideas were shared and brainstormed among the speakers and attendees, and new connections were made and social networks enriched. We are looking forward to coming back to China again soon!

Reference: https://bit.ly/2qdWZzq

Machine Translation and Compliance

Machine Translation and Compliance

 

Compliance management is no simple task in today’s world. The sheer volume of data involved is intimidating enough. But when that data is in multiple languages, you have an additional layer of complexity to manage as well as another significant expense to budget for.

Machine translation is no replacement for expert human translators. But it can help solve some of the compliance problems multicultural organizations face.

Internal Compliance Monitoring

Ideally, organizations should aspire to catch (and end) compliance issues as early as possible. Firing employees is an expense in and of itself, and if you address these issues quickly you can often solve the problem with education rather than termination. Meanwhile, whether the behavior in question is illegal, unethical or just plain risky, the sooner you put a stop to it, the less likely you are to get stuck with expensive fines.

Is your organization monitoring employee communication to identify concerning behavior? Machine translation makes it possible to understand, analyze, and review large amounts of archived data in foreign languages, so you can stop problems before they start.

eDiscovery Compliance

Businesses today generate vast amounts of electronic documents and communications. That makes eDiscovery like looking for needles in a haystack, sifting through tons of irrelevant information to find materials that are relevant to the case. And of course, there are penalties for not identifying and producing all of the necessary documents in a timely manner.

The most workable solution is appropriately-deployed machine translation followed by review and post editing from human experts, when required. Machine translation is not a substitute for human translators. That said, in large cross-border cases, machine translation can be used to produce documents for opposing counsel, and then human translators can translate only those documents that seem relevant. Machine translation can also help your team identify and classify large numbers of documents for review.

Data security

Using machine translation when applicable can also improve data security, as long the platform used is secure. (Note: That means free platforms are strictly off limits!) No matter how careful your employees are, each person who accesses a document creates a new security risk. Machine translation can reduce the number of people who need that access to reduce security vulnerabilities.

Machine Translation and Compliance Budgets

As the cost of compliance goes up, so does the pressure for businesses to make their compliance procedures more efficient. Machine translation can help optimize your compliance budget by only using human translators when necessary.

When Machine Translation is a Compliance Nightmare

When wielded wisely, machine translation can be a powerful weapon in your compliance arsenal. But it can also be risky. For instance, if individuals in your organization rely on free online translation services, your data security could be at risk.

Last year, employees at Norway’s Statoil discovered that sensitive data translated using Translate.com’s free MT tool was available to the public via a simple Google search.

Though the quality of machine translation has improved by leaps and bounds during the past few years, it’s still not a substitute for human translators when clear and accurate translations are required. If inaccuracies make your translations misleading or incomprehensible, that’s a compliance risk, too.

 

Reference: https://goo.gl/krFhns

How to Cut Localization Costs with Translation Technology

How to Cut Localization Costs with Translation Technology

What is translation technology?

Translation technologies are sets of software tools designed to process translation materials and help linguists in their everyday tasks. They are divided in three main subcategories:

Machine Translation (MT)

Translation tasks are performed by machines (computers) either on the basis of statistical models (MT engines execute translation tasks on the basis of accumulated translated materials) or neural models (MT engines are based on artificial intelligence). The computer-translated output is edited by professional human linguists through the process of postediting that may be more or less demanding depending on language combinations and the complexity of materials, as well as the volume of content.

Computer-Aided Translation (CAT)

Computer-aided or computer-assisted translation is performed by professional human translators who use specific CAT or productivity software tools to optimize their process and increase their output.

Providing a perfect combination of technological advantages and human expertise, CAT software packages are the staple tools of the language industry. CAT tools are essentially advanced text editors that break the source content into segments, and split the screen into source and target fields which in and of itself makes the translator’s job easier. However, they also include an array of advanced features that enable the optimization of the translation/localization process, enhance the quality of output and save time and resources. For this reason, they are also called productivity tools.

Figure 1 – CAT software in use

The most important features of productivity tools include:

  • Translation Asset Management
  • Advanced grammar and spell checkers
  • Advanced source and target text search
  • Concordance search.

Standard CAT tools include Across Language ServerSDL Trados StudioSDL GroupShare, SDL PassolomemoQMemsource CloudWordfastTranslation Workspace and others, and they come both in forms of installed software and cloud solutions.

Quality Assurance (QA)

Quality assurance tools are used for various quality control checks during and after the translation/localization process. These tools use sophisticated algorithms to check spelling, consistency, general and project-specific style, code and layout integrity and more.

All productivity tools have built-in QA features, but there are also dedicated quality assurance tools such as Xbench and Verifika QA.

What is a translation asset?

We all know that information has value and the same holds true for translated information. This is why previously translated/localized and edited textual elements in a specific language pair are regarded as translation assets in the language industry – once translated/localized and approved, textual elements do not need to be translated again and no additional resources are spent. These elements that are created, managed and used with productivity tools include:

Translation Memories (TM)

Translation memories are segmented databases containing previously translated elements in a specific language pair that can be reused and recycled in further projects. Productivity software calculates the percentage of similarity between the new content for translation/localization and the existing segments that were previously translated, edited and proofread, and the linguist team is able to access this information, use it and adapt it where necessary. This percentage has a direct impact on costs associated with a translation/localization project and the time required for project completion, as the matching segments cost less and require less time for processing.

Figure 2 – Translation memory in use (aligned sample from English to German)

Translation memories are usually developed during the initial stages of a translation/localization project and they grow over time, progressively cutting localization costs and reducing the time required for project completion. However, translation memories require regular maintenance, i.e. cleaning for this very reason, as the original content may change and new terminology may be adopted.

In case when an approved translation of a document exists, but it was performed without productivity tools, translation memories can be produced through the process of alignment:

Figure 3 – Document alignment example

Source and target documents are broken into segments that are subsequently matched to produce a TM file that can be used for a project.

Termbases (TB)

Termbases or terminology bases (TB) are databases containing translations of specific terms in a specific language pair that provide assistance to the linguist team and assure lexical consistency throughout projects.

Termbases can be developed before the project, when specific terminology translations have been confirmed by all stakeholders (client, content producer, linguist), or during the project, as the terms are defined. They are particularly useful in the localization of medical devices, technical materials and software.

Glossaries

Unlike termbases, glossaries are monolingual documents explaining specific terminology in either source or target language. They provide further context to linguists and can be used for the development of terminology bases.

Benefits of Translation Technology

The primary purpose of all translation technology is the optimization and unification of the translation/localization process, as well as providing the technological infrastructure that facilitates work and full utilization of the expertise of professional human translators.

As we have already seen, translation memories, once developed, provide immediate price reduction (that varies depending on the source materials and the amount of matching segments, but may run up to 20% in the initial stages and it may only grow over time), but the long-term, more subtle benefits of the smart integration of translation technology are the ones that really make a difference and they include:

Human Knowledge with Digital Infrastructure

While it has a limited application, machine translation still does not yield satisfactory results that can be used for commercial purposes. All machine translations need to be postedited by professional linguists and this process is known to take more time and resources instead of less.

On the other hand, translation performed in productivity tools is performed by people, translation assets are checked and approved by people, specific terminology is developed in collaboration with the client, content producers, marketing managers, subject-field experts and all other stakeholders, eventually providing a perfect combination of human expertise, feel and creativity, and technological solutions.

Time Saving

Professional human linguists are able to produce more in less time. Productivity software, TMs, TBs and glossaries all reduce the valuable hours of research and translation, and enable linguists to perform their tasks in a timely manner, with technological infrastructure acting as a stylistic and lexical guide.

This eventually enables the timely release of a localized product/service, with all the necessary quality checks performed.

Consistent Quality Control

The use of translation technology itself represents real-time quality control, as linguists rely on previously proofread and quality-checked elements, and maintain the established style, terminology and quality used in previous translations.

Brand Message Consistency

Translation assets enable the consistent use of a particular tonestyle and intent of the brand in all translation/localization projects. This means that the specific features of a corporate message for a particular market/target group will remain intact even if the linguist team changes on future projects.

Code / Layout Integrity Preservation

Translation technology enables the preservation of features of the original content across translated/localized versions, regardless of whether the materials are intended for printing or online publishing.

Different solutions are developed for different purposes. For example, advanced cloud-based solutions for the localization of WordPress-powered websites enable full preservation of codes and other technical elements, save a lot of time and effort in advance and optimize complex multilingual localization projects.

Wrap-up

In a larger scheme of things, all these benefits eventually spell long-term cost/time savings and a leaner translation/localization process due to their preventive functions that, in addition to direct price reduction, provide consistencyquality control and preservation of the integrity of source materials.

Reference: https://goo.gl/r5kmCJ