Author: Admin

NEURAL MACHINE TRANSLATION: THE RISING STAR

NEURAL MACHINE TRANSLATION: THE RISING STAR

These days, language industry professionals simply can’t escape hearing about neural machine translation (NMT). However, there still isn’t enough information about the practical facts of NMT for translation buyers, language service providers, and translators. People often ask: is NMT intended for me? How will it change my life?

A Short History and Comparison

At the beginning of time – around the 1970s – the story began with rule-based machine translation (RBMT) solutions. The idea was to create grammatical rule sets for source and target languages, where machine translation is a kind of conversion process between the languages based on these rule sets. This concept works well with generic content, but adding new content, new language pairs, and maintaining the rule set is very time-consuming and expensive.

This problem was solved with statistical machine translation (SMT) around the late ‘80s and early ‘90s. SMT systems create statistical models by analyzing aligned source-target language data (training set) and use them to generate the translation. The advantage of SMT is the automatic learning process and the relatively easy adaptation by simply changing or extending the training set. The limitation of SMT is the training set itself: to create a usable engine, a large database of source-target segments is required. Additionally, SMT is not language independent in the sense that it is highly sensitive to the language combination and has a very hard time dealing with grammatically rich languages.

This is where neural machine translation (NMT) begins to shine: it can look at the sentence as a whole and can create associations between the phrases over an even longer distance within the sentence. The result is a convincing fluency and an improved grammatical correctness compared to SMT.

Statistical MT vs Neural MT

Both SMT and NMT are working on a statistical base and are using source-target language segment pairs as a basis. What’s the difference? What we typically call SMT is actually Phrase Based Statistical Machine Translation (PBSMT), meaning SMT is splitting the source segments into phrases. During the training process, SMT creates a translation model and a language model. The translation model stores the different translations of the phrases and the language model stores the probability of the sequence of phrases on the target side. During the translation phase, the decoder chooses the translation that gives the best result based on these two models. On a phrase or expression level, SMT (or PBSMT) is performing well, but language fluency and grammar is not good.

‘Buch’ is aligned with ‘book’ twice and only once with ‘the’ and ‘a’ – the winner is the ‘Buch’-’book’ combination

Neural Machine Translation, on the other hand, is using neural network-based, deep, machine learning technology. Words or even word chunks are transformed into “word vectors”. This means that ‘dog’ is not only representing the characters d, o and g, but it can contain contextual information from the training data. During the training phase, the NMT system tries to set the parameter weights of the neural network based on the reference values (source-target translation). Words appearing in similar context will get similar word vectors. The result is a neural network which can process source segments and transfer them into target segments. During translation, NMT is looking for a complete sentence, not just chunks (phrases). Thanks to the neural approach, it is not translating words, it’s transferring information and context. This is why fluency is much better than in SMT, but terminology accuracy is sometimes not perfect.

Similar words are closer to each other in a vector space

The Hardware

A popular GPU: NVIDIA Tesla

One big difference between SMT and NMT systems is that NMT requires Graphics Processing Units (GPUs), which were originally designed to help computers process graphics. These GPUs can calculate astonishingly fast – the latest cards have about 3,500 cores which can process data simultaneously. In fact, there is a small ongoing hardware revolution and GPU-based computers are the foundation for almost all deep learning and machine learning solutions. One of the great perks of this revolution is that nowadays, NMT is not only available for large enterprises, but also for small and medium-sized companies as well.

The Software

The main element, or ‘kernel’, of any NMT solution is the so-called NMT toolkit. There are a couple of NMT toolkits available, such as Nematus or openNMT, but the landscape is changing fast and more companies and universities are now developing their own toolkits. Since many of these toolkits are open-source solutions and hardware resources have become more affordable, the industry is experiencing an accelerating speed in toolkit R&D and NMT-related solutions.

On the other hand, as important as toolkits are, they are only one small part of a complex system, which contains frontend, backend, pre-processing and post-processing elements, parsers, filters, converters, and so on. These are all factors for anyone to consider before jumping into the development of an individual system. However, it is worth noting that the success of MT is highly community-driven and would not be where it is today without the open source community.

Corpora

A famous bilingual corpus: the Rosetta Stone

And here comes one of the most curious questions: what are the requirements of creating a well-performing NMT engine? Are there different rules compared to SMT systems? There are so many misunderstandings floating around on this topic that I think it’s a perfect opportunity to go into the details a little bit.

The main rules are nearly the same both for SMT and NMT systems. The differences are mainly that an NMT system is less sensitive and performs better in the same circumstances. As I have explained in an earlier blog post about SMT engine quality, the quality of an engine should always be measured in relation to the particular translation project for which you would like to use it.

These are the factors which will eventually influence the performance of an NMT engine:

Volume

Regardless of you may have heard, volume is still very important for NMT engines just like in the SMT world. There is no explicit rule on entry volumes but what we can safely say is that the bare minimum is about 100,000 segment pairs. There are Globalese users who are successfully using engines created based on 150,000 segments, but to be honest, this is more of an exception and requires special circumstances (like the right language combination, see below). The optimum volume starts around 500,000 segment pairs (2 million words).

Quality

The quality of the training set plays an important role (garbage in, garbage out). Don’t add unqualified content to your engine just to increase the overall size of the training set.

Relevance

Applying the right engine to the right project is the first key to success. An engine trained on automotive content will perform well on car manual translation but will give back disappointing results when you try to use it for web content for the food industry.

This raises the question of whether the content (TMs) should be mixed. If you have enough domain-specific content you shouldn’t necessarily add more out-of-domain data to your engine, but if you have an insufficient volume of domain-specific data then adding generic content (e.g. from public sources) may help improve the quality. We always encourage our Globalese users to try different engine combinations with different training sets.

Content type

Content generated by possible non-native speaking users on a chat forum or marketing material requiring transcreation is always a challenge to any MT system. On the other hand, technical documentation with controlled language is a very good candidate for NMT.

Language combination

Unfortunately, language combination still has an impact on quality. The good news is that NMT has now opened up the option of using machine translation for languages like Japanese, Turkish, or Hungarian –  languages which had nearly been excluded from the machine translation club because of poor results provided by SMT. NMT has also helped solve the problem of long distance dependencies for German and the translation output is much smoother for almost all languages. But English combined with Latin languages still provides better results than, for example, English combined with Russian when using similar volumes and training set quality.

Expectations for the future

Neural Machine Translation is a big step ahead in quality, but it still isn’t magic. Nobody should expect that NMT will replace human translators anytime soon. What you CAN expect is that NMT can be a powerful productivity tool in the translation process and open new service options both for translation buyers and language service providers (see post-editing experience).

Training and Translation Time

When we started developing Globalese NMT, one of the most surprising experiences for us was that the training time was far shorter than we had previously anticipated. This is due to the amazingly fast evolution of hardware and software. With Globalese, we currently have an average training time of 50,000 segments per hour – this means that an average engine with 1 million segments can be trained within one day. The situation is even better when looking at translation times: with Globalese, we currently have an average translation time between 100 and 400 segments per minute, depending on the corpus size, segment length in the translation and training content.

Neural MT Post-editing Experience

One of the great changes neural machine translation brings along is that the overall language quality is much better when compared to the SMT world. This does not mean that the translation is always perfect. As stated by one of our testers: if it is right, then it is astonishingly good quality. The ratio of good and poor translation naturally varies depending on the engine, but good engines can provide about 50% (or even higher) of really good translation target text.

Here are some examples showcasing what NMT post-editors can expect:

DE original:

Der Rechnungsführer sorgt für die gebotenen technischen Vorkehrungen zur wirksamen Anwendung des FWS und für dessen Überwachung.

Reference human translation:

The accounting officer shall ensure appropriate technical arrangements for aneffective functioning of the EWS and its monitoring.

Globalese NMT:

The accounting officer shall ensure the necessary technical arrangements for theeffective use of the EWS and for its monitoring.

As you can see, the output is fluent, and the differences are just preferential ones, more or less. This is highlighting another issue: automated quality metrics like BLEU score are not really sufficient to measure the quality. The example above is only a 50% match in the BLEU score, but if we look at the quality, the rating should be much higher.

Let’s look another example:

EN original

The concept of production costs must be understood as being net of any aid but inclusive of a normal level of profit.

Reference human translation:

Die Produktionskosten verstehen sich ohne Beihilfe, aber einschließlich eines normalen Gewinns.

Globalese NMT:

Der Begriff der Produktionskosten bezieht sich auf die Höhe der Beihilfe, aber einschließlich eines normalen Gewinns.

What is interesting here that the first part of the sentence sounds good, but if you look at the content, the translation is not good. This is an example of a fluent output with a bad translation. This is a typical case in the NMT world and it emphasizes the point that post-editors must examine NMT output differently than they did for SMT – in SMT, bad grammar was a clear indicator that the translation must be post-edited.

Post-editors who used to proof and correct SMT output have to change the way they are working and have to be more careful with proofreading, even if the NMT output looks alright at first glance. Also, services related to light post-editing will change – instead of correcting serious grammatical errors without checking the correctness of translation in order to create some readable content, the focus will shift to sorting out serious mistranslations. The funny thing is that one of the main problems in the SMT world was weak fluency and grammar, and now we have good fluency and grammar as an issue in the NMT world…

And finally:

DE original:

Aufgrund des rechtlichen Status der Beteiligten ist ein solcher Vorgang mit einer Beauftragung des liefernden Standorts und einer Berechnung der erbrachten Leistung verbunden.

Reference human translation:

The legal status of the companies involved in these activities means that this process is closely connected with placing orders at the location that is to supply the goods/services and calculating which goods/services they supply.

Globalese NMT:

Due to the legal status of the person, it may lead to this process at the site of the plant, and also a calculation of the completed technician.

This example shows that unfortunately, NMT can produce bad translations too. As I mentioned before, the ratio of good and bad NMT output you will face in a project always depends on the circumstances. Another weak point of NMT is that it currently cannot handle the terminology directly and it acts as a kind of “black box” with no option to directly influence the results.

Reference: https://bit.ly/2hBGsVh

How machine learning can be used to break down language barriers

How machine learning can be used to break down language barriers

Machine learning has transformed major aspects of the modern world with great success. Self-driving cars, intelligent virtual assistants on smartphones, and cybersecurity automation are all examples of how far the technology has come.

But of all the applications of machine learning, few have the potential to so radically shape our economy as language translation. The content of language translation is the perfect model for machine learning to tackle. Language operates on a set of predictable rules, but with a degree of variation that makes it difficult for humans to interpret. Machine learning, on the other hand, can leverage repetition, pattern recognition, and vast databases to translate faster than humans can.

There are other compelling reasons that indicate language will be one of the most important applications of machine learning. To begin with, there are over 6,500 spoken languages in the world, and many of the more obscure ones are spoken by poorer demographics who are frequently isolated from the global economy. Removing language barriers through technology connects more communities to global marketplaces. More people speak Mandarin Chinese than any other language in the world, making China’s growing middle class is a prime market for U.S. companies if they can overcome the language barrier.

Let’s take a look at how machine learning is currently being applied to the language barrier problem, and how it might develop in the future.

Neural machine translation

Recently, language translation took an enormous leap forward with the emergence of a new machine translation technology called Neural Machine Translation (NMT). The emphasis should be on the “neural” component because the inner workings of the technology really do mimic the human mind. The architects behind NMT will tell you that they frequently struggle to understand how it comes to certain translations because of how quickly and accurately it delivers them.

“NMT can do what other machine translation methods have not done before – it achieves translation of entire sentences without losing meaning,” says Denis A. Gachot, CEO of SYSTRAN, a language translation technologies company. “This technology is of a caliber that deserves the attention of everyone in the field. It can translate at near-human levels of accuracy and can translate massive volumes of information exponentially faster than we can operate.”

The comparison to human translators is not a stretch anymore. Unlike the days of garbled Google Translate results, which continue to feed late night comedy sketches, NMT is producing results that rival those of humans. In fact, Systran’s Pure Neural Machine Translation product was preferred over human translators 41% of the time in one test.

Martin Volk, a professor at the Institute of Computational Linguistics at the University of Zurich, had this to say about neural machine translation in a 2017 Slator article:

“I think that as computing power inevitably increases, and neural learning mechanisms improve, machine translation quality will gradually approach the quality of a professional human translator over the coming two decades. There will be a point where in commercial translation there will no longer be a need for a professional human translator.”

Gisting to fluency

One telling metric to watch is gisting vs. fluency. Are the translations being produced communicating the gist of an idea, or fluently communicating details?

Previous iterations of language translation technology only achieved the level of gisting. These translations required extensive human support to be usable. NMT successfully pushes beyond gisting and communicates fluently. Now, with little to no human support, usable translations can be processed at the same level of quality as those produced by humans. Sometimes, the NMT translations are even superior.

Quality and accuracy are the main priorities of any translation effort. Any basic translation software can quickly spit out its best rendition of a body of text. To parse information correctly and deliver a fluent translation requires a whole different set of competencies. Volk also said, “Speed is not the key. We want to drill down on how information from sentences preceding and following the one being translated can be used to improve the translation.”

This opens up enormous possibilities for global commerce. Massive volumes of information traverse the globe every second, and quite a bit of that data needs to be translated into two or more languages. That is why successfully automating translation is so critical. Tasks like e-discovery, compliance, or any other business processes that rely on document accuracy can be accelerated exponentially with NMT.

Education, e-commerce, travel, diplomacy, and even international security work can be radically changed by the ability to communicate in your native language with people from around the globe.

Post language economy

Everywhere you look, language barriers are a speed check on global commerce. Whether that commerce involves government agencies approving business applications, customs checkpoints, massive document sharing, or e-commerce, fast and effective translation are essential.

If we look at language strictly as a means of sharing ideas and coordinating, it is somewhat inefficient. It is linear and has a lot of rules that make it difficult to use. Meaning can be obfuscated easily, and not everyone is equally proficient at using it. But the biggest drawback to language is simply that not everyone speaks the same one.

NMT has the potential to reduce and eventually eradicate that problem.

“You can think of NMT as part of your international go-to-market strategy,” writes Gachot. “In theory, the Internet erased geographical barriers and allowed players of all sizes from all places to compete in what we often call a ‘global economy,’ But we’re not all global competitors because not all of us can communicate in the 26 languages that have 50 million or more speakers. NMT removes language barriers, enabling new and existing players to be global communicators, and thus real global competitors. We’re living in the post-internet economy, and we’re stepping into the post-language economy.”

Machine learning has made substantial progress but has not yet cracked the code on language. It does have its shortcomings, namely when it faces slang, idioms, obscure dialects of prominent languages and creative or colorful writing. It shines, however, in the world of business, where jargon is defined and intentional. That in itself is a significant leap forward.

Reference: https://bit.ly/2Fwhuku

GDPR. Understanding the Translation Journey

GDPR. Understanding the Translation Journey

“We only translate content into the languages of the EU, so we are covered with regards GDPR clauses relating to international transfers.”

Right? Wrong.

The GDPR imposes restrictions on the transfer of personal data outside the European Union (EU), to third-party countries or international organizations. While there are provisions that refer to your ability to do this with the appropriate safeguards in place, how confident are you that you’re not jeopardising GDPR-compliance with outdated translation processes?

Let’s consider the following:

  1. 85% of companies cannot identify whether they send personal information externally as part of their translation process.
  2. The translation process is complex – it isn’t a simple case of sending content from you to your translator. Translating one document alone into 10 languages involves 150 data exchanges (or ‘file handoffs’). Multiply this by dozens of documents and you have a complex task of co-ordinating thousands of highly-sensitive documents – some which may contain personal data.

With different file versions, translators, editors, complex graphics, subject matter experts and in country reviewers the truth is that content is flying back and forth around the world faster than we can imagine. Designed with speed of delivery and time to market in mind these workflows overlook the fact that partners might not share the same compliance credentials.

Where exactly is my data?

Given that we know email is not secure – let us think about what happens when you use a translation portal or an enterprise translation management system.

Once you’ve transferred the content for translation, the translation agency or provider downloads and processes that data on its premises before allocating the work to linguists and other teams (let’s hope these are in the EU and they are GDPR compliant).

Alternatively, the software you have used to share your content may process the data to come up with your Translation Memory leverage and spend – in which case better check your End User Licence Agreement to ensure you know where that processing (and backup) takes place.

After that has happened the content is distributed to the translators to work on. Even if all the languages you translate into are in the EU – are you SURE that your translators are physically located there too?

And what about your translation agency’s project management team? How exactly do they handle files that require Desktop Publishing or file engineering? Are these teams located onshore in the EU or offshore to control costs? If the latter what systems are they using, and how can you ensure no copies of your files are sitting in servers outside of your control?

These are just some of the questions you should be asking now to fully understand where your translation data is located.

What can I do?

If you haven’t already – now is the time to open a conversation with your partner about your data protection needs and what they are doing as a business to ensure compliance. They should be able to tell you exactly which borders your data crosses during the translation process, where it’s stored and what they’re doing to help with Translation Memory management. They should also provide you with a controlled environment that you can use across the entire translation supply chain, so that no data ever leaves the system.

Of course, there are many considerations to take into account when it comes to GDPR. But looking at the complexity of translating large volumes of content – are you still confident that your translation processes are secure?

Reference: https://bit.ly/2vmKKX5

Europe’s New Privacy Regulation GDPR Is Changing How LSPs Handle Content

Europe’s New Privacy Regulation GDPR Is Changing How LSPs Handle Content

GDPR, the General Data Protection Regulation, is soon to be introduced across Europe, and is prompting language service providers (LSPs) to update policies and practices relating to their handling of all types of personal data.

The GDPR comes into effect on 25 May 2018 and supersedes the existing Data Protection Directive of 1995. It introduces some more stringent requirements on how the personal data of EU citizens are treated.

Specifically, LSPs must demonstrate that they are compliant in the way that they handle any type of personal data that at some point flows through their business. Personal data means any information by which a person can be identified, such as a name, location, photo, email address, bank details…the list goes on.

Therefore, LSPs need to ensure that all data, from employee records and supplier agreements to client contact information and content for translation, are handled appropriately.

What personal data do LSPs handle?

Aside from the actual content for translation, an LSP is likely to possess a vast array of personal data including Sales and Marketing data (prospective client details, mailing lists, etc.), existing client data (customer names, emails, POs, etc.), HR and Recruitment data (candidate and employee data including CVs, appraisals, addresses, etc.) and Supplier (freelance) data (bank details, contact details, performance data, CVs, etc.).

In this respect, the challenges that LSPs will face are not significantly different from most other service businesses, and there are lots of resources that outline the requirements and responsibilities for complying with GDPR. For example, the Europa website details some key points, and ICO (for the UK) has a self-assessment readiness toolkit for businesses.

What about content for translation?

Content that a client sends you for translation also may contain personal information. Some of these documents are easy enough to identify by their nature (such as birth, death, marriage certificates, HR records, and medical records), but personal data might be also considered to extend to the case where you receive an internal communication from a customer that includes a quote from the company CEO, for example.

Short-term challenges

It is important to be able to interpret what the GDPR means for LSPs generally, and for your business specifically. The impact of the regulation will become clearer over time, but it throws up some potentially crucial questions in the immediate, such as:

  • What the risks are for LSPs who continue to store personal data within translation memories and machine translation engines;
  • What the implications are for sharing personal data with suppliers outside of the EU / EEA, and specifically in countries deemed to be inadequate with respect to GDPR obligations (even a mid-sized LSP would work with hundreds of freelancers outside the EU);
  • How binding corporate rules can be applied to LSPs with a global presence;
  • Whether obliging suppliers to work in an online environment could help LSPs to comply with certain GDPR obligations

Longer-term considerations

While the GDPR presents a challenge to LSPs in the short-term, it may also impact on the longer-term purchasing habits within the industry.

For example, if LSPs are penalized for sharing personal data with freelancers located within inadequate countries (of which there is a long list), LSPs could be forced to outsource translation work within the EU / EEA / adequate countries only or even insource certain language combinations entirely, potentially driving up the cost of translation spend for some languages.

Or, if a client company is penalized for sharing personal data with a subcontractor (i.e. an LSP or freelancer) without the full knowledge and consent of the person the information relates to (known as the data subject), will they be more inclined to employ alternative buying models for their language needs: e.g. to source freelancers directly or via digital marketplaces, or implement in-house translation models of their own?

Get informed

Although most LSPs are well-acquainted with data privacy, there are a lot of unknowns around the impact of GDPR, and LSPs would be wise to tread especially carefully when it comes to handling personal data, in particular post-25 May.

Perhaps the noise around GDPR turns out to be hot air, but with companies in breach of the regulation facing possible penalties that the GDPR recommends should be “effective, proportionate and dissuasive”, it is essential to get informed, and quickly.

Reference: https://bit.ly/2Jwh9g6

How Lingotek Uses AI to Optimize Vendor Management

How Lingotek Uses AI to Optimize Vendor Management

Language Services Vendor Management is a complex management task. It requires vetting multiple language services providers (LSPs), requesting multiple bids and comparing different rate structures. It can include literally hundreds of projects to monitor and manage to ensure on-time delivery. Adding to the complexity, LSPs typically use  several different computer-assisted translation (CAT) tools and maintain multiple linguistic assets in various offline locations. How well translation is managed has a direct effect on the company’s globalization goals and its ability to execute an agile go-to-market strategy.

No one makes vendor management easier than Lingotek. Our groundbreaking artificial intelligence (AI)-driven app inside our industry-leading translation management system (TMS) is a cost-efficient localization platform that simplifies vendor management, enhances efficiency, accelerates delivery, and optimizes budgets and costs to reduce your translation spend.

What is Artificial Intelligence?

Artificial Intelligence (AI) is simply technology that learns. AI uses data and experience to perform tasks that would otherwise require human intelligence and effort. When applied to Vendor Management, it creates a foundation for trigger-based automation, rule-driven systems, and data collection.

How does Lingotek use AI to optimize vendor management?

Lingotek continues to spearhead innovation in the translation industry with a Vendor Management app that brings AI-driven automation and multilingual business intelligence to translation management. The entire process for managing vendors: vendor selection, tracking costs and spending, vendor performance is now easier and more automated. With this data, organizations can easily and repeatedly select vendors who provide the highest translation quality and who consistently deliver jobs on time.

Integrated & automated vendor selection

The Vendor Management app simplifies and consolidates the process for requesting quotes, setting rates and pricing, choosing vendors, managing deadlines, tracking spending, and measuring translator quality and performance. The dashboard displays all of the information needed for tracking and evaluating which vendors are providing the highest quality translation and meeting deadlines. This gives project managers insights to better manage workloads and resources for maximum throughput.

  • Automatic vendor assignment based on language, industry, timeline, and more.
  • Automated bid requests, rate charts & invoicing.
  • Monitor costs and billing information within the TMS.

Centralized tracking of rates, costs & spending

The Vendor Management app automates many of the steps required for creating a language services purchase order and to closely track translation spending. The app also tracks the leveraging of translation memories (TM) to gauge the efficient reuse of linguistic assets across the enterprise. At-a-glance rate charts for quick reference of:

  • Integrated cost reporting inside the TMS.
  • Total translation expenses by date, job, or vendor.
  • Aggregation of data to simplify invoice creation.

Automatic cost calculation

Lingotek’s vendor management includes auto-calculation of costs, even when specific jobs have been skipped or cancelled. A project manager can manually skip or cancel a phase, target, or entire document.

With the active monitoring offered by our Intelligent Workflows, jobs can also be auto-skipped or auto-cancelled in order to ensure on-time delivery. When this happens, our AI-driven Vendor Management system is able to proactively alert vendors of the skipped and/or cancelled job, ensure that additional work cannot be performed on those skipped and/or cancelled jobs, and then automatically calculate the the costs for the work that was completed before the job was cancelled.

This makes invoicing a breeze, as project managers and vendor managers no longer have to worry about notifying vendors of changes made to the project mid-stream, or figure out how much work was done after the fact in order to manually calculate their costs.

Intelligence & insight to optimize your supply chain

Get more data-driven insight and control over your localization supply chain. The dashboard displays tracking and evaluating information on vendors, so you can easily select vendors who provide the highest translation quality and consistently deliver jobs on time. This gives you much-needed insight to better manage workloads and resources for maximum throughput.

  • Vendor-specific intelligence.
  • Evaluate vendor performance & quality through SLA compliance metrics.
  • Monitor project delivery & efficiency by vendor.
  • Get key metrics on costs, turnaround time, word counts, missed deadlines.

As the technology improves, we recommend that all providers review their operations to learn where they could take best advantage of AI.

–Common Sense Advisory, “The Journey to Project Management Automation”

Discover the Benefits of Lingotek’s AI-Driven Vendor Management

The new Vendor Management app gives enterprise localization managers, vendor managers, and project managers revolutionary new tools for managing multiple language services providers (LSPs) and projects. Automating vendor management provides critical operational efficiency to enable more scalable globalization strategies and to optimize your localization supply chain to create a more cost-efficient localization network.

Lingotek’s AI-driven Vendor Management can reduce the need for project managers to perform routine, automated tasks. Instead, they can use that time for solving problems that AI can’t solve. When you implement better process automation, that leaves time for project managers to perform tasks that are more valuable to the organization. They can focus their time on exception management–problem solving and responding to urgent issues.

Reference: https://bit.ly/2wONm0C

A New Way to Measure NMT Quality

A New Way to Measure NMT Quality

Neural Machine Translation (NMT) systems produce very high quality translations, and are poised to radically change the professional translation industry. These systems require quality feedback / scores on an ongoing basis. Today, the prevalent method is via Bilingual Evaluation Understudy (BLEU), but methods like this are no longer fit for purpose.

A better approach is to have a number of native speakers assess NMT output and rate the quality of each translation. One Hour Translation (OHT) is doing just that: our new NMT index is released in late April 2018 and fully available for the translation community to use.

A new age of MT

NMT marks a new age in automatic machine translation. Unlike technologies developed over the past 60 years,  the well-trained and tested NMT systems that are available today,  have the potential to replace human translators.

Aside from processing power, the main factors that impact NMT performance are:

  •      the amount and quality of initial training materials, and
  •      an ongoing quality-feedback process

For a NMT system to work well, it needs to be properly trained, i.e. “fed” with hundreds of thousands (and in some cases millions) of correct translations. It also requires feedback on the quality of the translations it produces.

NMT is the future of translation. It is already much better than previous MT technologies, but issues with training and quality assurance are impeding progress.

NMT is a “disruptive technology” that will change the way most translations are performed. It has taken over 50 years, but machine translation can now be used to replace human translators in many cases.

So what is the problem?

While NMT systems could potentially revolutionize the translation market, their development and adoption are hampered by the lack of quality input, insufficient means of testing the quality of the translations and the challenge of providing translation feedback.

These systems also require a lot of processing power, an issue which should be solved in the next few years, thanks to two main factors. Firstly, Moore’s law, which predicts that processing power doubles every 18 months, also applies to NMT, meaning that processing power will continue to increase exponentially. Secondly, as more companies become aware of the cost benefit of using NMT, more and more resources will be allocated for NMT systems.

Measuring quality is a different and more problematic challenge. Today, algorithms such as BLEU, METEOR, and TER try to predict automatically what a human being would say about the quality of a given machine translation. While these tests are fast, easy, and inexpensive to run (because they are simply software applications), their value is very limited. They do not provide an accurate quality score for the translation, and they fail to estimate what a human reviewer would say about the translation quality (a quick scan of the text in question by a human would reveal the issues with the existing quality tests).

Simply put, translation quality scores generated by computer programs that predict what a human would say about the translation are just not good enough.

With more major corporations including Google, Amazon, Facebook, Bing, Systran, Baidu, and Yandex joining the game, producing an accurate quality score for NMT translations becomes a major problem that has a direct negative impact on the adoption of NMT systems.

There must be a better way!

We need a better way to evaluate NMT systems, i.e. something that replicates the original intention more closely and can mirror what a human would say about the translation.

The solution seems simple: instead of having some software try to predict what a human would say about the translation, why not just ask enough people to rate the quality of each translation? While this solution is simple, direct, and intuitive, doing it right and in a way that is statistically significant means running numerous evaluation projects at one time.

NMT systems are highly specialized, meaning that if a system has been trained using travel and tourism content, testing it with technical material will not produce the best results. Thus, each type of material has to be tested and scored separately. In addition, the rating must be done for every major language pair, since some NMT engines perform better in particular languages. Furthermore, to be statistically significant, at least 40 people need to rate each project per language, per type of material, per engine. Besides that, each project should have at least 30 strings.

Checking one language pair with one type of material translated with one engine is relatively straightforward: 40 reviewers each check and rate the same neural machine translation consisting of about 30 strings. This approach produces relatively solid (statistically significant) results, and repeating it over time also produces a trend, i.e. making it possible to find out whether or not the NMT system is getting better.

The key to doing this one isolated evaluation is selecting the right reviewers and making sure they do their job correctly. As one might expect, using freelancers for the task requires some solid quality control procedures to make sure the answers are not “fake” or “random.”

At that magnitude (one language, one type of material, one NMT engine, etc), the task is manageable, even when run manually. It becomes more difficult when an NMT vendor, user, or LSP wants to test 10 languages and 10 different types of material with 40 reviewers each. In this case, each test requires between 400 reviewers (1 NMT engine x 1 type of material x 10 language pairs x 40 reviewers) and 4,000 reviewers (1 NMT engine x 10 types of material x 10 language pairs x 40 reviewers).

Running a human based quality score is a major task, even for just one NMT vendor. It requires up to 4,000 reviewers working on thousands of projects.

This procedure is relevant for every NMT vendor who wants to know the real value of their system and obtain real human feedback for the translations it produces.

The main challenge is of course finding, testing, screening, training, and monitoring thousands of reviewers in various countries and languages — monitoring their work while they handle tens of thousands of projects in parallel.

The greater good – industry level quality score

Looking at the greater good,  what is really needed is a standardised NMT quality score for the industry to employ, measuring all of the various systems using the same benchmark, strings, and reviewers, in order to compare like for like performance. Since the performance of NMT systems can vary dramatically between different types of materials and languages, a real human-based comparison using the same group of linguists and the same source material is the only way to produce real comparative results. Such scores will be useful both for the individual NMT vendor or user and for the end customer or LSP trying to decide which engine to use.

To produce the same tests on an industry-relevant level is a larger undertaking. Using 10 NMT engines, 10 types of material, 10 language pairs and 40 reviewers, the parameters of the project can be outlined as follows:

  •      Assuming the top 10 language pairs are evaluated, ie EN > ES, FR, DE, PT-BR, AR, RU, CN, JP, IT and KR;
  •      10 types of material – general, legal, marketing, finance, gaming, software, medical, technical, scientific, and tourism;
  •      10 leading (web-based) engines – Google, Microsoft (Bing), Amazon, DeepL, Systran, Baidu, Promt, IBM Watson, Globalese and Yandex;
  •      40 reviewers rating each project;
  •      30 strings per test; and
  •      12 words on average per string

This comes to a total of 40,000 separate tests (10 language pairs x 10 types of material x 10 NMT engines x 40 reviewers), each with at least 30 strings, i.e. 1,200,000 strings of 12 words each, resulting in an evaluation of approximately 14.4 million words. This evaluation is needed to create just one instance (!) of a real, comparative, human-based NMT quality index.

The challenge is clear: to produce just one instance of a real viable and useful NMT score, 4,000 linguists need to evaluate 1,200,000 strings equating to well over 14 million words!

The magnitude of the project, the number of people involved and the requirement to recruit, train, and monitor all the reviewers, as well as making sure, in real time, that they are doing the job correctly, are obviously daunting tasks, even for large NMT players, and certainly for traditional translation agencies.

Completing the entire process within a reasonable time (e.g. less than one day), so that the results are “fresh” and relevant makes it even harder.

There are not many translation agencies with the capacity, technology, and operational capability to run a project of that magnitude on a regular basis.

This is where One Hour Translation (OHT) excels. They have recruited, trained, and tested thousands of linguists in over 50 languages, and already run well over 1,000,000 NMT rating and testing projects for our customers. By the end of April 2018, they published the first human-based NMT quality index (initially covering several engines and domains and later expanding), with the goal of promoting the use of NMT across the industry.

A word about the future

In the future, a better NMT quality index can be built using the same technology NMT is built on, i.e. deep-learning neural networks. Building a Neural Quality system is just like building a NMT system. The required ingredients are high quality translations, high volume, and quality rating / feedback.

With these ingredients, it is possible to build a deep-learning, neural network based quality control system that will read the translation and score it like a human does. Once the NMT systems are working smoothly and a reliable, human based, quality score/feedback developed, , the next step will be to create a neural quality score.

Once a neural quality score is available, it will be further possible to have engines improve each other, and create a self-learning and self-improving translation system by linking the neural quality score to the NMT  (obviously it does not make sense to have a closed loop system as it cannot improve without additional external data).

With additional external translation data, this system will “teach itself” and learn to improve without the need for human feedback.

Google has done it already. Its AI subsidiary, DeepMind, developed AlphaGo, a neural network computer program that beat the world’s (human) Go champion. AlphaGo is now improving, becoming better and better, by playing against itself again and again – no people involved.

Reference: https://bit.ly/2HDXbTf

AI Interpreter Fail at China Summit Sparks Debate about Future of Profession

AI Interpreter Fail at China Summit Sparks Debate about Future of Profession

Tencent’s AI powered translation engine, which was supposed to perform simultaneous transcribing and interpreting at China’s Boao Forum for Asia last week, faltered badly and became the brunt of jokes on social media. It even made headlines on the South China Morning Post, Hong Kong’s main English newspaper – which, incidentally, is owned by Tencent’s key rival Alibaba.

The Boao Forum, held in Hainan Province on April 8-11, 2018, is an annual nonprofit event that was started in 2001. Supported by the region’s governments, its purpose is to further progress and economic integration in Asia by bringing together leaders in politics, business and academia for high-end dialogs and networking.

Tencent is one of the tech giants of China, often dubbed the “B.A.T.” (for Baidu, Alibaba, Tencent; sometimes BATX if one includes Xiaomi). Its most well known products include the instant messenger WeChat as well as microblogging site Sina Weibo. Both are everyday apps used by just about all Chinese citizens as well as other ethnic Chinese around the world.

WeChat in China is pretty much an all-round, full service lifestyle mobile app in its local Chinese version. You could do just about anything in it these days – from buying train and movie tickets to making mutual fund investments to ordering groceries or an hourly maid from the neighbourhood.

In 2017, Tencent rolled out an AI powered translation engine called “Fanyijun”, which literally translates to “Mr. Translate”, since the Chinese character for “jun” is a polite, literary term for a male person.

What went Wrong?

Fanyijun is already in use powering the in-app translator in WeChat as well as available online as a free online service. However, it was supposed to have made a high-profile debut at the Boao Forum together with the Tencents “Zhiling” or literally translated, “Smart Listening” speech recognition engine, showcasing the company’s ability to do real-time transcription and interpreting. In retrospect, it seems the publicity effort has backfired on Tencent.

To be sure, human interpreters were still on hand to do the bulk of the interpreting work during the forum. However, Tencent used its AI engine to power the live translation and broadcast of some of the side conferences to screens next to the stage and for followers of the event within WeChat.

This resulted in many users making screenshots of the embarrassing errors made when the engine frequently went haywire and generated certain words needlessly and repeatedly, as well as getting confused when some speakers spoke in an unstructured manner or used certain terminology wrongly.

Chinese media cited a Tencent’s spokesperson who admitted that their system “did make errors” and “answered a few questions wrongly”. But he also said in their defense that the Boao Forum was a high-level, multi-faceted, multi-speaker, multi-lingual, discussion based event. That and the fact that the environment was sometimes filled with echo and noise, added to the challenges their system faced.

“They still need humans…”

The gloating hit a crescendo when someone circulated this screenshot from a WeChat group composed of freelance interpreters. It was an urgent request for English simultaneous interpreters to do a live webcast later that day for the Boao Forum.

One group member replied, “They still need humans…” Another said, “Don’t they have an interpreter device?” A third sarcastically added, “Where’s the AI?”

Tencent later clarified that this request was meant for engaging interpreters for their professional news team doing live reporting in Beijing, and not for the simultaneous interpreting team located onsite at the Boao Forum.

Tencent reportedly beat other heavyweight contenders such as Sogou and iFlytek to secure this prestigious demo opportunity at the Boao Forum after a 3-month long process. Sogou is the 2nd largest search engine in China, which also provides a free online translator, built in part through leveraging its investment in China startup UTH International, which provides translation data and NMT engines. iFlytek is a listed natural language processing (NLP) company worth about USD 13 billion in market capitalization. Its speech recognition software is reportedly used daily by half a billion Chinese users and it also sells a popular pocket translation device targeted at Chinese tourists going abroad.

But given what went down at the Boao Forum for “Mr. Translator”, Tencent’s competitors are probably seeing their ‘loss’ as a gain now. The social media gloating aside, this incident has sparked off an active online debate on the ‘what and when’ of AI replacing human jobs.

One netizen said on Sina Weibo, “A lot of people who casually say that AI can replace this or that job, are those who do not really understand or know what those jobs entail; translation included.”

However, Sogou news quoted a veteran interpreter who often accompanied government leaders on overseas visits. She said, “As an interpreter for 20 years, I believe AI will replace human translators sooner or later, at least in most day to day translation and the majority of conference interpreting. The former probably in 3-5 years, the latter in 10 years.”

She added that her opinions were informed by the fact that she frequently did translation work for IT companies. As such she was well aware of the speed at which AI and processor chips were advancing at, and hence did not encourage young people to view translation and interpreting as a lifelong career, which she considers to be a sunset industry.

Reference: https://bit.ly/2qGLhxu

XTM International Announces XTM Cloud v11.1

XTM International Announces XTM Cloud v11.1

London, April 16, 2018 — XTM International has released a new version of XTM Cloud. Building on the success of XTM v11, the new version adds many new features requested by users.

The integration with Google Sheets is a breakthrough achievement. XTM Connect for Google Sheets is intuitive and collaborative. Localization managers can push content for translation directly from the chosen columns or entire sheets. Completed translations are delivered into specified cells, and can be instantly shared with the rest of the teams. The process is fully automated and does not involve copy/pasting nor file exports. Translation takes less time as an outcome, and there are no version conflicts between the localized documents and their newer versions updated by copy writers.

Projects in XTM can now be assigned to language leads or in-house translators. The new user role has the rights to view and manage projects for their specified target languages. By doing so, in-house translators can translate texts in person or outsource them depending on the needs and the workload. In effect, they can reduce the turnaround time and gain extra flexibility to manage source text overflow.

“Our development strategy is focused on enhancing XTM with features that provide maximum value to our Enterprise and LSP users. We are delighted to release XTM Cloud v11.1, as it delivers a very useful set of enhancements to our growing customer base.” – said Bob Willans, CEO of XTM International.

Other main features include a new connector for Kentico, support for markdown (.md) source files, options to color or penalize language variant matches, and new REST and SOAP API methods.

For additional information about XTM and its new features, please visit https://xtm.cloud/release-notes/11.1/.

Reference: https://bit.ly/2HvnQS7

BOUTIQUE TRANSLATION AGENCIES: THE NEW GENERATION

BOUTIQUE TRANSLATION AGENCIES: THE NEW GENERATION

There was a time when dinosaurs dominated the world of translation: huge great lumbering beasts of companies with offices in every major world city and thousands of contractors at their fingertips. They offered every language pair, every specialism and every service under the sun, all overseen by huge teams of project managers in vast offices filled with piles of paperwork. But things don’t stay the same forever, and with the rise of the internet and a new focus on niche services a very different kind of professional translation service is on the rise: the boutique translation agency.

They may be small, but don’t underestimate their appeal to translators and clients alike.

What are boutique translation agencies?

LIGHT ON THEIR FEET

Boutique translation agencies take their cue from boutique advertising companies, the new form of PR that aimed to offer something different to the behemoths of the ad world. Just like their marketing forerunners, boutique translation agencies are small, nimble and fast-paced. Unlike the larger firms that worry so much about economies of scale, boutique agencies offer specialised services with a high degree of personalisation and flexibility.

Boutique firms have staff that can react quickly and flexibly to any new challenge, because they aren’t spending their time churning out huge amounts of repetitive work. They’re free to follow opportunities, evolve and change rapidly through time, leaving big global corporations in their dust. They don’t have a vast translation team of unknown and untested contractors, but rather they work with a small and trusted group of contacts, so the relationships within the agency tend to be closer. This means that quality control is not a matter of ticking boxes as it is with larger companies, but rather comes down to close working relationships where managers have in-depth, detailed knowledge of all their staff’s skills and strengths, and can draw together the perfect team for each project.

 EXACTLY WHAT YOU NEED

Specialisation is also one of the biggest strengths of these new and nimble agencies. Unlike massive international companies, they aren’t Jacks of all trades and masters of none. No one can truly specialise in everything, and larger companies run the risk of spreading themselves too thin at the expense of quality. Boutique agencies are at the other end of the scale, offering very specific niche services. They know their strengths and they know their target market’s needs, as well as having a comprehensive understanding of the language, culture or industry they specialise in.

Different firms have different ways of narrowing down to a specialisation. Some focus on a particular subject area or industry, for example legal, marketing or technical translations. These agencies focus on hiring translators who are experts within that industry, many of whom will have had a previous career elsewhere before becoming translators. Other agencies specialise in particular languages, amassing a team of native Russian translators, for example, but with a wide range of interests, knowledge and skills. These teams offer particular advantages because they can combine the different subject specialisms of their translators in line with the client’s needs. Many of these agencies also offer specialised services such as localisation, DTP or web services like SEO and web marketing, all in combination with translation. This allows a team of different professionals, all with a comprehensive understanding of your language pair or industry, to work together fluidly and produce an excellent finished product exactly to your specifications.

THE PERSONAL TOUCH

In line with the fantastic opportunities for specialisation that boutique agencies offer, clients and staff alike tend to find these firms are much more personal than the big multinationals.

Smaller agencies can offer a highly tailored and personalised service built around your needs rather than the company’s ‘way of doing things’. Instead of forcing you to fit their box, they will shape their work to suit your needs. You’re likely to experience less bureaucracy and paper pushing, because a smaller team can find common sense solutions instead of having to rely on endless protocols. And you’ll have access to the people that matter. Often a smaller translation company will be directly managed by the CEO, who isn’t a fat cat investor sitting in a board meeting or playing golf, but is more likely to be a translator him/herself. At the very least you’ll have a regular, designated contact person within the company over time, so you’ll have an opportunity to build a good working relationship with your own project manager. And with a smaller company the team that wins your business is the exact one that will work on your project; unlike some of the less scrupulous bigger companies they won’t impress you with the CVs of excellent translators and then farm your work out to untrained, poorly qualified individuals.

A by-product of all this is that boutique agencies tend to be more detail-oriented and creative than their larger cousins. Unbound by pointless rules and procedures they’re free to offer the kind of personalised service that has clients returning year after year.

THE CLIENT IS KING

Whereas big multinational corporations are bound by the bottom line, long-term relationships, reputation and old-fashioned business values mean everything to smaller companies. As they thrive by word-of-mouth and often keep their client list short, boutique agencies are heavily focused on client satisfaction and building trust. For smaller companies no account is too small to warrant their care and attention, and communication tends to be personal, efficient and meaningful.

Boutique agencies aren’t staffed by managers from other sectors with no real understanding of translation, and they don’t take on new translators with little evidence as to their skills and abilities. They tend to be run by passionate linguists who view their business as a vocation, not just a moneymaking exercise. That’s why you’ll often spot all kinds of added extra value when working with a boutique agency, along with a willingness to source additional services or skills in accordance with your needs. In short, they will go the extra mile for your business, because they know that’s how to win and keep custom.

THE BOTTOM LINE

Finally, you’ll get more bang for your buck with a boutique translation agency, as many of these companies offer outstanding value for money with no compromise on quality – in fact, often providing a more specialised and personalised service than a big provider of ‘off the peg’ translation solutions. They will be able to offer flexibility over rates and often have much lower overheads than multinationals. Some are based in countries with low tax rates and rents, while others save by managing their team online instead of assembling them in an office. Bearing all this in mind, a small budget to a global firm can often be quite a substantial one to a small agency, meaning you can get more for your money.

What aren’t boutique translation agencies?

EXCLUSIVE SERVICE, EVERYDAY PRICES

Boutique translation agencies needn’t be expensive. Although the term conjures an exclusive tailor-made experience, owing to the nature of these smaller companies you needn’t pay through the nose for it. For a start, they are less profit-oriented and more concerned with providing an excellent service, which is, after all, their unique selling point. Low overheads and innovative working practices also mean that if money is tight in your office a boutique agency might be just the right service provider for you.

FOCUSED, NOT LIMITED

Boutique translation agencies needn’t be limited in scope. Don’t confuse their emphasis on specialisation with a narrow focus. Any good small agency will have a network of highly skilled individuals on call, and can put together teams to tackle any text. The difference between these smaller translation agencies and the corporate giants is that boutique agencies know their limits and will not take on work on spec without knowing they can deliver. They also don’t keep huge numbers of staff on their permanent payroll just to cover any eventuality, so they can really save you money.

MIDDLEMEN BEGONE!

Small translation firms know that you want to pay for fantastic translation, not layer upon layer of middle management. You’ll have a project manager, whose role is to know the team inside out and be able to pick out the best individuals for your project. Good project managers are indispensable after all – but you won’t be paying for heads of business development, corporate strategists, marketing gurus, IT departments or any other of the staff members so indispensable to bigger clients. Instead you’ll find your team is flexible and diverse enough to tackle any of the challenges that come their way.

UP CLOSE AND PERSONAL

Boutique translation agencies are the very opposite of corporate. You’re not just a number on a spreadsheet and you won’t receive formulaic service – rather the whole experience will be shaped around you. These firms don’t tend to be concerned with growth at any cost, but rather they prioritise building and maintaining a cast-iron reputation in a specific field. There are no economies of scale, which means every client matters, and customer service is by nature at the very heart of everything they do.

It easy to see why these agencies are becoming more and more popular, and in some sectors are now starting to corner the translation market. Bigger companies are running scared and looking to find ways to streamline their service offerings, but savvy clients are still abandoning impersonal companies in their droves, looking for something different. In the battle of David and Goliath you’d be forgiven for betting on the big guy, but don’t rule out the underdog. Putting meaning and value back at the heart of the translation process, it looks like these plucky contenders are here to stay.

Reference: https://bit.ly/2og0aWS