Tag: Translation Technology Development

Smart devices and the future of CAT tools

Smart devices and the future of CAT tools

CAT tools have already been on the market for many years now and yet they are still improving. New technologies and emerging needs from translators are triggering a shift from computer-aided translation tools to smart device-aided translations tools. Does the future of productivity lie in web-based translation environments?

The emergence of online translation environments

While CAT tools nowadays are inevitable in the toolkit of translators, it is still not long ago that professional translators had to work without them. The tools for computer-aided translation, not to be confused with online translation tools like Google Translate, only emerged in the early 1990s. Although there might have been some earlier attempts to create software that helps translators to improve their quality, productivity and consistency, in the last decade of the last century they came into full swing. Nowadays translators can choose from at least 20 different CAT tools, both online and offline, to suit their needs out of which SDL Trados and MemoQ are by far the best known.
However, only 25 years after the introduction of mainstream translation software a new era is on the horizon. The introduction of cloud technology, the rise of digital nomads, and the general availability of cheap and fast internet connections has led to a new branch on the CAT tool tree: translators can now use online translation environments, both free and paid, to work wherever they choose to.

Translating online

The technological advancements in the last couple of years opened great opportunities for companies who looked beyond traditional CAT tools and wanted to pluck the low-hanging fruit of the cloud’s capabilities. Several professionals, both from inside and outside the translation industry, quickly introduced their own online variants of the desktop translation tools. Examples included Smartling and Memsource (which has a desktop tool as well). These tools are browser-based, which means that they are accessible as webpages and can be used to work wherever users want as long as they have a compatible device and an internet connection. The online translation environments offer full functionality, which is often equivalent to the standard desktop tools. Users (in the case of Smartling and Memsource mainly project managers) can create translation memories and term bases, set rules for quality assurance and require users to perform several checks before they can deliver their translations. The tools also offer support for the most common file formats, like Microsoft Office files, PDF files and HTML documents, but also for bilingual filetypes like XLIFF and the proprietary formats of Trados and MemoQ. In addition, they often have familiar user interfaces, with well-known toolbars and panels that make it easier for project managers and translators alike to find their way in the online CAT tool.

It might be clear that the new members in the CAT tool family are working disruptively to shake up the CAT tool industry. It is therefore not a surprise that after the introduction of new online CAT tools developers of ‘traditional’ CAT tools also came up with an online version. MemoQ introduced MemoQ Web while SDL brought SDL Online Translation Editor to the table.

Web-based CAT tools for translators

The most important feature of the web-based CAT tools is, (how surprising), that they work in a browser. Most of them were initially designed to work on a desktop, offering translators a convenient tool with omnipresent accessibility while at the same time making it easier for project managers to dispense projects. Indeed, project managers only had to upload files, create or connect a translation memory, and send a link to multiple translators, making it easier to complete projects, shorten the turnaround time, and circumvent lengthy discussions via email. But because these new online CAT tools were mainly directed at agencies and project managers, they fell short of meeting the needs of translators who wanted to work on the go. Other bright minds therefore developed new web-based CAT tools that supported the needs of the freelance translator better: in the past few years Lilt and Smartcat were introduced, among others. The SDL Online Translation Editor has also been created with freelance professionals in mind, while MemoQ Web is more dedicated to project managers.

The biggest difference between tools for freelance translators and project managers is their workflow. While project managers have loads of options to manage projects, tools like Lilt and Smartcat introduce only the options freelancers need: they can upload a file in different file types, create or use a translation memory (term bases are often not supported), work their way through the file, and complete the job. The tools have a familiar and simple user interface, so translators do not need to look for advanced options, but often, powerful options are hidden under the bonnet, so they can really compete with their desktop equivalents.
Another major advantage of CAT tools in the cloud is that they frequently release new features quickly and respond to feature requests even faster, while traditional CAT tools often require months for implementing, testing, and introducing new features in a newly built (minor) version of their tool.

Another major difference is that many tools aimed at freelancers are free to use. They offer various plans for advanced users, often based on the amount of characters being translated, but there is only one free flavour, and it comes without many of the options that paid users have access to.

Privacy concerns with online CAT environments

In the past few years the online CAT tools have quickly risen to the level at which they can compete with traditional computer-based CAT tools. Where CAT tools have evolved and added new features with every new release, their online counterparts were introduced according to the status quo of traditional CAT tools. They sometimes even introduced ground-breaking new features that traditional CAT tools were not able to offer, like Lilt’s adaptive machine translation.
Yet among translators there is still much debate about their adaptations. The most important concern is that of privacy. While computer resources are generally considered a safe option, many translators are afraid to use cloud environments because of the risk of hacks and leaks that expose clients’ confidential information. At the same time, using a free online translation environment sometimes requires that translations are shared with the platform provider to improve the quality of generally available translation memories and machine translation services. Freelancers, whose business depends on credibility, simply cannot afford to share their client’s information for the sake of improving their productivity or flexibility.
On the other hand, early adopters and technology enthusiasts debate that the cloud is much safer than many computers thanks to continuous security updates. However, they are only a small group in the world of translators.

From CAT to SAT?

Whatever the privacy concerns, until now the introduction of online CAT tools has made clear that they are here to stay. With the increasing adaptation of online tools, lifestyles shifting to working on the go, and digital nomadism it is expected that online translation environments will be increasingly in demand in the future.
Although traditional CAT tools do not offer any opportunities to be run on smart devices with an Android, iOS, or Windows Phone operating system, online CAT tools do not have this problem. That means that they can be used without barriers on smartphones and tablets, once they have been adopted on a computer. Indeed they offer the same experience everywhere as they are browser-based and do not need to be adapted much to work in different operating environments. An added advantage of this possibility is that users can start a task on their desktop, then work on it while away, and complete it in a third environment.

Yet, despite the seemingly endless possibilities of the online CAT tools, many of them still do not offer a flawless experience on smartphones and tablets. One of the biggest disadvantages of the browser-based tools is that they do not fit neatly onto the small screens of smart devices. A short experiment with a few translation platforms (Smartcat and Lilt; SDL’s Online Translator Editor returned an error) quickly showed that the user interface has problems with touch-enabled devices. While all elements of a CAT tool (the panel with the bilingual format, a panel with translation memory results, a concordance panel, and some other interface elements) are present, they often do not fit neatly. While the interface appears fine in its initial state, touching a text box to add a translation will cause the panels to be re-arranged every time. Furthermore after touching the screen the screen keyboard pops up, often making (a part of) the source text invisible. While this problem is apparent on tablets, it is even more problematic on smartphones with even smaller screens. Working on a translation on the go using a tablet of smartphone therefore does not offer a seamless, flawless, or productive experience just yet.

Another problem is that rendering the translation environment on a tablet or smartphone requires considerable computing resources on some devices. So in order to make full use of an online CAT tool, users need to have a powerful tablet or smartphone that can execute scripts and render style sheets quickly to realize a productivity gain.

That brings us to the question of whether online CAT tools can fulfil the needs of professional translators. Basically, the answer is yes. Online CAT tools often work well on desktops. However, they are currently an online variant on computer-aided translation tools. That does not mean that they are fully fledged to become smart device-based translation tools (SAT). The current generation of browser-based CAT tools is perfect to use with laptops while one is on the go, but in order to benefit from their full potential for smartphones and tablets they still need to be more adapted to these devices. The future of CAT tools is in our hands, but it still need to be adapted to our fingers.

Translators in the Algorithmic Age

Translators in the Algorithmic Age

Translation is being transformed by the forces of global business and new technology. This means that the human resources used in the industry – typically translators – need to reassess their role in the emerging translation landscape. This report focuses on the role of translators in an environment now driven by data at every level and disrupted by tougher competition, new management priorities, and a concerted effort to use machine learning in business generally. You can also download the no-time-to-read version of the report.

Files, Files Everywhere: The Subtle Power of Translation Alignment

Files, Files Everywhere: The Subtle Power of Translation Alignment

Here’s the basic scenario: you have the translated versions of your documents, but the translation wasn’t performed in a CAT tool and you have to build a translation memory because these documents need to be updated or changed across the languages, you want to retain the existing elements, style and terminology, and you have integrated CAT technology in your processes in the meantime. The solution is a neat piece of language engineering called translation alignment.

Translation alignment is a native feature of most productivity tools for computer-assisted translation, but its application in real life is limited to very specific situations, so even the language professionals rarely have an opportunity to use it. However, these situations do happen once in while and when they do, alignment usually comes as a trusty solution for process optimization. We will take a look at two actual cases to show you what exactly it does.

Example No. 1: A simple case

Project outline:

Three Word documents previously translated to one language, totaling 6000 unweighted words. Two new documents totaling around 2500 words that feature certain elements of the existing files and need to follow the existing style and terminology.

Project execution:

Since the translated documents were properly formatted and there were no layout issues, the alignment process was completed almost instantly. The software was able to segmentize the source files and we matched the translated segments, with some minor tweaking of segmentation. We then built a translation memory from those matched segments and added the new files to the project.

The result:

Thanks to the created translation assets, the final wordcount of the new content was around 1500 and our linguists were able to produce translation in accordance with the previously established style and terminology. The assets were preserved for use on future projects.

Example No.2: An extreme case of multilingual alignment

Project outline:

In one of our projects we had to develop translation assets in four language pairs, totaling roughly 30k words per language. The source materials were expanded with new content totaling about 20k words unweighted and the language assets had to be developed both to retain the existing style and terminology solution and to help the client switch to a new CAT platform.

Project execution:

Unfortunately, there was no workaround for ploughing through dozens of files, but once we organized the materials we could proceed to the alignment phase. Since these files were localized and some parts were even transcreated to match the target cultures, which also included changes in layout and differences in content, we knew that alignment was not going to be fully automated.

This is why native linguists in these languages performed the translation alignment and communicated with the client and the content producer during this phase. While this slowed the process a bit, it ultimately yielded the best results possible.

We then exported the created translation memory in the cross-platform TMXformat that allowed use in different CAT tools and the alignment phase was finished.

The result:

With the TM applied, the weighted volume of new content was around 7k words. Our linguists localized the new materials in accordance with the existing conventions in the new CAT platform and the translation assets were saved for future use.

Wrap up

In both cases, translation alignment enabled us to reduce the volume of the new content for translation and localization and ensure stylistic and lexical consistencywith the previously translated materials. It also provided an additional, real-time quality control and helped our linguists produce a better translation in less time.

Translation alignment is not an everyday operation, but it is good to know that when it is called to deliver the goods, this is exactly what it does.

Reference: https://bit.ly/2p5aYr0

What machine translation teaches us about the challenges ahead for AI

What machine translation teaches us about the challenges ahead for AI

oão Graça, co-founder and CTO of Unbabel, on what machine translation can teach us about the challenges still lying ahead for artificial intelligence.

Can you understand this sentence? Now try understanding the long and convoluted and unexpectedly – maybe never-ending, or maybe ending-sooner-than-you-think, but let’s hope it ends soon – nature of this alternative sentence.

The complexities of language can be an inconvenience to a reader. But even to today’s smartest machine learning algorithms, there are more translation challenges remaining than advances in other fields would have you believe.

These challenges in particular are a good demonstration of the multitude of complexities that still remain for machines to catch up with human performance.

You say tomato

When it comes to translation, there are two categories of content. On one hand, you have “commodity” translation. Perhaps you want to point your phone at a menu and get a rough idea of what it is. Or you want to impress a colleague with a phrase from their local language.

Here, phrases are short, the content is often formal and errors aren’t life or death.

But on the other hand, you have interactions where context is key – understanding the intent of the writer or speaker, and the expectations of the reader or listener. Take any example where a business speaks to its customers – you better hope you are speaking their language respectfully when they have a complaint or problem.

It’s not enough to solve the problem at a superficial level, and to achieve comparably “human quality” communication still has an enormous amount of research ahead of it. This need for perfection is why most research is focused in this second area.

In the examples below, I discuss the challenges still ahead for the translation industry, and touch on what they mean for how we use machine learning tech more broadly.

Challenge 1: Long-distance lookups

Many of the biggest challenges are structural.

A good example is long distance lookups. If you are translating a sentence word by word, but the order is the same, it’s just solving “what is the correct equivalent of this for that?”

But once you start having to think about reordering the sentence, the problem space that has to be explored is exponentially larger. And in languages like Chinese and Japanese, you find verbs at the end of the sentence, potentially producing the longest distances possible.

The system needs to assess at least three reordering systems. This is why these languages are so hard, because you have to cater to very different grammatical patterns, very different vocabularies, and how many characters are in each word.

Here, you can see how expanding problem spaces create difficulties in an area the human brain handles with ease.

Challenge 2: Taxonomy

The second major area of complexity involves different formats of data.

For example, conversational language has a completely different structure and appropriate models than formal documents. In areas like customer service translation, this makes a big difference. Nobody likes to feel like the representative of a company is being overly officious when handling their problem.

Therefore, any model that is able to learn from a volume of real human queries will have an advantage — and doubly so if it’s able to take it from a particular industry sector. Meanwhile, other models might be relying on news stories or generic online text, and output completely different results.

Similarly, with other machine learning challenges, the ability to learn from the most valuable and representative data can give a big advantage – or risk limiting taxonomical flexibility.

This brings us to context.

Challenge 3: Context

Most translation models still translate sentence by sentence, so they don’t take the context into account.

If they are translating a pronoun, they have no clue which pronoun should be translated. They will randomly generate sentences that are formal or informal. They don’t guarantee consistency of terminology – for instance, translating a legal term correctly in the same way throughout. There’s no way you can guarantee the whole document is correct.

The other problem is the content is not always in the same language. Sometimes it’s one sentence in Chinese, one sentence in English. The sentences are much shorter, so you probably have to look much higher for context. This reaches its extreme in “chat” interactions.

And the context problem is different than if you were translating an email. For example, if you are doing a legal document and the document is ten pages long, you would need to use the entire document for an accurate contextual translation.

This is next to impossible with current models – you have to find some way to summarise it. Otherwise, consistency is nearly impossible.

On the other hand, if you are translating for something like SEO, what you are actually translating is key words that don’t form a sentence, just keywords by themselves. This means you turn to more dictionary-like translation to disambiguate and use other words or the image associated with it.

People think “Oh, we are in the age of unlimited data” but actually we are still enormously lacking in many ways.

Yes, we have a lot of data but often not enough relevant data.

Looking to the future

There will be many translation engines but what makes them different is their models.

The model is going to look at the data and predict patterns and assign them to different customers, and from then, will decide which voice/ language/ tone/ etc. to choose.

In current common public translation tools, they aren’t aware of this yet. They don’t even have the knowledge of the document from where the translation came from, let alone the speaker or their translation preferences.

This will bring in the next level of sophistication in this area. Machine learning, exercised against use-specific corpus of language, will give fast and accurate translations, while being able to forward them to humans to finalise and learn from further.

Languages might still drive machines crazy – but with careful human thinking, we can teach them to persevere.

Reference: https://bit.ly/2PYYHAB

Is This The Beginning of UNMT?

Is This The Beginning of UNMT?

Research at Facebook just made it easier to translate between languages without many translation examples. For example, from Urdu to English.

Neural Machine Translation

Neural Machine Translation (NMT) is the field concerned with using AI to translate between any language such as English and French. In 2015 researchers at the Montreal Institute of Learning Algorithms, developed new AI techniques [1] which allowed machine-generated translations to finally work. Almost overnight, systems like Google Translate became orders of magnitude better.

While that leap was significant, it still required having sentence pairs in both languages, for example, “I like to eat” (English) and “me gusta comer” (Spanish).  For translations between languages like Urdu and English without many of these pairs, translation systems failed miserably. Since then, researchers have been building systems that can translate without sentence pairings, ie: Unsupervised Neural Machine Translation (UNMT).

In the past year, researchers at Facebook, NYU, University of the Basque Country and Sorbonne Universites, made dramatic advancements which are finally enabling systems to translate without knowing that “house” means “casa” in Spanish.

Just a few days ago, Facebook AI Research (FAIR), published a paper [2] showing a dramatic improvement which allowed translations from languages like Urdu to English. “To give some idea of the level of advancement, an improvement of 1 BLEU point (a common metric for judging the accuracy of MT) is considered a remarkable achievement in this field; our methods showed an improvement of more than 10 BLEU points.”

Check out more info at Forbes.

Let us know what do you think about this new leap!

Here’s Why Neural Machine Translation is a Huge Leap Forward

Here’s Why Neural Machine Translation is a Huge Leap Forward

Though machine translation has been around for decades, the most you’ll read about it is the perceived proximity to the mythical “Babel Fish” –an instantaneous personal translation device– itself ready to replace each and every human translator. The part that gets left out is machine translation’s relationship with human translators. For a long time, this relationship was no more complex than post-editing badly translated text, a process most translators find to be a tiresome chore. With the advent of neural machine translation, however, machine translation is not just something that creates more tedious work for translators. It is now a partner to them, making them faster and their output more accurate.

So What’s the Big Deal?

Before we jump into the brave new translating world of tomorrow, let’s put the technology in context. Prior to neural machine translation, there have been two main paradigms in the history of the field. The first was rules-based machine translation (RBMT) and the second, dominant until very recently, was phrase-based statistical machine translation (SMT).

When building rules-based machine translation systems, linguists and computer scientists joined forces to write thousands of rules for translating text from one language to another. This was good enough for monolingual reviewers to be able to get the general idea of important documents in an otherwise unmanageable body of content in a language they couldn’t read. But for the purposes of actually creating good translations, this approach has obvious flaws: it’s time consuming and, naturally, results in low quality translations.

Phrase-based SMT, on the other hand, looks at a large body of bilingual text and creates a statistical model of probable translations. The trouble with SMT is its reliance on systems. For instance, it is unable to associate synonyms or derivatives of a single word, requiring the use of a supplemental system responsible for morphology. It also requires a language model to ensure fluency, but this is limited to a given word’s immediate surroundings. SMT is therefore prone to grammatical errors, and relatively inflexible when it encounters phrases that are different from those included in its training data.

Finally, here we are at the advent of neural machine translation. Virtually all NMT systems use what is known as “attentional encoder-decoder” architecture. The system has two main neural networks, one that receives a sentence (the encoder) and transforms it into a series of coordinates, or “vectors”. A decoder neural network then gets to work transforming those vectors back into text in another language, with an attention mechanism sitting in between, helping the decoder network focus on the important parts of the encoder output.

The effect of this encoding is that an NMT system learns the similarity between words and phrases, grouping them together in space, whereas an SMT system just sees a bunch of unrelated words that are more or less likely to be present in a translation.

Interestingly, this architecture is what makes Google’s “zero-shot translation” possible. A well-trained multilingual NMT can decode the same encoded vector into different languages it knows, regardless of whether that particular source/target language combination was used in training.

As the decoder makes its way through the translation, it predicts words based on the entire sentence up to that point, which means it produces entire coherent sentences, unlike SMT. Unfortunately, this also means that any flaws appearing early in the sentence tend to snowball, dragging down the quality of the result. Some NMT models also struggle with words it doesn’t know, which tend to be rare words or proper nouns.

Despite its flaws, NMT represents a huge improvement in MT quality, and the flaws it does have happen to present opportunities.

Translators and Machine Translation: Together at Last

While improvements to MT typically mean increases in its usual applications (i.e. post-editing, automatic translation), the real winner with NMT is translators. This is particularly true when a translator is able to use it in real time as they translate, as opposed to post-editing MT output. When the translator actively works with an NMT engine to create a translation, they are able to build and learn from each other, the engine offering up a translation the human may not have considered, and the human serving as a moderator, and in so doing, a teacher of the engine.

For example, during the translation process, when the translator corrects the beginning of a sentence, it improves the system’s chances getting the rest of the translation right. Often all it takes is a nudge at the beginning of a sentence to fix the rest, and the snowball of mistakes unravels.

Meanwhile, NMT’s characteristic improvements in grammar and coherence mean that when it reaches a correct translation, the translator spends less time fixing grammar, beating MT output and skipping post-editing all together. When they have the opportunity to work together, translators and their NMT engines quite literally finish each other’s sentences. Besides speeding up the process, and here I’m speaking as a translator, it’s honestly a rewarding experience.

Where Do We Go Now?

Predicting the future is always a risky business, but provided the quality and accessibility of NMT continues to improve, it will gradually come to be an indispensable part of a translator’s toolbox, just as CAT tools and translation memory already have.

A lot of current research has to do with getting better data, and with building systems that need less data. Both of these areas will continue to improve MT quality and accelerate its usefulness to translators. Hopefully this usefulness will also reach more languages, especially ones with less data available for training. Once that happens, translators in those languages could get through more and more text, gradually improving the availability of quality text both for the public and for further MT training, in turn allowing those translators, having already built the groundwork, to move on to bigger challenges.

When done right, NMT has the potential to not just improve translators’ jobs, but to move the entire translation industry closer to its goal of being humanity’s Babel Fish. Not found in an app, or in an earbud, but in networks of people.

 

Reference: https://bit.ly/2CewZNs

Four Ways the Translator Role Will Change

Four Ways the Translator Role Will Change

Translators have been the staple of the translation business for decades. Linguistics, multilingual communication, and quality of language—this is their domain. They are grammarians and often self-admitted language nerds.

Translators are usually bilingual linguists who live in the country where their native language is spoken—it’s the best way for them to stay connected with linguistic and cultural changes. Their responsibility traditionally has been to faithfully render a source text into a specific target language.

But now, with the advent of more complicated content types such as highly branded content (like mobile apps or PPC ads), and with much higher volumes of content like User Generated Content (think customer reviews), all bets are off. The role of the translator has to evolve: translators now have to offer the right solution to new globalization problems…or risk being left behind.

This reality isn’t just relevant for translators: localization project managers need to know what new qualifications to look for as they try to match resources to their content. Four new specialist roles have evolved out of changing global content needs, allowing translators and linguists to expand their offerings and learn new skills.

Transcreators

In transcreation, a highly specialized linguist recreates the source content so that it’s appropriate for the target locale. They key term here is ‘recreates’, which means re-invent or build again. The goal is to create content that inspires the same emotions in the target language as the source content does in the home market.

Typically, the process of transcreation applies to taglines, product names, slogans, and advertisement copy; anything highly branded.

The linguists performing this service are highly creative translators: senior, experienced professionals with lots of marketing content translation experience. They also might have agency expertise.

Many transcreators begin their professional career as translators, and as they gain proficiency in marketing content, they become adept at the re-creation process. If they are creative types, this expertise can lead them right into the specialization of transcreation.

Content creators or copywriters

In-country copywriters create materials from scratch for a target market—a highly creative process. There’s no actual translation here. While the resource is often an ad agency professional with copywriting experience, they may also be a translator—or have been one in the past. (It’s not uncommon for a translator with creative content experience to move into copywriting.) Like translators, these professionals must be in-country in order to represent the latest trends in that market.

Cultural consultants

These folks, who also reside in the target country, provide guidance to a client on the motivations and behaviors of target buyers. They are researchers and representatives of their culture. They also may be experts in local paid media, search marketing, social media, influencer marketing, CRO, and UX.

Whatever their areas of expertise, these in-country experts could, for example, plan and manage an international digital campaign, conduct focus groups to determine user preferences, or do demographic research to help an enterprise understand or identify their target client. Bilingual, in-country translators already have—or can learn—the skills required to become a cultural consultant.

Post-editors

It’s not uncommon for an enterprise with a maturing localization program to deploy Machine Translation (MT). And most MT programs involve some level of post-editing: the process by which a linguist edits the machine’s output to a level of quality agreed upon between the client and vendor.

Post-editing needs a different skillset than translation: instead of converting source text to target text faithfully, a post-editor has to understand how an MT engine operates and what errors might be typical, and then fix all issues required to meet the requested quality bar. It’s one part translator, one part linguistic reviewer, and one part Machine Translation specialist. Translators with good critical thinking skills can train to do this work.

The needs of global businesses give translators an opportunity to stretch and grow into a variety of other industry positions that make use of their unique skillset and cultural expertise. Will translators of the future do any actual translation? Only time will tell. In the meantime, these newer linguistic services are growing in demand, and thus, so will the need for talent.

Reference: https://bit.ly/2P1hIRJ

TAUS BIG DATA GETS BIGGER

TAUS BIG DATA GETS BIGGER

Introduction

By July 2018, the TAUS Quality Dashboard benchmarking database had exceeded 100 million words. It is still small in comparison with TMS databases, but it is slowly becoming a relevant aggregation of translation project metadata from which we can start drawing early conclusions.

TAUS’s QD is fed by a handful of enterprise companies, a notable exception being Baltic LSP Synergium. Collectively, they add about 1 million words a day.

Translation memory (TM) is the main way for churning translations at enterprise companies using TAUS DQF. Unedited matches from the TM account for a 64 percent of all content translated, and edited fuzzy matches represent close to 10 percent more. Depending on the discount scheme for matches used by vendors, companies might be saving anywhere from one half to 70 percent of their human translation budget with the help of CAT-tools and TMS.

That could mean  USD 7.5 – 10.5 million in savings for the whole sample of 100 million words, assuming the average price per word is  USD 0.15. Technology for ten enterprise companies should cost around USD 1 million a year, warranting a 7 – 10x return on investment.

TAUS’s figure for savings is much greater than any previous benchmarks. Two years ago, using data from Memsource, I looked at the TM leverage using a database of 500 million words, and the median saving was at 36 percent. Today, TAUS shows a 50 – 70 percent economy. The difference is that most Memsource clients at that time were language services companies. Large LSPs usually deal with varied content from multiple clients. A significant portion of their content is new and has no corresponding matches in the memory. Content in the enterprise is more regular and repetitive, and thus the TAUS database can boast higher match rates.

The quest for an ideal TM + MT combo

According to the dataset, machine translation is nowhere close to replacing TM in the business-boosting human translator productivity. MT accounts for roughly only 12.5 percent of segments. Furthermore, most MT suggestions require some editing. However, drawing final conclusions would be unfair considering that the sample for MT is still very small.

TAUS is looking for an ideal threshold on which to replace TM with MT. The report splits the sample into two workflows. In the first, there is translation memory with humans. In the second MT-supported workflow, the text goes through the TM first, and machine translation is used for segments where memory matches are below a quality threshold. At the moment, an early speculation is that the best threshold is a 70 percent match rate, after which MT becomes inefficient. Companies use this cut-off point in practice, and TAUS’s objective is to check whether there is data to prove this is the most efficient way.

The search continues — through Levenshtein edit distances and tag-riddled segments.

400 words an hour — the average productivity for a human translator

Finally, the dataset gives insight into human productivity. TAUS offers an online tool to benchmark, but the data there is skewed because most of the volume comes from TM and MT. Using the report data on human translation volumes we were able to configure the visualization for languages with significant human-made volumes only (German, Baltic, Russian). The result: 400 words an hour without the help of technology. Pure, un-augmented human brain power.

A 7-hour full work day nets about 2,800 words, or roughly 11 pages. If only someone could sit for 7 hours straight to perform uninterrupted translating…

TMS databases could offer a more precise picture

TAUS QD database of 100 million is tiny compared to the massive silos on which cloud-based TMS companies sit. For instance, Memsource claims to have processed more than 20 billion words last year, but with about one third actually translated. XTM says their public cloud clocked 14 billion in source words, and on private cloud clients uploaded billions more. In a recent presentation, Smartling claimed that they translated  8.5 billion words in 2017.

Companies track word numbers differently, and none of them believe each other, but it’s a good measure for the order of magnitude.

Though smaller, TAUS database has the benefit of neutrality. You can believe that their numbers are true.

Reference: http://bit.ly/2vOOP49

 

 

 

 

 

 

 

 

 

 

 

A Beginner’s Guide to Machine Translation

A Beginner’s Guide to Machine Translation

What is Machine Translation?

Machine translation (MT) is automated translation by computer software. MT can be used to translate entire texts without any human input, or can be used alongside human translators. The concept of MT started gaining traction in the early 50s, and has come a long way since. Many used to consider MT an inadequate alternative to human translators, but as the technology has advanced, more and more companies are turning to MT to aid human translators and optimize the localization process.

How Does Machine Translation Work?

Well, that depends on the type of machine translation engine. There are several different kinds of MT software which work in different ways. We will introduce Rule-based, Statistical, and Neural.

Rule-based machine translation (RBMT) is the forefather of MT software. It is based on sets of grammatical and syntactical rules and phraseology of a language. RBMT links the structure of the source segment to the target segment, producing a result based on analysis of the rules of the source and target languages. The rules are developed by linguists and users can add terminology to override the MT and improve the translation quality.

Statistical MT (SMT) started in the age of big data and uses large amounts of existing translated texts and statistical models and algorithms to generate translations. This system relies heavily on available multilingual corpora and an average of two millions words are needed to train the engine for a specific domain – which can be time and resource intensive. When a using domain specific data, SMT can produce good quality translations, especially in the technical, medical, and financial field.

Neural MT (NMT) is a new approach which is built on deep neural networks. There are a variety of network architectures used in NMT but typically, the network can be divided into two components: an encoder which reads the input sentence and generates a representation suitable for translation, and a decoder which generates the actual translation. Words and even whole sentences are represented as vectors of real numbers in NMT. Compared to the previous generation of MT, NMT generates outputs which tend to be more fluent and grammatically accurate. Overall, NMT is a major step in MT quality. However, NMT may slightly lack behind previous approaches when it comes to translating rare words and terminology. Long and/or complex sentences are still an issue even for state-of-the-art NMT systems.

The Pros and Cons of Machine Translation

So now you have a brief understanding of MT – but what does it mean for your translation workflow? How does it benefit you?

  • MT is incredibly fast and can translate thousands of words per minute.
  • It can translate into multiple languages at once which drastically reduces the amount of manpower needed.
  • Implementing MT into your localization process can do the heavy lifting for translators and free up their valuable time, allowing them to focus on the more intricate aspects of translation.
  • MT technology is developing rapidly, and is constantly advancing towards producing higher quality translations and reducing the need for post-editing.

There are many advantages of using MT but we can’t ignore the disadvantages. MT does not always produce perfect translations. Unlike human translators, computers can’t understand context and culture, therefore MT can’t be used to translate anything and everything. Sometimes MT alone is suitable, while others a combination of MT and human translation is best. Sometimes it is not suitable at all. MT is not a one-size-fits-all translation solution.

When Should You Use Machine Translation?

When translating creative or literary content, MT is not a suitable choice. This can also be the case when translating culturally specific-texts. A good rule of thumb is the more complex your content is, the less suitable it is for MT.

For large volumes of content, especially if it has a short turnaround time, MT is very effective. If accuracy is not vital, MT can produce suitable translations at a fraction of the cost. Customer reviews, news monitoring, internal documents, and product descriptions are all good candidates.

Using a combination of MT along with a human translator post-editor opens the doors to a wider variety of suitable content.

Which MT Engine Should You Use?

Not all MT engines are created equal, but there is no specific MT engine for a specific kind of content. Publicly available MT engines are designed to be able to translate most types of content, however, with custom MT engines the training data can be tailored to a specific domain or content types.

Ultimately, choosing an MT engine is a process. You need to choose the kind of content you wish to translate, review security and privacy policies, run tests on text samples, choose post-editors, and several other considerations. The key is to do your research before making a decision. And, if you are using a translation management system (TMS) be sure it is able to support your chosen MT engine.

Using Machine Translation and a Translation Management System

You can use MT on its own, but to get the maximum benefits we suggest integrating it with a TMS. With these technologies integrated, you will be able to leverage additional tools such as translation memories, term bases, and project management features to help streamline and optimize your localization strategy. You will have greater control over your translations, and be able to analyze the effectiveness of your MT engine.

Reference: http://bit.ly/2P85d7P