Tag: Translation Technology Development

Creative Destruction in the Localization Industry

Creative Destruction in the Localization Industry

Excerpts from an article with the same title, written by Ameesh Randeri in Multilingual Magazine.  Ameesh Randeri is part of the localization solutions department at Autodesk and manages the vendor and linguistic quality management functions. He has over 12 years of experience in the localization industry, having worked on both the buyer and seller sides.

Te concept of creative destruction was derived from the works of Karl Marx by economist Joseph Schumpeter. Schumpeter elaborated on the concept in his 1942 book Capitalism, Socialism, and Democracy, where he described creative destruction as the “process of industrial mutation that incessantly revolutionizes the economic structure from within, incessantly destroying the old one, incessantly creating a new one.

What began as a concept of economics started being used broadly across the spectrum to describe breakthrough innovation that requires invention and ingenuity — as well as breaking apart or destroying the previous order. To look for examples of creative destruction, just look around you. Artificial intelligence, machine learning and automation are creating massive efficiency gains and productivity increases, but they are also causing millions to lose jobs. Uber and other ride hailing apps worldwide are revolutionizing transport, but many traditional taxi companies are suffering.

Te process of creative destruction and innovation is accelerating over time. To understand this, we can look at the Schumpeterian (Kondratieff) waves of technological innovation. We are currently in the fifth wave of innovation ushered in by digital networks, the software industry and new media.

Te effects of the digital revolution can be felt across the spectrum. Te localization industry is no exception and is undergoing fast-paced digital disruption. There is a confluence of technology in localization tools and processes that are ushering in major changes.

The localization industry: Drawing parallels from the Industrial Revolution

All of us are familiar with the Industrial Revolution. It commenced in the second half of the 18th century and went on until the mid-19th century. As a result of the Industrial Revolution, we witnessed a transition from hand production methods to machine-based methods and factories that facilitated mass production. It ushered in innovation and urbanization. It was creative destruction at its best. Looking back at the Industrial Revolution, we see that there were inflection points, following which there were massive surges and changes in the industry.

Translation has historically been a human and manual task. A translator looks at the source text and translates it while keeping in mind grammar, style, terminology and several other factors. Te translation throughput is limited by a human’s productivity, which severely
limits the volume of translation and time required. In 1764, James Hargreaves invented the spinning jenny, a machine that enabled an individual to produce multiple spools of
threads simultaneously. Inventor Samuel Compton innovated further and came up with the spinning mule, further improving the process. Next was the mechanization of cloth weaving through the power loom, invented by Edmund Cartwright. These innovators and their inventions completely transformed the textile industry.

For the localization industry, a similar innovation is machine translation (MT). Tough research into MT had been going on for many years, it went mainstream post-2005. Rule-based and statistical MT engines were created, which resulted in drastic productivity increases. However, the quality was nowhere near what a human could produce and hence the MT engines became a supplemental technology, aiding humans and helping them increase productivity.

There was a 30%-60% productivity gain based on the language and engine that was used. There was fear that translators’ roles would diminish. But rather than diminish, their role evolved into post-editing.

The real breakthrough came in 2016 when Google and Microsoft went public with their neural machine translation (NMT) engines. Te quality produced by NMT is not yet flawless, but it seems to be very close to human translation. It can also reproduce some of the finer
nuances of writing style and creativity that were lacking in the rule-based and statistical machine translation engines. NMT is a big step forward in reducing the human footprint in the translation process. It is without a doubt an inflection point and while not perfect yet, it
has the same disruptive potential as the spinning jenny and the power loom. Sharp productivity increases, lower prices and since a machine is behind it, the volumes that can be managed are endless. And hence it renews concerns about whether translators will be needed. It is to the translation industry what the spinning jenny was to textiles, where several manual workers were
replaced by machines.

What history teaches us though is that although there is a loss of jobs based on the existing task or technology, there are newer ones created to support the newer task or technology.

In the steel industry, two inventors charted a new course: Abraham Darby, who created a cheaper, easier method to produce cast iron, using a coke-fueled furnace and Henry Bessemer, who invented the Bessemer process, the first inexpensive process for mass-producing steel. The Bessemer process revolutionized steel manufacturing by decreasing its cost, from £40 per long ton to £6–7 per long ton. Besides the
reduction in cost, there were major increases in speed and the need for labor decreased sharply.

The localization industry is seeing the creation of its own Bessemer process, called continuous localization. Simply explained, it is a fully-connected and automated process where the content creators and developers create source material that is passed for translation in continuous, small chunks. The translated content is continually merged back, facilitating continuous deployment and release. It is an extension of the agile approach and it can be demonstrated with the example of mobile applications where latest updates are continually pushed through to our phones in multiple languages. To facilitate continuous localization, vendor platforms or computer-assisted translation (CAT) tools need to be able to connect to client systems or clients need to provide CAT tool-like interfaces for vendors and their resources to use. The process would flow seamlessly from the developer or content creator creating content to the post-editor doing edits to the machine translated content. The Bessemer process in the steel industry paved the way for large-scale continuous and efficient steel production. Similarly, continuous localization has the potential to pave the way for large-scale continuous and efficient localization enabling companies to localize more, into more languages at lower prices.

There were many other disruptive technologies and processes that led to the Industrial Revolution. For the localization industry as well, there are several other tools and process improvements in play.

Audiovisual localization and interpretation: This is a theme that began evolving in recent years. Players like Microsoft-Skype and Google have made improvements in the text-to-speech, speech-to-text arena. The text to speech has become more human-like though it isn’t there yet. Speech-to-text has improved significantly as well, with the output quality going up and errors reducing. Interpretation is the other area where we see automated solutions springing up. Google’s new headphones are one example of automated interpretation solutions.

Automated terminology extraction: This is one that hasn’t garnered as much attention and focus. While there is consensus that terminology is an important aspect of localization quality, it always seems to be relegated to a lower tier from a technological advancement standpoint. There are several interesting commercial as well as open source solutions that have greatly improved terminology extraction and reduced the false positives. This area could potentially be served by artificial intelligence and machine learning solutions in the future.

Automated quality assurance (QA) checks: QA checks can be categorized into two main areas – functional and linguistic. In terms of functional QA, automations have been around for several years and have vastly improved over time. There is already exploration on applying machine learning and artificial intelligence to functional automations to predict bugs, to create scripts that are self-healing and so on. Linguistic QA on the other hand has seen some automation primarily in the areas of spelling and terminology checks. However, the automation is limited in what it can achieve and does not replace the need for human checks or audits. This is an area that could benefit hugely from artificial intelligence and machine learning.

Local language support using chatbots: Chatbots are fast becoming the first level of customer support for most companies. Most chatbots are still in English. However, we are starting to see chatbots in local languages powered by machine translation engines in the background thus enabling local language support for international customers.

Data (big or small): While data is not a tool, technology or process by itself, it is important to call it out. Data is central to a lot of the technologies and processes mentioned above. Without a good corpus, there is no machine translation. For automated terminology extraction and automated QA checks, the challenge is to have a big enough corpus of data making it possible to train the machine. In addition, metadata becomes critical. Today metadata is important to provide translators with additional contextual information, to ensure higher quality output. In future, metadata will provide the same information to machines – to a machine translation system, to an automated QA check and so on. This highlights the importance of data!

The evolution in localization is nothing but the forces of creative destruction. Each new process/technology is destructing an old way of operating and creating a new way forward. It also means that old jobs are being made redundant while new ones are being created.

How far is this future? Well, the entire process is extremely resource and technology intensive. Many companies will require a lot of time to adopt these practices. This provides the perfect opportunity for sellers to spruce up their offering and provide an automated digital localization solution. Companies with access to abundant resources or funding should be able to achieve this sooner. This is also why a pan-industry open source platform may accelerate this transformation.

Is There a Future in Freelance Translation? Let’s Talk About It!

Is There a Future in Freelance Translation? Let’s Talk About It!

While the demand for translation services is at a record high, many freelancers say their inflation-adjusted earnings seem to be declining. Why is this and can anything be done to reverse what some have labelled an irreversible trend?

Over the past few years globalization has brought unprecedented growth to the language services industry. Many have heard and answered the call. Census data shows that the number of translators and interpreters in the U.S. nearly doubled between 2008 and 2015, and, according to the Bureau of Labor Statistics, the employment outlook for translators and interpreters is projected to grow by 29% through 2024. In an interview with CNBC last year, ATA Past President David Rumsey stated: “As the economy becomes more globalized and businesses realize the need for translation and interpreting to market their products and services, opportunities for people with advanced language skills will continue to grow sharply.” Judging by the size of the industry—estimated at $33.5 billion back in 2012, and expected to reach $37 billion this year—it seems the demand for translation will only continue to increase.

Many long-time freelance translators, however, don’t seem to be benefitting from this growth, particularly those who don’t work with a lot of direct clients. Many report they’ve had to lower their rates and work more hours to maintain their inflation-adjusted earnings. Also, the same question seems to be popping up in articles, blogs, and online forums. Namely, if the demand for translation is increasing, along with opportunities for people with advanced language skills, why are many professional freelance translators having difficulty finding work that compensates translation for what it is—a time-intensive, complex process that requires advanced, unique, and hard-acquired skills?

Before attempting to discuss this issue, a quick disclaimer is necessary: for legal reasons, antitrust law prohibits members of associations from discussing specific rates. Therefore, the following will not mention translation rates per se. Instead, it will focus on why many experienced translators, in a booming translation market inundated by newcomers, are forced to switch gears or careers, and what can be done to reverse what some have labelled an irreversible trend.

THE (UNQUANTIFIABLE) ISSUE

I’ll be honest. Being an in-house translator with a steady salary subject to regular increases, I have no first-hand experience with the crisis many freelance translators are currently facing. But I have many friends and colleagues who do. We all do. Friends who tell us that they’ve lost long-standing clients because they couldn’t lower their rates enough to accommodate the clients’ new demands. Friends who have been translating for ages who are now wondering whether there’s a future in freelance translation.

Unfortunately, unlike the growth of the translation industry, the number of freelance translators concerned about the loss of their inflation-adjusted earnings and the future of the profession is impossible to quantify. But that doesn’t mean the problem is any less real. At least not judging by the increasing number of social media posts discussing the issue, where comments such as the ones below abound.

  • “Expenses go up, but rates have remained stagnant or decreased. It doesn’t take a genius to see that translation is slowly becoming a sideline industry rather than a full-time profession.”
  • “Some business economists claim that translation is a growth industry. The problem is that the growth is in volume, not rates.”
  • “Our industry has been growing, but average wages are going down. This means that cheap service is growing faster than quality.”

Back in 2010, Common Sense Advisory, a market research company specializing in translation and globalization, started discussing technology- and globalization-induced rate stagnation and analyzing potential causes. Now, almost 10 years later, let’s take another look at what created the crisis many freelance translators are facing today.

A LONG LIST OF INTERCONNECTED FACTORS

The causes leading to technology- and globalization-induced rate stagnation are so interconnected that it’s difficult to think of each one separately. Nevertheless, each deserves a spot on the following list.

Globalization, internet technology, and the growth of demand for translation services naturally resulted in a rise of the “supply.” In other words, an increasing number of people started offering their services as translators. Today, like all professionals affected by global competition, most freelance translators in the U.S., Canada, Australia, and Western Europe find themselves competing against a virtually infinite pool of translators who live in countries where the cost of living is much cheaper and are able to offer much lower rates. Whether those translators are genuine professional translators or opportunists selling machine translation to unsuspecting clients is almost immaterial. As the law of supply and demand dictates, when supply exceeds demand, prices generally fall.

1. The Sheer Number of Language Services Providers and the Business/Competition Model: The increase in global demand has also lead to an increase in the number of language services providers (LSPs) entering the market. Today, there are seemingly thousands of translation agencies in a market dominated by top players. Forced to keep prices down and invest in advertising and sales to maintain their competitiveness, many agencies give themselves limited options to keep profits up—the most obvious being to cut direct costs (i.e., lower rates paid to translators). Whether those agencies make a substantial profit each year (or know anything about translation itself) is beside the point. There are many LSPs out there that follow a business model that is simply not designed to serve the interests of freelance translators. Interestingly enough, competing against each other on the basis of price alone doesn’t seem to be serving their interests either, as it forces many LSPs into a self-defeating, downward spiral of dropping prices. As Luigi Muzii, an author, translator, terminologist, teacher, and entrepreneur who has been working in the industry for over 30 years, puts it:

“The industry as a whole behaves as if the market were extremely limited. It’s as if survival depended on open warfare […] by outright price competition. Constantly pushing the price down is clearly not a sustainable strategy in the long-term interests of the professional translation community.”

2. The Unregulated State of the Profession: In many countries, including the U.S., translation is a widely unregulated profession with low barriers to entry. There is also not a standardized career path stipulating the minimum level of training, experience, or credentials required. Despite the existence of ISO standards and certifications from professional associations around the globe, as long as the profession (and membership to many professional associations) remains open to anyone and everyone, competition will remain exaggeratedly and unnaturally high, keeping prices low or, worse, driving them down.

3. Technology and Technological “Improvements”: From the internet to computer-assisted translation (CAT) tools to machine translation, technology may not be directly related to technology- and globalization-induced rate stagnation, but there’s no denying it’s connected. The internet is what makes global communication and competition possible. CAT tools have improved efficiency so much in some areas that most clients have learned to expect three-tier pricing in all areas. Machine translation is what’s allowing amateurs to pass as professionals and driving the post-editing-of-machine-translation business that more and more LSPs rely on today. Whether machine translation produces quality translations, whether the post-editing of machine translation is time efficient, and whether “fuzzy matches” require less work than new content are all irrelevant questions, at least as things stand today. As long as technologies that improve (or claim to improve) efficiency exist, end clients will keep expecting prices to reflect those “improvements.”

4. Unaware, Unsuspecting, and Unconcerned Clients: Those of you who’ve read my article about “uneducated” clients may think that I’m obsessed with the subject, but to me it seems that most of the aforementioned factors have one common denominator: clients who are either unaware that all translations (and translators) are not created equal, or are simply unconcerned about the quality of the service they receive. These clients will not be willing to pay a premium price for a service they don’t consider to be premium.

One look at major translation bloopers and their financial consequences for companies such as HSBC, KFC, Ford, Pampers, Coca Cola, and many more is enough to postulate that many clients know little about translation (or the languages they’re having their texts translated into). They may be unaware that results (in terms of quality) are commensurate to a translator’s skills, experience, and expertise, the technique/technology used for translating, and the time spent on a project. And who’s to blame them? Anyone with two eyes is capable of looking at a bad paint job and seeing it for what it is, but it requires a trained eye to spot a poor translation and knowledge of the translation process itself (and language in general) to value translation for what it is.

Then there’s the (thankfully marginal) number of clients who simply don’t care about the quality of the service they receive, or whether the translation makes sense or not. This has the unfortunate effect of devaluing our work and the profession in the eyes of the general public. Regrettably, when something is perceived as being of little value, it doesn’t tend to fetch premium prices. As ATA Treasurer John Milan writes:

“When consumers perceive value, they [clients] are more willing to pay for it, which raises a series of questions for our market. Do buyers of language services understand the services being offered? What value do they put on them? […] All these variables will have an impact on final market rates.”

5. The Economy/The Economical State of Mind: Whether clients need or want to save money on language services, there’s no denying that everyone always seems to be looking for a bargain these days. Those of us who have outsourced translation on behalf of clients know that, more often than not, what drives a client’s decision to choose a service provider over another is price, especially when many LSPs make the same claims about their qualifications, quality assurance processes, and industry expertise.

6. Other Factors: From online platforms and auction sites that encourage price-based bidding and undifferentiated global competition, to LSPs making the post-editing of machine translation the cornerstone of their business, to professional translators willing to drop their rates to extreme lows, there are many other factors that may be responsible for the state of things. However, they’re more byproducts of the situation than factors themselves.

A VERY REAL CONCERN

Rising global competition and rate stagnation are hardly a unique situation. Today, freelance web designers, search engine optimization specialists, graphic designers, and many other professionals in the U.S., Canada, Australia, and Western Europe must compete against counterparts in India, China, and other parts of the world where the cost of living is much cheaper—with the difference that product/service quality isn’t necessarily sacrificed in the process. And that may be the major distinction between what’s happening in our industry and others: the risk posed to translation itself, both as an art form and as a product/service.

While some talk about the “uberization” or “uberification” of the translation industry or blame technology (namely, machine translation) for declining rates, others point a finger at a business model (i.e., the business/competition model) that marginalizes the best translators and creates a system where “bad translators are driving out the good ones.” The outcome seems to be the same no matter which theory we examine: the number of qualified translators (and the quality of translations) is in danger of going down over time. As Luigi Muzii explains:

“The unprecedented growth in demand for translation in tandem with the effect of Gresham’s Law [i.e., bad translators driving out the good ones] will lead inexorably to a chronic shortfall of qualified language specialists. The gap between the lower and the higher ends of the translation labor market is widening and the process will inevitably continue.”

Between 2006 and 2012, Common Sense Advisory conducted a regular business confidence survey among LSPs. During those years, there seemed to be an increase in the number of LSPs that reported having difficulty finding enough qualified language specialists to meet their needs. Since the number of translators varies depending on the language pair, the shortage may not yet be apparent in all segments of the industry, but the trend is obviously noticeable enough that an increasing number of professionals (translators, LSPs, business analysts, etc.) are worrying about it. And all are wondering the same thing: can anything be done to reverse it?

ARE THERE ANY “SOLUTIONS?”

In terms of solutions, two types have been discussed in recent years: micro solutions (i.e., individual measures that may help individual translators maintain their rates or get more work), and macro solutions (i.e., large-scale measures that may help the entire profession on a long-term basis).

On the micro-solution side, we generally find:

  • Differentiation (skills, expertise, productivity, degree, etc.)
  • Specialization (language, subject area, market, translation sub-fields such a transcreation)
  • Diversification (number of languages or services offered, etc.)
  • Presentation (marketing efforts, business practices, etc.)
  • Client education

Generally speaking, micro solutions tend to benefit only the person implementing them, although it can be argued that anything that can be done to improve one’s image as a professional and educate clients might also benefit the profession as a whole, albeit to a lesser degree.

On the macro-solution side, we find things that individual translators have somewhat limited power over. But professional associations (and even governments) may be able to help!

Large-Scale Client Education: Large-scale client education is possibly the cornerstone of change; the one thing that may change consumer perception and revalue the profession in the eyes of the general public. As ATA Treasurer John Milan puts it:

“Together, we can educate the public and ensure that our consumers value us more like diamonds and less like water”

Most professional associations around the globe already publish client education material, such as Translation, Getting it Right— A Guide to Buying Translation. Other initiatives designed to raise awareness about translation, such as ATA’s School Outreach Program, are also helpful because they educate the next generation of clients. But some argue that client education could be more “aggressive.” In other words, professional associations should not wait for inquiring clients to look for information, but take the information to everyone, carrying out highly visible public outreach campaigns (e.g., advertising, articles, and columns in the general media). ATA’s Public Relations Committee has been very active in this area, including publishing articles written by its Writers Group in over 85 trade and business publications.

Some have also mentioned that having professional associations take a clear position on issues such as machine translation and the post-editing of machine translation would also go a long way in changing consumer perception. In this regard, many salute ATA’s first Advocacy Day last October in Washington, DC, when 50 translators and interpreters approached the U.S. Congress on issues affecting our industry, including machine translation and the “lowest-price-technically-available” model often used by the government to contract language services. However, the success of large-scale client education may be hindered by one fundamental element, at least in the United States.

Language Education: I’m a firm believer that there are some things that one must have some personal experience with to value. For example, a small business owner might think that tax preparation is easy (and undervalue the service provided by his CPA) until he tries to prepare his business taxes himself and realizes how difficult and time consuming it is—not to mention the level of expertise required!

Similarly, monolingual people may be told or “understand” that translation is a complex process that requires a particular set of skills, or that being bilingual doesn’t make you a translator any more than having two hands makes you a concert pianist. But unless they have studied another language (or, in the case of bilingual people, have formally studied their second language or have tried their hand at translation), they’re not likely to truly comprehend the amount of work and expertise required to translate, or value translation for what it really is.

According to the U.S. Census Bureau, the vast majority of Americans (close to 80%) remain monolingual, and only 10% of the U.S. population speak another language well. In their 2017 report on the state of language education in the U.S., the Commission on Language Learning concluded that the U.S. lags behind most nations when it comes to language education and knowledge, and recommended a national strategy to improve access to language learning and “value language education as a persistent national need.”

Until language education improves and most potential clients have studied a second language, one might contend that the vast majority of Americans are likely to keep undervaluing translation services and that large-scale client education may not yield the hoped-for results. This leaves us with one option when it comes to addressing the technology- and globalization-induced rate stagnation conundrum.

Industry-Wide Regulations: In most countries, physicians are expected to have a medical degree, undergo certification, and get licensed to practice medicine. The same applies to dentists, nurses, lawyers, plumbers, electricians, and many other professions. In those fields, mandatory education, training, and/or licensing/certification establish core standards and set an expected proficiency level that clients have learned to expect and trust—a proficiency level that all clients value.

Whether we’re talking of regulating access to the profession itself or controlling access to professional associations or online bidding platforms, there’s no question that implementing industry-wide regulations would go a long way in limiting wild, undifferentiated competition and assuring clients that they are receiving the best possible service. While some may think that regulations are not a practical option, it may be helpful to remember that physicians didn’t always have to undergo training, certification, and licensing to practice medicine in the U.S. Today, however, around 85% of physicians in the U.S. are certified by an accredited medical board, and it’s safe to say that all American physicians have a medical degree and are licensed to practice medicine. And the general public wouldn’t want it any other way! Is it so implausible to expect that the same people who would let no one except a qualified surgeon operate on them would want no one except a qualified professional translate the maintenance manual of their nation’s nuclear reactors?

SO, WHAT DOES THE FUTURE HOLD FOR FREELANCE TRANSLATORS?

Generally speaking, most experts agree that the demand for translation services will keep growing, that technology will keep becoming more and more prevalent, and that the translation industry will become even more fragmented. According to Luigi Muzii:

In the immediate future, I see the translation industry remaining highly fragmented with an even larger concentration of the volume of business in the hands of a bunch of multi-language vendors who hire translators from the lower layer of the resource market to keep competing on price. This side of the industry will soon count for more than a half of the pie. The other side will be made up of tiny local boutique firms and tech-savvy translator pools making use of cutting-edge collaborative tools. […] The prevailing model will be “freeconomics,” where basic services are offered for free while advanced or special features are charged at a premium. The future is in disintermediation and collaboration. […] The winners will be those translators who can leverage their specialist linguistic skills by increasing their productivity with advances in technology.

The future of freelance translation, however, may be a bit more uncertain. Indeed, many argue that even with acute specialization, first-rate translation skills, and marketing abilities to match, many freelance translators’ chances at succeeding financially in the long term may be limited by the lack of industry regulations and the general public’s lack of language education/knowledge (i.e., the two factors that feed wild, undifferentiated competition). But that’s not to say there’s no hope.

At least that’s what learning about the history of vanilla production taught me. Growing and curing vanilla beans is a time-intensive, labor-intensive, intricate process. It’s a process that meant that for over 150 years vanilla was considered a premium product, and vanilla growers made a decent living. When vanillin (i.e., synthetic vanilla flavoring) became widely available in the 1950s, however, most food manufacturers switched to the less expensive alternative. After only a few decades, many vanilla growers were out of business and the ones who endured barely made a living, forced to lower prices or resort to production shortcuts (which reduced quality) to sell faster. During that period, the only people making a profit were the vanilla brokers. At the beginning of the 21st century, however, nutrition education and consumer demand for all-natural foods started turning things around, and by 2015 vanillin had fallen from grace and natural vanilla was in high demand again. By then, however, there were few vanilla growers left and climate change was affecting production and reducing supply significantly. Today, vanilla beans fetch 30–50 times the price they did during the vanillin era.

For those who may have missed the analogy: professional (freelance) translators are to the translation industry what the vanilla growers are to the food industry. Those who endure the current technology- and globalization-induced rate stagnation may eventually (if the forces at play can be harnessed) witness a resurgence. In the meantime, the best we can do is to keep doing what we do (provide quality service, educate our clients, fight for better language education in the U.S., and support our professional associations’ initiatives to improve things), and talk constructively about the issue instead of pretending that it doesn’t exist, that it won’t affect us, or that nothing can be done about it. If you’re reading this article, things have already started to change!

 

Reference: https://bit.ly/2K3t1Xe

SDL Cracks Russian to English Neural Machine Translation

SDL Cracks Russian to English Neural Machine Translation

On 19 June 2018, SDL published a press release to announce that its next-generation SDL Neural Machine Translation (NMT) 2.0 has mastered Russian to English translation, one of the toughest linguistic Artificial Intelligence (AI) problems to date.

SDL NMT 2.0 outperformed all industry standards, setting a benchmark for Russian to English machine translation, with over 90% of the system’s output labelled as perfect by professional Russian-English translators. The new SDL NMT 2.0 Russian engine is being made available to enterprise customers via SDL Enterprise Translation Server (ETS), a secure NMT product, enabling organizations to translate large volumes of information into multiple languages.

“One of the toughest linguistic challenges facing the machine translation community has been overcome by our team,” said Adolfo Hernandez, CEO, SDL. “It was the Russian language that first inspired the science and research behind machine translation, and since then it has always been a major challenge for the community. SDL has deployed breakthrough research strategies to master these difficult languages, and support the global expansion of its enterprise customers. We have pushed the boundaries and raised the performance bar even higher, and we are now paving the way for leadership in other complex languages.

”The linguistic properties and intricacies of the Russian language relative to English make it particularly challenging for MT systems to model. Russian is a highly inflected language with different syntax, grammar, and word order compared to English. Given the complexities created by these differences between the Russian and English language, raising the translation quality has been an ongoing focus of the SDL Machine Learning R&D team.

“With over 15 years of research and innovation in machine translation, our scientists and engineers took up the challenge to bring Neural MT to the next level,” said Samad Echihabi, Head of Machine Learning R&D, SDL. “We have been evolving, optimizing and adapting our neural technology to deal with highly complex translation tasks such as Russian to English, with phenomenal results. A machine running SDL NMT 2.0 can now produce translations of Russian text virtually indistinguishable from what Russian-English bilingual humans can produce.”

SDL NMT 2.0 is optimized for both accuracy and fluency and provides a powerful paradigm to deal with morphologically rich languages. It has been designed to adapt to the quality and quantity of the data it is trained on leading to high learning efficiency. SDL NMT 2.0 is also developed with the enterprise in mind with a significant focus on translation production speed and user control via terminology support. This also adds another level of productivity to Language Services Providers, and SDL’s own translators will be first to get access and benefit from this development.

Powered by SDL NMT 2.0, SDL Enterprise Translation Server (ETS) transforms the way global enterprises understand, communicate, collaborate and do business enabling them to securely translate and deliver large volumes of content into one or more languages quickly. Offering total control and security of translation data, SDL ETS has been successfully used in the government sector as well for over a decade.

Machine Translation From the Cold War to Deep Learning

Machine Translation From the Cold War to Deep Learning

In the beginning

The story begins in 1933. Soviet scientist Peter Troyanskii presented “the machine for the selection and printing of words when translating from one language to another” to the Academy of Sciences of the USSR. The invention was super simple — it had cards in four different languages, a typewriter, and an old-school film camera.

The operator took the first word from the text, found a corresponding card, took a photo, and typed its morphological characteristics (noun, plural, genitive) on the typewriter. The typewriter’s keys encoded one of the features. The tape and the camera’s film were used simultaneously, making a set of frames with words and their morphology.

Despite all this, as often happened in the USSR, the invention was considered “useless”. Troyanskii died of Stenocardia after trying to finish his invention for 20 years. No one in the world knew about the machine until two Soviet scientists found his patents in 1956.

It was at the beginning of the Cold War. On January 7th 1954, at IBM headquarters in New York, the Georgetown–IBM experiment started. The IBM 701 computer automatically translated 60 Russian sentences into English for the first time in history.

However, the triumphant headlines hid one little detail. No one mentioned the translated examples were carefully selected and tested to exclude any ambiguity. For everyday use, that system was no better than a pocket phrasebook. Nevertheless, this sort of arms race launched: Canada, Germany, France, and especially Japan, all joined the race for machine translation.

The race for machine translation

The vain struggles to improve machine translation lasted for forty years. In 1966, the US ALPAC committee, in its famous report, called machine translation expensive, inaccurate, and unpromising. They instead recommended focusing on dictionary development, which eliminated US researchers from the race for almost a decade.

Even so, a basis for modern Natural Language Processing was created only by the scientists and their attempts, research, and developments. All of today’s search engines, spam filters, and personal assistants appeared thanks to a bunch of countries spying on each other.

Rule-based machine translation (RBMT)

The first ideas surrounding rule-based machine translation appeared in the 70s. The scientists peered over the interpreters’ work, trying to compel the tremendously sluggish computers to repeat those actions. These systems consisted of:

  • Bilingual dictionary (RU -> EN)
  • A set of linguistic rules for each language (For example, nouns ending in certain suffixes such as -heit, -keit, -ung are feminine)

That’s it. If needed, systems could be supplemented with hacks, such as lists of names, spelling correctors, and transliterators.

PROMPT and Systran are the most famous examples of RBMT systems. Just take a look at the Aliexpress to feel the soft breath of this golden age.

But even they had some nuances and subspecies.

Direct Machine Translation

This is the most straightforward type of machine translation. It divides the text into words, translates them, slightly corrects the morphology, and harmonizes syntax to make the whole thing sound right, more or less. When the sun goes down, trained linguists write the rules for each word.

The output returns some kind of translation. Usually, it’s quite crappy. It seems that the linguists wasted their time for nothing.

Modern systems do not use this approach at all, and modern linguists are grateful.

Transfer-based Machine Translation

In contrast to direct translation, we prepare first by determining the grammatical structure of the sentence, as we were taught at school. Then we manipulate whole constructions, not words, afterwards. This helps to get quite decent conversion of the word order in translation. In theory.

In practice, it still resulted in verbatim translation and exhausted linguists. On the one hand, it brought simplified general grammar rules. But on the other, it became more complicated because of the increased number of word constructions in comparison with single words.

Interlingual Machine Translation

In this method, the source text is transformed to the intermediate representation, and is unified for all the world’s languages (interlingua). It’s the same interlingua Descartes dreamed of: a meta-language, which follows the universal rules and transforms the translation into a simple “back and forth” task. Next, interlingua would convert to any target language, and here was the singularity!

Because of the conversion, Interlingua is often confused with transfer-based systems. The difference is the linguistic rules specific to every single language and interlingua, and not the language pairs. This means, we can add a third language to the interlingua system and translate between all three. We can’t do this in transfer-based systems.

It looks perfect, but in real life it’s not. It was extremely hard to create such universal interlingua — a lot of scientists have worked on it their whole lives. They’ve not succeeded, but thanks to them we now have morphological, syntactic, and even semantic levels of representation. But the only Meaning-text theory costs a fortune!

The idea of intermediate language will be back. Let’s wait awhile.

As you can see, all RBMT are dumb and terrifying, and that’s the reason they are rarely used unless for specific cases (like the weather report translation, and so on). Among the advantages of RBMT, often mentioned are its morphological accuracy (it doesn’t confuse the words), reproducibility of results (all translators get the same result), and the ability to tune it to the subject area (to teach economists or terms specific to programmers, for example).

Even if anyone were to succeed in creating an ideal RBMT, and linguists enhanced it with all the spelling rules, there would always be some exceptions: all the irregular verbs in English, separable prefixes in German, suffixes in Russian, and situations when people just say it differently. Any attempt to take into account all the nuances would waste millions of man hours.

And don’t forget about homonyms. The same word can have a different meaning in a different context, which leads to a variety of translations. How many meanings can you catch here: I saw a man on a hill with a telescope?

Languages did not develop based on a fixed set of rules — a fact which linguists love. They were much more influenced by the history of invasions in past three hundred years. How could you explain that to a machine?

Forty years of the Cold War didn’t help in finding any distinct solution. RBMT was dead.

Example-based Machine Translation (EBMT)

Japan was especially interested in fighting for machine translation. There was no Cold War, but there were reasons: very few people in the country knew English. It promised to be quite an issue at the upcoming globalization party. So the Japanese were extremely motivated to find a working method of machine translation.

Rule-based English-Japanese translation is extremely complicated. The language structure is completely different, and almost all words have to be rearranged and new ones added. In 1984, Makoto Nagao from Kyoto University came up with the idea of using ready-made phrases instead of repeated translation.

Let’s imagine that we have to translate a simple sentence — “I’m going to the cinema.” And let’s say we’ve already translated another similar sentence — “I’m going to the theater” — and we can find the word “cinema” in the dictionary.

All we need is to figure out the difference between the two sentences, translate the missing word, and then not screw it up. The more examples we have, the better the translation.

I build phrases in unfamiliar languages exactly the same way!

EBMT showed the light of day to scientists from all over the world: it turns out, you can just feed the machine with existing translations and not spend years forming rules and exceptions. Not a revolution yet, but clearly the first step towards it. The revolutionary invention of statistical translation would happen in just five years.

Statistical Machine Translation (SMT)

In early 1990, at the IBM Research Center, a machine translation system was first shown which knew nothing about rules and linguistics as a whole. It analyzed similar texts in two languages and tried to understand the patterns.

The idea was simple yet beautiful. An identical sentence in two languages split into words, which were matched afterwards. This operation repeated about 500 million times to count, for example, how many times the word “Das Haus” translated as “house” vs “building” vs “construction”, and so on.

If most of the time the source word was translated as “house”, the machine used this. Note that we did not set any rules nor use any dictionaries — all conclusions were done by machine, guided by stats and the logic that “if people translate that way, so will I.” And so statistical translation was born.

The method was much more efficient and accurate than all the previous ones. And no linguists were needed. The more texts we used, the better translation we got.

There was still one question left: how would the machine correlate the word “Das Haus,” and the word “building” — and how would we know these were the right translations?

The answer was that we wouldn’t know. At the start, the machine assumed that the word “Das Haus” equally correlated with any word from the translated sentence. Next, when “Das Haus” appeared in other sentences, the number of correlations with the “house” would increase. That’s the “word alignment algorithm,” a typical task for university-level machine learning.

The machine needed millions and millions of sentences in two languages to collect the relevant statistics for each word. How did we get them? Well, we decided to take the abstracts of the European Parliament and the United Nations Security Council meetings — they were available in the languages of all member countries and were now available for download at UN Corporaand Europarl Corpora.

Word-based SMT

In the beginning, the first statistical translation systems worked by splitting the sentence into words, since this approach was straightforward and logical. IBM’s first statistical translation model was called Model one. Quite elegant, right? Guess what they called the second one?

Model 1: “the bag of words”

Model one used a classical approach — to split into words and count stats. The word order wasn’t taken into account. The only trick was translating one word into multiple words. For example, “Der Staubsauger” could turn into “Vacuum Cleaner,” but that didn’t mean it would turn out vice versa.

Here’re some simple implementations in Python: shawa/IBM-Model-1.

Model 2: considering the word order in sentences

The lack of knowledge about languages’ word order became a problem for Model 1, and it’s very important in some cases.

Model 2 dealt with that: it memorized the usual place the word takes at the output sentence and shuffled the words for the more natural sound at the intermediate step. Things got better, but they were still kind of crappy.

Model 3: extra fertility

New words appeared in the translation quite often, such as articles in German or using “do” when negating in English. “Ich will keine Persimonen” → “I donot want Persimmons.” To deal with it, two more steps were added to Model 3.

  • The NULL token insertion, if the machine considers the necessity of a new word
  • Choosing the right grammatical particle or word for each token-word alignment

Model 4: word alignment

Model 2 considered the word alignment, but knew nothing about the reordering. For example, adjectives would often switch places with the noun, and no matter how good the order was memorized, it wouldn’t make the output better. Therefore, Model 4 took into account the so-called “relative order” — the model learned if two words always switched places.

Model 5: bugfixes

Nothing new here. Model 5 got some more parameters for the learning and fixed the issue with conflicting word positions.

Despite their revolutionary nature, word-based systems still failed to deal with cases, gender, and homonymy. Every single word was translated in a single-true way, according to the machine. Such systems are not used anymore, as they’ve been replaced by the more advanced phrase-based methods.

Phrase-based SMT

This method is based on all the word-based translation principles: statistics, reordering, and lexical hacks. Although, for the learning, it split the text not only into words but also phrases. These were the n-grams, to be precise, which were a contiguous sequence of n words in a row.

Thus, the machine learned to translate steady combinations of words, which noticeably improved accuracy.

The trick was, the phrases were not always simple syntax constructions, and the quality of the translation dropped significantly if anyone who was aware of linguistics and the sentences’ structure interfered. Frederick Jelinek, the pioneer of the computer linguistics, joked about it once: “Every time I fire a linguist, the performance of the speech recognizer goes up.”

Besides improving accuracy, the phrase-based translation provided more options in choosing the bilingual texts for learning. For the word-based translation, the exact match of the sources was critical, which excluded any literary or free translation. The phrase-based translation had no problem learning from them. To improve the translation, researchers even started to parse the news websites in different languages for that purpose.

Starting in 2006, everyone began to use this approach. Google Translate, Yandex, Bing, and other high-profile online translators worked as phrase-based right up until 2016. Each of you can probably recall the moments when Google either translated the sentence flawlessly or resulted in complete nonsense, right? The nonsense came from phrase-based features.

The good old rule-based approach consistently provided a predictable though terrible result. The statistical methods were surprising and puzzling. Google Translate turns “three hundred” into “300” without any hesitation. That’s called a statistical anomaly.

Phrase-based translation has become so popular, that when you hear “statistical machine translation” that is what is actually meant. Up until 2016, all studies lauded phrase-based translation as the state-of-the-art. Back then, no one even thought that Google was already stoking its fires, getting ready to change our whole image of machine translation.

Syntax-based SMT

This method should also be mentioned, briefly. Many years before the emergence of neural networks, syntax-based translation was considered “the future or translation,” but the idea did not take off.

The proponents of syntax-based translation believed it was possible to merge it with the rule-based method. It’s necessary to do quite a precise syntax analysis of the sentence — to determine the subject, the predicate, and other parts of the sentence, and then to build a sentence tree. Using it, the machine learns to convert syntactic units between languages and translates the rest by words or phrases. That would have solved the word alignment issue once and for all.

The problem is, the syntactic parsing works terribly, despite the fact that we consider it solved a while ago (as we have the ready-made libraries for many languages). I tried to use syntactic trees for tasks a bit more complicated than to parse the subject and the predicate. And every single time I gave up and used another method.

Let me know in the comments if you succeed using it at least once.

Neural Machine Translation (NMT)

A quite amusing paper on using neural networks in machine translation was published in 2014. The Internet didn’t notice it at all, except Google — they took out their shovels and started to dig. Two years later, in November 2016, Google made a game-changing announcement.

The idea was close to transferring the style between photos. Remember apps like Prisma, which enhanced pictures in some famous artist’s style? There was no magic. The neural network was taught to recognize the artist’s paintings. Next, the last layers containing the network’s decision were removed. The resulting stylized picture was just the intermediate image that network got. That’s the network’s fantasy, and we consider it beautiful.

If we can transfer the style to the photo, what if we try to impose another language to a source text? The text would be that precise “artist’s style,” and we would try to transfer it while keeping the essence of the image (in other words, the essence of the text).

Imagine I’m trying to describe my dog — average size, sharp nose, short tail, always barks. If I gave you this set of the dog’s features, and if the description was precise, you could draw it, even though you have never seen it.

Now, imagine the source text is the set of specific features. Basically, it means that you encode it, and let the other neural network decode it back to the text, but, in another language. The decoder only knows its language. It has no idea about of the features’ origin, but it can express them in, for example, Spanish. Continuing the analogy, it doesn’t matter how you draw the dog — with crayons, watercolor or your finger. You paint it as you can.

Once again — one neural network can only encode the sentence to the specific set of features, and another one can only decode them back to the text. Both have no idea about the each other, and each of them knows only its own language. Recall something? Interlingua is back. Ta-da.

The question is, how do we find those features? It’s obvious when we’re talking about the dog, but how to deal with the text? Thirty years ago scientists already tried to create the universal language code, and it ended in a total failure.

Nevertheless, we have deep learning now. And that’s its essential task! The primary distinction between the deep learning and classic neural networks lays precisely in the ability to search for those specific features, without any idea of their nature. If the neural network is big enough, and there are a couple of thousand video cards at hand, it’s possible to find those features in the text as well.

Theoretically, we can pass the features gotten from the neural networks to the linguists, so that they can open brave new horizons for themselves.

The question is, what type of neural network should be used for encoding and decoding? Convolutional Neural Networks (CNN) fit perfectly for pictures since they operate with independent blocks of pixels.

But there are no independent blocks in the text — every word depends on its surroundings. Text, speech, and music are always consistent. So recurrent neural networks (RNN) would be the best choice to handle them, since they remember the previous result — the prior word, in our case.

Now RNNs are used everywhere — Siri’s speech recognition (it’s parsing the sequence of sounds, where the next depends on the previous), keyboard’s tips (memorize the prior, guess the next), music generation, and even chatbots.

In two years, neural networks surpassed everything that had appeared in the past 20 years of translation. Neural translation contains 50% fewer word order mistakes, 17% fewer lexical mistakes, and 19% fewer grammar mistakes. The neural networks even learned to harmonize gender and case in different languages. And no one taught them to do so.

The most noticeable improvements occurred in fields where direct translation was never used. Statistical machine translation methods always worked using English as the key source. Thus, if you translated from Russian to German, the machine first translated the text to English and then from English to German, which leads to a double loss.

Neural translation doesn’t need that — only a decoder is required so it can work. That was the first time that direct translation between languages with no сommon dictionary became possible.

The conclusion and the future

Everyone’s still excited about the idea of “Babel fish” — instant speech translation. Google has made steps towards it with its Pixel Buds, but in fact, it’s still not what we were dreaming of. The instant speech translation is different from the usual translation. You need to know when to start translating and when to shut up and listen. I haven’t seen suitable approaches to solve this yet. Unless, maybe, Skype…

And here’s one more empty area: all the learning is limited to the set of parallel text blocks. The deepest neural networks still learn at parallel texts. We can’t teach the neural network without providing it with a source. People, instead, can complement their lexicon with reading books or articles, even if not translating them to their native language.

If people can do it, the neural network can do it too, in theory. I found only one prototype attempting to incite the network, which knows one language, to read the texts in another language in order to gain experience. I’d try it myself, but I’m silly. Ok, that’s it.

Reference: https://bit.ly/2HCmT6v

The Stunning Variety of Job Titles in the Language Industry

The Stunning Variety of Job Titles in the Language Industry

Slator published an amazing report about the job titles used in the language industry in LinkedIn. They have identified over 600 unique titles…and counting! An impressive total for what is often referred to as a niche industry. Here they ask What does it all mean?

Project Management

While Transcreation and Localization indicate that a Project Manager is operating within the language industry (rather than in Software or Construction, for example), the AssociateSenior and Principal prefaces are indicative of the job level. Hyphens also seem to be en vogue on LinkedIn, and used mainly to denote the specific customer segment, as in the case of “Project Manager – Life Sciences”. We also see Language Manager or Translation Manager, although these seem to be more in use when a Project Manager is responsible for an inhouse linguistic team.

Coordinator and Manager appear to be used somewhat interchangeably across the industry, but where one company uses both titles, Manager is usually more senior. So how do you tell where a Project Coordinator ends and a Project Manager begins, especially if the lines are blurred further with the Associate, Principal or Senior modifiers?

Some companies reserve the Project Manager title for those who are customer facing, while Coordinators might remain more internally focused (e.g. performing administrative and linguist-related tasks but not interfacing with the customers). To make this same distinction, some LSPs are increasingly using Customer Success Manager, a title that presumably has its origin among Silicon Valley startups.

The Program Manager title is also emerging as a mid to senior job title in Project Management on technology and other large accounts, with an element of people or portfolio management involved as well. In other companies, Account Manager can also be used to describe a similar role within Project Management, specifically customer focused, and often also involving a degree of people or performance management.

Confusingly, Account Managers in many LSPs are part of the Sales function, with revenue / retention targets attached. Likewise, the Customer Success Manager job title is broad and ambiguous since it can also apply to both Sales and Project Management staff.

Sales and Business Development

Across the Sales function, we find a similar array of job titles: from Business Development Manager and Senior Localization Strategy Consultant to Strategic Account Executive and Vice President of Sales. Preferences range from specific to vague on a spectrum of transparency, with the slightly softer BD title being more favored among the frontline Sales staff in LSPs. We also note the C-Suite title Chief Revenue Officerentering the arena as someone responsible for the revenue generating activities of Marketing and Sales teams, and offer a special mention to the Bid Managers and Pre-Sales teams.

Solutions

At the center of the Sales, Operations and Technology Venn diagram are the Solutions teams, striving to solve the most complex of customer and prospective client “puzzles”. From the generic Solutions ArchitectDirector of Client Solutions, Solutions Consulting and Director of Technology Solutions, to the more specific Cloud Solutions Architect or Solutions Manager for Machine Intelligence, these individuals help make the promises of Sales a reality for the customer by enabling the Operations teams to deliver the right product in the right way.

Vendor Management

It’s a similar state of affairs across the Vendor Management function. Here we find Global Procurement Directors, Supplier Relations Managers, Area Sourcing Managers, Supply Chain Managers and Talent Program Managers, all dedicated to the managing the pool of linguists and other linguistic subcontractors within an LSP.

Linguists

Arguably the lifeblood of the language industry, but not every LSP has them. Companies that do have a team of linguists inhouse hire for roles such as Medical and Legal InterpreterSenior EditorTechnical TranslatorInhouse Translator/Reviser and French Translator-Subtitler, with some multi-tasking as Translator / IT Manager and Account Manager / Translator.

Tech etc.

The Technology function(s) in LSPs can be a bit of a catch-all for employees working on IT, software development and functional QA activities, within many coming from outside the industry originally. The extent to which an LSP develops its own solutions inhouse will determine the technicality of the job titles assigned to Technology staff, and some language industry old-timers may be hard-pressed to tell their Junior Full Stack Software Developer from their Senior UX Designer and their Product Managers from their Project Manager. Other Tech-type job roles include QA Automation EngineerAssociate Customer Support EngineerChief Information Officer, and Sound Engineer.

Back-Office

Perhaps the most standardized and least localization-specific area of the language industry, the back-office and shared-services functions house the likes of marketing, payroll, HR, finance, and accounting professionals. Behind the scenes here can be found HR SpecialistsHR Generalists (and everything in between), your friendly Director of Talent Acquisition as well as Financial Accounting Managers, Group Financial Controllers, and not forgetting General Counsel.

Why The Variety?

There are many elements at play in explaining the mindblowing variety of job titles found in the language industry. Some of the key factors include:

  • Geography – While variants of the VP title are seen more in the US, Asia tends to favour Area or Country Managers. By contrast, Directors and Heads of are most likely to be found in Europe.
  • Customer Base – Some companies tap into the idea of using job titles strategically to mirror the language used by their clients, hence Customer Success Manager in a Tech-focused LSP, or Principal Project Manager in one servicing a Financial customer base.
  • Organizational Design – Flatter organizations typically differentiate less between job levels while others design progressively more senior titles as a people management / motivational tool. Internally, an employee may achieve levels of progression (junior, senior or level 1, 2, 3 etc.), without the external facing job title having changed. This contributes to giving companies a….
  • Competitive Edge – Helpfully, job titles that are ambiguous are less understandable to those outside the business, which can make it harder for competitors to poach the best employees.
  • Creative License – Since LinkedIn profiles are normally owned by individuals, employees have a certain leave to embellish on their actual job titles.

In alongside the obvious and mundane, the vague and ambiguous are also some intriguing job titles: we spotted Traffic Coordinator, People Ops and Quality Rater to name just a few.

Reference: https://bit.ly/2JbQpl6

2018 European Language Industry Survey Results

2018 European Language Industry Survey Results

GALA published the 2018 survey results for European Language Industry. In the preamble, it appears to be one of the most successful  surveys from its kind.

With 1285 responses from 55 countries, including many outside Europe, this 2018 edition of the European Language Industry survey is the most successful one since its start in 2013.

This report analyses European trends rather than those in individual countries. Significant differences between countries will be highlighted if the number of answers from those countries is sufficiently high to draw meaningful conclusions.

Objectives of This Survey

The objectives of the survey have not changed compared to previous editions. It was not set up to gather exact quantitative data but to establish the mood of the industry. As such it does not replace other local, regional or global surveys of the language industry but adds the important dimensions of perception and trust which largely determine the actions of industry stakeholders.

The questions concerning the market as well as the open questions regarding trends and concerns are identical to those in the previous editions in order to detect changes in prevailing opinions.

The survey results report covers many aspect in the language industry. We chose the below aspects to highlight on:

Certification Requirements 

Companies report an increase in certification requirements in 2017 and consequently adjust their expectations for 2018 upward. Although most responding companies expect the requirements to stay at the current level, 25% of them expect an increase. Nobody is expecting a decrease.


Security Requirements

According to the respondents, the real increase in security requirements exceeded even the 2017 expectations, which led them to further increase their expectations for 2018.

Operational Practices

Outsourcing remains a popular practice among language service companies, with 40% indicating that they want to increase this practice. Only 2% report a decrease. Even more popular is MT post-editing this year. 37% report an increase and an additional 17% indicate that they are starting this practice.

Crowdsourcing and offshoring, both often debated in language industry forums, remain slow starters. This year 5% of the companies report to start with crowdsourcing and 4% to increase their use of this practice. Offshoring has already a slightly higher penetration and 11% of the
companies report to increase this practice, compared to 5% in 2017. An additional 3% want to start with the practice.

Note: the graph above does not represent actual usage of the practices, but the level of their expected development, determined as follows: [start * 2] + [increase] – [stop * 2] – [decrease].

Technology

Machine Translation

We will remember 2018 as the year in which more than 50% of both the companies and the individual language professionals reported that they are using MT in one form or another.

The technology cannot yet be considered mainstream, because only 22% of the LSC’s and 19% of the individuals state that they are using it daily, but the number of companies and individuals that are not using it at all has dropped to respectively 31% and 38%.

This does not mean that MT users are enthusiastically embracing the technology, as the answers in the section about negative trends testify, but it is a strong indication that the market has accepted that machine translation is here to stay.

The survey results also show that using MT does not necessarily mean investing in MT. The most popular engine is still the free Google Translate. 52% of all respondents report that they are using the site, but we see a clear difference between the various categories of respondents. While more than 70% of the respondents in training institutes report that they are using the site, only 49% of the translation companies and 52% of the individual translators state the same.

CAT and Terminology Tools

This year’s results confirm the 2017 statement that the use of CAT tools is clearly more widespread in language service companies than in the individual professionals’ community. Less than 1% of the companies report that they are not using CAT tools, compared to 13% of the
individual language professionals.

This year the survey tried to ascertain the level of competition on the CAT market. The survey results indicate that this CAT landscape is becoming more complex, but they also show that the SDL/TRADOS product suite still has a leading position in terms of installed base,
with 67% of the respondents using one or more versions of the product (ranging from 56% of the training institutes to 79% of the translation companies).

MemoQ can currently be considered as the most serious contender, with approx. 40% penetration. The top 5 is completed with Memsource, Wordfast and Across, which all remain below the 30% installed base mark.

Not surprisingly, Multiterm (the terminology tool linked with the SDL/Trados suite) is the most popular terminology tool around – except for the basic Office-type tools that are used 50% more often than Multiterm, which itself is used 6 times more often than the next in line.

Translation Management Systems

The level of penetration of translation management systems in language service companies has not significantly changed compared to 2017, with 76% of the responding companies using some type of management system.

The most popular 3rd party system in this category is Plunet, followed by XTRF. SDLTMS on the other hand seems to be more often selected by training institutes and translation departments.

Recruitment and Training

Skill Level of  New-Master Level Graduates

The results below refer to training institutes, translation companies and translation departments (359 respondents).

A majority of these respondents rate all skills of new graduates as either sufficiently developed or very well developed. Translation tool skills score lowest, despite the stronger cooperation between universities and translation professionals, and the efforts made by translation tool
providers.

10 to 15% used the “not applicable” answer, which indicates that the person who completed the survey is not involved in recruitment and therefore was not comfortable giving an opinion.

Investment in Training or Professional Development

Which Type of Training Have You Organized or Attended in 2017?

The following chart presents the popularity of the various types of training across all respondent types.

Not surprisingly, the respondents representing training institutes, translation companies and translation departments report a higher than average number of trainings organised or followed. Given the importance of lifelong learning, the 15% respondents that did not organise or follow any training in 2017 can – and should – be considered as a wakeup call for the industry at large.

Return on Investment

Training institutions, translation companies and translation departments report a considerably higher impact of training on their performance than the individual professionals, which make up most of the respondents.

Trends for The Industry 

In this edition of the survey, the open question about trends that will dominate the industry has been split to allow the respondents to distinguish between positive and negative trends.

The fact that both language service companies and individual professionals see price pressure as a prevailing negative trend but at the same time expect a status quo on pricing indicates that they are fairly confident that they will be able to withstand the pressure.

Across the board, the increase of translation demand is the most often cited positive trend for 2018, with 16% of the respondents. Advances in technology in general (including CAT), machine translation, increased professionalism and a higher awareness by the market of the importance of language services complete the top 5. Interesting to note is that quite a few respondents, in particular individual professionals, expect that the lack of quality of machine translation can lead to an increased appreciation for the quality of human translation.

That same machine translation clearly remains the number 2 among the negative trends, almost always correlated with the factor price pressure. The traditional fear that machine translation opens the door to lower quality and more competition by lower qualified translators and translation companies remains strong.

The report also includes some insights. We chose the below insights to highlight on:

1-  Most European language service companies (LSCs) can be considered to be small.
2-  The number of individual language professionals that work exclusively as subcontractors decreases with growing revenue.

3-  Legal services remain undisputedly the most widely served type of customer for both respondent types; companies and individuals. 

4-  Machine Translation engines that require financial or time investment have difficulty to attract more than minority interest.

5-  Except for “client terms and conditions” and “insufficient demand”, language service companies score all challenges higher than individual professionals.

Conclusion

This 2018 edition of the European Language Industry survey reinforces the positive image that could already be seen in the 2017 results. Virtually all parameters point to higher confidence in the market, from expected sales levels, recruitment plans and investment intentions to the expectation that 2018 prices will be stable.

2018 is clearly the year of machine translation. This is the first year that more than half of the respondents declare that they are using the technology in one way or another. On the other hand, it is too soon to conclude that MT is now part of the translation reality, with only some
20% of the language service companies and independent language professionals reporting daily usage. Neural MT has clearly not yet brought the big change that the market is expecting.

Changes to the technology questions are giving us a better view of the actual use of CAT, MT and other technologies by the various categories of respondents. New questions about internships have brought us additional insights in the way that the market is looking upon this
important tool to bridge the gap between the universities and the professional world.

Reference: http://bit.ly/2HOJEpx

Top 5 Reasons Why Enterprises Rely on Machine Translation for Global Expansion

Top 5 Reasons Why Enterprises Rely on Machine Translation for Global Expansion

SDL published a whitepaper regarding the reasons behind why enterprises rely on Machine Translation for global expansion. SDL stated the case in point in the introduction, which is language barriers between companies and their global customers stifle economic growth. In fact, forty-nine percent of executives say a language barrier has stood in the way of a major international business deal. Nearly two-thirds (64 percent) of those same executives say language barriers make it difficult to gain a foothold in international markets. Whether inside or outside your company, your global audiences prefer to read in their native languages. It speeds efficiency, increases receptivity and allows for easier processing of concepts. 

SDL stated this point as a solution to the aforementioned challenge:

To break the language barrier and expand your global and multilingual footprint, there are opportunities to leverage both human translation and machine translation.

Then, the paper compared between human translation and MT from the perspective of usage. For human translation, it is the best for content that is legally binding, as well as high value, branded content. However, human translation can be costly, can take weeks (or even months) to complete and can’t address all of the real-time needs of your business to serve multilingual prospects, partners and customers.

And regarding MT, it is fast becoming an essential complement to human translation efforts. It is well suited for use as part of a human translation process, but also solves high-volume and real-time content challenges that human translation cannot on its own, including the five that are the focus of this white paper.

First reason:  Online user activity and multilingual engagement

Whether it’s a web forum, blog, community content, customer review or a Wiki page, your online user-generated content (UGC) is a powerful tool for customer experience and can be a great opportunity to connect customers around your brand and products. These are rarely translated because the ever-fluctuating content requires real-time translation that is not possible with traditional translation options. However, this content is a valuable resource for resolving problems, providing information, building a brand and delivering a positive customer experience.

Machine translation provides a way for companies to quickly and affordably translate user reviews on e-commerce sites, comments on blogs or within online communities or forums, Wiki content and just about any other online UGC that helps provide support or information to your customers and prospects. While the translation isn’t perfect, its quality is sufficient for its primary purpose: information.

Second reason:  Global customer service and customer relationship management

The goal of any customer service department is to help customers find the right answer – and to stay off the phone. Phone support is typically expensive and inefficient for the company and can be frustrating for the customer. Today, customer service departments are working to enhance relationships with customers by offering support over as many self-service channels as possible, including knowledge base articles, email support and real-time chat.

However, due to its dynamic nature, this content often isn’t translated into different languages, making multilingual customer service agents required instead. Because of its real-time capabilities, capacity to handle large volumes of content and ability to lower costs, machine translation is an extremely attractive option for businesses with global customer support organizations.

There are two key online customer support areas that are strong candidates for machine translation:
• Real-time communication
• Knowledge base articles

Third reason:  International employee collaboration

Your employees are sharing information every day: proposals, product specification, designs, documents. In a multinational company, they’re likely native speakers of languages other than the one spoken at headquarters. While these employees may speak your language very
well, they most likely prefer to review complex concepts in their native languages. Reading in their native languages increases their mental
processing speed and allows them to work better and faster.

Human translation isn’t possible in this scenario because of the time-sensitivity inherent to internal collaboration. But internal knowledge sharing doesn’t need the kind of letter perfect translation that public-facing documents often do. For internal content sharing, machine translation can provide an understandable translation that will help employees transcend language barriers. In addition, by granting all employees access to a machine translation solution, they are able to access and quickly translate external information as well without sending it through a lengthy translation process or exposing it outside of your walls.

This level of multilingual information sharing and information access can dramatically improve internal communications and knowledge sharing, increase employee satisfaction and retention and drive innovation among your teams.

Forth reason:  Online security and protection of intellectual property

In an effort to be resourceful, your employees will likely seek out free translation methods like Google Translate or Microsoft Bing. These public, web-based machine translation tools are effective, but they allow your intellectual property to be mined to improve search results or for other needs. There is a simple test to determine if your company’s information is being submitted through public channels for translation: Simply have your IT department audit your firewalls to determine how much traffic is going to the IP addresses of online translation services. Many companies have been surprised by the volume of information going out of their organization this way.

This security hole can be plugged with a secure, enterprise-grade machine translation hosted on-premises or in a private cloud. With this type of solution, you can give employees a secure translation option for translation of documents, websites and more. And, of course, you’ll protect your valuable intellectual property by keeping it in-house, where it belongs.

Fifth reason:  Translation capacity and turnaround time for internal teams or agencies

Machine translation can improve the capacity and productivity of internal translation departments or language service providers (LSPs) by 30 percent or more and greatly reduces the cost of content translaton. Large enterprises that translate massive volumes have seen increases up to 300 percent in translation productivity when machine translation is used to generate the initial translation, which is then edited by skilled translators.

Here’s how it works: instead of starting with a raw document, translators start with a machine translation, which they review in a post-editing process. Translators edit and fine-tune the content for readability, accuracy and cultural sensitivity. By front-loading the process with a high-quality machine translation, translators are still able to provide high-quality content, but in a fraction of the time. 

Reference: https://bit.ly/2wXRQSt

A Gentle Introduction to Neural Machine Translation

A Gentle Introduction to Neural Machine Translation

One of the earliest goals for computers was the automatic translation of text from one language to another.

Automatic or machine translation is perhaps one of the most challenging artificial intelligence tasks given the fluidity of human language. Classically, rule-based systems were used for this task, which were replaced in the 1990s with statistical methods. More recently, deep neural network models achieve state-of-the-art results in a field that is aptly named neural machine translation.

In this post, you will discover the challenge of machine translation and the effectiveness of neural machine translation models.

After reading this post, you will know:

  • Machine translation is challenging given the inherent ambiguity and flexibility of human language.
  • Statistical machine translation replaces classical rule-based systems with models that learn to translate from examples.
  • Neural machine translation models fit a single model rather than a pipeline of fine-tuned models and currently achieve state-of-the-art results.

Let’s get started.

What is Machine Translation?

Machine translation is the task of automatically converting source text in one language to text in another language.

In a machine translation task, the input already consists of a sequence of symbols in some language, and the computer program must convert this into a sequence of symbols in another language.

— Page 98, Deep Learning, 2016.

Given a sequence of text in a source language, there is no one single best translation of that text to another language. This is because of the natural ambiguity and flexibility of human language. This makes the challenge of automatic machine translation difficult, perhaps one of the most difficult in artificial intelligence:

The fact is that accurate translation requires background knowledge in order to resolve ambiguity and establish the content of the sentence.

— Page 21, Artificial Intelligence, A Modern Approach, 3rd Edition, 2009.

Classical machine translation methods often involve rules for converting text in the source language to the target language. The rules are often developed by linguists and may operate at the lexical, syntactic, or semantic level. This focus on rules gives the name to this area of study: Rule-based Machine Translation, or RBMT.

RBMT is characterized with the explicit use and manual creation of linguistically informed rules and representations.

— Page 133, Handbook of Natural Language Processing and Machine Translation, 2011.

The key limitations of the classical machine translation approaches are both the expertise required to develop the rules, and the vast number of rules and exceptions required.

What is Statistical Machine Translation?

Statistical machine translation, or SMT for short, is the use of statistical models that learn to translate text from a source language to a target language gives a large corpus of examples.

This task of using a statistical model can be stated formally as follows:

Given a sentence T in the target language, we seek the sentence S from which the translator produced T. We know that our chance of error is minimized by choosing that sentence S that is most probable given T. Thus, we wish to choose S so as to maximize Pr(S|T).

— A Statistical Approach to Machine Translation, 1990.

This formal specification makes the maximizing of the probability of the output sequence given the input sequence of text explicit. It also makes the notion of there being a suite of candidate translations explicit and the need for a search process or decoder to select the one most likely translation from the model’s output probability distribution.

Given a text in the source language, what is the most probable translation in the target language? […] how should one construct a statistical model that assigns high probabilities to “good” translations and low probabilities to “bad” translations?

— Page xiii, Syntax-based Statistical Machine Translation, 2017.

The approach is data-driven, requiring only a corpus of examples with both source and target language text. This means linguists are not longer required to specify the rules of translation.

This approach does not need a complex ontology of interlingua concepts, nor does it need handcrafted grammars of the source and target languages, nor a hand-labeled treebank. All it needs is data—sample translations from which a translation model can be learned.

— Page 909, Artificial Intelligence, A Modern Approach, 3rd Edition, 2009.

Quickly, the statistical approach to machine translation outperformed the classical rule-based methods to become the de-facto standard set of techniques.

Since the inception of the field at the end of the 1980s, the most popular models for statistical machine translation […] have been sequence-based. In these models, the basic units of translation are words or sequences of words […] These kinds of models are simple and effective, and they work well for man language pairs

— Syntax-based Statistical Machine Translation, 2017.

The most widely used techniques were phrase-based and focus on translating sub-sequences of the source text piecewise.

Statistical Machine Translation (SMT) has been the dominant translation paradigm for decades. Practical implementations of SMT are generally phrase-based systems (PBMT) which translate sequences of words or phrases where the lengths may differ

— Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Although effective, statistical machine translation methods suffered from a narrow focus on the phrases being translated, losing the broader nature of the target text. The hard focus on data-driven approaches also meant that methods may have ignored important syntax distinctions known by linguists. Finally, the statistical approaches required careful tuning of each module in the translation pipeline.

What is Neural Machine Translation?

Neural machine translation, or NMT for short, is the use of neural network models to learn a statistical model for machine translation.

The key benefit to the approach is that a single system can be trained directly on source and target text, no longer requiring the pipeline of specialized systems used in statistical machine learning.

Unlike the traditional phrase-based translation system which consists of many small sub-components that are tuned separately, neural machine translation attempts to build and train a single, large neural network that reads a sentence and outputs a correct translation.

— Neural Machine Translation by Jointly Learning to Align and Translate, 2014.

As such, neural machine translation systems are said to be end-to-end systems as only one model is required for the translation.

The strength of NMT lies in its ability to learn directly, in an end-to-end fashion, the mapping from input text to associated output text.

— Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Encoder-Decoder Model

Multilayer Perceptron neural network models can be used for machine translation, although the models are limited by a fixed-length input sequence where the output must be the same length.

These early models have been greatly improved upon recently through the use of recurrent neural networks organized into an encoder-decoder architecture that allow for variable length input and output sequences.

An encoder neural network reads and encodes a source sentence into a fixed-length vector. A decoder then outputs a translation from the encoded vector. The whole encoder–decoder system, which consists of the encoder and the decoder for a language pair, is jointly trained to maximize the probability of a correct translation given a source sentence.

— Neural Machine Translation by Jointly Learning to Align and Translate, 2014.

Key to the encoder-decoder architecture is the ability of the model to encode the source text into an internal fixed-length representation called the context vector. Interestingly, once encoded, different decoding systems could be used, in principle, to translate the context into different languages.

… one model first reads the input sequence and emits a data structure that summarizes the input sequence. We call this summary the “context” C. […] A second mode, usually an RNN, then reads the context C and generates a sentence in the target language.

— Page 461, Deep Learning, 2016.

Encoder-Decoders with Attention

Although effective, the Encoder-Decoder architecture has problems with long sequences of text to be translated.

The problem stems from the fixed-length internal representation that must be used to decode each word in the output sequence.

The solution is the use of an attention mechanism that allows the model to learn where to place attention on the input sequence as each word of the output sequence is decoded.

Using a fixed-sized representation to capture all the semantic details of a very long sentence […] is very difficult. […] A more efficient approach, however, is to read the whole sentence or paragraph […], then to produce the translated words one at a time, each time focusing on a different part of he input sentence to gather the semantic details required to produce the next output word.

— Page 462, Deep Learning, 2016.

The encoder-decoder recurrent neural network architecture with attention is currently the state-of-the-art on some benchmark problems for machine translation. And this architecture is used in the heart of the Google Neural Machine Translation system, or GNMT, used in their Google Translate service.

… current state-of-the-art machine translation systems are powered by models that employ attention.

— Page 209, Neural Network Methods in Natural Language Processing, 2017.

Although effective, the neural machine translation systems still suffer some issues, such as scaling to larger vocabularies of words and the slow speed of training the models. There are the current areas of focus for large production neural translation systems, such as the Google system.

Three inherent weaknesses of Neural Machine Translation […]: its slower training and inference speed, ineffectiveness in dealing with rare words, and sometimes failure to translate all words in the source sentence.

— Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Reference: https://bit.ly/2Cx8zxI

NEURAL MACHINE TRANSLATION: THE RISING STAR

NEURAL MACHINE TRANSLATION: THE RISING STAR

These days, language industry professionals simply can’t escape hearing about neural machine translation (NMT). However, there still isn’t enough information about the practical facts of NMT for translation buyers, language service providers, and translators. People often ask: is NMT intended for me? How will it change my life?

A Short History and Comparison

At the beginning of time – around the 1970s – the story began with rule-based machine translation (RBMT) solutions. The idea was to create grammatical rule sets for source and target languages, where machine translation is a kind of conversion process between the languages based on these rule sets. This concept works well with generic content, but adding new content, new language pairs, and maintaining the rule set is very time-consuming and expensive.

This problem was solved with statistical machine translation (SMT) around the late ‘80s and early ‘90s. SMT systems create statistical models by analyzing aligned source-target language data (training set) and use them to generate the translation. The advantage of SMT is the automatic learning process and the relatively easy adaptation by simply changing or extending the training set. The limitation of SMT is the training set itself: to create a usable engine, a large database of source-target segments is required. Additionally, SMT is not language independent in the sense that it is highly sensitive to the language combination and has a very hard time dealing with grammatically rich languages.

This is where neural machine translation (NMT) begins to shine: it can look at the sentence as a whole and can create associations between the phrases over an even longer distance within the sentence. The result is a convincing fluency and an improved grammatical correctness compared to SMT.

Statistical MT vs Neural MT

Both SMT and NMT are working on a statistical base and are using source-target language segment pairs as a basis. What’s the difference? What we typically call SMT is actually Phrase Based Statistical Machine Translation (PBSMT), meaning SMT is splitting the source segments into phrases. During the training process, SMT creates a translation model and a language model. The translation model stores the different translations of the phrases and the language model stores the probability of the sequence of phrases on the target side. During the translation phase, the decoder chooses the translation that gives the best result based on these two models. On a phrase or expression level, SMT (or PBSMT) is performing well, but language fluency and grammar is not good.

‘Buch’ is aligned with ‘book’ twice and only once with ‘the’ and ‘a’ – the winner is the ‘Buch’-’book’ combination

Neural Machine Translation, on the other hand, is using neural network-based, deep, machine learning technology. Words or even word chunks are transformed into “word vectors”. This means that ‘dog’ is not only representing the characters d, o and g, but it can contain contextual information from the training data. During the training phase, the NMT system tries to set the parameter weights of the neural network based on the reference values (source-target translation). Words appearing in similar context will get similar word vectors. The result is a neural network which can process source segments and transfer them into target segments. During translation, NMT is looking for a complete sentence, not just chunks (phrases). Thanks to the neural approach, it is not translating words, it’s transferring information and context. This is why fluency is much better than in SMT, but terminology accuracy is sometimes not perfect.

Similar words are closer to each other in a vector space

The Hardware

A popular GPU: NVIDIA Tesla

One big difference between SMT and NMT systems is that NMT requires Graphics Processing Units (GPUs), which were originally designed to help computers process graphics. These GPUs can calculate astonishingly fast – the latest cards have about 3,500 cores which can process data simultaneously. In fact, there is a small ongoing hardware revolution and GPU-based computers are the foundation for almost all deep learning and machine learning solutions. One of the great perks of this revolution is that nowadays, NMT is not only available for large enterprises, but also for small and medium-sized companies as well.

The Software

The main element, or ‘kernel’, of any NMT solution is the so-called NMT toolkit. There are a couple of NMT toolkits available, such as Nematus or openNMT, but the landscape is changing fast and more companies and universities are now developing their own toolkits. Since many of these toolkits are open-source solutions and hardware resources have become more affordable, the industry is experiencing an accelerating speed in toolkit R&D and NMT-related solutions.

On the other hand, as important as toolkits are, they are only one small part of a complex system, which contains frontend, backend, pre-processing and post-processing elements, parsers, filters, converters, and so on. These are all factors for anyone to consider before jumping into the development of an individual system. However, it is worth noting that the success of MT is highly community-driven and would not be where it is today without the open source community.

Corpora

A famous bilingual corpus: the Rosetta Stone

And here comes one of the most curious questions: what are the requirements of creating a well-performing NMT engine? Are there different rules compared to SMT systems? There are so many misunderstandings floating around on this topic that I think it’s a perfect opportunity to go into the details a little bit.

The main rules are nearly the same both for SMT and NMT systems. The differences are mainly that an NMT system is less sensitive and performs better in the same circumstances. As I have explained in an earlier blog post about SMT engine quality, the quality of an engine should always be measured in relation to the particular translation project for which you would like to use it.

These are the factors which will eventually influence the performance of an NMT engine:

Volume

Regardless of you may have heard, volume is still very important for NMT engines just like in the SMT world. There is no explicit rule on entry volumes but what we can safely say is that the bare minimum is about 100,000 segment pairs. There are Globalese users who are successfully using engines created based on 150,000 segments, but to be honest, this is more of an exception and requires special circumstances (like the right language combination, see below). The optimum volume starts around 500,000 segment pairs (2 million words).

Quality

The quality of the training set plays an important role (garbage in, garbage out). Don’t add unqualified content to your engine just to increase the overall size of the training set.

Relevance

Applying the right engine to the right project is the first key to success. An engine trained on automotive content will perform well on car manual translation but will give back disappointing results when you try to use it for web content for the food industry.

This raises the question of whether the content (TMs) should be mixed. If you have enough domain-specific content you shouldn’t necessarily add more out-of-domain data to your engine, but if you have an insufficient volume of domain-specific data then adding generic content (e.g. from public sources) may help improve the quality. We always encourage our Globalese users to try different engine combinations with different training sets.

Content type

Content generated by possible non-native speaking users on a chat forum or marketing material requiring transcreation is always a challenge to any MT system. On the other hand, technical documentation with controlled language is a very good candidate for NMT.

Language combination

Unfortunately, language combination still has an impact on quality. The good news is that NMT has now opened up the option of using machine translation for languages like Japanese, Turkish, or Hungarian –  languages which had nearly been excluded from the machine translation club because of poor results provided by SMT. NMT has also helped solve the problem of long distance dependencies for German and the translation output is much smoother for almost all languages. But English combined with Latin languages still provides better results than, for example, English combined with Russian when using similar volumes and training set quality.

Expectations for the future

Neural Machine Translation is a big step ahead in quality, but it still isn’t magic. Nobody should expect that NMT will replace human translators anytime soon. What you CAN expect is that NMT can be a powerful productivity tool in the translation process and open new service options both for translation buyers and language service providers (see post-editing experience).

Training and Translation Time

When we started developing Globalese NMT, one of the most surprising experiences for us was that the training time was far shorter than we had previously anticipated. This is due to the amazingly fast evolution of hardware and software. With Globalese, we currently have an average training time of 50,000 segments per hour – this means that an average engine with 1 million segments can be trained within one day. The situation is even better when looking at translation times: with Globalese, we currently have an average translation time between 100 and 400 segments per minute, depending on the corpus size, segment length in the translation and training content.

Neural MT Post-editing Experience

One of the great changes neural machine translation brings along is that the overall language quality is much better when compared to the SMT world. This does not mean that the translation is always perfect. As stated by one of our testers: if it is right, then it is astonishingly good quality. The ratio of good and poor translation naturally varies depending on the engine, but good engines can provide about 50% (or even higher) of really good translation target text.

Here are some examples showcasing what NMT post-editors can expect:

DE original:

Der Rechnungsführer sorgt für die gebotenen technischen Vorkehrungen zur wirksamen Anwendung des FWS und für dessen Überwachung.

Reference human translation:

The accounting officer shall ensure appropriate technical arrangements for aneffective functioning of the EWS and its monitoring.

Globalese NMT:

The accounting officer shall ensure the necessary technical arrangements for theeffective use of the EWS and for its monitoring.

As you can see, the output is fluent, and the differences are just preferential ones, more or less. This is highlighting another issue: automated quality metrics like BLEU score are not really sufficient to measure the quality. The example above is only a 50% match in the BLEU score, but if we look at the quality, the rating should be much higher.

Let’s look another example:

EN original

The concept of production costs must be understood as being net of any aid but inclusive of a normal level of profit.

Reference human translation:

Die Produktionskosten verstehen sich ohne Beihilfe, aber einschließlich eines normalen Gewinns.

Globalese NMT:

Der Begriff der Produktionskosten bezieht sich auf die Höhe der Beihilfe, aber einschließlich eines normalen Gewinns.

What is interesting here that the first part of the sentence sounds good, but if you look at the content, the translation is not good. This is an example of a fluent output with a bad translation. This is a typical case in the NMT world and it emphasizes the point that post-editors must examine NMT output differently than they did for SMT – in SMT, bad grammar was a clear indicator that the translation must be post-edited.

Post-editors who used to proof and correct SMT output have to change the way they are working and have to be more careful with proofreading, even if the NMT output looks alright at first glance. Also, services related to light post-editing will change – instead of correcting serious grammatical errors without checking the correctness of translation in order to create some readable content, the focus will shift to sorting out serious mistranslations. The funny thing is that one of the main problems in the SMT world was weak fluency and grammar, and now we have good fluency and grammar as an issue in the NMT world…

And finally:

DE original:

Aufgrund des rechtlichen Status der Beteiligten ist ein solcher Vorgang mit einer Beauftragung des liefernden Standorts und einer Berechnung der erbrachten Leistung verbunden.

Reference human translation:

The legal status of the companies involved in these activities means that this process is closely connected with placing orders at the location that is to supply the goods/services and calculating which goods/services they supply.

Globalese NMT:

Due to the legal status of the person, it may lead to this process at the site of the plant, and also a calculation of the completed technician.

This example shows that unfortunately, NMT can produce bad translations too. As I mentioned before, the ratio of good and bad NMT output you will face in a project always depends on the circumstances. Another weak point of NMT is that it currently cannot handle the terminology directly and it acts as a kind of “black box” with no option to directly influence the results.

Reference: https://bit.ly/2hBGsVh

How machine learning can be used to break down language barriers

How machine learning can be used to break down language barriers

Machine learning has transformed major aspects of the modern world with great success. Self-driving cars, intelligent virtual assistants on smartphones, and cybersecurity automation are all examples of how far the technology has come.

But of all the applications of machine learning, few have the potential to so radically shape our economy as language translation. The content of language translation is the perfect model for machine learning to tackle. Language operates on a set of predictable rules, but with a degree of variation that makes it difficult for humans to interpret. Machine learning, on the other hand, can leverage repetition, pattern recognition, and vast databases to translate faster than humans can.

There are other compelling reasons that indicate language will be one of the most important applications of machine learning. To begin with, there are over 6,500 spoken languages in the world, and many of the more obscure ones are spoken by poorer demographics who are frequently isolated from the global economy. Removing language barriers through technology connects more communities to global marketplaces. More people speak Mandarin Chinese than any other language in the world, making China’s growing middle class is a prime market for U.S. companies if they can overcome the language barrier.

Let’s take a look at how machine learning is currently being applied to the language barrier problem, and how it might develop in the future.

Neural machine translation

Recently, language translation took an enormous leap forward with the emergence of a new machine translation technology called Neural Machine Translation (NMT). The emphasis should be on the “neural” component because the inner workings of the technology really do mimic the human mind. The architects behind NMT will tell you that they frequently struggle to understand how it comes to certain translations because of how quickly and accurately it delivers them.

“NMT can do what other machine translation methods have not done before – it achieves translation of entire sentences without losing meaning,” says Denis A. Gachot, CEO of SYSTRAN, a language translation technologies company. “This technology is of a caliber that deserves the attention of everyone in the field. It can translate at near-human levels of accuracy and can translate massive volumes of information exponentially faster than we can operate.”

The comparison to human translators is not a stretch anymore. Unlike the days of garbled Google Translate results, which continue to feed late night comedy sketches, NMT is producing results that rival those of humans. In fact, Systran’s Pure Neural Machine Translation product was preferred over human translators 41% of the time in one test.

Martin Volk, a professor at the Institute of Computational Linguistics at the University of Zurich, had this to say about neural machine translation in a 2017 Slator article:

“I think that as computing power inevitably increases, and neural learning mechanisms improve, machine translation quality will gradually approach the quality of a professional human translator over the coming two decades. There will be a point where in commercial translation there will no longer be a need for a professional human translator.”

Gisting to fluency

One telling metric to watch is gisting vs. fluency. Are the translations being produced communicating the gist of an idea, or fluently communicating details?

Previous iterations of language translation technology only achieved the level of gisting. These translations required extensive human support to be usable. NMT successfully pushes beyond gisting and communicates fluently. Now, with little to no human support, usable translations can be processed at the same level of quality as those produced by humans. Sometimes, the NMT translations are even superior.

Quality and accuracy are the main priorities of any translation effort. Any basic translation software can quickly spit out its best rendition of a body of text. To parse information correctly and deliver a fluent translation requires a whole different set of competencies. Volk also said, “Speed is not the key. We want to drill down on how information from sentences preceding and following the one being translated can be used to improve the translation.”

This opens up enormous possibilities for global commerce. Massive volumes of information traverse the globe every second, and quite a bit of that data needs to be translated into two or more languages. That is why successfully automating translation is so critical. Tasks like e-discovery, compliance, or any other business processes that rely on document accuracy can be accelerated exponentially with NMT.

Education, e-commerce, travel, diplomacy, and even international security work can be radically changed by the ability to communicate in your native language with people from around the globe.

Post language economy

Everywhere you look, language barriers are a speed check on global commerce. Whether that commerce involves government agencies approving business applications, customs checkpoints, massive document sharing, or e-commerce, fast and effective translation are essential.

If we look at language strictly as a means of sharing ideas and coordinating, it is somewhat inefficient. It is linear and has a lot of rules that make it difficult to use. Meaning can be obfuscated easily, and not everyone is equally proficient at using it. But the biggest drawback to language is simply that not everyone speaks the same one.

NMT has the potential to reduce and eventually eradicate that problem.

“You can think of NMT as part of your international go-to-market strategy,” writes Gachot. “In theory, the Internet erased geographical barriers and allowed players of all sizes from all places to compete in what we often call a ‘global economy,’ But we’re not all global competitors because not all of us can communicate in the 26 languages that have 50 million or more speakers. NMT removes language barriers, enabling new and existing players to be global communicators, and thus real global competitors. We’re living in the post-internet economy, and we’re stepping into the post-language economy.”

Machine learning has made substantial progress but has not yet cracked the code on language. It does have its shortcomings, namely when it faces slang, idioms, obscure dialects of prominent languages and creative or colorful writing. It shines, however, in the world of business, where jargon is defined and intentional. That in itself is a significant leap forward.

Reference: https://bit.ly/2Fwhuku