Author: Admin

Creative Destruction in the Localization Industry

Creative Destruction in the Localization Industry

Excerpts from an article with the same title, written by Ameesh Randeri in Multilingual Magazine.  Ameesh Randeri is part of the localization solutions department at Autodesk and manages the vendor and linguistic quality management functions. He has over 12 years of experience in the localization industry, having worked on both the buyer and seller sides.

Te concept of creative destruction was derived from the works of Karl Marx by economist Joseph Schumpeter. Schumpeter elaborated on the concept in his 1942 book Capitalism, Socialism, and Democracy, where he described creative destruction as the “process of industrial mutation that incessantly revolutionizes the economic structure from within, incessantly destroying the old one, incessantly creating a new one.

What began as a concept of economics started being used broadly across the spectrum to describe breakthrough innovation that requires invention and ingenuity — as well as breaking apart or destroying the previous order. To look for examples of creative destruction, just look around you. Artificial intelligence, machine learning and automation are creating massive efficiency gains and productivity increases, but they are also causing millions to lose jobs. Uber and other ride hailing apps worldwide are revolutionizing transport, but many traditional taxi companies are suffering.

Te process of creative destruction and innovation is accelerating over time. To understand this, we can look at the Schumpeterian (Kondratieff) waves of technological innovation. We are currently in the fifth wave of innovation ushered in by digital networks, the software industry and new media.

Te effects of the digital revolution can be felt across the spectrum. Te localization industry is no exception and is undergoing fast-paced digital disruption. There is a confluence of technology in localization tools and processes that are ushering in major changes.

The localization industry: Drawing parallels from the Industrial Revolution

All of us are familiar with the Industrial Revolution. It commenced in the second half of the 18th century and went on until the mid-19th century. As a result of the Industrial Revolution, we witnessed a transition from hand production methods to machine-based methods and factories that facilitated mass production. It ushered in innovation and urbanization. It was creative destruction at its best. Looking back at the Industrial Revolution, we see that there were inflection points, following which there were massive surges and changes in the industry.

Translation has historically been a human and manual task. A translator looks at the source text and translates it while keeping in mind grammar, style, terminology and several other factors. Te translation throughput is limited by a human’s productivity, which severely
limits the volume of translation and time required. In 1764, James Hargreaves invented the spinning jenny, a machine that enabled an individual to produce multiple spools of
threads simultaneously. Inventor Samuel Compton innovated further and came up with the spinning mule, further improving the process. Next was the mechanization of cloth weaving through the power loom, invented by Edmund Cartwright. These innovators and their inventions completely transformed the textile industry.

For the localization industry, a similar innovation is machine translation (MT). Tough research into MT had been going on for many years, it went mainstream post-2005. Rule-based and statistical MT engines were created, which resulted in drastic productivity increases. However, the quality was nowhere near what a human could produce and hence the MT engines became a supplemental technology, aiding humans and helping them increase productivity.

There was a 30%-60% productivity gain based on the language and engine that was used. There was fear that translators’ roles would diminish. But rather than diminish, their role evolved into post-editing.

The real breakthrough came in 2016 when Google and Microsoft went public with their neural machine translation (NMT) engines. Te quality produced by NMT is not yet flawless, but it seems to be very close to human translation. It can also reproduce some of the finer
nuances of writing style and creativity that were lacking in the rule-based and statistical machine translation engines. NMT is a big step forward in reducing the human footprint in the translation process. It is without a doubt an inflection point and while not perfect yet, it
has the same disruptive potential as the spinning jenny and the power loom. Sharp productivity increases, lower prices and since a machine is behind it, the volumes that can be managed are endless. And hence it renews concerns about whether translators will be needed. It is to the translation industry what the spinning jenny was to textiles, where several manual workers were
replaced by machines.

What history teaches us though is that although there is a loss of jobs based on the existing task or technology, there are newer ones created to support the newer task or technology.

In the steel industry, two inventors charted a new course: Abraham Darby, who created a cheaper, easier method to produce cast iron, using a coke-fueled furnace and Henry Bessemer, who invented the Bessemer process, the first inexpensive process for mass-producing steel. The Bessemer process revolutionized steel manufacturing by decreasing its cost, from £40 per long ton to £6–7 per long ton. Besides the
reduction in cost, there were major increases in speed and the need for labor decreased sharply.

The localization industry is seeing the creation of its own Bessemer process, called continuous localization. Simply explained, it is a fully-connected and automated process where the content creators and developers create source material that is passed for translation in continuous, small chunks. The translated content is continually merged back, facilitating continuous deployment and release. It is an extension of the agile approach and it can be demonstrated with the example of mobile applications where latest updates are continually pushed through to our phones in multiple languages. To facilitate continuous localization, vendor platforms or computer-assisted translation (CAT) tools need to be able to connect to client systems or clients need to provide CAT tool-like interfaces for vendors and their resources to use. The process would flow seamlessly from the developer or content creator creating content to the post-editor doing edits to the machine translated content. The Bessemer process in the steel industry paved the way for large-scale continuous and efficient steel production. Similarly, continuous localization has the potential to pave the way for large-scale continuous and efficient localization enabling companies to localize more, into more languages at lower prices.

There were many other disruptive technologies and processes that led to the Industrial Revolution. For the localization industry as well, there are several other tools and process improvements in play.

Audiovisual localization and interpretation: This is a theme that began evolving in recent years. Players like Microsoft-Skype and Google have made improvements in the text-to-speech, speech-to-text arena. The text to speech has become more human-like though it isn’t there yet. Speech-to-text has improved significantly as well, with the output quality going up and errors reducing. Interpretation is the other area where we see automated solutions springing up. Google’s new headphones are one example of automated interpretation solutions.

Automated terminology extraction: This is one that hasn’t garnered as much attention and focus. While there is consensus that terminology is an important aspect of localization quality, it always seems to be relegated to a lower tier from a technological advancement standpoint. There are several interesting commercial as well as open source solutions that have greatly improved terminology extraction and reduced the false positives. This area could potentially be served by artificial intelligence and machine learning solutions in the future.

Automated quality assurance (QA) checks: QA checks can be categorized into two main areas – functional and linguistic. In terms of functional QA, automations have been around for several years and have vastly improved over time. There is already exploration on applying machine learning and artificial intelligence to functional automations to predict bugs, to create scripts that are self-healing and so on. Linguistic QA on the other hand has seen some automation primarily in the areas of spelling and terminology checks. However, the automation is limited in what it can achieve and does not replace the need for human checks or audits. This is an area that could benefit hugely from artificial intelligence and machine learning.

Local language support using chatbots: Chatbots are fast becoming the first level of customer support for most companies. Most chatbots are still in English. However, we are starting to see chatbots in local languages powered by machine translation engines in the background thus enabling local language support for international customers.

Data (big or small): While data is not a tool, technology or process by itself, it is important to call it out. Data is central to a lot of the technologies and processes mentioned above. Without a good corpus, there is no machine translation. For automated terminology extraction and automated QA checks, the challenge is to have a big enough corpus of data making it possible to train the machine. In addition, metadata becomes critical. Today metadata is important to provide translators with additional contextual information, to ensure higher quality output. In future, metadata will provide the same information to machines – to a machine translation system, to an automated QA check and so on. This highlights the importance of data!

The evolution in localization is nothing but the forces of creative destruction. Each new process/technology is destructing an old way of operating and creating a new way forward. It also means that old jobs are being made redundant while new ones are being created.

How far is this future? Well, the entire process is extremely resource and technology intensive. Many companies will require a lot of time to adopt these practices. This provides the perfect opportunity for sellers to spruce up their offering and provide an automated digital localization solution. Companies with access to abundant resources or funding should be able to achieve this sooner. This is also why a pan-industry open source platform may accelerate this transformation.

Mastering the art of Transcreation

Mastering the art of Transcreation

Former British beauty queen, glamour model and celebrity Danielle Lloyd wanted a classy tattoo. Aside from the fact that classy tattoo is an oxymoron of the first order, her head-on collision with Latin is an object lesson in the importance of good translation. Her shoulder was supposed to read “To diminish me will only make me stronger.” It actually translates as “As who am I wearing away for myself, I only set (it) down for/on myself, strong man (that I am).”

Lloyd is far from alone in having nonsense inked into her skin. Another example that did the rounds on social media is the unfortunate woman who wanted to write, “I love [name of boyfriend],” down her spine in Hebrew, but ended up with, “Babylon is the world’s leading dictionary and translation software,” instead.

As famed oilwell firefighter Red Adair is credited with saying, if you think it’s expensive to hire a professional to do the job, wait till you hire an amateur. Yet, allegedly, many leading brands do go for the cheap – or perhaps, more accurately, the unthinking – option when it comes to translation.

Good translation doesn’t just mean faithful or accurate transcription from one language to another. If it did, Coca Cola in China would be known as, “Mare stuffed with wax,” or “Bite the wax tadpole,” which is what the Chinese characters that together make the sound ‘Coca Cola’ mean depending on the dialect (Chinese characters have both a sound and a meaning). Instead it has a different name pronounced “Kokou-Kolay”, which means, “A pleasure in the mouth.” Experts in the field refer to this highly successful strategy as transcreation rather than mere translation, or intelligent localisation.

UK-based localisation agency Conversis CEO and managing director Gary Muddyman says, “Translation is just one element of global brand management. A lot of time is invested coming up with brands and communications pieces for all global brands, in terms of visuals and graphics, but also the words. All that hard work is lost if you then do a poor translation job.

People tend to think of translation as very simple conversion; you take a load of words and you put the appropriate words for that language into a sentence and it works out. It isn’t and it doesn’t. It’s much more complicated than that, particularly when you’re talking about external communications and specifically marketing and brand pieces.

“It’s as much about look and feel as cultural adaptation. Of course, there will also be legal and regulatory differences from country to country. There will be market norms that are different from country to country, distribution differences and so on. All of that would come into localisation. Translation is purely and simply the conversion of the words. Yes, it is important to brands, but only as part of a basket of considerations that you need to make.”

Muddyman was head of corporate development at HSBC and was involved in the world’s local bank brand initiative. “After nearly 500 man hours of work in coming up with the concept, nobody mentioned translation once,” he sighs. “That’s why I’m in the translation industry. I was the person who had to catch the ball, to try to work through that particular challenge and realise I didn’t even know where to start.”

Thankfully times have changed and some companies have developed a more thoughtful and enlightened approached to adapt their messaging to the panoply of international languages. Nick Parker, strategy partner at London-based brand language agency the Writer, believes there is growing recognition of the subliminal benefits of communicating like a native speaker. “Even the effort put in to translating your message appropriately is important – it says to your customers that you’ve gone to the trouble of getting it right, which increases the strength of your relationship, Customers like and trust brands more that make the effort even if they make mistakes along the way,” he says.

Parker adds, “Look at Google. All its terms and conditions are archived so you can see how its language and tone has changed over time. It has got friendlier as time has gone on, the information is simpler and clearer, and it sounds more Google now. That rarely happens naturally. It usually is the result of a lot of work.” And investment.

Google was cited as an example by several experts in the field and all agree that it takes time and a whole lot of money to do it right. Language and transcreation agency thebigword’s chief commercial officer Josh Gould says, “Google is a multi- billion dollar company trading across a number of markets but with the same approachable, conversational tone in all. It does more training than any other company I’ve ever met. It really invests in its people and its suppliers people so that linguists speaking or writing as Google are well trained, aligned and motivated. It works.”

All those who praised Google noted that it is a tough gig with incredibly high standards. “They tell [recruits] it is Harvard. Not the Harvard of the business world, but Harvard period. The majority who enter fail,” notes a well-placed source. “It’s rigorous all right.”
This alludes to the central and somewhat thorny issue of control. Some characterise Google as an uber-controlling organisation that delivers consistency of brand experience through strict discipline. Others believe its investment of time and money in staff empowers them – liberates them – to deliver consistency by living the brand values.

Whichever the case, the spectrum of control is a pertinent consideration. “We organised a summit for the all the brand language heads,” says Parker. “And while people from BT and PWC were talking about detailed guidelines, policies and process, the guy from Innocent was explaining how he spent four months looking for a Norwegian writer whose style and tone fitted with the brand’s ethos.”

Control versus empowerment is never going to be easy to answer, or even possible given the vagaries of business and the nature of the organisation for whom one works. Both strive for consistency in delivery, which is what every brand custodian wants, but any marketer worth his or her salt also wants the brand to be meaningful to its audience, and that requires a much greater degree of flexibility.

McDonald’s has invested billions in installing its strapline, “I’m lovin’ it,” in the global consciousness and for such a brand you’d expect consistency to be king, queen and all the courtiers and for that particular phrase to be used exactly as is around the world. However, that would be a mistake in China as love is a serious word. Traditionally the word is never said aloud. Even today lovers use, “I like you” to communicate great affection without actually saying love, according to global transcreation agency Mother Tongue’s CEO Guy Gilpin’s marvelous volume ‘The Little Book of Transcreation.’ McDonald’s accepted the need to adapt and opted instead for, “I just like (it),” which is more normal, more everyday vocabulary, easier on Chinese ears and retains the youthful, confident street vibe of the English original.

There are many examples where constancy of global messaging or positioning would have been a mistake. A campaign by Intel in Brazil demonstrates the point. The English slogan, “Sponsors of Tomorrow,” translated directly in Portuguese would imply that the brand doesn’t yet deliver on its promises. “In love with tomorrow,” stays true to the values expressed in the rational original English line, but importantly is much more in keeping with a Brazilian population falling more and more in love with the latest high tech products.

When Motorola launched its Q phone in Canada it hadn’t foreseen the hilarity with which its marketing messaging would be received by French speakers. ‘Q’ sounds much like ‘cul’ – that is ass – in French. Lines like, “L’intelligence renouvelee” and “Si c’est important pour vous, c’est important pour votre Q,” in common parlance became, “My ass. Renewed intelligence,” and, “If it’s important to you, it’s important to your ass.” Pepsi’s unwitting claim to rejuvenate the dead went down in the annals of advertising history as how not to pull off a global campaign. “Come alive with Pepsi!” actually means “Pepsi. Bring your ancestors back from the dead” in Chinese.

Haribo is an institution in its home market, Germany, and its strapline, “Haribo macht Kinder froh, und Erwachsene ebenso,” works perfectly there. Literally translated it becomes the stilted, dry and decidedly unmotivating, “Haribo makes kids happy, and adults too.” It doesn’t even rhyme, damn it. How much more appropriate for the brand is the reimagined UK version? “Kids and grown-ups love it so, the happy world of Haribo.”

Translating jingles can be a nightmare, as Gillette found. The German translation of, “The best a man can get,” comes out as, “For the best inside a man,” which doesn’t make a whole lot of sense given that facial hair is on the outside, not the inside of a man. In addition, the line was too short for the music and so had to be dragged out for longer than sounds natural. And it doesn’t even rhyme with Gillette. The tortured and tortuous result: “Fur das Be-e-e-est-e-e im Ma-a-an” became a national laughing stock.

This is where transcreation comes into its own. Transcreation is the process of adapting the messaging to resonate meaningfully with the local target audience while staying true to the meaning of the original and maintaining its intent, style, tone and context. As Gilpin puts it, “Transcreation allows brands to walk the fine line that ensures they are both fresh and relevant locally, and at the same time consistent globally.”

BMW has used the tagline, “Freude am Fahren” (Pleasure in driving) since 1969 in Germany. “Would that be as effective as ‘The Ultimate Driving Machine’? Same idea, different words – that’s what transcreation is all about,” Gilpin says. Maintaining a consistent tone of voice across your communications across different languages and regions is a challenge as Parker points out. “Your brand may have adopted a friendly, approachable tone of voice, so you have to look at what are the indicators of friendliness in the particular language you are translating into. Charming has a very specific feel in English; how do you work out what is charming in Swiss German? And what if it charming in Swiss German turns out to be too idiosyncratic for your brand positioning?” he says. “There are no easy answers – it takes experience, research and creativity.”

Transcreation is about far more than words. Colours mean different things to different people. In the western world yellow is associated with cowardice, whereas in Japan it signifies courage. White is for weddings in the west and funerals in Asia. Red symbolises purity in India and passion in Europe. Lifestyle imagery must be sensitive to the audience’s experience, which is easy to observe and difficult to execute.

And don’t think going down the visual only route is a get out of jail free card. A major pharmaceutical company decided to do the visual only treatment for the international launch of a new product using pictures to explain the benefit. On the left, the ill patient; middle picture of patient taking medicine and the final shot on the right showing him recovered. The problem with that was potential customers in the United Arab Emirates read right to left.

Anomalous though it is, it may be necessary for your brand to look and sound different, and say different things to different people, in order to maintain brand consistency. “There’s a fine line between brand dilution and true localisation,” says Gould. “But without localisation you are potentially harming your business and your brand, from credibility to sales – and you may never know by how much.”

Reference: https://bit.ly/2Krw9jm

Nimdzi Language Technology Atlas

Nimdzi Language Technology Atlas

For this first version, Nimdzi has mapped over 400 different tools, and the list is growing quickly. The Atlas consists of an infographic accompanied by a curated spreadsheet with software listings for various translation and interpreting needs.

As the language industry becomes more technical and complex, there is a growing need for easy-to-understand materials explaining available tech options. The Nimdzi Language Technology Atlas provides a useful view into the relevant technologies available today.

Software users can quickly find alternatives for their current tools and evaluate market saturation in each segment at a glance. Software developers can identify competition and find opportunities in the market with underserved areas.

Reference: https://bit.ly/2ticEyT

Is There a Future in Freelance Translation? Let’s Talk About It!

Is There a Future in Freelance Translation? Let’s Talk About It!

While the demand for translation services is at a record high, many freelancers say their inflation-adjusted earnings seem to be declining. Why is this and can anything be done to reverse what some have labelled an irreversible trend?

Over the past few years globalization has brought unprecedented growth to the language services industry. Many have heard and answered the call. Census data shows that the number of translators and interpreters in the U.S. nearly doubled between 2008 and 2015, and, according to the Bureau of Labor Statistics, the employment outlook for translators and interpreters is projected to grow by 29% through 2024. In an interview with CNBC last year, ATA Past President David Rumsey stated: “As the economy becomes more globalized and businesses realize the need for translation and interpreting to market their products and services, opportunities for people with advanced language skills will continue to grow sharply.” Judging by the size of the industry—estimated at $33.5 billion back in 2012, and expected to reach $37 billion this year—it seems the demand for translation will only continue to increase.

Many long-time freelance translators, however, don’t seem to be benefitting from this growth, particularly those who don’t work with a lot of direct clients. Many report they’ve had to lower their rates and work more hours to maintain their inflation-adjusted earnings. Also, the same question seems to be popping up in articles, blogs, and online forums. Namely, if the demand for translation is increasing, along with opportunities for people with advanced language skills, why are many professional freelance translators having difficulty finding work that compensates translation for what it is—a time-intensive, complex process that requires advanced, unique, and hard-acquired skills?

Before attempting to discuss this issue, a quick disclaimer is necessary: for legal reasons, antitrust law prohibits members of associations from discussing specific rates. Therefore, the following will not mention translation rates per se. Instead, it will focus on why many experienced translators, in a booming translation market inundated by newcomers, are forced to switch gears or careers, and what can be done to reverse what some have labelled an irreversible trend.

THE (UNQUANTIFIABLE) ISSUE

I’ll be honest. Being an in-house translator with a steady salary subject to regular increases, I have no first-hand experience with the crisis many freelance translators are currently facing. But I have many friends and colleagues who do. We all do. Friends who tell us that they’ve lost long-standing clients because they couldn’t lower their rates enough to accommodate the clients’ new demands. Friends who have been translating for ages who are now wondering whether there’s a future in freelance translation.

Unfortunately, unlike the growth of the translation industry, the number of freelance translators concerned about the loss of their inflation-adjusted earnings and the future of the profession is impossible to quantify. But that doesn’t mean the problem is any less real. At least not judging by the increasing number of social media posts discussing the issue, where comments such as the ones below abound.

  • “Expenses go up, but rates have remained stagnant or decreased. It doesn’t take a genius to see that translation is slowly becoming a sideline industry rather than a full-time profession.”
  • “Some business economists claim that translation is a growth industry. The problem is that the growth is in volume, not rates.”
  • “Our industry has been growing, but average wages are going down. This means that cheap service is growing faster than quality.”

Back in 2010, Common Sense Advisory, a market research company specializing in translation and globalization, started discussing technology- and globalization-induced rate stagnation and analyzing potential causes. Now, almost 10 years later, let’s take another look at what created the crisis many freelance translators are facing today.

A LONG LIST OF INTERCONNECTED FACTORS

The causes leading to technology- and globalization-induced rate stagnation are so interconnected that it’s difficult to think of each one separately. Nevertheless, each deserves a spot on the following list.

Globalization, internet technology, and the growth of demand for translation services naturally resulted in a rise of the “supply.” In other words, an increasing number of people started offering their services as translators. Today, like all professionals affected by global competition, most freelance translators in the U.S., Canada, Australia, and Western Europe find themselves competing against a virtually infinite pool of translators who live in countries where the cost of living is much cheaper and are able to offer much lower rates. Whether those translators are genuine professional translators or opportunists selling machine translation to unsuspecting clients is almost immaterial. As the law of supply and demand dictates, when supply exceeds demand, prices generally fall.

1. The Sheer Number of Language Services Providers and the Business/Competition Model: The increase in global demand has also lead to an increase in the number of language services providers (LSPs) entering the market. Today, there are seemingly thousands of translation agencies in a market dominated by top players. Forced to keep prices down and invest in advertising and sales to maintain their competitiveness, many agencies give themselves limited options to keep profits up—the most obvious being to cut direct costs (i.e., lower rates paid to translators). Whether those agencies make a substantial profit each year (or know anything about translation itself) is beside the point. There are many LSPs out there that follow a business model that is simply not designed to serve the interests of freelance translators. Interestingly enough, competing against each other on the basis of price alone doesn’t seem to be serving their interests either, as it forces many LSPs into a self-defeating, downward spiral of dropping prices. As Luigi Muzii, an author, translator, terminologist, teacher, and entrepreneur who has been working in the industry for over 30 years, puts it:

“The industry as a whole behaves as if the market were extremely limited. It’s as if survival depended on open warfare […] by outright price competition. Constantly pushing the price down is clearly not a sustainable strategy in the long-term interests of the professional translation community.”

2. The Unregulated State of the Profession: In many countries, including the U.S., translation is a widely unregulated profession with low barriers to entry. There is also not a standardized career path stipulating the minimum level of training, experience, or credentials required. Despite the existence of ISO standards and certifications from professional associations around the globe, as long as the profession (and membership to many professional associations) remains open to anyone and everyone, competition will remain exaggeratedly and unnaturally high, keeping prices low or, worse, driving them down.

3. Technology and Technological “Improvements”: From the internet to computer-assisted translation (CAT) tools to machine translation, technology may not be directly related to technology- and globalization-induced rate stagnation, but there’s no denying it’s connected. The internet is what makes global communication and competition possible. CAT tools have improved efficiency so much in some areas that most clients have learned to expect three-tier pricing in all areas. Machine translation is what’s allowing amateurs to pass as professionals and driving the post-editing-of-machine-translation business that more and more LSPs rely on today. Whether machine translation produces quality translations, whether the post-editing of machine translation is time efficient, and whether “fuzzy matches” require less work than new content are all irrelevant questions, at least as things stand today. As long as technologies that improve (or claim to improve) efficiency exist, end clients will keep expecting prices to reflect those “improvements.”

4. Unaware, Unsuspecting, and Unconcerned Clients: Those of you who’ve read my article about “uneducated” clients may think that I’m obsessed with the subject, but to me it seems that most of the aforementioned factors have one common denominator: clients who are either unaware that all translations (and translators) are not created equal, or are simply unconcerned about the quality of the service they receive. These clients will not be willing to pay a premium price for a service they don’t consider to be premium.

One look at major translation bloopers and their financial consequences for companies such as HSBC, KFC, Ford, Pampers, Coca Cola, and many more is enough to postulate that many clients know little about translation (or the languages they’re having their texts translated into). They may be unaware that results (in terms of quality) are commensurate to a translator’s skills, experience, and expertise, the technique/technology used for translating, and the time spent on a project. And who’s to blame them? Anyone with two eyes is capable of looking at a bad paint job and seeing it for what it is, but it requires a trained eye to spot a poor translation and knowledge of the translation process itself (and language in general) to value translation for what it is.

Then there’s the (thankfully marginal) number of clients who simply don’t care about the quality of the service they receive, or whether the translation makes sense or not. This has the unfortunate effect of devaluing our work and the profession in the eyes of the general public. Regrettably, when something is perceived as being of little value, it doesn’t tend to fetch premium prices. As ATA Treasurer John Milan writes:

“When consumers perceive value, they [clients] are more willing to pay for it, which raises a series of questions for our market. Do buyers of language services understand the services being offered? What value do they put on them? […] All these variables will have an impact on final market rates.”

5. The Economy/The Economical State of Mind: Whether clients need or want to save money on language services, there’s no denying that everyone always seems to be looking for a bargain these days. Those of us who have outsourced translation on behalf of clients know that, more often than not, what drives a client’s decision to choose a service provider over another is price, especially when many LSPs make the same claims about their qualifications, quality assurance processes, and industry expertise.

6. Other Factors: From online platforms and auction sites that encourage price-based bidding and undifferentiated global competition, to LSPs making the post-editing of machine translation the cornerstone of their business, to professional translators willing to drop their rates to extreme lows, there are many other factors that may be responsible for the state of things. However, they’re more byproducts of the situation than factors themselves.

A VERY REAL CONCERN

Rising global competition and rate stagnation are hardly a unique situation. Today, freelance web designers, search engine optimization specialists, graphic designers, and many other professionals in the U.S., Canada, Australia, and Western Europe must compete against counterparts in India, China, and other parts of the world where the cost of living is much cheaper—with the difference that product/service quality isn’t necessarily sacrificed in the process. And that may be the major distinction between what’s happening in our industry and others: the risk posed to translation itself, both as an art form and as a product/service.

While some talk about the “uberization” or “uberification” of the translation industry or blame technology (namely, machine translation) for declining rates, others point a finger at a business model (i.e., the business/competition model) that marginalizes the best translators and creates a system where “bad translators are driving out the good ones.” The outcome seems to be the same no matter which theory we examine: the number of qualified translators (and the quality of translations) is in danger of going down over time. As Luigi Muzii explains:

“The unprecedented growth in demand for translation in tandem with the effect of Gresham’s Law [i.e., bad translators driving out the good ones] will lead inexorably to a chronic shortfall of qualified language specialists. The gap between the lower and the higher ends of the translation labor market is widening and the process will inevitably continue.”

Between 2006 and 2012, Common Sense Advisory conducted a regular business confidence survey among LSPs. During those years, there seemed to be an increase in the number of LSPs that reported having difficulty finding enough qualified language specialists to meet their needs. Since the number of translators varies depending on the language pair, the shortage may not yet be apparent in all segments of the industry, but the trend is obviously noticeable enough that an increasing number of professionals (translators, LSPs, business analysts, etc.) are worrying about it. And all are wondering the same thing: can anything be done to reverse it?

ARE THERE ANY “SOLUTIONS?”

In terms of solutions, two types have been discussed in recent years: micro solutions (i.e., individual measures that may help individual translators maintain their rates or get more work), and macro solutions (i.e., large-scale measures that may help the entire profession on a long-term basis).

On the micro-solution side, we generally find:

  • Differentiation (skills, expertise, productivity, degree, etc.)
  • Specialization (language, subject area, market, translation sub-fields such a transcreation)
  • Diversification (number of languages or services offered, etc.)
  • Presentation (marketing efforts, business practices, etc.)
  • Client education

Generally speaking, micro solutions tend to benefit only the person implementing them, although it can be argued that anything that can be done to improve one’s image as a professional and educate clients might also benefit the profession as a whole, albeit to a lesser degree.

On the macro-solution side, we find things that individual translators have somewhat limited power over. But professional associations (and even governments) may be able to help!

Large-Scale Client Education: Large-scale client education is possibly the cornerstone of change; the one thing that may change consumer perception and revalue the profession in the eyes of the general public. As ATA Treasurer John Milan puts it:

“Together, we can educate the public and ensure that our consumers value us more like diamonds and less like water”

Most professional associations around the globe already publish client education material, such as Translation, Getting it Right— A Guide to Buying Translation. Other initiatives designed to raise awareness about translation, such as ATA’s School Outreach Program, are also helpful because they educate the next generation of clients. But some argue that client education could be more “aggressive.” In other words, professional associations should not wait for inquiring clients to look for information, but take the information to everyone, carrying out highly visible public outreach campaigns (e.g., advertising, articles, and columns in the general media). ATA’s Public Relations Committee has been very active in this area, including publishing articles written by its Writers Group in over 85 trade and business publications.

Some have also mentioned that having professional associations take a clear position on issues such as machine translation and the post-editing of machine translation would also go a long way in changing consumer perception. In this regard, many salute ATA’s first Advocacy Day last October in Washington, DC, when 50 translators and interpreters approached the U.S. Congress on issues affecting our industry, including machine translation and the “lowest-price-technically-available” model often used by the government to contract language services. However, the success of large-scale client education may be hindered by one fundamental element, at least in the United States.

Language Education: I’m a firm believer that there are some things that one must have some personal experience with to value. For example, a small business owner might think that tax preparation is easy (and undervalue the service provided by his CPA) until he tries to prepare his business taxes himself and realizes how difficult and time consuming it is—not to mention the level of expertise required!

Similarly, monolingual people may be told or “understand” that translation is a complex process that requires a particular set of skills, or that being bilingual doesn’t make you a translator any more than having two hands makes you a concert pianist. But unless they have studied another language (or, in the case of bilingual people, have formally studied their second language or have tried their hand at translation), they’re not likely to truly comprehend the amount of work and expertise required to translate, or value translation for what it really is.

According to the U.S. Census Bureau, the vast majority of Americans (close to 80%) remain monolingual, and only 10% of the U.S. population speak another language well. In their 2017 report on the state of language education in the U.S., the Commission on Language Learning concluded that the U.S. lags behind most nations when it comes to language education and knowledge, and recommended a national strategy to improve access to language learning and “value language education as a persistent national need.”

Until language education improves and most potential clients have studied a second language, one might contend that the vast majority of Americans are likely to keep undervaluing translation services and that large-scale client education may not yield the hoped-for results. This leaves us with one option when it comes to addressing the technology- and globalization-induced rate stagnation conundrum.

Industry-Wide Regulations: In most countries, physicians are expected to have a medical degree, undergo certification, and get licensed to practice medicine. The same applies to dentists, nurses, lawyers, plumbers, electricians, and many other professions. In those fields, mandatory education, training, and/or licensing/certification establish core standards and set an expected proficiency level that clients have learned to expect and trust—a proficiency level that all clients value.

Whether we’re talking of regulating access to the profession itself or controlling access to professional associations or online bidding platforms, there’s no question that implementing industry-wide regulations would go a long way in limiting wild, undifferentiated competition and assuring clients that they are receiving the best possible service. While some may think that regulations are not a practical option, it may be helpful to remember that physicians didn’t always have to undergo training, certification, and licensing to practice medicine in the U.S. Today, however, around 85% of physicians in the U.S. are certified by an accredited medical board, and it’s safe to say that all American physicians have a medical degree and are licensed to practice medicine. And the general public wouldn’t want it any other way! Is it so implausible to expect that the same people who would let no one except a qualified surgeon operate on them would want no one except a qualified professional translate the maintenance manual of their nation’s nuclear reactors?

SO, WHAT DOES THE FUTURE HOLD FOR FREELANCE TRANSLATORS?

Generally speaking, most experts agree that the demand for translation services will keep growing, that technology will keep becoming more and more prevalent, and that the translation industry will become even more fragmented. According to Luigi Muzii:

In the immediate future, I see the translation industry remaining highly fragmented with an even larger concentration of the volume of business in the hands of a bunch of multi-language vendors who hire translators from the lower layer of the resource market to keep competing on price. This side of the industry will soon count for more than a half of the pie. The other side will be made up of tiny local boutique firms and tech-savvy translator pools making use of cutting-edge collaborative tools. […] The prevailing model will be “freeconomics,” where basic services are offered for free while advanced or special features are charged at a premium. The future is in disintermediation and collaboration. […] The winners will be those translators who can leverage their specialist linguistic skills by increasing their productivity with advances in technology.

The future of freelance translation, however, may be a bit more uncertain. Indeed, many argue that even with acute specialization, first-rate translation skills, and marketing abilities to match, many freelance translators’ chances at succeeding financially in the long term may be limited by the lack of industry regulations and the general public’s lack of language education/knowledge (i.e., the two factors that feed wild, undifferentiated competition). But that’s not to say there’s no hope.

At least that’s what learning about the history of vanilla production taught me. Growing and curing vanilla beans is a time-intensive, labor-intensive, intricate process. It’s a process that meant that for over 150 years vanilla was considered a premium product, and vanilla growers made a decent living. When vanillin (i.e., synthetic vanilla flavoring) became widely available in the 1950s, however, most food manufacturers switched to the less expensive alternative. After only a few decades, many vanilla growers were out of business and the ones who endured barely made a living, forced to lower prices or resort to production shortcuts (which reduced quality) to sell faster. During that period, the only people making a profit were the vanilla brokers. At the beginning of the 21st century, however, nutrition education and consumer demand for all-natural foods started turning things around, and by 2015 vanillin had fallen from grace and natural vanilla was in high demand again. By then, however, there were few vanilla growers left and climate change was affecting production and reducing supply significantly. Today, vanilla beans fetch 30–50 times the price they did during the vanillin era.

For those who may have missed the analogy: professional (freelance) translators are to the translation industry what the vanilla growers are to the food industry. Those who endure the current technology- and globalization-induced rate stagnation may eventually (if the forces at play can be harnessed) witness a resurgence. In the meantime, the best we can do is to keep doing what we do (provide quality service, educate our clients, fight for better language education in the U.S., and support our professional associations’ initiatives to improve things), and talk constructively about the issue instead of pretending that it doesn’t exist, that it won’t affect us, or that nothing can be done about it. If you’re reading this article, things have already started to change!

 

Reference: https://bit.ly/2K3t1Xe

The Language Industry According to LinkedIn

The Language Industry According to LinkedIn

Professional networking site LinkedIn has continued to grow since it was acquired by Microsoft for a whopping USD 26.2bn in late 2016. The site now has more than 500 million users and reportedly generated USD 1.3bn in revenues in the first quarter of 2018.

While many people continue to see LinkedIn as an online version of their resume, an increasing number of professionals find the site useful for personal branding, sales, business development, and research. Different from other social media sites like Facebook and Twitter, LinkedIn generates much of its revenue not from ad sales but from subscription services for recruiters and business development professionals. Paid subscribers are able to search LinkedIn’s extensive database in much more granular detail, which is useful for targeting potential recruits or prospective clients.

Some premium subscriptions such as Sales Navigator enable searches based on industry categories. One of the 147 such industry categories featured on LinkedIn is Translation and Localization. While not among the top categories – that honor goes to IT and Services(15m profiles), Financial Services (8.5m) and Computer Software (7.6m), the Translation and Localization category still lists an impressive 603,700 professional profiles and 21,400 so called “accounts”, i.e. LinkedIn company pages.

For what it’s worth, we sliced and diced that data and compiled a list of the top 50 countries by professional profiles and top 50 countries by company pages.

LinkedIn: Top 50 Countries in “Translation and Localization” (Personal)

Total number of personal LinkedIn profiles per country as of May 2, 2018 (top 50 countries) under industry category “Translation and Localization”

On a continental scale, Europe takes a clear lead over both North America and Asia. To the 11 translators apparently typing away in Antarctica, we salute you.

Language Industry on LinkedIn by Continent

Company pages and professional profiles in the “Translation & Localization” category by continent

Finally, let’s look at a selection of leading language industry providers and their following on the social network. Just as in real life (i.e. in terms of revenue), Lionbridge and TransPerfect battle it out for number of profiles and followers. Employees at SDL, meanwhile, seem to be more present on LinkedIn in general since, despite the relatively lower number of staff in the real world, SDL beats both TransPerfect and Lionbridge when it comes to LinkedIn profiles.

LinkedIn Presence of Large Language Service Providers

Profiles and Followers of 10 large language service providers

Of course, data from LinkedIn does not present a fully accurate picture of the size and distribution of the language industry in the real world. In Germany, to name just one example, LinkedIn struggles to gain a dominant position, competing with local alternatives such as Xing. Furthermore, translation and localization professionals working internally at large corporations may not choose Translation and Localization as their category but rather their employer’s industry.

That said, crunching LinkedIn’s Translation and Localization numbers is still interesting since it enables you to get a feel for just how big and widely-distributed this industry is.

Reference: https://bit.ly/2JP7iCa

SDL Cracks Russian to English Neural Machine Translation

SDL Cracks Russian to English Neural Machine Translation

On 19 June 2018, SDL published a press release to announce that its next-generation SDL Neural Machine Translation (NMT) 2.0 has mastered Russian to English translation, one of the toughest linguistic Artificial Intelligence (AI) problems to date.

SDL NMT 2.0 outperformed all industry standards, setting a benchmark for Russian to English machine translation, with over 90% of the system’s output labelled as perfect by professional Russian-English translators. The new SDL NMT 2.0 Russian engine is being made available to enterprise customers via SDL Enterprise Translation Server (ETS), a secure NMT product, enabling organizations to translate large volumes of information into multiple languages.

“One of the toughest linguistic challenges facing the machine translation community has been overcome by our team,” said Adolfo Hernandez, CEO, SDL. “It was the Russian language that first inspired the science and research behind machine translation, and since then it has always been a major challenge for the community. SDL has deployed breakthrough research strategies to master these difficult languages, and support the global expansion of its enterprise customers. We have pushed the boundaries and raised the performance bar even higher, and we are now paving the way for leadership in other complex languages.

”The linguistic properties and intricacies of the Russian language relative to English make it particularly challenging for MT systems to model. Russian is a highly inflected language with different syntax, grammar, and word order compared to English. Given the complexities created by these differences between the Russian and English language, raising the translation quality has been an ongoing focus of the SDL Machine Learning R&D team.

“With over 15 years of research and innovation in machine translation, our scientists and engineers took up the challenge to bring Neural MT to the next level,” said Samad Echihabi, Head of Machine Learning R&D, SDL. “We have been evolving, optimizing and adapting our neural technology to deal with highly complex translation tasks such as Russian to English, with phenomenal results. A machine running SDL NMT 2.0 can now produce translations of Russian text virtually indistinguishable from what Russian-English bilingual humans can produce.”

SDL NMT 2.0 is optimized for both accuracy and fluency and provides a powerful paradigm to deal with morphologically rich languages. It has been designed to adapt to the quality and quantity of the data it is trained on leading to high learning efficiency. SDL NMT 2.0 is also developed with the enterprise in mind with a significant focus on translation production speed and user control via terminology support. This also adds another level of productivity to Language Services Providers, and SDL’s own translators will be first to get access and benefit from this development.

Powered by SDL NMT 2.0, SDL Enterprise Translation Server (ETS) transforms the way global enterprises understand, communicate, collaborate and do business enabling them to securely translate and deliver large volumes of content into one or more languages quickly. Offering total control and security of translation data, SDL ETS has been successfully used in the government sector as well for over a decade.

Six takeaways from LocWorld 37 in Warsaw

Six takeaways from LocWorld 37 in Warsaw

Over the past weekend, Warsaw welcomed Localization World 37 which gathered over 380 language industry professionals. Here is what Nimdzi has gathered from conversations at this premiere industry conference.

1. A boom in data processing services

A new market has formed preparing data to train machine learning algorithms. Between Lionbridge, Pactera, appen, and Welocalize  – the leading LSPs that have staked a claim in this sector – the revenue from these services already exceeds USD 100 million.

Pactera calls it “AI Enablement Services”, Lionbridge and Welocalize have labelled it “Global services for Machine Intelligence”, and appen prefers the title, “data for machine learning enhanced by human touch.” What companies really do is a variety of human tasks at scale:

  • Audio transcription
  • Proofreading
  • Annotation
  • Dialogue management

Humans help to train voice assistants and chat bots, image-recognition programs, and whatever else the Silicon Valley disruptors decide to unleash upon the world. One prime example was performed at the beginning of this year when Lionbridge recorded thousands of children pronouncing scripted phrases for a child-voice recognition engine.

Machine learning and AI are the second biggest areas for venture investment, according to dealroom.co. According to the International Data Corporation’s (IDC) forecast, this is likely to  quadruple in the next 5 years, from USD 12 billion in 2017 to USD 57.6 billion. Companies will need lots of accurate data to train their AI, hence there is significant business opportunity in data cleaning. Compared to flash platforms like Clickworker and Future Eight, LSPs have a broader human resource management competence and can compete for a large slice of the market.

2. LSP AI: Separating fact from fantasy

Artificial intelligence was high on information at #Locworld 37, but apart from the advances in machine translation, nothing radically new was presented. If any LSPs use machine learning for linguist selection, ad-hoc workflow building, or predictive quality analytics, they didn’t show it.

On the other hand, everyone is chiming in to the new buzzword. In a virtual show of hands at the AI panel discussion, an overwhelming proportion of LSP representatives voted that they already use some AI in their business. That’s pure exaggeration to put it mildly.

3. Introducing Game Global

Locworld’s Game Localization Roundtable expanded this year into a fully-fledged sister conference – the Game Global Forum. The two-day event gathered just over 100 people, including teams from King.com, Electronic Arts, Square Enix, Ubisoft, Wooga, Zenimax / Bethesda, Sony, SEGA, Bluehole and other gaming companies.

We spoke to participants on the buying side who believe the content to be very relevant, and vendors were happy with pricing – for roughly EUR 500, they were able to network with the world’s leading game localization buyers. This is much more affordable than the EUR 3,300+ price tag for the rival IQPC Game QA and Localization Conference.

Given the success of Game Global and the continued operation of the Brand2Global event, it’s fair to assume there is room for more industry-specific localization conferences.

4. TMS-buying rampage

Virtually every client company we’ve spoken to at Locworld is looking for a new translation management system. Some were looking for their first solution while others were migrating from heavy systems to more lightweight cloud-based solutions. This trend has been picked up by language technology companies which brought a record number of salespeople and unveiled new offerings.

Every buyer talked about the need for integration as well as end-to-end automation, and shared the “unless there is an integration, I won’t buy” sentiment. Both TMS providers and custom development companies such as Spartan Software are fully booked and churning out new connectors until the end of the 2018.

5. Translation tech and LSPs gear up for media localization

Entrepreneurs following the news have noticed that all four of the year’s fastest organically-growing companies are in the business of media localization. Their success made ripples that reached the general language services crowd. LSP voiceover and subtitling studios are overloaded, and conventional CAT-tools will roll out media localization capabilities this year. MemoQ will have a subtitle editor with video preview, and a bigger set of features is planned to be released by GlobalLink.

These features will make it easier for traditional LSPs to hop on the departed train of media localization. However, LSP systems won’t compare to specialized software packages that are tailored to dubbing workflow, detecting and labeling individual characters who speak in videos, tagging images with metadata, and the like.

Reference: https://bit.ly/2JZpkSM

Machine Translation From the Cold War to Deep Learning

Machine Translation From the Cold War to Deep Learning

In the beginning

The story begins in 1933. Soviet scientist Peter Troyanskii presented “the machine for the selection and printing of words when translating from one language to another” to the Academy of Sciences of the USSR. The invention was super simple — it had cards in four different languages, a typewriter, and an old-school film camera.

The operator took the first word from the text, found a corresponding card, took a photo, and typed its morphological characteristics (noun, plural, genitive) on the typewriter. The typewriter’s keys encoded one of the features. The tape and the camera’s film were used simultaneously, making a set of frames with words and their morphology.

Despite all this, as often happened in the USSR, the invention was considered “useless”. Troyanskii died of Stenocardia after trying to finish his invention for 20 years. No one in the world knew about the machine until two Soviet scientists found his patents in 1956.

It was at the beginning of the Cold War. On January 7th 1954, at IBM headquarters in New York, the Georgetown–IBM experiment started. The IBM 701 computer automatically translated 60 Russian sentences into English for the first time in history.

However, the triumphant headlines hid one little detail. No one mentioned the translated examples were carefully selected and tested to exclude any ambiguity. For everyday use, that system was no better than a pocket phrasebook. Nevertheless, this sort of arms race launched: Canada, Germany, France, and especially Japan, all joined the race for machine translation.

The race for machine translation

The vain struggles to improve machine translation lasted for forty years. In 1966, the US ALPAC committee, in its famous report, called machine translation expensive, inaccurate, and unpromising. They instead recommended focusing on dictionary development, which eliminated US researchers from the race for almost a decade.

Even so, a basis for modern Natural Language Processing was created only by the scientists and their attempts, research, and developments. All of today’s search engines, spam filters, and personal assistants appeared thanks to a bunch of countries spying on each other.

Rule-based machine translation (RBMT)

The first ideas surrounding rule-based machine translation appeared in the 70s. The scientists peered over the interpreters’ work, trying to compel the tremendously sluggish computers to repeat those actions. These systems consisted of:

  • Bilingual dictionary (RU -> EN)
  • A set of linguistic rules for each language (For example, nouns ending in certain suffixes such as -heit, -keit, -ung are feminine)

That’s it. If needed, systems could be supplemented with hacks, such as lists of names, spelling correctors, and transliterators.

PROMPT and Systran are the most famous examples of RBMT systems. Just take a look at the Aliexpress to feel the soft breath of this golden age.

But even they had some nuances and subspecies.

Direct Machine Translation

This is the most straightforward type of machine translation. It divides the text into words, translates them, slightly corrects the morphology, and harmonizes syntax to make the whole thing sound right, more or less. When the sun goes down, trained linguists write the rules for each word.

The output returns some kind of translation. Usually, it’s quite crappy. It seems that the linguists wasted their time for nothing.

Modern systems do not use this approach at all, and modern linguists are grateful.

Transfer-based Machine Translation

In contrast to direct translation, we prepare first by determining the grammatical structure of the sentence, as we were taught at school. Then we manipulate whole constructions, not words, afterwards. This helps to get quite decent conversion of the word order in translation. In theory.

In practice, it still resulted in verbatim translation and exhausted linguists. On the one hand, it brought simplified general grammar rules. But on the other, it became more complicated because of the increased number of word constructions in comparison with single words.

Interlingual Machine Translation

In this method, the source text is transformed to the intermediate representation, and is unified for all the world’s languages (interlingua). It’s the same interlingua Descartes dreamed of: a meta-language, which follows the universal rules and transforms the translation into a simple “back and forth” task. Next, interlingua would convert to any target language, and here was the singularity!

Because of the conversion, Interlingua is often confused with transfer-based systems. The difference is the linguistic rules specific to every single language and interlingua, and not the language pairs. This means, we can add a third language to the interlingua system and translate between all three. We can’t do this in transfer-based systems.

It looks perfect, but in real life it’s not. It was extremely hard to create such universal interlingua — a lot of scientists have worked on it their whole lives. They’ve not succeeded, but thanks to them we now have morphological, syntactic, and even semantic levels of representation. But the only Meaning-text theory costs a fortune!

The idea of intermediate language will be back. Let’s wait awhile.

As you can see, all RBMT are dumb and terrifying, and that’s the reason they are rarely used unless for specific cases (like the weather report translation, and so on). Among the advantages of RBMT, often mentioned are its morphological accuracy (it doesn’t confuse the words), reproducibility of results (all translators get the same result), and the ability to tune it to the subject area (to teach economists or terms specific to programmers, for example).

Even if anyone were to succeed in creating an ideal RBMT, and linguists enhanced it with all the spelling rules, there would always be some exceptions: all the irregular verbs in English, separable prefixes in German, suffixes in Russian, and situations when people just say it differently. Any attempt to take into account all the nuances would waste millions of man hours.

And don’t forget about homonyms. The same word can have a different meaning in a different context, which leads to a variety of translations. How many meanings can you catch here: I saw a man on a hill with a telescope?

Languages did not develop based on a fixed set of rules — a fact which linguists love. They were much more influenced by the history of invasions in past three hundred years. How could you explain that to a machine?

Forty years of the Cold War didn’t help in finding any distinct solution. RBMT was dead.

Example-based Machine Translation (EBMT)

Japan was especially interested in fighting for machine translation. There was no Cold War, but there were reasons: very few people in the country knew English. It promised to be quite an issue at the upcoming globalization party. So the Japanese were extremely motivated to find a working method of machine translation.

Rule-based English-Japanese translation is extremely complicated. The language structure is completely different, and almost all words have to be rearranged and new ones added. In 1984, Makoto Nagao from Kyoto University came up with the idea of using ready-made phrases instead of repeated translation.

Let’s imagine that we have to translate a simple sentence — “I’m going to the cinema.” And let’s say we’ve already translated another similar sentence — “I’m going to the theater” — and we can find the word “cinema” in the dictionary.

All we need is to figure out the difference between the two sentences, translate the missing word, and then not screw it up. The more examples we have, the better the translation.

I build phrases in unfamiliar languages exactly the same way!

EBMT showed the light of day to scientists from all over the world: it turns out, you can just feed the machine with existing translations and not spend years forming rules and exceptions. Not a revolution yet, but clearly the first step towards it. The revolutionary invention of statistical translation would happen in just five years.

Statistical Machine Translation (SMT)

In early 1990, at the IBM Research Center, a machine translation system was first shown which knew nothing about rules and linguistics as a whole. It analyzed similar texts in two languages and tried to understand the patterns.

The idea was simple yet beautiful. An identical sentence in two languages split into words, which were matched afterwards. This operation repeated about 500 million times to count, for example, how many times the word “Das Haus” translated as “house” vs “building” vs “construction”, and so on.

If most of the time the source word was translated as “house”, the machine used this. Note that we did not set any rules nor use any dictionaries — all conclusions were done by machine, guided by stats and the logic that “if people translate that way, so will I.” And so statistical translation was born.

The method was much more efficient and accurate than all the previous ones. And no linguists were needed. The more texts we used, the better translation we got.

There was still one question left: how would the machine correlate the word “Das Haus,” and the word “building” — and how would we know these were the right translations?

The answer was that we wouldn’t know. At the start, the machine assumed that the word “Das Haus” equally correlated with any word from the translated sentence. Next, when “Das Haus” appeared in other sentences, the number of correlations with the “house” would increase. That’s the “word alignment algorithm,” a typical task for university-level machine learning.

The machine needed millions and millions of sentences in two languages to collect the relevant statistics for each word. How did we get them? Well, we decided to take the abstracts of the European Parliament and the United Nations Security Council meetings — they were available in the languages of all member countries and were now available for download at UN Corporaand Europarl Corpora.

Word-based SMT

In the beginning, the first statistical translation systems worked by splitting the sentence into words, since this approach was straightforward and logical. IBM’s first statistical translation model was called Model one. Quite elegant, right? Guess what they called the second one?

Model 1: “the bag of words”

Model one used a classical approach — to split into words and count stats. The word order wasn’t taken into account. The only trick was translating one word into multiple words. For example, “Der Staubsauger” could turn into “Vacuum Cleaner,” but that didn’t mean it would turn out vice versa.

Here’re some simple implementations in Python: shawa/IBM-Model-1.

Model 2: considering the word order in sentences

The lack of knowledge about languages’ word order became a problem for Model 1, and it’s very important in some cases.

Model 2 dealt with that: it memorized the usual place the word takes at the output sentence and shuffled the words for the more natural sound at the intermediate step. Things got better, but they were still kind of crappy.

Model 3: extra fertility

New words appeared in the translation quite often, such as articles in German or using “do” when negating in English. “Ich will keine Persimonen” → “I donot want Persimmons.” To deal with it, two more steps were added to Model 3.

  • The NULL token insertion, if the machine considers the necessity of a new word
  • Choosing the right grammatical particle or word for each token-word alignment

Model 4: word alignment

Model 2 considered the word alignment, but knew nothing about the reordering. For example, adjectives would often switch places with the noun, and no matter how good the order was memorized, it wouldn’t make the output better. Therefore, Model 4 took into account the so-called “relative order” — the model learned if two words always switched places.

Model 5: bugfixes

Nothing new here. Model 5 got some more parameters for the learning and fixed the issue with conflicting word positions.

Despite their revolutionary nature, word-based systems still failed to deal with cases, gender, and homonymy. Every single word was translated in a single-true way, according to the machine. Such systems are not used anymore, as they’ve been replaced by the more advanced phrase-based methods.

Phrase-based SMT

This method is based on all the word-based translation principles: statistics, reordering, and lexical hacks. Although, for the learning, it split the text not only into words but also phrases. These were the n-grams, to be precise, which were a contiguous sequence of n words in a row.

Thus, the machine learned to translate steady combinations of words, which noticeably improved accuracy.

The trick was, the phrases were not always simple syntax constructions, and the quality of the translation dropped significantly if anyone who was aware of linguistics and the sentences’ structure interfered. Frederick Jelinek, the pioneer of the computer linguistics, joked about it once: “Every time I fire a linguist, the performance of the speech recognizer goes up.”

Besides improving accuracy, the phrase-based translation provided more options in choosing the bilingual texts for learning. For the word-based translation, the exact match of the sources was critical, which excluded any literary or free translation. The phrase-based translation had no problem learning from them. To improve the translation, researchers even started to parse the news websites in different languages for that purpose.

Starting in 2006, everyone began to use this approach. Google Translate, Yandex, Bing, and other high-profile online translators worked as phrase-based right up until 2016. Each of you can probably recall the moments when Google either translated the sentence flawlessly or resulted in complete nonsense, right? The nonsense came from phrase-based features.

The good old rule-based approach consistently provided a predictable though terrible result. The statistical methods were surprising and puzzling. Google Translate turns “three hundred” into “300” without any hesitation. That’s called a statistical anomaly.

Phrase-based translation has become so popular, that when you hear “statistical machine translation” that is what is actually meant. Up until 2016, all studies lauded phrase-based translation as the state-of-the-art. Back then, no one even thought that Google was already stoking its fires, getting ready to change our whole image of machine translation.

Syntax-based SMT

This method should also be mentioned, briefly. Many years before the emergence of neural networks, syntax-based translation was considered “the future or translation,” but the idea did not take off.

The proponents of syntax-based translation believed it was possible to merge it with the rule-based method. It’s necessary to do quite a precise syntax analysis of the sentence — to determine the subject, the predicate, and other parts of the sentence, and then to build a sentence tree. Using it, the machine learns to convert syntactic units between languages and translates the rest by words or phrases. That would have solved the word alignment issue once and for all.

The problem is, the syntactic parsing works terribly, despite the fact that we consider it solved a while ago (as we have the ready-made libraries for many languages). I tried to use syntactic trees for tasks a bit more complicated than to parse the subject and the predicate. And every single time I gave up and used another method.

Let me know in the comments if you succeed using it at least once.

Neural Machine Translation (NMT)

A quite amusing paper on using neural networks in machine translation was published in 2014. The Internet didn’t notice it at all, except Google — they took out their shovels and started to dig. Two years later, in November 2016, Google made a game-changing announcement.

The idea was close to transferring the style between photos. Remember apps like Prisma, which enhanced pictures in some famous artist’s style? There was no magic. The neural network was taught to recognize the artist’s paintings. Next, the last layers containing the network’s decision were removed. The resulting stylized picture was just the intermediate image that network got. That’s the network’s fantasy, and we consider it beautiful.

If we can transfer the style to the photo, what if we try to impose another language to a source text? The text would be that precise “artist’s style,” and we would try to transfer it while keeping the essence of the image (in other words, the essence of the text).

Imagine I’m trying to describe my dog — average size, sharp nose, short tail, always barks. If I gave you this set of the dog’s features, and if the description was precise, you could draw it, even though you have never seen it.

Now, imagine the source text is the set of specific features. Basically, it means that you encode it, and let the other neural network decode it back to the text, but, in another language. The decoder only knows its language. It has no idea about of the features’ origin, but it can express them in, for example, Spanish. Continuing the analogy, it doesn’t matter how you draw the dog — with crayons, watercolor or your finger. You paint it as you can.

Once again — one neural network can only encode the sentence to the specific set of features, and another one can only decode them back to the text. Both have no idea about the each other, and each of them knows only its own language. Recall something? Interlingua is back. Ta-da.

The question is, how do we find those features? It’s obvious when we’re talking about the dog, but how to deal with the text? Thirty years ago scientists already tried to create the universal language code, and it ended in a total failure.

Nevertheless, we have deep learning now. And that’s its essential task! The primary distinction between the deep learning and classic neural networks lays precisely in the ability to search for those specific features, without any idea of their nature. If the neural network is big enough, and there are a couple of thousand video cards at hand, it’s possible to find those features in the text as well.

Theoretically, we can pass the features gotten from the neural networks to the linguists, so that they can open brave new horizons for themselves.

The question is, what type of neural network should be used for encoding and decoding? Convolutional Neural Networks (CNN) fit perfectly for pictures since they operate with independent blocks of pixels.

But there are no independent blocks in the text — every word depends on its surroundings. Text, speech, and music are always consistent. So recurrent neural networks (RNN) would be the best choice to handle them, since they remember the previous result — the prior word, in our case.

Now RNNs are used everywhere — Siri’s speech recognition (it’s parsing the sequence of sounds, where the next depends on the previous), keyboard’s tips (memorize the prior, guess the next), music generation, and even chatbots.

In two years, neural networks surpassed everything that had appeared in the past 20 years of translation. Neural translation contains 50% fewer word order mistakes, 17% fewer lexical mistakes, and 19% fewer grammar mistakes. The neural networks even learned to harmonize gender and case in different languages. And no one taught them to do so.

The most noticeable improvements occurred in fields where direct translation was never used. Statistical machine translation methods always worked using English as the key source. Thus, if you translated from Russian to German, the machine first translated the text to English and then from English to German, which leads to a double loss.

Neural translation doesn’t need that — only a decoder is required so it can work. That was the first time that direct translation between languages with no сommon dictionary became possible.

The conclusion and the future

Everyone’s still excited about the idea of “Babel fish” — instant speech translation. Google has made steps towards it with its Pixel Buds, but in fact, it’s still not what we were dreaming of. The instant speech translation is different from the usual translation. You need to know when to start translating and when to shut up and listen. I haven’t seen suitable approaches to solve this yet. Unless, maybe, Skype…

And here’s one more empty area: all the learning is limited to the set of parallel text blocks. The deepest neural networks still learn at parallel texts. We can’t teach the neural network without providing it with a source. People, instead, can complement their lexicon with reading books or articles, even if not translating them to their native language.

If people can do it, the neural network can do it too, in theory. I found only one prototype attempting to incite the network, which knows one language, to read the texts in another language in order to gain experience. I’d try it myself, but I’m silly. Ok, that’s it.

Reference: https://bit.ly/2HCmT6v

England’s Top Judge Predicts ‘the End of Interpreters’

England’s Top Judge Predicts ‘the End of Interpreters’

The top judge in England and Wales has joined the machine translation debate. And he is not mincing his words. Speaking on “The Age of Reform” at the Sir Henry Brooke Annual Lecture on June 7, 2018, the Lord Chief Justice (LCJ) of England and Wales stated “I have little doubt that within a few years high quality simultaneous translation will be available and see the end of interpreters”.

The Lord Chief Justice is the Head of the Judiciary of England and Wales. He is also the President of the Courts of England and Wales and responsible for representing the views of the judiciary to Parliament and the Government.

In his speech, the LCJ, Ian Burnett, also described current-state online instant translation as “the technological equivalent of the steam-engine” and “artificial intelligence that is the transformative technology of our age.”

He acknowledged, however, that the current ambition of “HMCTS [HM Courts & Tribunals Service] and Government is more modest but nonetheless important. It is to bring our systems up to date and to take advantage of widely available technology.”

The comment made by Lord Burnett of Maldon, who occupies one of the most senior judicial positions in the U.K., has been met with disbelief by some, with a number of industry professionals posting comments in response to an article published online by the Law Society Gazette on June 8, 2018.

“I have little doubt that within a few years high quality simultaneous translation will be available and see the end of interpreters” — Lord Burnett of Maldon

One anonymous comment read “…I feel that the LCJ simply does not have the slightest understanding of what interpreters do, or the difficulties they face, in the real world.” Another contributor said that “it is astonishing and very seriously worrying that any member of the judiciary, let alone the LCJ, can seriously think that a computer will in the foreseeable future, or even ever, be able accurately to translate the fine nuances of a legal argument or evidence.”

Interpretation services for the HMCTS are currently provided under a four-year MoJ contract worth GBP 232.4m (USD 289m), which thebigword took over from Capita TI in late 2016.

Slator reached out to language service provider (LSP) thebigword for comment, and CEO Larry Gould responded by agreeing on the one hand that “it is right to say that machine translation and AI are transforming the language sector, as they are many other parts of the economy.”

He continued in explaining that, “our experiences have taught us that AI still has a long way to go in being able to deliver the subtleties and nuances of language. At the moment these can be lost very quickly with machine translation, and this could have a big impact on access to justice and law enforcement if it is rushed out too fast.”

“(…) this could have a big impact on access to justice and law enforcement if it is rushed out too fast” — Larry Gould, CEO, thebigword

For an interpreter’s perspective, Slator also contacted Dr Jonathan Downie PhD, AITI, whose PhD was on client expectations of interpreters. Downie told us that “The Lord Chief Justice has done all interpreters a favour by raising the issue of machine interpreting and showing how persuasive the PR around it has been. He is also right that legal Interpreting is ripe for technological change.”

“We do have to remember however that so far the lab results of machine interpreting have been shown to be irrelevant to real-life. The Tencent fiasco with machine interpreting at the Boao Forum this year taught us that lesson, as has almost every public trial of the technology outside of basic conversations.”

“We do have to remember however that so far the lab results of machine interpreting have been shown to be irrelevant to real-life” — Dr Jonathan Downie PhD, AITI

“It may be meaningful that my challenge to machine interpreting companies to put their technology on trial at a realistic conference has been met with deafening silence. Could it be that they are not as convinced by their PR and marketing as the Lord Chief Justice seems to be?”

Reference: https://bit.ly/2JIotc2