Tag: Translators

Machine Translation From the Cold War to Deep Learning

Machine Translation From the Cold War to Deep Learning

In the beginning

The story begins in 1933. Soviet scientist Peter Troyanskii presented “the machine for the selection and printing of words when translating from one language to another” to the Academy of Sciences of the USSR. The invention was super simple — it had cards in four different languages, a typewriter, and an old-school film camera.

The operator took the first word from the text, found a corresponding card, took a photo, and typed its morphological characteristics (noun, plural, genitive) on the typewriter. The typewriter’s keys encoded one of the features. The tape and the camera’s film were used simultaneously, making a set of frames with words and their morphology.

Despite all this, as often happened in the USSR, the invention was considered “useless”. Troyanskii died of Stenocardia after trying to finish his invention for 20 years. No one in the world knew about the machine until two Soviet scientists found his patents in 1956.

It was at the beginning of the Cold War. On January 7th 1954, at IBM headquarters in New York, the Georgetown–IBM experiment started. The IBM 701 computer automatically translated 60 Russian sentences into English for the first time in history.

However, the triumphant headlines hid one little detail. No one mentioned the translated examples were carefully selected and tested to exclude any ambiguity. For everyday use, that system was no better than a pocket phrasebook. Nevertheless, this sort of arms race launched: Canada, Germany, France, and especially Japan, all joined the race for machine translation.

The race for machine translation

The vain struggles to improve machine translation lasted for forty years. In 1966, the US ALPAC committee, in its famous report, called machine translation expensive, inaccurate, and unpromising. They instead recommended focusing on dictionary development, which eliminated US researchers from the race for almost a decade.

Even so, a basis for modern Natural Language Processing was created only by the scientists and their attempts, research, and developments. All of today’s search engines, spam filters, and personal assistants appeared thanks to a bunch of countries spying on each other.

Rule-based machine translation (RBMT)

The first ideas surrounding rule-based machine translation appeared in the 70s. The scientists peered over the interpreters’ work, trying to compel the tremendously sluggish computers to repeat those actions. These systems consisted of:

  • Bilingual dictionary (RU -> EN)
  • A set of linguistic rules for each language (For example, nouns ending in certain suffixes such as -heit, -keit, -ung are feminine)

That’s it. If needed, systems could be supplemented with hacks, such as lists of names, spelling correctors, and transliterators.

PROMPT and Systran are the most famous examples of RBMT systems. Just take a look at the Aliexpress to feel the soft breath of this golden age.

But even they had some nuances and subspecies.

Direct Machine Translation

This is the most straightforward type of machine translation. It divides the text into words, translates them, slightly corrects the morphology, and harmonizes syntax to make the whole thing sound right, more or less. When the sun goes down, trained linguists write the rules for each word.

The output returns some kind of translation. Usually, it’s quite crappy. It seems that the linguists wasted their time for nothing.

Modern systems do not use this approach at all, and modern linguists are grateful.

Transfer-based Machine Translation

In contrast to direct translation, we prepare first by determining the grammatical structure of the sentence, as we were taught at school. Then we manipulate whole constructions, not words, afterwards. This helps to get quite decent conversion of the word order in translation. In theory.

In practice, it still resulted in verbatim translation and exhausted linguists. On the one hand, it brought simplified general grammar rules. But on the other, it became more complicated because of the increased number of word constructions in comparison with single words.

Interlingual Machine Translation

In this method, the source text is transformed to the intermediate representation, and is unified for all the world’s languages (interlingua). It’s the same interlingua Descartes dreamed of: a meta-language, which follows the universal rules and transforms the translation into a simple “back and forth” task. Next, interlingua would convert to any target language, and here was the singularity!

Because of the conversion, Interlingua is often confused with transfer-based systems. The difference is the linguistic rules specific to every single language and interlingua, and not the language pairs. This means, we can add a third language to the interlingua system and translate between all three. We can’t do this in transfer-based systems.

It looks perfect, but in real life it’s not. It was extremely hard to create such universal interlingua — a lot of scientists have worked on it their whole lives. They’ve not succeeded, but thanks to them we now have morphological, syntactic, and even semantic levels of representation. But the only Meaning-text theory costs a fortune!

The idea of intermediate language will be back. Let’s wait awhile.

As you can see, all RBMT are dumb and terrifying, and that’s the reason they are rarely used unless for specific cases (like the weather report translation, and so on). Among the advantages of RBMT, often mentioned are its morphological accuracy (it doesn’t confuse the words), reproducibility of results (all translators get the same result), and the ability to tune it to the subject area (to teach economists or terms specific to programmers, for example).

Even if anyone were to succeed in creating an ideal RBMT, and linguists enhanced it with all the spelling rules, there would always be some exceptions: all the irregular verbs in English, separable prefixes in German, suffixes in Russian, and situations when people just say it differently. Any attempt to take into account all the nuances would waste millions of man hours.

And don’t forget about homonyms. The same word can have a different meaning in a different context, which leads to a variety of translations. How many meanings can you catch here: I saw a man on a hill with a telescope?

Languages did not develop based on a fixed set of rules — a fact which linguists love. They were much more influenced by the history of invasions in past three hundred years. How could you explain that to a machine?

Forty years of the Cold War didn’t help in finding any distinct solution. RBMT was dead.

Example-based Machine Translation (EBMT)

Japan was especially interested in fighting for machine translation. There was no Cold War, but there were reasons: very few people in the country knew English. It promised to be quite an issue at the upcoming globalization party. So the Japanese were extremely motivated to find a working method of machine translation.

Rule-based English-Japanese translation is extremely complicated. The language structure is completely different, and almost all words have to be rearranged and new ones added. In 1984, Makoto Nagao from Kyoto University came up with the idea of using ready-made phrases instead of repeated translation.

Let’s imagine that we have to translate a simple sentence — “I’m going to the cinema.” And let’s say we’ve already translated another similar sentence — “I’m going to the theater” — and we can find the word “cinema” in the dictionary.

All we need is to figure out the difference between the two sentences, translate the missing word, and then not screw it up. The more examples we have, the better the translation.

I build phrases in unfamiliar languages exactly the same way!

EBMT showed the light of day to scientists from all over the world: it turns out, you can just feed the machine with existing translations and not spend years forming rules and exceptions. Not a revolution yet, but clearly the first step towards it. The revolutionary invention of statistical translation would happen in just five years.

Statistical Machine Translation (SMT)

In early 1990, at the IBM Research Center, a machine translation system was first shown which knew nothing about rules and linguistics as a whole. It analyzed similar texts in two languages and tried to understand the patterns.

The idea was simple yet beautiful. An identical sentence in two languages split into words, which were matched afterwards. This operation repeated about 500 million times to count, for example, how many times the word “Das Haus” translated as “house” vs “building” vs “construction”, and so on.

If most of the time the source word was translated as “house”, the machine used this. Note that we did not set any rules nor use any dictionaries — all conclusions were done by machine, guided by stats and the logic that “if people translate that way, so will I.” And so statistical translation was born.

The method was much more efficient and accurate than all the previous ones. And no linguists were needed. The more texts we used, the better translation we got.

There was still one question left: how would the machine correlate the word “Das Haus,” and the word “building” — and how would we know these were the right translations?

The answer was that we wouldn’t know. At the start, the machine assumed that the word “Das Haus” equally correlated with any word from the translated sentence. Next, when “Das Haus” appeared in other sentences, the number of correlations with the “house” would increase. That’s the “word alignment algorithm,” a typical task for university-level machine learning.

The machine needed millions and millions of sentences in two languages to collect the relevant statistics for each word. How did we get them? Well, we decided to take the abstracts of the European Parliament and the United Nations Security Council meetings — they were available in the languages of all member countries and were now available for download at UN Corporaand Europarl Corpora.

Word-based SMT

In the beginning, the first statistical translation systems worked by splitting the sentence into words, since this approach was straightforward and logical. IBM’s first statistical translation model was called Model one. Quite elegant, right? Guess what they called the second one?

Model 1: “the bag of words”

Model one used a classical approach — to split into words and count stats. The word order wasn’t taken into account. The only trick was translating one word into multiple words. For example, “Der Staubsauger” could turn into “Vacuum Cleaner,” but that didn’t mean it would turn out vice versa.

Here’re some simple implementations in Python: shawa/IBM-Model-1.

Model 2: considering the word order in sentences

The lack of knowledge about languages’ word order became a problem for Model 1, and it’s very important in some cases.

Model 2 dealt with that: it memorized the usual place the word takes at the output sentence and shuffled the words for the more natural sound at the intermediate step. Things got better, but they were still kind of crappy.

Model 3: extra fertility

New words appeared in the translation quite often, such as articles in German or using “do” when negating in English. “Ich will keine Persimonen” → “I donot want Persimmons.” To deal with it, two more steps were added to Model 3.

  • The NULL token insertion, if the machine considers the necessity of a new word
  • Choosing the right grammatical particle or word for each token-word alignment

Model 4: word alignment

Model 2 considered the word alignment, but knew nothing about the reordering. For example, adjectives would often switch places with the noun, and no matter how good the order was memorized, it wouldn’t make the output better. Therefore, Model 4 took into account the so-called “relative order” — the model learned if two words always switched places.

Model 5: bugfixes

Nothing new here. Model 5 got some more parameters for the learning and fixed the issue with conflicting word positions.

Despite their revolutionary nature, word-based systems still failed to deal with cases, gender, and homonymy. Every single word was translated in a single-true way, according to the machine. Such systems are not used anymore, as they’ve been replaced by the more advanced phrase-based methods.

Phrase-based SMT

This method is based on all the word-based translation principles: statistics, reordering, and lexical hacks. Although, for the learning, it split the text not only into words but also phrases. These were the n-grams, to be precise, which were a contiguous sequence of n words in a row.

Thus, the machine learned to translate steady combinations of words, which noticeably improved accuracy.

The trick was, the phrases were not always simple syntax constructions, and the quality of the translation dropped significantly if anyone who was aware of linguistics and the sentences’ structure interfered. Frederick Jelinek, the pioneer of the computer linguistics, joked about it once: “Every time I fire a linguist, the performance of the speech recognizer goes up.”

Besides improving accuracy, the phrase-based translation provided more options in choosing the bilingual texts for learning. For the word-based translation, the exact match of the sources was critical, which excluded any literary or free translation. The phrase-based translation had no problem learning from them. To improve the translation, researchers even started to parse the news websites in different languages for that purpose.

Starting in 2006, everyone began to use this approach. Google Translate, Yandex, Bing, and other high-profile online translators worked as phrase-based right up until 2016. Each of you can probably recall the moments when Google either translated the sentence flawlessly or resulted in complete nonsense, right? The nonsense came from phrase-based features.

The good old rule-based approach consistently provided a predictable though terrible result. The statistical methods were surprising and puzzling. Google Translate turns “three hundred” into “300” without any hesitation. That’s called a statistical anomaly.

Phrase-based translation has become so popular, that when you hear “statistical machine translation” that is what is actually meant. Up until 2016, all studies lauded phrase-based translation as the state-of-the-art. Back then, no one even thought that Google was already stoking its fires, getting ready to change our whole image of machine translation.

Syntax-based SMT

This method should also be mentioned, briefly. Many years before the emergence of neural networks, syntax-based translation was considered “the future or translation,” but the idea did not take off.

The proponents of syntax-based translation believed it was possible to merge it with the rule-based method. It’s necessary to do quite a precise syntax analysis of the sentence — to determine the subject, the predicate, and other parts of the sentence, and then to build a sentence tree. Using it, the machine learns to convert syntactic units between languages and translates the rest by words or phrases. That would have solved the word alignment issue once and for all.

The problem is, the syntactic parsing works terribly, despite the fact that we consider it solved a while ago (as we have the ready-made libraries for many languages). I tried to use syntactic trees for tasks a bit more complicated than to parse the subject and the predicate. And every single time I gave up and used another method.

Let me know in the comments if you succeed using it at least once.

Neural Machine Translation (NMT)

A quite amusing paper on using neural networks in machine translation was published in 2014. The Internet didn’t notice it at all, except Google — they took out their shovels and started to dig. Two years later, in November 2016, Google made a game-changing announcement.

The idea was close to transferring the style between photos. Remember apps like Prisma, which enhanced pictures in some famous artist’s style? There was no magic. The neural network was taught to recognize the artist’s paintings. Next, the last layers containing the network’s decision were removed. The resulting stylized picture was just the intermediate image that network got. That’s the network’s fantasy, and we consider it beautiful.

If we can transfer the style to the photo, what if we try to impose another language to a source text? The text would be that precise “artist’s style,” and we would try to transfer it while keeping the essence of the image (in other words, the essence of the text).

Imagine I’m trying to describe my dog — average size, sharp nose, short tail, always barks. If I gave you this set of the dog’s features, and if the description was precise, you could draw it, even though you have never seen it.

Now, imagine the source text is the set of specific features. Basically, it means that you encode it, and let the other neural network decode it back to the text, but, in another language. The decoder only knows its language. It has no idea about of the features’ origin, but it can express them in, for example, Spanish. Continuing the analogy, it doesn’t matter how you draw the dog — with crayons, watercolor or your finger. You paint it as you can.

Once again — one neural network can only encode the sentence to the specific set of features, and another one can only decode them back to the text. Both have no idea about the each other, and each of them knows only its own language. Recall something? Interlingua is back. Ta-da.

The question is, how do we find those features? It’s obvious when we’re talking about the dog, but how to deal with the text? Thirty years ago scientists already tried to create the universal language code, and it ended in a total failure.

Nevertheless, we have deep learning now. And that’s its essential task! The primary distinction between the deep learning and classic neural networks lays precisely in the ability to search for those specific features, without any idea of their nature. If the neural network is big enough, and there are a couple of thousand video cards at hand, it’s possible to find those features in the text as well.

Theoretically, we can pass the features gotten from the neural networks to the linguists, so that they can open brave new horizons for themselves.

The question is, what type of neural network should be used for encoding and decoding? Convolutional Neural Networks (CNN) fit perfectly for pictures since they operate with independent blocks of pixels.

But there are no independent blocks in the text — every word depends on its surroundings. Text, speech, and music are always consistent. So recurrent neural networks (RNN) would be the best choice to handle them, since they remember the previous result — the prior word, in our case.

Now RNNs are used everywhere — Siri’s speech recognition (it’s parsing the sequence of sounds, where the next depends on the previous), keyboard’s tips (memorize the prior, guess the next), music generation, and even chatbots.

In two years, neural networks surpassed everything that had appeared in the past 20 years of translation. Neural translation contains 50% fewer word order mistakes, 17% fewer lexical mistakes, and 19% fewer grammar mistakes. The neural networks even learned to harmonize gender and case in different languages. And no one taught them to do so.

The most noticeable improvements occurred in fields where direct translation was never used. Statistical machine translation methods always worked using English as the key source. Thus, if you translated from Russian to German, the machine first translated the text to English and then from English to German, which leads to a double loss.

Neural translation doesn’t need that — only a decoder is required so it can work. That was the first time that direct translation between languages with no сommon dictionary became possible.

The conclusion and the future

Everyone’s still excited about the idea of “Babel fish” — instant speech translation. Google has made steps towards it with its Pixel Buds, but in fact, it’s still not what we were dreaming of. The instant speech translation is different from the usual translation. You need to know when to start translating and when to shut up and listen. I haven’t seen suitable approaches to solve this yet. Unless, maybe, Skype…

And here’s one more empty area: all the learning is limited to the set of parallel text blocks. The deepest neural networks still learn at parallel texts. We can’t teach the neural network without providing it with a source. People, instead, can complement their lexicon with reading books or articles, even if not translating them to their native language.

If people can do it, the neural network can do it too, in theory. I found only one prototype attempting to incite the network, which knows one language, to read the texts in another language in order to gain experience. I’d try it myself, but I’m silly. Ok, that’s it.

Reference: https://bit.ly/2HCmT6v

The Stunning Variety of Job Titles in the Language Industry

The Stunning Variety of Job Titles in the Language Industry

Slator published an amazing report about the job titles used in the language industry in LinkedIn. They have identified over 600 unique titles…and counting! An impressive total for what is often referred to as a niche industry. Here they ask What does it all mean?

Project Management

While Transcreation and Localization indicate that a Project Manager is operating within the language industry (rather than in Software or Construction, for example), the AssociateSenior and Principal prefaces are indicative of the job level. Hyphens also seem to be en vogue on LinkedIn, and used mainly to denote the specific customer segment, as in the case of “Project Manager – Life Sciences”. We also see Language Manager or Translation Manager, although these seem to be more in use when a Project Manager is responsible for an inhouse linguistic team.

Coordinator and Manager appear to be used somewhat interchangeably across the industry, but where one company uses both titles, Manager is usually more senior. So how do you tell where a Project Coordinator ends and a Project Manager begins, especially if the lines are blurred further with the Associate, Principal or Senior modifiers?

Some companies reserve the Project Manager title for those who are customer facing, while Coordinators might remain more internally focused (e.g. performing administrative and linguist-related tasks but not interfacing with the customers). To make this same distinction, some LSPs are increasingly using Customer Success Manager, a title that presumably has its origin among Silicon Valley startups.

The Program Manager title is also emerging as a mid to senior job title in Project Management on technology and other large accounts, with an element of people or portfolio management involved as well. In other companies, Account Manager can also be used to describe a similar role within Project Management, specifically customer focused, and often also involving a degree of people or performance management.

Confusingly, Account Managers in many LSPs are part of the Sales function, with revenue / retention targets attached. Likewise, the Customer Success Manager job title is broad and ambiguous since it can also apply to both Sales and Project Management staff.

Sales and Business Development

Across the Sales function, we find a similar array of job titles: from Business Development Manager and Senior Localization Strategy Consultant to Strategic Account Executive and Vice President of Sales. Preferences range from specific to vague on a spectrum of transparency, with the slightly softer BD title being more favored among the frontline Sales staff in LSPs. We also note the C-Suite title Chief Revenue Officerentering the arena as someone responsible for the revenue generating activities of Marketing and Sales teams, and offer a special mention to the Bid Managers and Pre-Sales teams.

Solutions

At the center of the Sales, Operations and Technology Venn diagram are the Solutions teams, striving to solve the most complex of customer and prospective client “puzzles”. From the generic Solutions ArchitectDirector of Client Solutions, Solutions Consulting and Director of Technology Solutions, to the more specific Cloud Solutions Architect or Solutions Manager for Machine Intelligence, these individuals help make the promises of Sales a reality for the customer by enabling the Operations teams to deliver the right product in the right way.

Vendor Management

It’s a similar state of affairs across the Vendor Management function. Here we find Global Procurement Directors, Supplier Relations Managers, Area Sourcing Managers, Supply Chain Managers and Talent Program Managers, all dedicated to the managing the pool of linguists and other linguistic subcontractors within an LSP.

Linguists

Arguably the lifeblood of the language industry, but not every LSP has them. Companies that do have a team of linguists inhouse hire for roles such as Medical and Legal InterpreterSenior EditorTechnical TranslatorInhouse Translator/Reviser and French Translator-Subtitler, with some multi-tasking as Translator / IT Manager and Account Manager / Translator.

Tech etc.

The Technology function(s) in LSPs can be a bit of a catch-all for employees working on IT, software development and functional QA activities, within many coming from outside the industry originally. The extent to which an LSP develops its own solutions inhouse will determine the technicality of the job titles assigned to Technology staff, and some language industry old-timers may be hard-pressed to tell their Junior Full Stack Software Developer from their Senior UX Designer and their Product Managers from their Project Manager. Other Tech-type job roles include QA Automation EngineerAssociate Customer Support EngineerChief Information Officer, and Sound Engineer.

Back-Office

Perhaps the most standardized and least localization-specific area of the language industry, the back-office and shared-services functions house the likes of marketing, payroll, HR, finance, and accounting professionals. Behind the scenes here can be found HR SpecialistsHR Generalists (and everything in between), your friendly Director of Talent Acquisition as well as Financial Accounting Managers, Group Financial Controllers, and not forgetting General Counsel.

Why The Variety?

There are many elements at play in explaining the mindblowing variety of job titles found in the language industry. Some of the key factors include:

  • Geography – While variants of the VP title are seen more in the US, Asia tends to favour Area or Country Managers. By contrast, Directors and Heads of are most likely to be found in Europe.
  • Customer Base – Some companies tap into the idea of using job titles strategically to mirror the language used by their clients, hence Customer Success Manager in a Tech-focused LSP, or Principal Project Manager in one servicing a Financial customer base.
  • Organizational Design – Flatter organizations typically differentiate less between job levels while others design progressively more senior titles as a people management / motivational tool. Internally, an employee may achieve levels of progression (junior, senior or level 1, 2, 3 etc.), without the external facing job title having changed. This contributes to giving companies a….
  • Competitive Edge – Helpfully, job titles that are ambiguous are less understandable to those outside the business, which can make it harder for competitors to poach the best employees.
  • Creative License – Since LinkedIn profiles are normally owned by individuals, employees have a certain leave to embellish on their actual job titles.

In alongside the obvious and mundane, the vague and ambiguous are also some intriguing job titles: we spotted Traffic Coordinator, People Ops and Quality Rater to name just a few.

Reference: https://bit.ly/2JbQpl6

2018 European Language Industry Survey Results

2018 European Language Industry Survey Results

GALA published the 2018 survey results for European Language Industry. In the preamble, it appears to be one of the most successful  surveys from its kind.

With 1285 responses from 55 countries, including many outside Europe, this 2018 edition of the European Language Industry survey is the most successful one since its start in 2013.

This report analyses European trends rather than those in individual countries. Significant differences between countries will be highlighted if the number of answers from those countries is sufficiently high to draw meaningful conclusions.

Objectives of This Survey

The objectives of the survey have not changed compared to previous editions. It was not set up to gather exact quantitative data but to establish the mood of the industry. As such it does not replace other local, regional or global surveys of the language industry but adds the important dimensions of perception and trust which largely determine the actions of industry stakeholders.

The questions concerning the market as well as the open questions regarding trends and concerns are identical to those in the previous editions in order to detect changes in prevailing opinions.

The survey results report covers many aspect in the language industry. We chose the below aspects to highlight on:

Certification Requirements 

Companies report an increase in certification requirements in 2017 and consequently adjust their expectations for 2018 upward. Although most responding companies expect the requirements to stay at the current level, 25% of them expect an increase. Nobody is expecting a decrease.


Security Requirements

According to the respondents, the real increase in security requirements exceeded even the 2017 expectations, which led them to further increase their expectations for 2018.

Operational Practices

Outsourcing remains a popular practice among language service companies, with 40% indicating that they want to increase this practice. Only 2% report a decrease. Even more popular is MT post-editing this year. 37% report an increase and an additional 17% indicate that they are starting this practice.

Crowdsourcing and offshoring, both often debated in language industry forums, remain slow starters. This year 5% of the companies report to start with crowdsourcing and 4% to increase their use of this practice. Offshoring has already a slightly higher penetration and 11% of the
companies report to increase this practice, compared to 5% in 2017. An additional 3% want to start with the practice.

Note: the graph above does not represent actual usage of the practices, but the level of their expected development, determined as follows: [start * 2] + [increase] – [stop * 2] – [decrease].

Technology

Machine Translation

We will remember 2018 as the year in which more than 50% of both the companies and the individual language professionals reported that they are using MT in one form or another.

The technology cannot yet be considered mainstream, because only 22% of the LSC’s and 19% of the individuals state that they are using it daily, but the number of companies and individuals that are not using it at all has dropped to respectively 31% and 38%.

This does not mean that MT users are enthusiastically embracing the technology, as the answers in the section about negative trends testify, but it is a strong indication that the market has accepted that machine translation is here to stay.

The survey results also show that using MT does not necessarily mean investing in MT. The most popular engine is still the free Google Translate. 52% of all respondents report that they are using the site, but we see a clear difference between the various categories of respondents. While more than 70% of the respondents in training institutes report that they are using the site, only 49% of the translation companies and 52% of the individual translators state the same.

CAT and Terminology Tools

This year’s results confirm the 2017 statement that the use of CAT tools is clearly more widespread in language service companies than in the individual professionals’ community. Less than 1% of the companies report that they are not using CAT tools, compared to 13% of the
individual language professionals.

This year the survey tried to ascertain the level of competition on the CAT market. The survey results indicate that this CAT landscape is becoming more complex, but they also show that the SDL/TRADOS product suite still has a leading position in terms of installed base,
with 67% of the respondents using one or more versions of the product (ranging from 56% of the training institutes to 79% of the translation companies).

MemoQ can currently be considered as the most serious contender, with approx. 40% penetration. The top 5 is completed with Memsource, Wordfast and Across, which all remain below the 30% installed base mark.

Not surprisingly, Multiterm (the terminology tool linked with the SDL/Trados suite) is the most popular terminology tool around – except for the basic Office-type tools that are used 50% more often than Multiterm, which itself is used 6 times more often than the next in line.

Translation Management Systems

The level of penetration of translation management systems in language service companies has not significantly changed compared to 2017, with 76% of the responding companies using some type of management system.

The most popular 3rd party system in this category is Plunet, followed by XTRF. SDLTMS on the other hand seems to be more often selected by training institutes and translation departments.

Recruitment and Training

Skill Level of  New-Master Level Graduates

The results below refer to training institutes, translation companies and translation departments (359 respondents).

A majority of these respondents rate all skills of new graduates as either sufficiently developed or very well developed. Translation tool skills score lowest, despite the stronger cooperation between universities and translation professionals, and the efforts made by translation tool
providers.

10 to 15% used the “not applicable” answer, which indicates that the person who completed the survey is not involved in recruitment and therefore was not comfortable giving an opinion.

Investment in Training or Professional Development

Which Type of Training Have You Organized or Attended in 2017?

The following chart presents the popularity of the various types of training across all respondent types.

Not surprisingly, the respondents representing training institutes, translation companies and translation departments report a higher than average number of trainings organised or followed. Given the importance of lifelong learning, the 15% respondents that did not organise or follow any training in 2017 can – and should – be considered as a wakeup call for the industry at large.

Return on Investment

Training institutions, translation companies and translation departments report a considerably higher impact of training on their performance than the individual professionals, which make up most of the respondents.

Trends for The Industry 

In this edition of the survey, the open question about trends that will dominate the industry has been split to allow the respondents to distinguish between positive and negative trends.

The fact that both language service companies and individual professionals see price pressure as a prevailing negative trend but at the same time expect a status quo on pricing indicates that they are fairly confident that they will be able to withstand the pressure.

Across the board, the increase of translation demand is the most often cited positive trend for 2018, with 16% of the respondents. Advances in technology in general (including CAT), machine translation, increased professionalism and a higher awareness by the market of the importance of language services complete the top 5. Interesting to note is that quite a few respondents, in particular individual professionals, expect that the lack of quality of machine translation can lead to an increased appreciation for the quality of human translation.

That same machine translation clearly remains the number 2 among the negative trends, almost always correlated with the factor price pressure. The traditional fear that machine translation opens the door to lower quality and more competition by lower qualified translators and translation companies remains strong.

The report also includes some insights. We chose the below insights to highlight on:

1-  Most European language service companies (LSCs) can be considered to be small.
2-  The number of individual language professionals that work exclusively as subcontractors decreases with growing revenue.

3-  Legal services remain undisputedly the most widely served type of customer for both respondent types; companies and individuals. 

4-  Machine Translation engines that require financial or time investment have difficulty to attract more than minority interest.

5-  Except for “client terms and conditions” and “insufficient demand”, language service companies score all challenges higher than individual professionals.

Conclusion

This 2018 edition of the European Language Industry survey reinforces the positive image that could already be seen in the 2017 results. Virtually all parameters point to higher confidence in the market, from expected sales levels, recruitment plans and investment intentions to the expectation that 2018 prices will be stable.

2018 is clearly the year of machine translation. This is the first year that more than half of the respondents declare that they are using the technology in one way or another. On the other hand, it is too soon to conclude that MT is now part of the translation reality, with only some
20% of the language service companies and independent language professionals reporting daily usage. Neural MT has clearly not yet brought the big change that the market is expecting.

Changes to the technology questions are giving us a better view of the actual use of CAT, MT and other technologies by the various categories of respondents. New questions about internships have brought us additional insights in the way that the market is looking upon this
important tool to bridge the gap between the universities and the professional world.

Reference: http://bit.ly/2HOJEpx

The GDPR for translators: all you need to know (and do!)

The GDPR for translators: all you need to know (and do!)

1. What is the General Data Protection Regulation?

The General Data Protection Regulation, in short GDPR, is a European regulatory framework that is designed to harmonize data privacy laws across Europe. Preparation of the GDPR took four years and the regulation was finally approved by the EU Parliament on 14 April 2016. Afterwards there was a striking silence all over Europe, but with the enforcement date set on 25 May 2018 companies have worked increasingly hard in the past months to make sure that they uphold the requirements of the regulation.

The GDPR replaces the Data Protection Directive 95/46/EC. It was designed to protect and empower the data privacy of all European citizens and to reshape the way organizations approach data privacy. While the term GDPR is used all over the world, many companies have their own designation. For instance, in the Netherlands the term is translated as ‘Algemene Verordening Gegevensbescherming’ (AVG).
More information about the GDPR can be found on the special portal created by the European Union.

2. To whom does the GDPR apply?

The GDPR applies to the processing of personal data by controllers and processors in the EU. It does not matter whether the processing takes place in the EU or not. It is, however, even more extensive as it also applies to the processing of personal data of data subjects in the EU by a controller or processor who is not established in the EU when they offer goods or services to EU citizens (irrespective of whether payment is required). Finally, the GDPR applies to the monitoring of behaviour that takes place within the EU as well. If a business outside the EU processes the data of EU citizens, it is required to appoint a representative in the EU.
So in short, the GDPR applies to every instance that

  • processes personal data from EU citizens (whether they process these data in the EU or not),
  • monitors behaviour that takes place in the EU.

In fact, this means that companies inside and outside the EU that offer or sell goods or services to EU citizens (paid or not) should apply the principles.

3. Controllers, processors, data subjects?

Yes, it is confusing, but let’s keep it short:

  • Controllers are parties that control the data.
  • Processors are parties that process the data, such as third parties that process the data for … ehm controllers.
  • Data subjects are parties whose data are controlled and processed by … you guessed it.

A controller is the entity that determines the purposes, conditions and means of processing personal data. The processor processes the personal data on behalf of the controller.

4. Sounds like a business horror. Can I opt out?

Not in any easy way. Oh wait, you can by moving outside the EU, getting rid of your European clients and clients with translation jobs about their European clients, and only focus on everything that is not EU related. But staying safe is much easier for the future, although it offers considerable hassle for the time being.

5. What happens if I do not take it seriously?

Of course the European Union thought about that before you did and they included a generous clause: if you breach the GDPR, you can be fined up to 4% of your annual global turnover or €20 Million (whichever is greater). This is the maximum fine that can be imposed for the most serious infringements, like insufficient customer consent to process data or violating the core of Privacy by Design concepts.
There is a tiered approach to fines. For instance a company can be fined 2% if it does not have its records in order (article 28), if it does not notify the supervising authority and data subject (remember?) about a breach or if it does not conduct an impact assessment.

6. So adhering to the GDPR is a no-brainer?

Yes indeed. Although you certainly should use your brains. Until now it was easy to impress all parties involved by using long and unreadable contracts, but the GDPR finally puts an end to that. Companies will no longer be able to use long unintelligible terms and conditions full of legalese. They need to ask consent for processing data and the request for consent must be given in an understandable and accessible form. Consent must be clear and distinguishable from other matters and provided in an intelligible and easily accessible form, using clear and plain language. Apart from that, all data subjects (just to check) should be able to withdraw their consent as easily as they gave it.

7. So I need to involve all people for whom I process data?

Yes. You need to ask their consent, but you need to give them access to the data you hold about them as well. EU citizens from whom you collect or process data, have a few rights:

  • Right to access
    People can ask your confirmation as to whether or not personal data concerning them is being processed. They can also ask where these data are processed and for what purpose. If someone makes use of their right to access, you need to provide a copy of the personal data in an electronic format. And yes, that should happen free of charge.
  • Right to be Forgotten
    The right to be forgotten entitles the people you collect data from to require you to erase their personal data, cease further dissemination of the data, and potentially have third parties halt processing of the data. There are a few conditions however: article 17 states that the data should no longer be relevant to the original purposes for processing, or a data subject should have withdrawn his or her consent.
  • Data Portability
    The GDPR introduces the concept of data portability. This grants persons a right to receive the personal data they have previously provided about themselves in a ‘commonly us[able] and machine readable format‘. EU citizens can than transmit that data to another controller.

8. What are these personal data you are talking about?

The GDPR pivots around the concept of ‘personal data’. This is any information related to a natural person that can be used to directly or indirectly identify the person. You might think about a person’s name, photo, email address, bank details, posts on social networking websites, medical information, or a computer IP address.

9. How does this affect my translation business?

As a freelance translator or translation agency you are basically a processor. (And if you are an EU citizen you are a data subject as well, but let’s keep that out of the scope of this discussion.)
The actual impact of the GDPR on your translation business differs greatly. If you are a technical translator or literary translator, chances are that you do not process the personal data of the so-called ‘data subjects’. In that case compliance should not be a heavy burden, although you should, of course, make sure that everything is in order.
However, if you are a medical translator for instance, translating personal health records, or if you are a sworn translator, translating certificates and other personal stuff, you have somewhat more work to do.

10. Great, you made it perfectly clear. How to proceed?

The best approach to ensure compliance with the GDPR is to follow a checklist. You might chose this 5-step guide for instance. However, if that sounds too easy you might use this 10-page document with complex language to show off your GDPR skills. You will find a short summary below:

1. Get insight into your data
Understand which kind of personal data you own and look at where the data comes from, how you collected it and how you plan to use it.

2. Ask explicit consent to collect data
People need to give free, specific, informed and unambiguous consent. If someone does not respond, does not opt in themselves or is inactive, you should not consider them as having given consent. This also means you should re-consider the ways you ask for consent: chances are that your current methods to get the necessary consent are not GDPR compliant.

3. Communicate how and why you collect data
Tell your clients how you collect data, why you do that and how long you plan to retain the data. Do not forget to include which personal data you collect, how you do that, for which purpose you process them, which rights the person in question has, in what way they can complain and what process you use to send their data to third parties.
NOTE: This needs thorough consideration if you make use of the cloud (i.e. Dropbox or Google Drive) to share translations with clients or if you use cloud-based CAT tools for translation.

4. Show that you are GDPR compliant
The GDPR requires you to show that you are compliant. So identify the legal basis for data processing, document your procedures and update your privacy policy.
NOTE: If you are outsourcing translation jobs to other translators, you should sign a data processing agreement (DPA) with them.

5. Make sure you have a system to remove personal data
Imagine what happens when someone makes use of their right to access or to be forgotten. If you do not have their data readily available, you will waste your time finding it and risking still not being compliant. So make sure you have an efficient system to fulfil the rights of all those people whose data you are processing.

So, the GDPR is no joke

It is definitely not funny for any of us, but we need to comply. To be compliant or not to be compliant: that is the question. The easiest way is to do that is the required Privacy Impact Assessment, so you know which data you collect or process and what the weak links and bottlenecks are. Following an easy guide will then help to establish the necessary controls. Opting out is not an option, but making sure your data subjects (still know what they are?) are opting into is.

Reference: https://bit.ly/2L3GVZL

How to become a localization project manager

How to become a localization project manager

Excerpts from an article with the same title, written by Olga Melnikova in Multilingual Magazine.  Olga Melnikova is a project manager at Moravia and an adjunct professor at the Middlebury Institute of International Studies. She has ten years of experience in the language industry. She holds an MA in translation and localization management and two degrees in language studies.

I decided to talk to people who have been in the industry for a while, who have seen it evolve and know where it’s going. My main question was: what should a person do to start a localization project manager career? I interviewed several experts who shared their vision and perspectives — academics, industry professionals and recruiters. I spoke with Mimi Moore, account manager at Anzu Global, a recruiting company for the localization industry; Tucker Johnson, managing director of Nimdzi Insights; Max Troyer, translation and localization management program coordinator at MIIS, and Jon Ritzdorf, senior solution architect at Moravia and an adjunct professor at the University of Maryland and at MIIS. All of them are industry veterans and have extensive knowledge and understanding of its processes.

Why localization project management?

The first question is: Why localization project management? Why is this considered a move upwards compared to the work of linguists who are the industry lifeblood? According to Renato Beninatto and Tucker Johnson’s The General Theory of the Translation Company, “project
management is the most crucial function of the LSP. Project management has the potential to most powerfully impact an LSP’s ability to add value to the language services value chain.” “Project managers are absolutely the core function in a localization company,” said Johnson. “It is important to keep in mind that language services providers do not sell translation, they sell services. Project managers are responsible for coordinating and managing all of the resources that need to be coordinated in order to deliver to the client: they are managing time, money, people and technology.


Nine times out of ten, Johnson added, the project manager is the face of the company to the client. “Face-to-face contact and building the relationship are extremely important.” This is why The General Theory of the Translation Company regards project management to be one of the core functions of any language service provider (LSP). This in no way undermines the value of all the other industry players, especially
linguists who do the actual translation work. However, the industry cannot do without PMs because “total value is much higher than the original translations. This added value is at the heart of the language services industry.” This is why clients are happy to pay higher prices to work with massive multiple services providers instead of working directly with translators.

Who are they?

The next question is, how have current project managers become project managers? “From the beginning, when the industry started 20 years
ago, there were no specialized training programs for project managers,” Troyer recounted. “So there were two ways. One is you were a translator, but wanted to do something else — become an editor, for example, or start to manage translators. The other route was people working in a business that goes global. So there were two types of people who would become project managers — former translators or people who were assigned localization as a job task.”

According to Ritzdorf, this is still the case in many companies. “I am working with project managers from three prospective clients right now, all of whom do not have a localization degree and are all in localization positions. Did they end up there because they wanted to? Maybe not. They did not end up there because they said ‘Wow, I really want to become a head of localization.’ They just ended up there by accident, like a lot of people do.”

“There are a lot of people who work in a company and who have never heard of localization, but guess what? It is their job now to do localization, and they have to figure it out all by themselves,” Moore confirmed. “When the company decides to go international, they have to find somebody to manage that,” said Ritzdorf.

Regionalization


The first to mention regionalization was Ritzdorf, and then other interviewees confrmed it exists. Ritzdorf lives on the East Coast of the
United States, but comes to the West Coast to teach at MIIS, so he sees the differences. “There are areas where localization is a thing, which means when you walk into a company, they actually know about localization. Since there are enough people who understand what localization is, they want someone with a background in it.” Silicon Valley is a great example, said Ritzdorf. MIIS is close; there is a

localization community that includes organizations like Women in Localization; and there are networking events like IMUG. “People live and
breathe localization. However, there is a totally different culture in other regions, which is very fragmented. There are tons of little companies in other parts of the US, and the situation there is different. If I am a small LSP owner in Wisconsin or Ohio, what are my chances of finding someone with a degree or experience to fill a localization position for a project manager? Extremely low. This is why I may hire a candidate who has an undergraduate degree in French literature, for example. Or in linguistics, languages — at least something.”

The recruiters’ perspective


Nimdzi Insights conducted an interesting study about hiring criteria for localization project manager positions (Figure 1). Some 75 respondents (both LSPs and clients) were asked how important on a scale of 1 to 5 a variety of qualifications are for project management positions. Te responses show a few trends. Top priorities for clients are previous localization experience and a college degree, followed by years of experience and proficiency in more than one language. Top criteria for LSPs are reputation and a college degree, also followed
by experience and proficiency in more than one language.

Moore said that when clients want to hire a localization project manager, the skills they are looking for are familiarity with computer assisted translation (CAT) tools “and an understanding of issues that can arise during localization — like quality issues, for example. Compared to
previous years, more technical skills are required by both clients and vendors: CAT tools, WorldServer, machine translation knowledge, sometimes WordPress or basic engineering. When I started, they were nice-to-haves, but certainly not mandatory.”

Technical skill is not enough, however. “Both hard and soft skills are important. You need hard skills because the industry has become a lot more technical as far as software, tools and automation are concerned. You need soft skills to deal with external and internal stakeholders, and one of the main things is working under pressure because you are juggling so many things.

Moore also mentioned some red flags that would cause Anzu not to hire a candidate. “Sometimes an applicant does not demonstrate good English skills in phone interviews. Having good communication skills is important for a client-facing position. Also, people sometimes exaggerate their skills or experience. Another red flag is if the person has a bad track record (if they change jobs every nine months, for example).” ‘

Anzu often hires for project management contract positions in large companies. “Clients usually come to us when they need a steady stream of contractors (three or six months), then in three or six months there will be other contractors. Te positions are usually project managers or testers. If you already work fulltime, a contract position may not be that attractive. However, if you are a newcomer or have just graduated, and you want to get some experience, then it is a great opportunity. You would spend three, six or 12 months at a company, and it is a very good line on the résumé.”

Do you need a localization degree? 

There is no firm answer to the question of whether or not you need a degree. If you don’t know what you should do, it can certainly help. Troyer discussed how the localization program at MIIS has evolved to ft current real-world pressures. “The program was first started in 2004, and it started small. We were first giving CAT tools, localization project management and software localization courses. This is
the core you need to become a project manager. Ten the program evolved and we introduced the introduction and then advanced levels to many courses. There are currently four or five courses focusing on translation
technology.” Recent additions to the curriculum include advanced JavaScript classes, advanced project management and program management. Natural language processing and computational linguistics will be added down the road. “The industry is driving this move because students will need skills to go in and localize Siri into many languages,” said Troyer.

The program at MIIS is a two-year master’s. It can be reduced to one year for those who already have experience. There are other degrees
available, as well as certification programs offered by institutions such as the University of Washington and The Localization Institute.

Moore said that though a localization degree is not a must, it has a distinct advantage. A lot of students have internships that give them experience. They also know tools, which makes their résumés better fit clients’ job descriptions.

However, both Troyer and Ritzdorf said you don’t necessarily need a degree. “If you have passion for languages and technology, you can get the training on your own,” said Troyer. “Just teach yourself these skills, network on your own and try to break into the industry.”

The future of localization project management

Automation, artificial intelligence and machine learning are affecting all industries, and localization is not an exception. However, all the interviewees forecast that there will be more localization jobs in the future.

According to Johnson, there is high project management turnover on the vendor side because if a person is a good manager, they never stay in this position for more than five years. “After that, they either get a job on the client’s side to make twice as much money and have a much easier job, or their LSP has to promote them to senior positions such as group manager or program director.”

“There is a huge opportunity to stop doing things that are annoying,” said Troyer. “Automation will let professionals work on the human side
of things and let the machines run 
day-to-day tasks. Letting the machine send files back and forth will allow humans to spend more time looking at texts and thinking about what questions a translator can ask. This will give them more time for building a personal relationship with the client. We are taking these innovations into consideration for the curriculum, and I often spend time during classes asking, ‘How can you automate this?’”

Moore stated that “we have seen automation change workflows over the last ten years and reduce the project manager’s workload, with files being automatically moved through each step in the localization process. Also, automation and machine translation go hand-in-hand to make the process faster, more efficient and cost-effective.”

NEURAL MACHINE TRANSLATION: THE RISING STAR

NEURAL MACHINE TRANSLATION: THE RISING STAR

These days, language industry professionals simply can’t escape hearing about neural machine translation (NMT). However, there still isn’t enough information about the practical facts of NMT for translation buyers, language service providers, and translators. People often ask: is NMT intended for me? How will it change my life?

A Short History and Comparison

At the beginning of time – around the 1970s – the story began with rule-based machine translation (RBMT) solutions. The idea was to create grammatical rule sets for source and target languages, where machine translation is a kind of conversion process between the languages based on these rule sets. This concept works well with generic content, but adding new content, new language pairs, and maintaining the rule set is very time-consuming and expensive.

This problem was solved with statistical machine translation (SMT) around the late ‘80s and early ‘90s. SMT systems create statistical models by analyzing aligned source-target language data (training set) and use them to generate the translation. The advantage of SMT is the automatic learning process and the relatively easy adaptation by simply changing or extending the training set. The limitation of SMT is the training set itself: to create a usable engine, a large database of source-target segments is required. Additionally, SMT is not language independent in the sense that it is highly sensitive to the language combination and has a very hard time dealing with grammatically rich languages.

This is where neural machine translation (NMT) begins to shine: it can look at the sentence as a whole and can create associations between the phrases over an even longer distance within the sentence. The result is a convincing fluency and an improved grammatical correctness compared to SMT.

Statistical MT vs Neural MT

Both SMT and NMT are working on a statistical base and are using source-target language segment pairs as a basis. What’s the difference? What we typically call SMT is actually Phrase Based Statistical Machine Translation (PBSMT), meaning SMT is splitting the source segments into phrases. During the training process, SMT creates a translation model and a language model. The translation model stores the different translations of the phrases and the language model stores the probability of the sequence of phrases on the target side. During the translation phase, the decoder chooses the translation that gives the best result based on these two models. On a phrase or expression level, SMT (or PBSMT) is performing well, but language fluency and grammar is not good.

‘Buch’ is aligned with ‘book’ twice and only once with ‘the’ and ‘a’ – the winner is the ‘Buch’-’book’ combination

Neural Machine Translation, on the other hand, is using neural network-based, deep, machine learning technology. Words or even word chunks are transformed into “word vectors”. This means that ‘dog’ is not only representing the characters d, o and g, but it can contain contextual information from the training data. During the training phase, the NMT system tries to set the parameter weights of the neural network based on the reference values (source-target translation). Words appearing in similar context will get similar word vectors. The result is a neural network which can process source segments and transfer them into target segments. During translation, NMT is looking for a complete sentence, not just chunks (phrases). Thanks to the neural approach, it is not translating words, it’s transferring information and context. This is why fluency is much better than in SMT, but terminology accuracy is sometimes not perfect.

Similar words are closer to each other in a vector space

The Hardware

A popular GPU: NVIDIA Tesla

One big difference between SMT and NMT systems is that NMT requires Graphics Processing Units (GPUs), which were originally designed to help computers process graphics. These GPUs can calculate astonishingly fast – the latest cards have about 3,500 cores which can process data simultaneously. In fact, there is a small ongoing hardware revolution and GPU-based computers are the foundation for almost all deep learning and machine learning solutions. One of the great perks of this revolution is that nowadays, NMT is not only available for large enterprises, but also for small and medium-sized companies as well.

The Software

The main element, or ‘kernel’, of any NMT solution is the so-called NMT toolkit. There are a couple of NMT toolkits available, such as Nematus or openNMT, but the landscape is changing fast and more companies and universities are now developing their own toolkits. Since many of these toolkits are open-source solutions and hardware resources have become more affordable, the industry is experiencing an accelerating speed in toolkit R&D and NMT-related solutions.

On the other hand, as important as toolkits are, they are only one small part of a complex system, which contains frontend, backend, pre-processing and post-processing elements, parsers, filters, converters, and so on. These are all factors for anyone to consider before jumping into the development of an individual system. However, it is worth noting that the success of MT is highly community-driven and would not be where it is today without the open source community.

Corpora

A famous bilingual corpus: the Rosetta Stone

And here comes one of the most curious questions: what are the requirements of creating a well-performing NMT engine? Are there different rules compared to SMT systems? There are so many misunderstandings floating around on this topic that I think it’s a perfect opportunity to go into the details a little bit.

The main rules are nearly the same both for SMT and NMT systems. The differences are mainly that an NMT system is less sensitive and performs better in the same circumstances. As I have explained in an earlier blog post about SMT engine quality, the quality of an engine should always be measured in relation to the particular translation project for which you would like to use it.

These are the factors which will eventually influence the performance of an NMT engine:

Volume

Regardless of you may have heard, volume is still very important for NMT engines just like in the SMT world. There is no explicit rule on entry volumes but what we can safely say is that the bare minimum is about 100,000 segment pairs. There are Globalese users who are successfully using engines created based on 150,000 segments, but to be honest, this is more of an exception and requires special circumstances (like the right language combination, see below). The optimum volume starts around 500,000 segment pairs (2 million words).

Quality

The quality of the training set plays an important role (garbage in, garbage out). Don’t add unqualified content to your engine just to increase the overall size of the training set.

Relevance

Applying the right engine to the right project is the first key to success. An engine trained on automotive content will perform well on car manual translation but will give back disappointing results when you try to use it for web content for the food industry.

This raises the question of whether the content (TMs) should be mixed. If you have enough domain-specific content you shouldn’t necessarily add more out-of-domain data to your engine, but if you have an insufficient volume of domain-specific data then adding generic content (e.g. from public sources) may help improve the quality. We always encourage our Globalese users to try different engine combinations with different training sets.

Content type

Content generated by possible non-native speaking users on a chat forum or marketing material requiring transcreation is always a challenge to any MT system. On the other hand, technical documentation with controlled language is a very good candidate for NMT.

Language combination

Unfortunately, language combination still has an impact on quality. The good news is that NMT has now opened up the option of using machine translation for languages like Japanese, Turkish, or Hungarian –  languages which had nearly been excluded from the machine translation club because of poor results provided by SMT. NMT has also helped solve the problem of long distance dependencies for German and the translation output is much smoother for almost all languages. But English combined with Latin languages still provides better results than, for example, English combined with Russian when using similar volumes and training set quality.

Expectations for the future

Neural Machine Translation is a big step ahead in quality, but it still isn’t magic. Nobody should expect that NMT will replace human translators anytime soon. What you CAN expect is that NMT can be a powerful productivity tool in the translation process and open new service options both for translation buyers and language service providers (see post-editing experience).

Training and Translation Time

When we started developing Globalese NMT, one of the most surprising experiences for us was that the training time was far shorter than we had previously anticipated. This is due to the amazingly fast evolution of hardware and software. With Globalese, we currently have an average training time of 50,000 segments per hour – this means that an average engine with 1 million segments can be trained within one day. The situation is even better when looking at translation times: with Globalese, we currently have an average translation time between 100 and 400 segments per minute, depending on the corpus size, segment length in the translation and training content.

Neural MT Post-editing Experience

One of the great changes neural machine translation brings along is that the overall language quality is much better when compared to the SMT world. This does not mean that the translation is always perfect. As stated by one of our testers: if it is right, then it is astonishingly good quality. The ratio of good and poor translation naturally varies depending on the engine, but good engines can provide about 50% (or even higher) of really good translation target text.

Here are some examples showcasing what NMT post-editors can expect:

DE original:

Der Rechnungsführer sorgt für die gebotenen technischen Vorkehrungen zur wirksamen Anwendung des FWS und für dessen Überwachung.

Reference human translation:

The accounting officer shall ensure appropriate technical arrangements for aneffective functioning of the EWS and its monitoring.

Globalese NMT:

The accounting officer shall ensure the necessary technical arrangements for theeffective use of the EWS and for its monitoring.

As you can see, the output is fluent, and the differences are just preferential ones, more or less. This is highlighting another issue: automated quality metrics like BLEU score are not really sufficient to measure the quality. The example above is only a 50% match in the BLEU score, but if we look at the quality, the rating should be much higher.

Let’s look another example:

EN original

The concept of production costs must be understood as being net of any aid but inclusive of a normal level of profit.

Reference human translation:

Die Produktionskosten verstehen sich ohne Beihilfe, aber einschließlich eines normalen Gewinns.

Globalese NMT:

Der Begriff der Produktionskosten bezieht sich auf die Höhe der Beihilfe, aber einschließlich eines normalen Gewinns.

What is interesting here that the first part of the sentence sounds good, but if you look at the content, the translation is not good. This is an example of a fluent output with a bad translation. This is a typical case in the NMT world and it emphasizes the point that post-editors must examine NMT output differently than they did for SMT – in SMT, bad grammar was a clear indicator that the translation must be post-edited.

Post-editors who used to proof and correct SMT output have to change the way they are working and have to be more careful with proofreading, even if the NMT output looks alright at first glance. Also, services related to light post-editing will change – instead of correcting serious grammatical errors without checking the correctness of translation in order to create some readable content, the focus will shift to sorting out serious mistranslations. The funny thing is that one of the main problems in the SMT world was weak fluency and grammar, and now we have good fluency and grammar as an issue in the NMT world…

And finally:

DE original:

Aufgrund des rechtlichen Status der Beteiligten ist ein solcher Vorgang mit einer Beauftragung des liefernden Standorts und einer Berechnung der erbrachten Leistung verbunden.

Reference human translation:

The legal status of the companies involved in these activities means that this process is closely connected with placing orders at the location that is to supply the goods/services and calculating which goods/services they supply.

Globalese NMT:

Due to the legal status of the person, it may lead to this process at the site of the plant, and also a calculation of the completed technician.

This example shows that unfortunately, NMT can produce bad translations too. As I mentioned before, the ratio of good and bad NMT output you will face in a project always depends on the circumstances. Another weak point of NMT is that it currently cannot handle the terminology directly and it acts as a kind of “black box” with no option to directly influence the results.

Reference: https://bit.ly/2hBGsVh

GDPR. Understanding the Translation Journey

GDPR. Understanding the Translation Journey

“We only translate content into the languages of the EU, so we are covered with regards GDPR clauses relating to international transfers.”

Right? Wrong.

The GDPR imposes restrictions on the transfer of personal data outside the European Union (EU), to third-party countries or international organizations. While there are provisions that refer to your ability to do this with the appropriate safeguards in place, how confident are you that you’re not jeopardising GDPR-compliance with outdated translation processes?

Let’s consider the following:

  1. 85% of companies cannot identify whether they send personal information externally as part of their translation process.
  2. The translation process is complex – it isn’t a simple case of sending content from you to your translator. Translating one document alone into 10 languages involves 150 data exchanges (or ‘file handoffs’). Multiply this by dozens of documents and you have a complex task of co-ordinating thousands of highly-sensitive documents – some which may contain personal data.

With different file versions, translators, editors, complex graphics, subject matter experts and in country reviewers the truth is that content is flying back and forth around the world faster than we can imagine. Designed with speed of delivery and time to market in mind these workflows overlook the fact that partners might not share the same compliance credentials.

Where exactly is my data?

Given that we know email is not secure – let us think about what happens when you use a translation portal or an enterprise translation management system.

Once you’ve transferred the content for translation, the translation agency or provider downloads and processes that data on its premises before allocating the work to linguists and other teams (let’s hope these are in the EU and they are GDPR compliant).

Alternatively, the software you have used to share your content may process the data to come up with your Translation Memory leverage and spend – in which case better check your End User Licence Agreement to ensure you know where that processing (and backup) takes place.

After that has happened the content is distributed to the translators to work on. Even if all the languages you translate into are in the EU – are you SURE that your translators are physically located there too?

And what about your translation agency’s project management team? How exactly do they handle files that require Desktop Publishing or file engineering? Are these teams located onshore in the EU or offshore to control costs? If the latter what systems are they using, and how can you ensure no copies of your files are sitting in servers outside of your control?

These are just some of the questions you should be asking now to fully understand where your translation data is located.

What can I do?

If you haven’t already – now is the time to open a conversation with your partner about your data protection needs and what they are doing as a business to ensure compliance. They should be able to tell you exactly which borders your data crosses during the translation process, where it’s stored and what they’re doing to help with Translation Memory management. They should also provide you with a controlled environment that you can use across the entire translation supply chain, so that no data ever leaves the system.

Of course, there are many considerations to take into account when it comes to GDPR. But looking at the complexity of translating large volumes of content – are you still confident that your translation processes are secure?

Reference: https://bit.ly/2vmKKX5

Europe’s New Privacy Regulation GDPR Is Changing How LSPs Handle Content

Europe’s New Privacy Regulation GDPR Is Changing How LSPs Handle Content

GDPR, the General Data Protection Regulation, is soon to be introduced across Europe, and is prompting language service providers (LSPs) to update policies and practices relating to their handling of all types of personal data.

The GDPR comes into effect on 25 May 2018 and supersedes the existing Data Protection Directive of 1995. It introduces some more stringent requirements on how the personal data of EU citizens are treated.

Specifically, LSPs must demonstrate that they are compliant in the way that they handle any type of personal data that at some point flows through their business. Personal data means any information by which a person can be identified, such as a name, location, photo, email address, bank details…the list goes on.

Therefore, LSPs need to ensure that all data, from employee records and supplier agreements to client contact information and content for translation, are handled appropriately.

What personal data do LSPs handle?

Aside from the actual content for translation, an LSP is likely to possess a vast array of personal data including Sales and Marketing data (prospective client details, mailing lists, etc.), existing client data (customer names, emails, POs, etc.), HR and Recruitment data (candidate and employee data including CVs, appraisals, addresses, etc.) and Supplier (freelance) data (bank details, contact details, performance data, CVs, etc.).

In this respect, the challenges that LSPs will face are not significantly different from most other service businesses, and there are lots of resources that outline the requirements and responsibilities for complying with GDPR. For example, the Europa website details some key points, and ICO (for the UK) has a self-assessment readiness toolkit for businesses.

What about content for translation?

Content that a client sends you for translation also may contain personal information. Some of these documents are easy enough to identify by their nature (such as birth, death, marriage certificates, HR records, and medical records), but personal data might be also considered to extend to the case where you receive an internal communication from a customer that includes a quote from the company CEO, for example.

Short-term challenges

It is important to be able to interpret what the GDPR means for LSPs generally, and for your business specifically. The impact of the regulation will become clearer over time, but it throws up some potentially crucial questions in the immediate, such as:

  • What the risks are for LSPs who continue to store personal data within translation memories and machine translation engines;
  • What the implications are for sharing personal data with suppliers outside of the EU / EEA, and specifically in countries deemed to be inadequate with respect to GDPR obligations (even a mid-sized LSP would work with hundreds of freelancers outside the EU);
  • How binding corporate rules can be applied to LSPs with a global presence;
  • Whether obliging suppliers to work in an online environment could help LSPs to comply with certain GDPR obligations

Longer-term considerations

While the GDPR presents a challenge to LSPs in the short-term, it may also impact on the longer-term purchasing habits within the industry.

For example, if LSPs are penalized for sharing personal data with freelancers located within inadequate countries (of which there is a long list), LSPs could be forced to outsource translation work within the EU / EEA / adequate countries only or even insource certain language combinations entirely, potentially driving up the cost of translation spend for some languages.

Or, if a client company is penalized for sharing personal data with a subcontractor (i.e. an LSP or freelancer) without the full knowledge and consent of the person the information relates to (known as the data subject), will they be more inclined to employ alternative buying models for their language needs: e.g. to source freelancers directly or via digital marketplaces, or implement in-house translation models of their own?

Get informed

Although most LSPs are well-acquainted with data privacy, there are a lot of unknowns around the impact of GDPR, and LSPs would be wise to tread especially carefully when it comes to handling personal data, in particular post-25 May.

Perhaps the noise around GDPR turns out to be hot air, but with companies in breach of the regulation facing possible penalties that the GDPR recommends should be “effective, proportionate and dissuasive”, it is essential to get informed, and quickly.

Reference: https://bit.ly/2Jwh9g6