Month: May 2018

GDPR. Understanding the Translation Journey

GDPR. Understanding the Translation Journey

“We only translate content into the languages of the EU, so we are covered with regards GDPR clauses relating to international transfers.”

Right? Wrong.

The GDPR imposes restrictions on the transfer of personal data outside the European Union (EU), to third-party countries or international organizations. While there are provisions that refer to your ability to do this with the appropriate safeguards in place, how confident are you that you’re not jeopardising GDPR-compliance with outdated translation processes?

Let’s consider the following:

  1. 85% of companies cannot identify whether they send personal information externally as part of their translation process.
  2. The translation process is complex – it isn’t a simple case of sending content from you to your translator. Translating one document alone into 10 languages involves 150 data exchanges (or ‘file handoffs’). Multiply this by dozens of documents and you have a complex task of co-ordinating thousands of highly-sensitive documents – some which may contain personal data.

With different file versions, translators, editors, complex graphics, subject matter experts and in country reviewers the truth is that content is flying back and forth around the world faster than we can imagine. Designed with speed of delivery and time to market in mind these workflows overlook the fact that partners might not share the same compliance credentials.

Where exactly is my data?

Given that we know email is not secure – let us think about what happens when you use a translation portal or an enterprise translation management system.

Once you’ve transferred the content for translation, the translation agency or provider downloads and processes that data on its premises before allocating the work to linguists and other teams (let’s hope these are in the EU and they are GDPR compliant).

Alternatively, the software you have used to share your content may process the data to come up with your Translation Memory leverage and spend – in which case better check your End User Licence Agreement to ensure you know where that processing (and backup) takes place.

After that has happened the content is distributed to the translators to work on. Even if all the languages you translate into are in the EU – are you SURE that your translators are physically located there too?

And what about your translation agency’s project management team? How exactly do they handle files that require Desktop Publishing or file engineering? Are these teams located onshore in the EU or offshore to control costs? If the latter what systems are they using, and how can you ensure no copies of your files are sitting in servers outside of your control?

These are just some of the questions you should be asking now to fully understand where your translation data is located.

What can I do?

If you haven’t already – now is the time to open a conversation with your partner about your data protection needs and what they are doing as a business to ensure compliance. They should be able to tell you exactly which borders your data crosses during the translation process, where it’s stored and what they’re doing to help with Translation Memory management. They should also provide you with a controlled environment that you can use across the entire translation supply chain, so that no data ever leaves the system.

Of course, there are many considerations to take into account when it comes to GDPR. But looking at the complexity of translating large volumes of content – are you still confident that your translation processes are secure?

Reference: https://bit.ly/2vmKKX5

Europe’s New Privacy Regulation GDPR Is Changing How LSPs Handle Content

Europe’s New Privacy Regulation GDPR Is Changing How LSPs Handle Content

GDPR, the General Data Protection Regulation, is soon to be introduced across Europe, and is prompting language service providers (LSPs) to update policies and practices relating to their handling of all types of personal data.

The GDPR comes into effect on 25 May 2018 and supersedes the existing Data Protection Directive of 1995. It introduces some more stringent requirements on how the personal data of EU citizens are treated.

Specifically, LSPs must demonstrate that they are compliant in the way that they handle any type of personal data that at some point flows through their business. Personal data means any information by which a person can be identified, such as a name, location, photo, email address, bank details…the list goes on.

Therefore, LSPs need to ensure that all data, from employee records and supplier agreements to client contact information and content for translation, are handled appropriately.

What personal data do LSPs handle?

Aside from the actual content for translation, an LSP is likely to possess a vast array of personal data including Sales and Marketing data (prospective client details, mailing lists, etc.), existing client data (customer names, emails, POs, etc.), HR and Recruitment data (candidate and employee data including CVs, appraisals, addresses, etc.) and Supplier (freelance) data (bank details, contact details, performance data, CVs, etc.).

In this respect, the challenges that LSPs will face are not significantly different from most other service businesses, and there are lots of resources that outline the requirements and responsibilities for complying with GDPR. For example, the Europa website details some key points, and ICO (for the UK) has a self-assessment readiness toolkit for businesses.

What about content for translation?

Content that a client sends you for translation also may contain personal information. Some of these documents are easy enough to identify by their nature (such as birth, death, marriage certificates, HR records, and medical records), but personal data might be also considered to extend to the case where you receive an internal communication from a customer that includes a quote from the company CEO, for example.

Short-term challenges

It is important to be able to interpret what the GDPR means for LSPs generally, and for your business specifically. The impact of the regulation will become clearer over time, but it throws up some potentially crucial questions in the immediate, such as:

  • What the risks are for LSPs who continue to store personal data within translation memories and machine translation engines;
  • What the implications are for sharing personal data with suppliers outside of the EU / EEA, and specifically in countries deemed to be inadequate with respect to GDPR obligations (even a mid-sized LSP would work with hundreds of freelancers outside the EU);
  • How binding corporate rules can be applied to LSPs with a global presence;
  • Whether obliging suppliers to work in an online environment could help LSPs to comply with certain GDPR obligations

Longer-term considerations

While the GDPR presents a challenge to LSPs in the short-term, it may also impact on the longer-term purchasing habits within the industry.

For example, if LSPs are penalized for sharing personal data with freelancers located within inadequate countries (of which there is a long list), LSPs could be forced to outsource translation work within the EU / EEA / adequate countries only or even insource certain language combinations entirely, potentially driving up the cost of translation spend for some languages.

Or, if a client company is penalized for sharing personal data with a subcontractor (i.e. an LSP or freelancer) without the full knowledge and consent of the person the information relates to (known as the data subject), will they be more inclined to employ alternative buying models for their language needs: e.g. to source freelancers directly or via digital marketplaces, or implement in-house translation models of their own?

Get informed

Although most LSPs are well-acquainted with data privacy, there are a lot of unknowns around the impact of GDPR, and LSPs would be wise to tread especially carefully when it comes to handling personal data, in particular post-25 May.

Perhaps the noise around GDPR turns out to be hot air, but with companies in breach of the regulation facing possible penalties that the GDPR recommends should be “effective, proportionate and dissuasive”, it is essential to get informed, and quickly.

Reference: https://bit.ly/2Jwh9g6

How Lingotek Uses AI to Optimize Vendor Management

How Lingotek Uses AI to Optimize Vendor Management

Language Services Vendor Management is a complex management task. It requires vetting multiple language services providers (LSPs), requesting multiple bids and comparing different rate structures. It can include literally hundreds of projects to monitor and manage to ensure on-time delivery. Adding to the complexity, LSPs typically use  several different computer-assisted translation (CAT) tools and maintain multiple linguistic assets in various offline locations. How well translation is managed has a direct effect on the company’s globalization goals and its ability to execute an agile go-to-market strategy.

No one makes vendor management easier than Lingotek. Our groundbreaking artificial intelligence (AI)-driven app inside our industry-leading translation management system (TMS) is a cost-efficient localization platform that simplifies vendor management, enhances efficiency, accelerates delivery, and optimizes budgets and costs to reduce your translation spend.

What is Artificial Intelligence?

Artificial Intelligence (AI) is simply technology that learns. AI uses data and experience to perform tasks that would otherwise require human intelligence and effort. When applied to Vendor Management, it creates a foundation for trigger-based automation, rule-driven systems, and data collection.

How does Lingotek use AI to optimize vendor management?

Lingotek continues to spearhead innovation in the translation industry with a Vendor Management app that brings AI-driven automation and multilingual business intelligence to translation management. The entire process for managing vendors: vendor selection, tracking costs and spending, vendor performance is now easier and more automated. With this data, organizations can easily and repeatedly select vendors who provide the highest translation quality and who consistently deliver jobs on time.

Integrated & automated vendor selection

The Vendor Management app simplifies and consolidates the process for requesting quotes, setting rates and pricing, choosing vendors, managing deadlines, tracking spending, and measuring translator quality and performance. The dashboard displays all of the information needed for tracking and evaluating which vendors are providing the highest quality translation and meeting deadlines. This gives project managers insights to better manage workloads and resources for maximum throughput.

  • Automatic vendor assignment based on language, industry, timeline, and more.
  • Automated bid requests, rate charts & invoicing.
  • Monitor costs and billing information within the TMS.

Centralized tracking of rates, costs & spending

The Vendor Management app automates many of the steps required for creating a language services purchase order and to closely track translation spending. The app also tracks the leveraging of translation memories (TM) to gauge the efficient reuse of linguistic assets across the enterprise. At-a-glance rate charts for quick reference of:

  • Integrated cost reporting inside the TMS.
  • Total translation expenses by date, job, or vendor.
  • Aggregation of data to simplify invoice creation.

Automatic cost calculation

Lingotek’s vendor management includes auto-calculation of costs, even when specific jobs have been skipped or cancelled. A project manager can manually skip or cancel a phase, target, or entire document.

With the active monitoring offered by our Intelligent Workflows, jobs can also be auto-skipped or auto-cancelled in order to ensure on-time delivery. When this happens, our AI-driven Vendor Management system is able to proactively alert vendors of the skipped and/or cancelled job, ensure that additional work cannot be performed on those skipped and/or cancelled jobs, and then automatically calculate the the costs for the work that was completed before the job was cancelled.

This makes invoicing a breeze, as project managers and vendor managers no longer have to worry about notifying vendors of changes made to the project mid-stream, or figure out how much work was done after the fact in order to manually calculate their costs.

Intelligence & insight to optimize your supply chain

Get more data-driven insight and control over your localization supply chain. The dashboard displays tracking and evaluating information on vendors, so you can easily select vendors who provide the highest translation quality and consistently deliver jobs on time. This gives you much-needed insight to better manage workloads and resources for maximum throughput.

  • Vendor-specific intelligence.
  • Evaluate vendor performance & quality through SLA compliance metrics.
  • Monitor project delivery & efficiency by vendor.
  • Get key metrics on costs, turnaround time, word counts, missed deadlines.

As the technology improves, we recommend that all providers review their operations to learn where they could take best advantage of AI.

–Common Sense Advisory, “The Journey to Project Management Automation”

Discover the Benefits of Lingotek’s AI-Driven Vendor Management

The new Vendor Management app gives enterprise localization managers, vendor managers, and project managers revolutionary new tools for managing multiple language services providers (LSPs) and projects. Automating vendor management provides critical operational efficiency to enable more scalable globalization strategies and to optimize your localization supply chain to create a more cost-efficient localization network.

Lingotek’s AI-driven Vendor Management can reduce the need for project managers to perform routine, automated tasks. Instead, they can use that time for solving problems that AI can’t solve. When you implement better process automation, that leaves time for project managers to perform tasks that are more valuable to the organization. They can focus their time on exception management–problem solving and responding to urgent issues.

Reference: https://bit.ly/2wONm0C

A New Way to Measure NMT Quality

A New Way to Measure NMT Quality

Neural Machine Translation (NMT) systems produce very high quality translations, and are poised to radically change the professional translation industry. These systems require quality feedback / scores on an ongoing basis. Today, the prevalent method is via Bilingual Evaluation Understudy (BLEU), but methods like this are no longer fit for purpose.

A better approach is to have a number of native speakers assess NMT output and rate the quality of each translation. One Hour Translation (OHT) is doing just that: our new NMT index is released in late April 2018 and fully available for the translation community to use.

A new age of MT

NMT marks a new age in automatic machine translation. Unlike technologies developed over the past 60 years,  the well-trained and tested NMT systems that are available today,  have the potential to replace human translators.

Aside from processing power, the main factors that impact NMT performance are:

  •      the amount and quality of initial training materials, and
  •      an ongoing quality-feedback process

For a NMT system to work well, it needs to be properly trained, i.e. “fed” with hundreds of thousands (and in some cases millions) of correct translations. It also requires feedback on the quality of the translations it produces.

NMT is the future of translation. It is already much better than previous MT technologies, but issues with training and quality assurance are impeding progress.

NMT is a “disruptive technology” that will change the way most translations are performed. It has taken over 50 years, but machine translation can now be used to replace human translators in many cases.

So what is the problem?

While NMT systems could potentially revolutionize the translation market, their development and adoption are hampered by the lack of quality input, insufficient means of testing the quality of the translations and the challenge of providing translation feedback.

These systems also require a lot of processing power, an issue which should be solved in the next few years, thanks to two main factors. Firstly, Moore’s law, which predicts that processing power doubles every 18 months, also applies to NMT, meaning that processing power will continue to increase exponentially. Secondly, as more companies become aware of the cost benefit of using NMT, more and more resources will be allocated for NMT systems.

Measuring quality is a different and more problematic challenge. Today, algorithms such as BLEU, METEOR, and TER try to predict automatically what a human being would say about the quality of a given machine translation. While these tests are fast, easy, and inexpensive to run (because they are simply software applications), their value is very limited. They do not provide an accurate quality score for the translation, and they fail to estimate what a human reviewer would say about the translation quality (a quick scan of the text in question by a human would reveal the issues with the existing quality tests).

Simply put, translation quality scores generated by computer programs that predict what a human would say about the translation are just not good enough.

With more major corporations including Google, Amazon, Facebook, Bing, Systran, Baidu, and Yandex joining the game, producing an accurate quality score for NMT translations becomes a major problem that has a direct negative impact on the adoption of NMT systems.

There must be a better way!

We need a better way to evaluate NMT systems, i.e. something that replicates the original intention more closely and can mirror what a human would say about the translation.

The solution seems simple: instead of having some software try to predict what a human would say about the translation, why not just ask enough people to rate the quality of each translation? While this solution is simple, direct, and intuitive, doing it right and in a way that is statistically significant means running numerous evaluation projects at one time.

NMT systems are highly specialized, meaning that if a system has been trained using travel and tourism content, testing it with technical material will not produce the best results. Thus, each type of material has to be tested and scored separately. In addition, the rating must be done for every major language pair, since some NMT engines perform better in particular languages. Furthermore, to be statistically significant, at least 40 people need to rate each project per language, per type of material, per engine. Besides that, each project should have at least 30 strings.

Checking one language pair with one type of material translated with one engine is relatively straightforward: 40 reviewers each check and rate the same neural machine translation consisting of about 30 strings. This approach produces relatively solid (statistically significant) results, and repeating it over time also produces a trend, i.e. making it possible to find out whether or not the NMT system is getting better.

The key to doing this one isolated evaluation is selecting the right reviewers and making sure they do their job correctly. As one might expect, using freelancers for the task requires some solid quality control procedures to make sure the answers are not “fake” or “random.”

At that magnitude (one language, one type of material, one NMT engine, etc), the task is manageable, even when run manually. It becomes more difficult when an NMT vendor, user, or LSP wants to test 10 languages and 10 different types of material with 40 reviewers each. In this case, each test requires between 400 reviewers (1 NMT engine x 1 type of material x 10 language pairs x 40 reviewers) and 4,000 reviewers (1 NMT engine x 10 types of material x 10 language pairs x 40 reviewers).

Running a human based quality score is a major task, even for just one NMT vendor. It requires up to 4,000 reviewers working on thousands of projects.

This procedure is relevant for every NMT vendor who wants to know the real value of their system and obtain real human feedback for the translations it produces.

The main challenge is of course finding, testing, screening, training, and monitoring thousands of reviewers in various countries and languages — monitoring their work while they handle tens of thousands of projects in parallel.

The greater good – industry level quality score

Looking at the greater good,  what is really needed is a standardised NMT quality score for the industry to employ, measuring all of the various systems using the same benchmark, strings, and reviewers, in order to compare like for like performance. Since the performance of NMT systems can vary dramatically between different types of materials and languages, a real human-based comparison using the same group of linguists and the same source material is the only way to produce real comparative results. Such scores will be useful both for the individual NMT vendor or user and for the end customer or LSP trying to decide which engine to use.

To produce the same tests on an industry-relevant level is a larger undertaking. Using 10 NMT engines, 10 types of material, 10 language pairs and 40 reviewers, the parameters of the project can be outlined as follows:

  •      Assuming the top 10 language pairs are evaluated, ie EN > ES, FR, DE, PT-BR, AR, RU, CN, JP, IT and KR;
  •      10 types of material – general, legal, marketing, finance, gaming, software, medical, technical, scientific, and tourism;
  •      10 leading (web-based) engines – Google, Microsoft (Bing), Amazon, DeepL, Systran, Baidu, Promt, IBM Watson, Globalese and Yandex;
  •      40 reviewers rating each project;
  •      30 strings per test; and
  •      12 words on average per string

This comes to a total of 40,000 separate tests (10 language pairs x 10 types of material x 10 NMT engines x 40 reviewers), each with at least 30 strings, i.e. 1,200,000 strings of 12 words each, resulting in an evaluation of approximately 14.4 million words. This evaluation is needed to create just one instance (!) of a real, comparative, human-based NMT quality index.

The challenge is clear: to produce just one instance of a real viable and useful NMT score, 4,000 linguists need to evaluate 1,200,000 strings equating to well over 14 million words!

The magnitude of the project, the number of people involved and the requirement to recruit, train, and monitor all the reviewers, as well as making sure, in real time, that they are doing the job correctly, are obviously daunting tasks, even for large NMT players, and certainly for traditional translation agencies.

Completing the entire process within a reasonable time (e.g. less than one day), so that the results are “fresh” and relevant makes it even harder.

There are not many translation agencies with the capacity, technology, and operational capability to run a project of that magnitude on a regular basis.

This is where One Hour Translation (OHT) excels. They have recruited, trained, and tested thousands of linguists in over 50 languages, and already run well over 1,000,000 NMT rating and testing projects for our customers. By the end of April 2018, they published the first human-based NMT quality index (initially covering several engines and domains and later expanding), with the goal of promoting the use of NMT across the industry.

A word about the future

In the future, a better NMT quality index can be built using the same technology NMT is built on, i.e. deep-learning neural networks. Building a Neural Quality system is just like building a NMT system. The required ingredients are high quality translations, high volume, and quality rating / feedback.

With these ingredients, it is possible to build a deep-learning, neural network based quality control system that will read the translation and score it like a human does. Once the NMT systems are working smoothly and a reliable, human based, quality score/feedback developed, , the next step will be to create a neural quality score.

Once a neural quality score is available, it will be further possible to have engines improve each other, and create a self-learning and self-improving translation system by linking the neural quality score to the NMT  (obviously it does not make sense to have a closed loop system as it cannot improve without additional external data).

With additional external translation data, this system will “teach itself” and learn to improve without the need for human feedback.

Google has done it already. Its AI subsidiary, DeepMind, developed AlphaGo, a neural network computer program that beat the world’s (human) Go champion. AlphaGo is now improving, becoming better and better, by playing against itself again and again – no people involved.

Reference: https://bit.ly/2HDXbTf