Tag: Translation Technology Development

Lessons in Building a Language Industry Startup

Lessons in Building a Language Industry Startup

Bryan Forrester, CEO of Boostlingo, Matt Conger, CEO of Cadence Translate, and Jeffrey Sandford, Co-Founder and CTO of Wovn Technologies joined Smart on stage to share their experiences and insights with the 120 senior executives in attendance.

Read full article from here.

VideoLocalize: A Case Study in Innovation

VideoLocalize: A Case Study in Innovation

VideoLocalize is a video localization platform that was developed by the Boffin Language Group to address a well-known challenge in the area of video localization. In its current shape, VideoLocalize integrates a synchronization tool with voice-talent and project management capabilities, allowing the end-to-end management of video localization projects.  It wasn’t conceived of in this way, however, and the journey that the Boffin Language Group undertook under the leadership of its President, George Zhao, is a case study in innovation.

Read full article from here.

Translators and Technology: Friends or Foes?

Translators and Technology: Friends or Foes?

It is a fact that different kinds of technology creep into the translation industry on all levels. As a result, some participants in this magical process of transforming a text to fit a different language, cultural, and sociological community, can feel quite uneasy, or even anxious. Will machine translation (MT) reach parity with human translation (HT)? Will there be a need for translators?

 

Read full article from here.

How Augmented Translation Will Redefine the Value of Translators

How Augmented Translation Will Redefine the Value of Translators

Norbert Oroszi, CEO of translation software company memoQ, joined the speaker line up at a sold-out SlatorCon San Francisco 2018 to reflect on the role of humans and machines in shaping the future of translation technology.

To lay the groundwork, Oroszi began by drawing a comparison between the role of technology in the automotive industry and localization. More than 100 years ago when technology hit the car industry to enable mass production of vehicles, many were fearful that machines would replace humans, but technology did not take jobs away from workers in the car industry. Instead, automation augmented human capabilities, redefined the value of workers, and facilitated what became an automotive revolution.

Read full article from here.

SYSTRAN presents its latest translation engines: huge quality & speed improvement!

SYSTRAN presents its latest translation engines: huge quality & speed improvement!

The latest version of our AI-powered Translation Software designed for Businesses

SYSTRAN Pure Neural® Server is our new generation of enterprise translation software based on Artificial Intelligence and Neural networks. It provides outstanding professional quality with the highest standards in data safety.

Our R&D team, extremely active to provide corporate users with state-of-the-art translation technology tailored for business, just released a new generation of Neural MT engines. SYSTRAN new engines are developed with OpenNMT-tf, our AI framework using latest TensorFlow features, and backed by a proprietary new training process: Infinite Training.

These innovations bring two major impacts on businesses:

  • Better Translation Quality & fluidity: the new engines exploit SAT (Self Attentional Transformers) neural networks that improve a contextual translation for better quality & fluency.
  • Better Performances: translation speed (char./sec.) is improved by 10 to 30 times on CPU hardware compared to previous generation engines.

For more info, please visit: https://bit.ly/2QgAMMq

Smart devices and the future of CAT tools

Smart devices and the future of CAT tools

CAT tools have already been on the market for many years now and yet they are still improving. New technologies and emerging needs from translators are triggering a shift from computer-aided translation tools to smart device-aided translations tools. Does the future of productivity lie in web-based translation environments?

The emergence of online translation environments

While CAT tools nowadays are inevitable in the toolkit of translators, it is still not long ago that professional translators had to work without them. The tools for computer-aided translation, not to be confused with online translation tools like Google Translate, only emerged in the early 1990s. Although there might have been some earlier attempts to create software that helps translators to improve their quality, productivity and consistency, in the last decade of the last century they came into full swing. Nowadays translators can choose from at least 20 different CAT tools, both online and offline, to suit their needs out of which SDL Trados and MemoQ are by far the best known.
However, only 25 years after the introduction of mainstream translation software a new era is on the horizon. The introduction of cloud technology, the rise of digital nomads, and the general availability of cheap and fast internet connections has led to a new branch on the CAT tool tree: translators can now use online translation environments, both free and paid, to work wherever they choose to.

Translating online

The technological advancements in the last couple of years opened great opportunities for companies who looked beyond traditional CAT tools and wanted to pluck the low-hanging fruit of the cloud’s capabilities. Several professionals, both from inside and outside the translation industry, quickly introduced their own online variants of the desktop translation tools. Examples included Smartling and Memsource (which has a desktop tool as well). These tools are browser-based, which means that they are accessible as webpages and can be used to work wherever users want as long as they have a compatible device and an internet connection. The online translation environments offer full functionality, which is often equivalent to the standard desktop tools. Users (in the case of Smartling and Memsource mainly project managers) can create translation memories and term bases, set rules for quality assurance and require users to perform several checks before they can deliver their translations. The tools also offer support for the most common file formats, like Microsoft Office files, PDF files and HTML documents, but also for bilingual filetypes like XLIFF and the proprietary formats of Trados and MemoQ. In addition, they often have familiar user interfaces, with well-known toolbars and panels that make it easier for project managers and translators alike to find their way in the online CAT tool.

It might be clear that the new members in the CAT tool family are working disruptively to shake up the CAT tool industry. It is therefore not a surprise that after the introduction of new online CAT tools developers of ‘traditional’ CAT tools also came up with an online version. MemoQ introduced MemoQ Web while SDL brought SDL Online Translation Editor to the table.

Web-based CAT tools for translators

The most important feature of the web-based CAT tools is, (how surprising), that they work in a browser. Most of them were initially designed to work on a desktop, offering translators a convenient tool with omnipresent accessibility while at the same time making it easier for project managers to dispense projects. Indeed, project managers only had to upload files, create or connect a translation memory, and send a link to multiple translators, making it easier to complete projects, shorten the turnaround time, and circumvent lengthy discussions via email. But because these new online CAT tools were mainly directed at agencies and project managers, they fell short of meeting the needs of translators who wanted to work on the go. Other bright minds therefore developed new web-based CAT tools that supported the needs of the freelance translator better: in the past few years Lilt and Smartcat were introduced, among others. The SDL Online Translation Editor has also been created with freelance professionals in mind, while MemoQ Web is more dedicated to project managers.

The biggest difference between tools for freelance translators and project managers is their workflow. While project managers have loads of options to manage projects, tools like Lilt and Smartcat introduce only the options freelancers need: they can upload a file in different file types, create or use a translation memory (term bases are often not supported), work their way through the file, and complete the job. The tools have a familiar and simple user interface, so translators do not need to look for advanced options, but often, powerful options are hidden under the bonnet, so they can really compete with their desktop equivalents.
Another major advantage of CAT tools in the cloud is that they frequently release new features quickly and respond to feature requests even faster, while traditional CAT tools often require months for implementing, testing, and introducing new features in a newly built (minor) version of their tool.

Another major difference is that many tools aimed at freelancers are free to use. They offer various plans for advanced users, often based on the amount of characters being translated, but there is only one free flavour, and it comes without many of the options that paid users have access to.

Privacy concerns with online CAT environments

In the past few years the online CAT tools have quickly risen to the level at which they can compete with traditional computer-based CAT tools. Where CAT tools have evolved and added new features with every new release, their online counterparts were introduced according to the status quo of traditional CAT tools. They sometimes even introduced ground-breaking new features that traditional CAT tools were not able to offer, like Lilt’s adaptive machine translation.
Yet among translators there is still much debate about their adaptations. The most important concern is that of privacy. While computer resources are generally considered a safe option, many translators are afraid to use cloud environments because of the risk of hacks and leaks that expose clients’ confidential information. At the same time, using a free online translation environment sometimes requires that translations are shared with the platform provider to improve the quality of generally available translation memories and machine translation services. Freelancers, whose business depends on credibility, simply cannot afford to share their client’s information for the sake of improving their productivity or flexibility.
On the other hand, early adopters and technology enthusiasts debate that the cloud is much safer than many computers thanks to continuous security updates. However, they are only a small group in the world of translators.

From CAT to SAT?

Whatever the privacy concerns, until now the introduction of online CAT tools has made clear that they are here to stay. With the increasing adaptation of online tools, lifestyles shifting to working on the go, and digital nomadism it is expected that online translation environments will be increasingly in demand in the future.
Although traditional CAT tools do not offer any opportunities to be run on smart devices with an Android, iOS, or Windows Phone operating system, online CAT tools do not have this problem. That means that they can be used without barriers on smartphones and tablets, once they have been adopted on a computer. Indeed they offer the same experience everywhere as they are browser-based and do not need to be adapted much to work in different operating environments. An added advantage of this possibility is that users can start a task on their desktop, then work on it while away, and complete it in a third environment.

Yet, despite the seemingly endless possibilities of the online CAT tools, many of them still do not offer a flawless experience on smartphones and tablets. One of the biggest disadvantages of the browser-based tools is that they do not fit neatly onto the small screens of smart devices. A short experiment with a few translation platforms (Smartcat and Lilt; SDL’s Online Translator Editor returned an error) quickly showed that the user interface has problems with touch-enabled devices. While all elements of a CAT tool (the panel with the bilingual format, a panel with translation memory results, a concordance panel, and some other interface elements) are present, they often do not fit neatly. While the interface appears fine in its initial state, touching a text box to add a translation will cause the panels to be re-arranged every time. Furthermore after touching the screen the screen keyboard pops up, often making (a part of) the source text invisible. While this problem is apparent on tablets, it is even more problematic on smartphones with even smaller screens. Working on a translation on the go using a tablet of smartphone therefore does not offer a seamless, flawless, or productive experience just yet.

Another problem is that rendering the translation environment on a tablet or smartphone requires considerable computing resources on some devices. So in order to make full use of an online CAT tool, users need to have a powerful tablet or smartphone that can execute scripts and render style sheets quickly to realize a productivity gain.

That brings us to the question of whether online CAT tools can fulfil the needs of professional translators. Basically, the answer is yes. Online CAT tools often work well on desktops. However, they are currently an online variant on computer-aided translation tools. That does not mean that they are fully fledged to become smart device-based translation tools (SAT). The current generation of browser-based CAT tools is perfect to use with laptops while one is on the go, but in order to benefit from their full potential for smartphones and tablets they still need to be more adapted to these devices. The future of CAT tools is in our hands, but it still need to be adapted to our fingers.

Files, Files Everywhere: The Subtle Power of Translation Alignment

Files, Files Everywhere: The Subtle Power of Translation Alignment

Here’s the basic scenario: you have the translated versions of your documents, but the translation wasn’t performed in a CAT tool and you have to build a translation memory because these documents need to be updated or changed across the languages, you want to retain the existing elements, style and terminology, and you have integrated CAT technology in your processes in the meantime. The solution is a neat piece of language engineering called translation alignment.

Translation alignment is a native feature of most productivity tools for computer-assisted translation, but its application in real life is limited to very specific situations, so even the language professionals rarely have an opportunity to use it. However, these situations do happen once in while and when they do, alignment usually comes as a trusty solution for process optimization. We will take a look at two actual cases to show you what exactly it does.

Example No. 1: A simple case

Project outline:

Three Word documents previously translated to one language, totaling 6000 unweighted words. Two new documents totaling around 2500 words that feature certain elements of the existing files and need to follow the existing style and terminology.

Project execution:

Since the translated documents were properly formatted and there were no layout issues, the alignment process was completed almost instantly. The software was able to segmentize the source files and we matched the translated segments, with some minor tweaking of segmentation. We then built a translation memory from those matched segments and added the new files to the project.

The result:

Thanks to the created translation assets, the final wordcount of the new content was around 1500 and our linguists were able to produce translation in accordance with the previously established style and terminology. The assets were preserved for use on future projects.

Example No.2: An extreme case of multilingual alignment

Project outline:

In one of our projects we had to develop translation assets in four language pairs, totaling roughly 30k words per language. The source materials were expanded with new content totaling about 20k words unweighted and the language assets had to be developed both to retain the existing style and terminology solution and to help the client switch to a new CAT platform.

Project execution:

Unfortunately, there was no workaround for ploughing through dozens of files, but once we organized the materials we could proceed to the alignment phase. Since these files were localized and some parts were even transcreated to match the target cultures, which also included changes in layout and differences in content, we knew that alignment was not going to be fully automated.

This is why native linguists in these languages performed the translation alignment and communicated with the client and the content producer during this phase. While this slowed the process a bit, it ultimately yielded the best results possible.

We then exported the created translation memory in the cross-platform TMXformat that allowed use in different CAT tools and the alignment phase was finished.

The result:

With the TM applied, the weighted volume of new content was around 7k words. Our linguists localized the new materials in accordance with the existing conventions in the new CAT platform and the translation assets were saved for future use.

Wrap up

In both cases, translation alignment enabled us to reduce the volume of the new content for translation and localization and ensure stylistic and lexical consistencywith the previously translated materials. It also provided an additional, real-time quality control and helped our linguists produce a better translation in less time.

Translation alignment is not an everyday operation, but it is good to know that when it is called to deliver the goods, this is exactly what it does.

Reference: https://bit.ly/2p5aYr0

What machine translation teaches us about the challenges ahead for AI

What machine translation teaches us about the challenges ahead for AI

oão Graça, co-founder and CTO of Unbabel, on what machine translation can teach us about the challenges still lying ahead for artificial intelligence.

Can you understand this sentence? Now try understanding the long and convoluted and unexpectedly – maybe never-ending, or maybe ending-sooner-than-you-think, but let’s hope it ends soon – nature of this alternative sentence.

The complexities of language can be an inconvenience to a reader. But even to today’s smartest machine learning algorithms, there are more translation challenges remaining than advances in other fields would have you believe.

These challenges in particular are a good demonstration of the multitude of complexities that still remain for machines to catch up with human performance.

You say tomato

When it comes to translation, there are two categories of content. On one hand, you have “commodity” translation. Perhaps you want to point your phone at a menu and get a rough idea of what it is. Or you want to impress a colleague with a phrase from their local language.

Here, phrases are short, the content is often formal and errors aren’t life or death.

But on the other hand, you have interactions where context is key – understanding the intent of the writer or speaker, and the expectations of the reader or listener. Take any example where a business speaks to its customers – you better hope you are speaking their language respectfully when they have a complaint or problem.

It’s not enough to solve the problem at a superficial level, and to achieve comparably “human quality” communication still has an enormous amount of research ahead of it. This need for perfection is why most research is focused in this second area.

In the examples below, I discuss the challenges still ahead for the translation industry, and touch on what they mean for how we use machine learning tech more broadly.

Challenge 1: Long-distance lookups

Many of the biggest challenges are structural.

A good example is long distance lookups. If you are translating a sentence word by word, but the order is the same, it’s just solving “what is the correct equivalent of this for that?”

But once you start having to think about reordering the sentence, the problem space that has to be explored is exponentially larger. And in languages like Chinese and Japanese, you find verbs at the end of the sentence, potentially producing the longest distances possible.

The system needs to assess at least three reordering systems. This is why these languages are so hard, because you have to cater to very different grammatical patterns, very different vocabularies, and how many characters are in each word.

Here, you can see how expanding problem spaces create difficulties in an area the human brain handles with ease.

Challenge 2: Taxonomy

The second major area of complexity involves different formats of data.

For example, conversational language has a completely different structure and appropriate models than formal documents. In areas like customer service translation, this makes a big difference. Nobody likes to feel like the representative of a company is being overly officious when handling their problem.

Therefore, any model that is able to learn from a volume of real human queries will have an advantage — and doubly so if it’s able to take it from a particular industry sector. Meanwhile, other models might be relying on news stories or generic online text, and output completely different results.

Similarly, with other machine learning challenges, the ability to learn from the most valuable and representative data can give a big advantage – or risk limiting taxonomical flexibility.

This brings us to context.

Challenge 3: Context

Most translation models still translate sentence by sentence, so they don’t take the context into account.

If they are translating a pronoun, they have no clue which pronoun should be translated. They will randomly generate sentences that are formal or informal. They don’t guarantee consistency of terminology – for instance, translating a legal term correctly in the same way throughout. There’s no way you can guarantee the whole document is correct.

The other problem is the content is not always in the same language. Sometimes it’s one sentence in Chinese, one sentence in English. The sentences are much shorter, so you probably have to look much higher for context. This reaches its extreme in “chat” interactions.

And the context problem is different than if you were translating an email. For example, if you are doing a legal document and the document is ten pages long, you would need to use the entire document for an accurate contextual translation.

This is next to impossible with current models – you have to find some way to summarise it. Otherwise, consistency is nearly impossible.

On the other hand, if you are translating for something like SEO, what you are actually translating is key words that don’t form a sentence, just keywords by themselves. This means you turn to more dictionary-like translation to disambiguate and use other words or the image associated with it.

People think “Oh, we are in the age of unlimited data” but actually we are still enormously lacking in many ways.

Yes, we have a lot of data but often not enough relevant data.

Looking to the future

There will be many translation engines but what makes them different is their models.

The model is going to look at the data and predict patterns and assign them to different customers, and from then, will decide which voice/ language/ tone/ etc. to choose.

In current common public translation tools, they aren’t aware of this yet. They don’t even have the knowledge of the document from where the translation came from, let alone the speaker or their translation preferences.

This will bring in the next level of sophistication in this area. Machine learning, exercised against use-specific corpus of language, will give fast and accurate translations, while being able to forward them to humans to finalise and learn from further.

Languages might still drive machines crazy – but with careful human thinking, we can teach them to persevere.

Reference: https://bit.ly/2PYYHAB