Tag: CAT Tools

Machine Translation From the Cold War to Deep Learning

Machine Translation From the Cold War to Deep Learning

In the beginning

The story begins in 1933. Soviet scientist Peter Troyanskii presented “the machine for the selection and printing of words when translating from one language to another” to the Academy of Sciences of the USSR. The invention was super simple — it had cards in four different languages, a typewriter, and an old-school film camera.

The operator took the first word from the text, found a corresponding card, took a photo, and typed its morphological characteristics (noun, plural, genitive) on the typewriter. The typewriter’s keys encoded one of the features. The tape and the camera’s film were used simultaneously, making a set of frames with words and their morphology.

Despite all this, as often happened in the USSR, the invention was considered “useless”. Troyanskii died of Stenocardia after trying to finish his invention for 20 years. No one in the world knew about the machine until two Soviet scientists found his patents in 1956.

It was at the beginning of the Cold War. On January 7th 1954, at IBM headquarters in New York, the Georgetown–IBM experiment started. The IBM 701 computer automatically translated 60 Russian sentences into English for the first time in history.

However, the triumphant headlines hid one little detail. No one mentioned the translated examples were carefully selected and tested to exclude any ambiguity. For everyday use, that system was no better than a pocket phrasebook. Nevertheless, this sort of arms race launched: Canada, Germany, France, and especially Japan, all joined the race for machine translation.

The race for machine translation

The vain struggles to improve machine translation lasted for forty years. In 1966, the US ALPAC committee, in its famous report, called machine translation expensive, inaccurate, and unpromising. They instead recommended focusing on dictionary development, which eliminated US researchers from the race for almost a decade.

Even so, a basis for modern Natural Language Processing was created only by the scientists and their attempts, research, and developments. All of today’s search engines, spam filters, and personal assistants appeared thanks to a bunch of countries spying on each other.

Rule-based machine translation (RBMT)

The first ideas surrounding rule-based machine translation appeared in the 70s. The scientists peered over the interpreters’ work, trying to compel the tremendously sluggish computers to repeat those actions. These systems consisted of:

  • Bilingual dictionary (RU -> EN)
  • A set of linguistic rules for each language (For example, nouns ending in certain suffixes such as -heit, -keit, -ung are feminine)

That’s it. If needed, systems could be supplemented with hacks, such as lists of names, spelling correctors, and transliterators.

PROMPT and Systran are the most famous examples of RBMT systems. Just take a look at the Aliexpress to feel the soft breath of this golden age.

But even they had some nuances and subspecies.

Direct Machine Translation

This is the most straightforward type of machine translation. It divides the text into words, translates them, slightly corrects the morphology, and harmonizes syntax to make the whole thing sound right, more or less. When the sun goes down, trained linguists write the rules for each word.

The output returns some kind of translation. Usually, it’s quite crappy. It seems that the linguists wasted their time for nothing.

Modern systems do not use this approach at all, and modern linguists are grateful.

Transfer-based Machine Translation

In contrast to direct translation, we prepare first by determining the grammatical structure of the sentence, as we were taught at school. Then we manipulate whole constructions, not words, afterwards. This helps to get quite decent conversion of the word order in translation. In theory.

In practice, it still resulted in verbatim translation and exhausted linguists. On the one hand, it brought simplified general grammar rules. But on the other, it became more complicated because of the increased number of word constructions in comparison with single words.

Interlingual Machine Translation

In this method, the source text is transformed to the intermediate representation, and is unified for all the world’s languages (interlingua). It’s the same interlingua Descartes dreamed of: a meta-language, which follows the universal rules and transforms the translation into a simple “back and forth” task. Next, interlingua would convert to any target language, and here was the singularity!

Because of the conversion, Interlingua is often confused with transfer-based systems. The difference is the linguistic rules specific to every single language and interlingua, and not the language pairs. This means, we can add a third language to the interlingua system and translate between all three. We can’t do this in transfer-based systems.

It looks perfect, but in real life it’s not. It was extremely hard to create such universal interlingua — a lot of scientists have worked on it their whole lives. They’ve not succeeded, but thanks to them we now have morphological, syntactic, and even semantic levels of representation. But the only Meaning-text theory costs a fortune!

The idea of intermediate language will be back. Let’s wait awhile.

As you can see, all RBMT are dumb and terrifying, and that’s the reason they are rarely used unless for specific cases (like the weather report translation, and so on). Among the advantages of RBMT, often mentioned are its morphological accuracy (it doesn’t confuse the words), reproducibility of results (all translators get the same result), and the ability to tune it to the subject area (to teach economists or terms specific to programmers, for example).

Even if anyone were to succeed in creating an ideal RBMT, and linguists enhanced it with all the spelling rules, there would always be some exceptions: all the irregular verbs in English, separable prefixes in German, suffixes in Russian, and situations when people just say it differently. Any attempt to take into account all the nuances would waste millions of man hours.

And don’t forget about homonyms. The same word can have a different meaning in a different context, which leads to a variety of translations. How many meanings can you catch here: I saw a man on a hill with a telescope?

Languages did not develop based on a fixed set of rules — a fact which linguists love. They were much more influenced by the history of invasions in past three hundred years. How could you explain that to a machine?

Forty years of the Cold War didn’t help in finding any distinct solution. RBMT was dead.

Example-based Machine Translation (EBMT)

Japan was especially interested in fighting for machine translation. There was no Cold War, but there were reasons: very few people in the country knew English. It promised to be quite an issue at the upcoming globalization party. So the Japanese were extremely motivated to find a working method of machine translation.

Rule-based English-Japanese translation is extremely complicated. The language structure is completely different, and almost all words have to be rearranged and new ones added. In 1984, Makoto Nagao from Kyoto University came up with the idea of using ready-made phrases instead of repeated translation.

Let’s imagine that we have to translate a simple sentence — “I’m going to the cinema.” And let’s say we’ve already translated another similar sentence — “I’m going to the theater” — and we can find the word “cinema” in the dictionary.

All we need is to figure out the difference between the two sentences, translate the missing word, and then not screw it up. The more examples we have, the better the translation.

I build phrases in unfamiliar languages exactly the same way!

EBMT showed the light of day to scientists from all over the world: it turns out, you can just feed the machine with existing translations and not spend years forming rules and exceptions. Not a revolution yet, but clearly the first step towards it. The revolutionary invention of statistical translation would happen in just five years.

Statistical Machine Translation (SMT)

In early 1990, at the IBM Research Center, a machine translation system was first shown which knew nothing about rules and linguistics as a whole. It analyzed similar texts in two languages and tried to understand the patterns.

The idea was simple yet beautiful. An identical sentence in two languages split into words, which were matched afterwards. This operation repeated about 500 million times to count, for example, how many times the word “Das Haus” translated as “house” vs “building” vs “construction”, and so on.

If most of the time the source word was translated as “house”, the machine used this. Note that we did not set any rules nor use any dictionaries — all conclusions were done by machine, guided by stats and the logic that “if people translate that way, so will I.” And so statistical translation was born.

The method was much more efficient and accurate than all the previous ones. And no linguists were needed. The more texts we used, the better translation we got.

There was still one question left: how would the machine correlate the word “Das Haus,” and the word “building” — and how would we know these were the right translations?

The answer was that we wouldn’t know. At the start, the machine assumed that the word “Das Haus” equally correlated with any word from the translated sentence. Next, when “Das Haus” appeared in other sentences, the number of correlations with the “house” would increase. That’s the “word alignment algorithm,” a typical task for university-level machine learning.

The machine needed millions and millions of sentences in two languages to collect the relevant statistics for each word. How did we get them? Well, we decided to take the abstracts of the European Parliament and the United Nations Security Council meetings — they were available in the languages of all member countries and were now available for download at UN Corporaand Europarl Corpora.

Word-based SMT

In the beginning, the first statistical translation systems worked by splitting the sentence into words, since this approach was straightforward and logical. IBM’s first statistical translation model was called Model one. Quite elegant, right? Guess what they called the second one?

Model 1: “the bag of words”

Model one used a classical approach — to split into words and count stats. The word order wasn’t taken into account. The only trick was translating one word into multiple words. For example, “Der Staubsauger” could turn into “Vacuum Cleaner,” but that didn’t mean it would turn out vice versa.

Here’re some simple implementations in Python: shawa/IBM-Model-1.

Model 2: considering the word order in sentences

The lack of knowledge about languages’ word order became a problem for Model 1, and it’s very important in some cases.

Model 2 dealt with that: it memorized the usual place the word takes at the output sentence and shuffled the words for the more natural sound at the intermediate step. Things got better, but they were still kind of crappy.

Model 3: extra fertility

New words appeared in the translation quite often, such as articles in German or using “do” when negating in English. “Ich will keine Persimonen” → “I donot want Persimmons.” To deal with it, two more steps were added to Model 3.

  • The NULL token insertion, if the machine considers the necessity of a new word
  • Choosing the right grammatical particle or word for each token-word alignment

Model 4: word alignment

Model 2 considered the word alignment, but knew nothing about the reordering. For example, adjectives would often switch places with the noun, and no matter how good the order was memorized, it wouldn’t make the output better. Therefore, Model 4 took into account the so-called “relative order” — the model learned if two words always switched places.

Model 5: bugfixes

Nothing new here. Model 5 got some more parameters for the learning and fixed the issue with conflicting word positions.

Despite their revolutionary nature, word-based systems still failed to deal with cases, gender, and homonymy. Every single word was translated in a single-true way, according to the machine. Such systems are not used anymore, as they’ve been replaced by the more advanced phrase-based methods.

Phrase-based SMT

This method is based on all the word-based translation principles: statistics, reordering, and lexical hacks. Although, for the learning, it split the text not only into words but also phrases. These were the n-grams, to be precise, which were a contiguous sequence of n words in a row.

Thus, the machine learned to translate steady combinations of words, which noticeably improved accuracy.

The trick was, the phrases were not always simple syntax constructions, and the quality of the translation dropped significantly if anyone who was aware of linguistics and the sentences’ structure interfered. Frederick Jelinek, the pioneer of the computer linguistics, joked about it once: “Every time I fire a linguist, the performance of the speech recognizer goes up.”

Besides improving accuracy, the phrase-based translation provided more options in choosing the bilingual texts for learning. For the word-based translation, the exact match of the sources was critical, which excluded any literary or free translation. The phrase-based translation had no problem learning from them. To improve the translation, researchers even started to parse the news websites in different languages for that purpose.

Starting in 2006, everyone began to use this approach. Google Translate, Yandex, Bing, and other high-profile online translators worked as phrase-based right up until 2016. Each of you can probably recall the moments when Google either translated the sentence flawlessly or resulted in complete nonsense, right? The nonsense came from phrase-based features.

The good old rule-based approach consistently provided a predictable though terrible result. The statistical methods were surprising and puzzling. Google Translate turns “three hundred” into “300” without any hesitation. That’s called a statistical anomaly.

Phrase-based translation has become so popular, that when you hear “statistical machine translation” that is what is actually meant. Up until 2016, all studies lauded phrase-based translation as the state-of-the-art. Back then, no one even thought that Google was already stoking its fires, getting ready to change our whole image of machine translation.

Syntax-based SMT

This method should also be mentioned, briefly. Many years before the emergence of neural networks, syntax-based translation was considered “the future or translation,” but the idea did not take off.

The proponents of syntax-based translation believed it was possible to merge it with the rule-based method. It’s necessary to do quite a precise syntax analysis of the sentence — to determine the subject, the predicate, and other parts of the sentence, and then to build a sentence tree. Using it, the machine learns to convert syntactic units between languages and translates the rest by words or phrases. That would have solved the word alignment issue once and for all.

The problem is, the syntactic parsing works terribly, despite the fact that we consider it solved a while ago (as we have the ready-made libraries for many languages). I tried to use syntactic trees for tasks a bit more complicated than to parse the subject and the predicate. And every single time I gave up and used another method.

Let me know in the comments if you succeed using it at least once.

Neural Machine Translation (NMT)

A quite amusing paper on using neural networks in machine translation was published in 2014. The Internet didn’t notice it at all, except Google — they took out their shovels and started to dig. Two years later, in November 2016, Google made a game-changing announcement.

The idea was close to transferring the style between photos. Remember apps like Prisma, which enhanced pictures in some famous artist’s style? There was no magic. The neural network was taught to recognize the artist’s paintings. Next, the last layers containing the network’s decision were removed. The resulting stylized picture was just the intermediate image that network got. That’s the network’s fantasy, and we consider it beautiful.

If we can transfer the style to the photo, what if we try to impose another language to a source text? The text would be that precise “artist’s style,” and we would try to transfer it while keeping the essence of the image (in other words, the essence of the text).

Imagine I’m trying to describe my dog — average size, sharp nose, short tail, always barks. If I gave you this set of the dog’s features, and if the description was precise, you could draw it, even though you have never seen it.

Now, imagine the source text is the set of specific features. Basically, it means that you encode it, and let the other neural network decode it back to the text, but, in another language. The decoder only knows its language. It has no idea about of the features’ origin, but it can express them in, for example, Spanish. Continuing the analogy, it doesn’t matter how you draw the dog — with crayons, watercolor or your finger. You paint it as you can.

Once again — one neural network can only encode the sentence to the specific set of features, and another one can only decode them back to the text. Both have no idea about the each other, and each of them knows only its own language. Recall something? Interlingua is back. Ta-da.

The question is, how do we find those features? It’s obvious when we’re talking about the dog, but how to deal with the text? Thirty years ago scientists already tried to create the universal language code, and it ended in a total failure.

Nevertheless, we have deep learning now. And that’s its essential task! The primary distinction between the deep learning and classic neural networks lays precisely in the ability to search for those specific features, without any idea of their nature. If the neural network is big enough, and there are a couple of thousand video cards at hand, it’s possible to find those features in the text as well.

Theoretically, we can pass the features gotten from the neural networks to the linguists, so that they can open brave new horizons for themselves.

The question is, what type of neural network should be used for encoding and decoding? Convolutional Neural Networks (CNN) fit perfectly for pictures since they operate with independent blocks of pixels.

But there are no independent blocks in the text — every word depends on its surroundings. Text, speech, and music are always consistent. So recurrent neural networks (RNN) would be the best choice to handle them, since they remember the previous result — the prior word, in our case.

Now RNNs are used everywhere — Siri’s speech recognition (it’s parsing the sequence of sounds, where the next depends on the previous), keyboard’s tips (memorize the prior, guess the next), music generation, and even chatbots.

In two years, neural networks surpassed everything that had appeared in the past 20 years of translation. Neural translation contains 50% fewer word order mistakes, 17% fewer lexical mistakes, and 19% fewer grammar mistakes. The neural networks even learned to harmonize gender and case in different languages. And no one taught them to do so.

The most noticeable improvements occurred in fields where direct translation was never used. Statistical machine translation methods always worked using English as the key source. Thus, if you translated from Russian to German, the machine first translated the text to English and then from English to German, which leads to a double loss.

Neural translation doesn’t need that — only a decoder is required so it can work. That was the first time that direct translation between languages with no сommon dictionary became possible.

The conclusion and the future

Everyone’s still excited about the idea of “Babel fish” — instant speech translation. Google has made steps towards it with its Pixel Buds, but in fact, it’s still not what we were dreaming of. The instant speech translation is different from the usual translation. You need to know when to start translating and when to shut up and listen. I haven’t seen suitable approaches to solve this yet. Unless, maybe, Skype…

And here’s one more empty area: all the learning is limited to the set of parallel text blocks. The deepest neural networks still learn at parallel texts. We can’t teach the neural network without providing it with a source. People, instead, can complement their lexicon with reading books or articles, even if not translating them to their native language.

If people can do it, the neural network can do it too, in theory. I found only one prototype attempting to incite the network, which knows one language, to read the texts in another language in order to gain experience. I’d try it myself, but I’m silly. Ok, that’s it.

Reference: https://bit.ly/2HCmT6v

How to become a localization project manager

How to become a localization project manager

Excerpts from an article with the same title, written by Olga Melnikova in Multilingual Magazine.  Olga Melnikova is a project manager at Moravia and an adjunct professor at the Middlebury Institute of International Studies. She has ten years of experience in the language industry. She holds an MA in translation and localization management and two degrees in language studies.

I decided to talk to people who have been in the industry for a while, who have seen it evolve and know where it’s going. My main question was: what should a person do to start a localization project manager career? I interviewed several experts who shared their vision and perspectives — academics, industry professionals and recruiters. I spoke with Mimi Moore, account manager at Anzu Global, a recruiting company for the localization industry; Tucker Johnson, managing director of Nimdzi Insights; Max Troyer, translation and localization management program coordinator at MIIS, and Jon Ritzdorf, senior solution architect at Moravia and an adjunct professor at the University of Maryland and at MIIS. All of them are industry veterans and have extensive knowledge and understanding of its processes.

Why localization project management?

The first question is: Why localization project management? Why is this considered a move upwards compared to the work of linguists who are the industry lifeblood? According to Renato Beninatto and Tucker Johnson’s The General Theory of the Translation Company, “project
management is the most crucial function of the LSP. Project management has the potential to most powerfully impact an LSP’s ability to add value to the language services value chain.” “Project managers are absolutely the core function in a localization company,” said Johnson. “It is important to keep in mind that language services providers do not sell translation, they sell services. Project managers are responsible for coordinating and managing all of the resources that need to be coordinated in order to deliver to the client: they are managing time, money, people and technology.


Nine times out of ten, Johnson added, the project manager is the face of the company to the client. “Face-to-face contact and building the relationship are extremely important.” This is why The General Theory of the Translation Company regards project management to be one of the core functions of any language service provider (LSP). This in no way undermines the value of all the other industry players, especially
linguists who do the actual translation work. However, the industry cannot do without PMs because “total value is much higher than the original translations. This added value is at the heart of the language services industry.” This is why clients are happy to pay higher prices to work with massive multiple services providers instead of working directly with translators.

Who are they?

The next question is, how have current project managers become project managers? “From the beginning, when the industry started 20 years
ago, there were no specialized training programs for project managers,” Troyer recounted. “So there were two ways. One is you were a translator, but wanted to do something else — become an editor, for example, or start to manage translators. The other route was people working in a business that goes global. So there were two types of people who would become project managers — former translators or people who were assigned localization as a job task.”

According to Ritzdorf, this is still the case in many companies. “I am working with project managers from three prospective clients right now, all of whom do not have a localization degree and are all in localization positions. Did they end up there because they wanted to? Maybe not. They did not end up there because they said ‘Wow, I really want to become a head of localization.’ They just ended up there by accident, like a lot of people do.”

“There are a lot of people who work in a company and who have never heard of localization, but guess what? It is their job now to do localization, and they have to figure it out all by themselves,” Moore confirmed. “When the company decides to go international, they have to find somebody to manage that,” said Ritzdorf.

Regionalization


The first to mention regionalization was Ritzdorf, and then other interviewees confrmed it exists. Ritzdorf lives on the East Coast of the
United States, but comes to the West Coast to teach at MIIS, so he sees the differences. “There are areas where localization is a thing, which means when you walk into a company, they actually know about localization. Since there are enough people who understand what localization is, they want someone with a background in it.” Silicon Valley is a great example, said Ritzdorf. MIIS is close; there is a

localization community that includes organizations like Women in Localization; and there are networking events like IMUG. “People live and
breathe localization. However, there is a totally different culture in other regions, which is very fragmented. There are tons of little companies in other parts of the US, and the situation there is different. If I am a small LSP owner in Wisconsin or Ohio, what are my chances of finding someone with a degree or experience to fill a localization position for a project manager? Extremely low. This is why I may hire a candidate who has an undergraduate degree in French literature, for example. Or in linguistics, languages — at least something.”

The recruiters’ perspective


Nimdzi Insights conducted an interesting study about hiring criteria for localization project manager positions (Figure 1). Some 75 respondents (both LSPs and clients) were asked how important on a scale of 1 to 5 a variety of qualifications are for project management positions. Te responses show a few trends. Top priorities for clients are previous localization experience and a college degree, followed by years of experience and proficiency in more than one language. Top criteria for LSPs are reputation and a college degree, also followed
by experience and proficiency in more than one language.

Moore said that when clients want to hire a localization project manager, the skills they are looking for are familiarity with computer assisted translation (CAT) tools “and an understanding of issues that can arise during localization — like quality issues, for example. Compared to
previous years, more technical skills are required by both clients and vendors: CAT tools, WorldServer, machine translation knowledge, sometimes WordPress or basic engineering. When I started, they were nice-to-haves, but certainly not mandatory.”

Technical skill is not enough, however. “Both hard and soft skills are important. You need hard skills because the industry has become a lot more technical as far as software, tools and automation are concerned. You need soft skills to deal with external and internal stakeholders, and one of the main things is working under pressure because you are juggling so many things.

Moore also mentioned some red flags that would cause Anzu not to hire a candidate. “Sometimes an applicant does not demonstrate good English skills in phone interviews. Having good communication skills is important for a client-facing position. Also, people sometimes exaggerate their skills or experience. Another red flag is if the person has a bad track record (if they change jobs every nine months, for example).” ‘

Anzu often hires for project management contract positions in large companies. “Clients usually come to us when they need a steady stream of contractors (three or six months), then in three or six months there will be other contractors. Te positions are usually project managers or testers. If you already work fulltime, a contract position may not be that attractive. However, if you are a newcomer or have just graduated, and you want to get some experience, then it is a great opportunity. You would spend three, six or 12 months at a company, and it is a very good line on the résumé.”

Do you need a localization degree? 

There is no firm answer to the question of whether or not you need a degree. If you don’t know what you should do, it can certainly help. Troyer discussed how the localization program at MIIS has evolved to ft current real-world pressures. “The program was first started in 2004, and it started small. We were first giving CAT tools, localization project management and software localization courses. This is
the core you need to become a project manager. Ten the program evolved and we introduced the introduction and then advanced levels to many courses. There are currently four or five courses focusing on translation
technology.” Recent additions to the curriculum include advanced JavaScript classes, advanced project management and program management. Natural language processing and computational linguistics will be added down the road. “The industry is driving this move because students will need skills to go in and localize Siri into many languages,” said Troyer.

The program at MIIS is a two-year master’s. It can be reduced to one year for those who already have experience. There are other degrees
available, as well as certification programs offered by institutions such as the University of Washington and The Localization Institute.

Moore said that though a localization degree is not a must, it has a distinct advantage. A lot of students have internships that give them experience. They also know tools, which makes their résumés better fit clients’ job descriptions.

However, both Troyer and Ritzdorf said you don’t necessarily need a degree. “If you have passion for languages and technology, you can get the training on your own,” said Troyer. “Just teach yourself these skills, network on your own and try to break into the industry.”

The future of localization project management

Automation, artificial intelligence and machine learning are affecting all industries, and localization is not an exception. However, all the interviewees forecast that there will be more localization jobs in the future.

According to Johnson, there is high project management turnover on the vendor side because if a person is a good manager, they never stay in this position for more than five years. “After that, they either get a job on the client’s side to make twice as much money and have a much easier job, or their LSP has to promote them to senior positions such as group manager or program director.”

“There is a huge opportunity to stop doing things that are annoying,” said Troyer. “Automation will let professionals work on the human side
of things and let the machines run 
day-to-day tasks. Letting the machine send files back and forth will allow humans to spend more time looking at texts and thinking about what questions a translator can ask. This will give them more time for building a personal relationship with the client. We are taking these innovations into consideration for the curriculum, and I often spend time during classes asking, ‘How can you automate this?’”

Moore stated that “we have seen automation change workflows over the last ten years and reduce the project manager’s workload, with files being automatically moved through each step in the localization process. Also, automation and machine translation go hand-in-hand to make the process faster, more efficient and cost-effective.”

Uberization of Translation by Jonckers

Uberization of Translation by Jonckers

WordsOnline Cloud Based Platform Explained…

Just over a year ago, Jonckers announced the launch of its unique Cloud based management platform WordsOnline. The concept evolved from working in partnership with eCommerce customers, processing over 30 million words each month. Jonckers knew that faster time to market is key for sectors such as retail to get products and messages to their audience. They needed to keep up with this demand and build on their speedy solutions.

Jonckers identified that when dealing with higher volumes, the traditional batch and project methodology for processing translation was not as effective. Waiting weeks for large volume deliveries, arranging thousands of files to allocate to multiple linguists and keeping trackers up to date was taking its toll! Quality Assurance checks were also risking on time deliveries – the allocated batches to linguists were simply too large and timescales too long to manage QA within the timeframes.

It was clear a paradigm shift was needed. Jonckers conclusion: to develop a technology powered continuous delivery solution.

What is WordsOnline?

It’s a state of the art, cloud based TMS (Translation Management System) accommodating both the traditional localization workflow (project based) and the continuous delivery model.

What is a continuous delivery model?

It is a model without handoffs or handbacks. Through API, WordsOnline can sync with the customers’ system and downloads the content to be translated into the Jonckers powered database. That content is then split into small set of strings (defined on a case per case basis), made immediately available to edit and translate online. It is based on the Uber business model of fast, efficient supply and demand.
Jonckers’ resourcing team ensures premium resource capacity to guarantee content is continuously processed.

What type of content does WordsOnline process?

The purpose of the WordsOnline platform is fast-turnaround. The content processed so far by this impressive system is mostly large scale documentation, product descriptions, MT training material. However, the platform has been designed to process and deliver on all file and content types. Its non-discriminate programming has been developed specifically to be adaptable to any volume, language, time-frame and file format.

What are the key advantages of using WordsOnline?

• Faster turn-around time – Jonckers are able to process massive amounts of data that after translation will be pushed to review and back to the customer, in a continuous cycle.

• Price – WordsOnline applies TM, then Jonckers’ NMT engine or the customer’s engine if preferred. The volumes processed allow a more attractive and cost effective price point.

• Control –Project Managers can monitor the volume of words being processed, translated, reviewed and pushed back to the customer’s system. There are several other features which also allow rating of resource and analytics for a comprehensive overview of every job.

What are the key features of WordsOnline?

WordOnline linguist database interface includes a ratings platform so clients can monitor the delivery and quality of resources:

The live Dashboard interface allows clients to follow the progress of the content, performance of the MT engine, stats etc…

In short, the process is completely ‘Uberized’, Jonckers is making translation as simple as upload your files… track the progress… receive final translation delivery! Its as simple as that.

Reference: https://bit.ly/2HnoGjF

Exclusive Look Inside MemoQ Zen

Exclusive Look Inside MemoQ Zen



MemoQ launched a beta version for MemoQ Zen, a new online CAT tool. MemoQ Zen brings you the joy of translation, without the hassle. Experience the benefits of an advanced CAT tool, delivered to your browser in a simple and clean interface. You can get the early access through this link and adding your email address. Then, MemoQ’s team will activate your email address.

Note: preferably to use gmail account.  

These are exclusive screenshots from inside MemoQ Zen, as our blog got an early access:

Once the user logs in, this home page appears:

Clicking on adding new job will lead to these details:

You can upload documents from your computer or adding files from your Google Drive. The second option needs access to your drive. After choosing files to be uploaded, you’ll complete the required details for adding new jobs.

In working days field, MemoQ Zen excludes Sundays and Saturdays from the total workdays. This option helps in planing the actual days required to get the task done. After uploading the files and adding the details, a new job will be created in your job board.

Clicking on view statistics will lead to viewing the analysis report. Unfortunately, it can’t be saved.

Clicking on translate will lead to opening an online editor for the CAT tool.

TM and TB matches will be viewed on the right pane. Other regular options such as copying tags, join segments, and concordance search are there. Previewing mode can be enabled as well. Unfortunately, copying source to target isn’t available.

QA errors alerts appear after confirming each segments. After clicking on the alert, the error will appear like this. You can check ignore, in case it is a false error.

While translating, the progress is updating in the main view.

Clicking on fetch will download the target file (clean) to your computer. TMs and TBs aren’t available to upload, add, create or even download yet.

Clicking on done will mark the job as completed.

That’s it! Easy tool and to the point with clean UI and direct options. Although it still need development to meet the industry requirements i.e adding TMs and TBs, etc. But, it’s a good start, and as MemoQ Zen website states it:

We created memoQ Zen to prove that an advanced CAT tool doesn’t need to be complicated. It is built on the same memoQ technology that is used by hundreds of companies and thousands of translators every day.

We are releasing it as a limited beta because we want to listen to you from day one. As a gesture, it will also stay free as long as the beta phase lasts.

MemoQ’s First Release in 2018: MemoQ 8.4

MemoQ’s First Release in 2018: MemoQ 8.4

Kilgary released its MemoQ 8.4, its first release for 2018. Improvements come in five main areas: user experience, terminology, filters in memoQ, performance, and server workflows. Read on for details:

1- User Experience 

A- Customer Insights Program

The memoQ Customer Insights program will feature two major initiatives:

Usage Data Collection: When you work with memoQ, you are given the choice to enable sending data about how you use the software. Not all types of data will be collected. For more details, check here: https://goo.gl/fgppMh

The Design Lab: A loosely knit community where you can share your insights, opinions and knowledge. In exchange, we will evolve memoQ to be a user-friendly tool that meets your needs and solves your problems. For more details and how to join: https://goo.gl/dbHHeD

B- Comments in online projects

We have re-worked the way comments work in online projects. Now, project managers can delete any comments anywhere. And, non PM users can only delete their own comments. Also, users can edit their own comments only.

C- Task Tracker Progress Messages

In memoQ 8.4, Task Tracker progress messages are shown more consistently. From now on, the Task Tracker will display proper progress messages whenever TMs and TBs are exported. When you export a TM or TB, the message “In progress…” will be displayed as soon as the export begins, and “Done.” when the export completes.

In addition, you will be able to open the location where the export was saved by using the Open folder icon.

2- Terminology

A- Import and export term bases with images

With memoQ 8.4, you can now import and export term bases with images.

B- Forbidden terms in the spotlight

MemoQ 8.4 adds new functionality to work with forbidden terms more effectively and -transparently. It will be marked in the term editor for easy identification. It will be highlighted in red for exported and imported term bases.

C- Filters & QA settings

MemoQ 8.4 features small improvements that will facilitate the way you work with terminology while boosting efficiency. You can now determine which of the term bases assigned to the project you want to use for quality assurance in a specific project.

D- More effective stop word lists

MemoQ 8.4 improves stop word list functionality to make term extraction sessions more productive. By improving your stop word lists you can reduce the number of term candidates you need to process in a term extraction session.

E- Filter filed in term extraction

MemoQ 8.4 introduces a more user-friendly filter field on the term extraction screen featuring the history of the term extraction’s session.

F- Smart search settings in QTerm

From now on, when you log into QTerm, you will see the same settings you used the last time on the search page (term bases to search, view, languages, term matching). This is particularly useful if you typically use QTerm for term lookup in a specific language combination and/or with specific term bases.

G- Entry relationships in QTerm

If you establish a symmetrical (homonym, synonym, antonym, cohyponym) or an anti-symmetrical (hyponym, hypernym) entry relationship in one entry with another, the corresponding relationship is also created in the other entry.

H- Easy Term Search

MemoQ 8.4 now offers memoQWeb external users simple and easy access to QTerm term bases for lookup.

I- Filtering Options

The “Begins with” filter condition and search option has been revamped and it now features a more user-friendly term matching interface.

3- Performance

A- Improvements in responsiveness​

When you download memoQ 8.4, you will experience performance improvements in the following areas:
  • Opening the memoQ dashboard,
  • Opening projects,
  • Opening translation documents,
  • Scrolling through resources,
  • Faster rendering of various screens.
Note: The degree of improvements in performance you experience depends on your hardware configuration.

B- MemoQ server back-up

With memoQ 8.4, backing up your server should be faster. We have improved the performance of this task by decreasing back-up time by up to 50%.

Note: The improvement in backup duration may not be significant for memoQ Servers running on SSD drives.

4- Document Import and Export

A- Import filter for subtitles and dubbing script

The new import filter in memoQ 8.4 can handle two subtitle formats:

  • .srt files
  • custom-made .xlsx.

The preview displaying live video will be a plugin based on Preview SDK.

B- ZIP Filter

The new filter offers a generic option for handling ZIP packages. It will display the files of the archive as embedded documents. It will also be possible to import only some of them.

5- Server-to-server Workflows

A- Lookup on Enterprise TM

Until now, memoQ servers around the world resemble big powerful giants that are unable of “talking to each other”. This is now going to change.
MemoQ is investing effort in developing this new technology that will add significant value to customers using the following workflows:
  • Client + Vendor
  • Making use of several memoQ servers.

The projects created from packages now have direct access to the parent TMs. It is done through the child server, so firewalls can be configured to let the traffic through. Project Managers can deliver with one click.

Edit Distance in Translation Industry

Edit Distance in Translation Industry

In computational linguistics, edit distance or Levenshtein distance, is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.  The edit distance between (a, b) is the minimum-weight series of edit operations that transforms a into b. One of the simplest sets of edit operations is that defined by Levenshtein in 1966 which are:

1- Insertion.

2- Deletion

3- Substitution.

In Levenshtein’s original definition, each of these operations has unit cost (except that substitution of a character by itself has zero cost), so the Levenshtein distance is equal to the minimum number of operations required to transform a to b.

For example, the Levenshtein distance between “kitten” and “sitting” is 3. A minimal edit script that transforms the former into the latter is:

  • kitten – sitten (substitution of “s” for “k”).
  • sitten –  sittin (substitution of “i” for “e”).
  • sittin –  sitting (insertion of “g” at the end).

What are the application of edit distance in translation industry?

1- Spell Checkers

Edit distance is applied where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question.

2- Machine Translation Evaluation and Post Editing

Edit distance can be used to compare a postedited file to the machine translated output that was the starting point for the postediting. When you calculate the edit distance, you are calculating the “effort” that the posteditor made to improve the quality of the machine translation to a certain level. Starting from the source content and same MT output, if you perform a light postediting and a full postediting, the edit distance for each task will be different, and the human quality level is expected to have a higher edit distance, because more changes are needed. This means that you are measuring light and full postediting using the edit distance.

Therefore, the edit distance is a kind of “word count” measure of the effort, similar in a way to the word count used to quantify the work of translators throughout the localization industry. It also helps in evaluating the quality of MT engine by comparing the raw MT to the post edited version by a human translator.

3- Fuzzy Match

In translation memories, edit distance is the technique of finding strings that match a pattern approximately (rather than exactly). Translation memories provide suggestions to translators, and fuzzy matches are used to measure the effort made to improve those suggestions.

LookAhead Feature – Towards Faster Translation Results

LookAhead Feature – Towards Faster Translation Results

 

 

To facilitate your work on SDL Trados Studio 2017 SR1, SDL powered it with LookAhead feature. LookAhead is an in-memory lookup and retrieval mechanism which ensures that your translation search results are displayed fast when you activate a segment for translation. LookAhead technology radically improves the retrieval speed of TM search results, especially for long or complex source text. Once your source text is loaded in SDL Trados Studio , the application starts matching source text strings against the available translation resources (TMs, termbases or machine translation) in the background for the next two segments after the current one. As a result, you are instantly provided with translation hits for each segment that has matching translation results.

How to enable LookAhead?

  1. Go to File, and select Options.
  2. In the Options dialog, in the navigation tree, expand Editor.
  3. Select Automation.
  4. Under Translation Memory, select the Enable LookAhead checkbox.

How to Cut Localization Costs with Translation Technology

How to Cut Localization Costs with Translation Technology

What is translation technology?

Translation technologies are sets of software tools designed to process translation materials and help linguists in their everyday tasks. They are divided in three main subcategories:

Machine Translation (MT)

Translation tasks are performed by machines (computers) either on the basis of statistical models (MT engines execute translation tasks on the basis of accumulated translated materials) or neural models (MT engines are based on artificial intelligence). The computer-translated output is edited by professional human linguists through the process of postediting that may be more or less demanding depending on language combinations and the complexity of materials, as well as the volume of content.

Computer-Aided Translation (CAT)

Computer-aided or computer-assisted translation is performed by professional human translators who use specific CAT or productivity software tools to optimize their process and increase their output.

Providing a perfect combination of technological advantages and human expertise, CAT software packages are the staple tools of the language industry. CAT tools are essentially advanced text editors that break the source content into segments, and split the screen into source and target fields which in and of itself makes the translator’s job easier. However, they also include an array of advanced features that enable the optimization of the translation/localization process, enhance the quality of output and save time and resources. For this reason, they are also called productivity tools.

Figure 1 – CAT software in use

The most important features of productivity tools include:

  • Translation Asset Management
  • Advanced grammar and spell checkers
  • Advanced source and target text search
  • Concordance search.

Standard CAT tools include Across Language ServerSDL Trados StudioSDL GroupShare, SDL PassolomemoQMemsource CloudWordfastTranslation Workspace and others, and they come both in forms of installed software and cloud solutions.

Quality Assurance (QA)

Quality assurance tools are used for various quality control checks during and after the translation/localization process. These tools use sophisticated algorithms to check spelling, consistency, general and project-specific style, code and layout integrity and more.

All productivity tools have built-in QA features, but there are also dedicated quality assurance tools such as Xbench and Verifika QA.

What is a translation asset?

We all know that information has value and the same holds true for translated information. This is why previously translated/localized and edited textual elements in a specific language pair are regarded as translation assets in the language industry – once translated/localized and approved, textual elements do not need to be translated again and no additional resources are spent. These elements that are created, managed and used with productivity tools include:

Translation Memories (TM)

Translation memories are segmented databases containing previously translated elements in a specific language pair that can be reused and recycled in further projects. Productivity software calculates the percentage of similarity between the new content for translation/localization and the existing segments that were previously translated, edited and proofread, and the linguist team is able to access this information, use it and adapt it where necessary. This percentage has a direct impact on costs associated with a translation/localization project and the time required for project completion, as the matching segments cost less and require less time for processing.

Figure 2 – Translation memory in use (aligned sample from English to German)

Translation memories are usually developed during the initial stages of a translation/localization project and they grow over time, progressively cutting localization costs and reducing the time required for project completion. However, translation memories require regular maintenance, i.e. cleaning for this very reason, as the original content may change and new terminology may be adopted.

In case when an approved translation of a document exists, but it was performed without productivity tools, translation memories can be produced through the process of alignment:

Figure 3 – Document alignment example

Source and target documents are broken into segments that are subsequently matched to produce a TM file that can be used for a project.

Termbases (TB)

Termbases or terminology bases (TB) are databases containing translations of specific terms in a specific language pair that provide assistance to the linguist team and assure lexical consistency throughout projects.

Termbases can be developed before the project, when specific terminology translations have been confirmed by all stakeholders (client, content producer, linguist), or during the project, as the terms are defined. They are particularly useful in the localization of medical devices, technical materials and software.

Glossaries

Unlike termbases, glossaries are monolingual documents explaining specific terminology in either source or target language. They provide further context to linguists and can be used for the development of terminology bases.

Benefits of Translation Technology

The primary purpose of all translation technology is the optimization and unification of the translation/localization process, as well as providing the technological infrastructure that facilitates work and full utilization of the expertise of professional human translators.

As we have already seen, translation memories, once developed, provide immediate price reduction (that varies depending on the source materials and the amount of matching segments, but may run up to 20% in the initial stages and it may only grow over time), but the long-term, more subtle benefits of the smart integration of translation technology are the ones that really make a difference and they include:

Human Knowledge with Digital Infrastructure

While it has a limited application, machine translation still does not yield satisfactory results that can be used for commercial purposes. All machine translations need to be postedited by professional linguists and this process is known to take more time and resources instead of less.

On the other hand, translation performed in productivity tools is performed by people, translation assets are checked and approved by people, specific terminology is developed in collaboration with the client, content producers, marketing managers, subject-field experts and all other stakeholders, eventually providing a perfect combination of human expertise, feel and creativity, and technological solutions.

Time Saving

Professional human linguists are able to produce more in less time. Productivity software, TMs, TBs and glossaries all reduce the valuable hours of research and translation, and enable linguists to perform their tasks in a timely manner, with technological infrastructure acting as a stylistic and lexical guide.

This eventually enables the timely release of a localized product/service, with all the necessary quality checks performed.

Consistent Quality Control

The use of translation technology itself represents real-time quality control, as linguists rely on previously proofread and quality-checked elements, and maintain the established style, terminology and quality used in previous translations.

Brand Message Consistency

Translation assets enable the consistent use of a particular tonestyle and intent of the brand in all translation/localization projects. This means that the specific features of a corporate message for a particular market/target group will remain intact even if the linguist team changes on future projects.

Code / Layout Integrity Preservation

Translation technology enables the preservation of features of the original content across translated/localized versions, regardless of whether the materials are intended for printing or online publishing.

Different solutions are developed for different purposes. For example, advanced cloud-based solutions for the localization of WordPress-powered websites enable full preservation of codes and other technical elements, save a lot of time and effort in advance and optimize complex multilingual localization projects.

Wrap-up

In a larger scheme of things, all these benefits eventually spell long-term cost/time savings and a leaner translation/localization process due to their preventive functions that, in addition to direct price reduction, provide consistencyquality control and preservation of the integrity of source materials.

Reference: https://goo.gl/r5kmCJ

Wordfast Releases Wordfast Pro 5.4 and Wordfast Anywhere 5.0

Wordfast Releases Wordfast Pro 5.4 and Wordfast Anywhere 5.0

Wordfast today released version 5.4 of its platform independent desktop tool, Wordfast Pro. Notable features and improvements include Adaptive Transcheck, a new Segment Changes report format, a new feedback proxy tool, and the ability to connect to Wordfast Anywhere TMs and glossaries. This latest feature puts the power of server-based TMs and glossaries into the hands of desktop users for free.

Wordfast also recently released Wordfast Anywhere 5.0 which includes a localized user interface (UI) in French and Spanish. The UI is ready to be translated to other languages with a collaborative translation page accessible through a user’s profile.

Wordfast will be showcasing the interconnectivity of Wordfast Pro and Wordfast Anywhere during its 4th annual user conference – Wordfast Forward – to take place on June 1-2, 2018 in Cascais, Portugal. For more details about the program, please see the dedicated conference page.

Reference: https://goo.gl/hstxKp