Tag: Translation Technology Development

AI Interpreter Fail at China Summit Sparks Debate about Future of Profession

AI Interpreter Fail at China Summit Sparks Debate about Future of Profession

Tencent’s AI powered translation engine, which was supposed to perform simultaneous transcribing and interpreting at China’s Boao Forum for Asia last week, faltered badly and became the brunt of jokes on social media. It even made headlines on the South China Morning Post, Hong Kong’s main English newspaper – which, incidentally, is owned by Tencent’s key rival Alibaba.

The Boao Forum, held in Hainan Province on April 8-11, 2018, is an annual nonprofit event that was started in 2001. Supported by the region’s governments, its purpose is to further progress and economic integration in Asia by bringing together leaders in politics, business and academia for high-end dialogs and networking.

Tencent is one of the tech giants of China, often dubbed the “B.A.T.” (for Baidu, Alibaba, Tencent; sometimes BATX if one includes Xiaomi). Its most well known products include the instant messenger WeChat as well as microblogging site Sina Weibo. Both are everyday apps used by just about all Chinese citizens as well as other ethnic Chinese around the world.

WeChat in China is pretty much an all-round, full service lifestyle mobile app in its local Chinese version. You could do just about anything in it these days – from buying train and movie tickets to making mutual fund investments to ordering groceries or an hourly maid from the neighbourhood.

In 2017, Tencent rolled out an AI powered translation engine called “Fanyijun”, which literally translates to “Mr. Translate”, since the Chinese character for “jun” is a polite, literary term for a male person.

What went Wrong?

Fanyijun is already in use powering the in-app translator in WeChat as well as available online as a free online service. However, it was supposed to have made a high-profile debut at the Boao Forum together with the Tencents “Zhiling” or literally translated, “Smart Listening” speech recognition engine, showcasing the company’s ability to do real-time transcription and interpreting. In retrospect, it seems the publicity effort has backfired on Tencent.

To be sure, human interpreters were still on hand to do the bulk of the interpreting work during the forum. However, Tencent used its AI engine to power the live translation and broadcast of some of the side conferences to screens next to the stage and for followers of the event within WeChat.

This resulted in many users making screenshots of the embarrassing errors made when the engine frequently went haywire and generated certain words needlessly and repeatedly, as well as getting confused when some speakers spoke in an unstructured manner or used certain terminology wrongly.

Chinese media cited a Tencent’s spokesperson who admitted that their system “did make errors” and “answered a few questions wrongly”. But he also said in their defense that the Boao Forum was a high-level, multi-faceted, multi-speaker, multi-lingual, discussion based event. That and the fact that the environment was sometimes filled with echo and noise, added to the challenges their system faced.

“They still need humans…”

The gloating hit a crescendo when someone circulated this screenshot from a WeChat group composed of freelance interpreters. It was an urgent request for English simultaneous interpreters to do a live webcast later that day for the Boao Forum.

One group member replied, “They still need humans…” Another said, “Don’t they have an interpreter device?” A third sarcastically added, “Where’s the AI?”

Tencent later clarified that this request was meant for engaging interpreters for their professional news team doing live reporting in Beijing, and not for the simultaneous interpreting team located onsite at the Boao Forum.

Tencent reportedly beat other heavyweight contenders such as Sogou and iFlytek to secure this prestigious demo opportunity at the Boao Forum after a 3-month long process. Sogou is the 2nd largest search engine in China, which also provides a free online translator, built in part through leveraging its investment in China startup UTH International, which provides translation data and NMT engines. iFlytek is a listed natural language processing (NLP) company worth about USD 13 billion in market capitalization. Its speech recognition software is reportedly used daily by half a billion Chinese users and it also sells a popular pocket translation device targeted at Chinese tourists going abroad.

But given what went down at the Boao Forum for “Mr. Translator”, Tencent’s competitors are probably seeing their ‘loss’ as a gain now. The social media gloating aside, this incident has sparked off an active online debate on the ‘what and when’ of AI replacing human jobs.

One netizen said on Sina Weibo, “A lot of people who casually say that AI can replace this or that job, are those who do not really understand or know what those jobs entail; translation included.”

However, Sogou news quoted a veteran interpreter who often accompanied government leaders on overseas visits. She said, “As an interpreter for 20 years, I believe AI will replace human translators sooner or later, at least in most day to day translation and the majority of conference interpreting. The former probably in 3-5 years, the latter in 10 years.”

She added that her opinions were informed by the fact that she frequently did translation work for IT companies. As such she was well aware of the speed at which AI and processor chips were advancing at, and hence did not encourage young people to view translation and interpreting as a lifelong career, which she considers to be a sunset industry.

Reference: https://bit.ly/2qGLhxu

XTM International Announces XTM Cloud v11.1

XTM International Announces XTM Cloud v11.1

London, April 16, 2018 — XTM International has released a new version of XTM Cloud. Building on the success of XTM v11, the new version adds many new features requested by users.

The integration with Google Sheets is a breakthrough achievement. XTM Connect for Google Sheets is intuitive and collaborative. Localization managers can push content for translation directly from the chosen columns or entire sheets. Completed translations are delivered into specified cells, and can be instantly shared with the rest of the teams. The process is fully automated and does not involve copy/pasting nor file exports. Translation takes less time as an outcome, and there are no version conflicts between the localized documents and their newer versions updated by copy writers.

Projects in XTM can now be assigned to language leads or in-house translators. The new user role has the rights to view and manage projects for their specified target languages. By doing so, in-house translators can translate texts in person or outsource them depending on the needs and the workload. In effect, they can reduce the turnaround time and gain extra flexibility to manage source text overflow.

“Our development strategy is focused on enhancing XTM with features that provide maximum value to our Enterprise and LSP users. We are delighted to release XTM Cloud v11.1, as it delivers a very useful set of enhancements to our growing customer base.” – said Bob Willans, CEO of XTM International.

Other main features include a new connector for Kentico, support for markdown (.md) source files, options to color or penalize language variant matches, and new REST and SOAP API methods.

For additional information about XTM and its new features, please visit https://xtm.cloud/release-notes/11.1/.

Reference: https://bit.ly/2HvnQS7

Uberization of Translation by Jonckers

Uberization of Translation by Jonckers

WordsOnline Cloud Based Platform Explained…

Just over a year ago, Jonckers announced the launch of its unique Cloud based management platform WordsOnline. The concept evolved from working in partnership with eCommerce customers, processing over 30 million words each month. Jonckers knew that faster time to market is key for sectors such as retail to get products and messages to their audience. They needed to keep up with this demand and build on their speedy solutions.

Jonckers identified that when dealing with higher volumes, the traditional batch and project methodology for processing translation was not as effective. Waiting weeks for large volume deliveries, arranging thousands of files to allocate to multiple linguists and keeping trackers up to date was taking its toll! Quality Assurance checks were also risking on time deliveries – the allocated batches to linguists were simply too large and timescales too long to manage QA within the timeframes.

It was clear a paradigm shift was needed. Jonckers conclusion: to develop a technology powered continuous delivery solution.

What is WordsOnline?

It’s a state of the art, cloud based TMS (Translation Management System) accommodating both the traditional localization workflow (project based) and the continuous delivery model.

What is a continuous delivery model?

It is a model without handoffs or handbacks. Through API, WordsOnline can sync with the customers’ system and downloads the content to be translated into the Jonckers powered database. That content is then split into small set of strings (defined on a case per case basis), made immediately available to edit and translate online. It is based on the Uber business model of fast, efficient supply and demand.
Jonckers’ resourcing team ensures premium resource capacity to guarantee content is continuously processed.

What type of content does WordsOnline process?

The purpose of the WordsOnline platform is fast-turnaround. The content processed so far by this impressive system is mostly large scale documentation, product descriptions, MT training material. However, the platform has been designed to process and deliver on all file and content types. Its non-discriminate programming has been developed specifically to be adaptable to any volume, language, time-frame and file format.

What are the key advantages of using WordsOnline?

• Faster turn-around time – Jonckers are able to process massive amounts of data that after translation will be pushed to review and back to the customer, in a continuous cycle.

• Price – WordsOnline applies TM, then Jonckers’ NMT engine or the customer’s engine if preferred. The volumes processed allow a more attractive and cost effective price point.

• Control –Project Managers can monitor the volume of words being processed, translated, reviewed and pushed back to the customer’s system. There are several other features which also allow rating of resource and analytics for a comprehensive overview of every job.

What are the key features of WordsOnline?

WordOnline linguist database interface includes a ratings platform so clients can monitor the delivery and quality of resources:

The live Dashboard interface allows clients to follow the progress of the content, performance of the MT engine, stats etc…

In short, the process is completely ‘Uberized’, Jonckers is making translation as simple as upload your files… track the progress… receive final translation delivery! Its as simple as that.

Reference: https://bit.ly/2HnoGjF

DQF: What is it? and How it works?

DQF: What is it? and How it works?

What does DQF stand for?

DQF stands for the Dynamic Quality Framework. Quality is considered Dynamic as translation quality requirements change depending on the content type, the purpose of the content and its audience.

Why is DQF the industry benchmark?

DQF has been co-created since January 2011 by over fifty companies and organizations. Contributors include translation buyers, translation service providers, and translation technology suppliers. Practitioners continue to define requirements and best practices as they evolve through regular meetings and events.

How does DQF work?

DQF provides a commonly agreed approach to select the most appropriate translation quality evaluation model(s) and metrics depending on specific quality requirements. The underlying process, technology and resources affect the choice of quality evaluation model. DQF Content Profiling, Guidelines and Knowledge base are used when creating or refining a quality assurance program. DQF provides shared language, guidance on process and standardized metrics to help users execute quality programs more consistently and effectively. Improving efficiency within organizations and through supply chains. The result is increased customer satisfaction and a more credible quality assurance function in the translation industry.

The Content Profiling feature is used to help select the most appropriate quality evaluation model for specific requirements. This leads to the Knowledge base where you find best practices, metrics, step-by-step guides, reference templates, and use cases. The Guidelines are publicly available summaries for parts of the Knowledge base as well as related topics.

What is included in DQF?

1. Content Profiling and Knowledge base

The DQF Content Profiling Wizard is used to help select the most appropriate quality evaluation model for specific requirements. In the Knowledge Base you find supporting best practices, metrics, step-by-step guides, reference templates, use cases and more.

2. Tools

A set of tools that allows users to do different types of evaluations: adequacy, fluency, error review, productivity measurement, MT ranking and comparison. The DQF tools can be used in the cloud, offline or indirectly through the DQF API.

3. Quality Dashboard

The Quality Dashboard is available as an industry-shared platform. In the dashboard, evaluation and productivity data is visualized in a flexible reporting environment. Users can create customized reports or filter data to be reflected in the charts. Both internal and external benchmarking is supported, offering the possibility to monitor one’s own development and to compare results to industry highs, lows and averages.

4. API

The DQF API allows users to assess productivity, efficiency and quality on the fly while in the translation production mode. Developers and integrators are invited to use the API and connect with DQF from within their TMS or CAT tool environments.

References: Taus

SDL and TAUS Integration Offers Brands Industry Benchmarking Framework

SDL and TAUS Integration Offers Brands Industry Benchmarking Framework

SDL, a leader in global content management, translation and digital experience, today announced an integration between SDL Translation Management System (TMS), and the TAUS Dynamic Quality Framework (DQF), a comprehensive set of tools that help brands benchmark the quality, productivity and efficiency of translation projects against industry standards.

The SDL TMS integration with TAUS DQF enables everyone involved in the translation supply chain – from translators, reviewers and managers – to improve the performance of their translation projects by learning from peers and implementing industry best-practice. Teams can also use TAUS’ dynamic tools and models to assess and compare the quality of their translations output – both human and machine – with the industry’s average for errors, fluency and post-editing productivity.

This enables brands to maintain quality – at extreme scale – and eliminate inefficiencies in the way content is created, managed, translated, and delivered to global audiences.

“One marketing campaign alone could involve translating over 50 pieces of content – and that’s just in one language. Imagine the complexity involved in translating content into over a dozen languages?” said Jim Saunders, Chief Product Officer, SDL. “Brands need a robust way to ensure quality when dealing with such high volumes of content. Our ongoing integrations with TAUS DQF tackle this challenge by fostering a knowledge-filled environment that creates improved ways to deliver and translate content.”

“Translating large volumes of content quickly can present enormous quality issues, and businesses are increasingly looking to learn from peers – and implement best-practices that challenge traditional approaches,” said TAUS Director, Jaap van der Meer. “Our development teams have worked closely with SDL to develop an integration that encourages companies not just to maintain high standards, but innovate and grow their business.”

The TAUS DQF offers a comprehensive set of tools, best practices, metrics, reports and data to help the industry set benchmarking standards. Its Quality Dashboard is available as an industry-shared platform, where evaluation and productivity data is presented in a flexible reporting environment. SDL TMS, now integrated within the TAUS DQF, is used by many Fortune 500 companies across most industries.

SDL already provides TAUS-ready packages for enterprise with our other language solutions. Customers of SDL WorldServer benefit from a connector to the TAUS DQF platform, enabling project managers to add and track a project’s productivity on the TAUS Quality Dashboard. Users can access both SDL WorldServer and SDL TMS through their SDL Trados Studio desktop, making it easy to share projects with the TAUS platform.

All SDL’s integrations with TAUS are designed to help centralize and manage a brand’s translation operations, resulting in lower translation costs, higher-quality translations and more efficient translation processes.

Reference: https://bit.ly/2EslqhA

Exclusive Look Inside MemoQ Zen

Exclusive Look Inside MemoQ Zen



MemoQ launched a beta version for MemoQ Zen, a new online CAT tool. MemoQ Zen brings you the joy of translation, without the hassle. Experience the benefits of an advanced CAT tool, delivered to your browser in a simple and clean interface. You can get the early access through this link and adding your email address. Then, MemoQ’s team will activate your email address.

Note: preferably to use gmail account.  

These are exclusive screenshots from inside MemoQ Zen, as our blog got an early access:

Once the user logs in, this home page appears:

Clicking on adding new job will lead to these details:

You can upload documents from your computer or adding files from your Google Drive. The second option needs access to your drive. After choosing files to be uploaded, you’ll complete the required details for adding new jobs.

In working days field, MemoQ Zen excludes Sundays and Saturdays from the total workdays. This option helps in planing the actual days required to get the task done. After uploading the files and adding the details, a new job will be created in your job board.

Clicking on view statistics will lead to viewing the analysis report. Unfortunately, it can’t be saved.

Clicking on translate will lead to opening an online editor for the CAT tool.

TM and TB matches will be viewed on the right pane. Other regular options such as copying tags, join segments, and concordance search are there. Previewing mode can be enabled as well. Unfortunately, copying source to target isn’t available.

QA errors alerts appear after confirming each segments. After clicking on the alert, the error will appear like this. You can check ignore, in case it is a false error.

While translating, the progress is updating in the main view.

Clicking on fetch will download the target file (clean) to your computer. TMs and TBs aren’t available to upload, add, create or even download yet.

Clicking on done will mark the job as completed.

That’s it! Easy tool and to the point with clean UI and direct options. Although it still need development to meet the industry requirements i.e adding TMs and TBs, etc. But, it’s a good start, and as MemoQ Zen website states it:

We created memoQ Zen to prove that an advanced CAT tool doesn’t need to be complicated. It is built on the same memoQ technology that is used by hundreds of companies and thousands of translators every day.

We are releasing it as a limited beta because we want to listen to you from day one. As a gesture, it will also stay free as long as the beta phase lasts.

New Frontiers in Linguistic Quality Evaluation

New Frontiers in Linguistic Quality Evaluation

When it comes to translating content, everyone wants to find the highest quality translation at the lowest price. A recent report on the results of an industry survey of over 550 respondents revealed that translation quality is over four times more important than cost. For translation buyers, finding a language service provider (LSP) that can consistently deliver high-quality is significantly more important than price. For LSPs, finding linguists who can deliver cost-effective, high quality translation is key to customer happiness.

Meeting quality expectations is even more difficult with the demand for higher volume. The Common Sense Advisory Global Market Survey 2018 of the top 100 LSPs predicts a trend toward continued growth. Three-quarters of those surveyed reported increased revenue and 80 percent reported an increase in work volume. That’s why improving and automating the tools and processes for evaluating linguistic quality are more important than ever. LSPs and enterprise localization groups need to look for quality evaluation solutions that are scalable and agile enough to meet the growing demand.

Evaluating the quality of translation is a two-step process. The first step involves Quality Assurance (QA) systems and tools used by the translator to monitor and correct the quality of their work, and the second step is Translation Quality Assessment (TQA) or Linguistic Quality Evaluation (LQE), which evaluates quality using a  model of defined values, parameters, and scoring based on representative sampling.

The Current State of Linguistic Quality Evaluation & Scoring

Many enterprise-level localization departments have staff specifically dedicated to evaluating translation quality. The challenge for these quality managers is creating and maintaining an easy-to-use system for efficiently scoring vendor quality.

Today, even the most sophisticated localization departments resort to spreadsheets and labor-intensive manual processes. The most commonly used LQE and scoring methods rely on offline, sequential processing. A March 2017 Common Sense Advisory, Inc. brief, “Translation Quality and In-Context Review Tools,” observed that the most widely used translation quality scorecards “suffer from a lack of integration.”

“Many LSPs continue to rely on in-house spreadsheet-based scorecards. They may be reluctant to switch to other tools that require process changes or that would raise direct costs. Unfortunately, these simple error-counting tools are typically inefficient[1] because they don’t tie in with other production tools and cannot connect errors to specific locations in translated content. In addition, they are seldom updated to reflect TQA best practices, and it is common for providers to use scorecards that were developed many years earlier with unclear definitions and procedures.”

In an age of digital transformation and real-time cloud technology, LQE is overdue for an automated, integrated solution.

Reducing manual processes = reducing human error

One critical step to ensure quality translation is to reduce the number of manual processes and to automate evaluation as much as possible. There is a direct correlation between the number of manual processes and the increased likelihood of errors. These usually occur when cutting and pasting from the content management system (CMS) into spreadsheets and back again.

Evaluation scorecards, typically managed with spreadsheets, are very labor intensive. The spreadsheets usually include columns for languages, projects, sample word counts, categories, and error types. They also can include complex algorithms for scoring severity. To evaluate quality segment by segment requires copying and pasting what was corrected, the severities of each, etc.

To perform sample testing, localization quality managers extract some percentage of the total project to examine. If the project contains thousands of documents, they may use an equation –ten percent of the total word count, for example. They will then export those documents, load them into ApSIC Xbench, Okapi Checkmate, or some other tool for checking quality programmatically, and open a spreadsheet to enter quality feedback and/or issues. When the quality evaluation is complete, it is cut and pasted back into the CAT Tool, often with annotations.

LSPs resort to these less than desirable scoring methods, because there haven’t been any tools on the market to create or administer a quality program at scale, until now.

The New Frontier of Linguistic Quality Evaluation

Centralized quality management inside a TMS

Top-tier cloud translation management system (TMS) platforms now have the ability to make assessing vendor quality easier and more automated with LQE and scoring inside the TMS. It can be purchased as a TMS add-on or clients can outsource quality evaluation and assessment to LSPs offering quality services using this innovative LQE technology.

The centralized storage of information and the agile change management that a full API and cloud technology can provide eliminates the need to rely on error-prone manual processes. It centralizes quality management, supports flexible and dynamic scoring, and incorporates LQE as a seamless part of the workflow.

Currently, localization quality managers have to go into the TMS to get their sample, bulk select and download the information. With integrated LQE, there are no offline tasks to slow down the evaluation process or that can lead to human error. Quality evaluation is easily added to the workflow template by selecting from a list of published quality programs. From there, tasks are automatically assigned, and quality evaluation is performed in an integrated CAT tool/workbench, including running programmatic quality checks on the translated content.

Creating an LQE program inside the TMS

Creating and setting up a quality program can be challenging and time consuming, but it will ensure that everyone identifies quality issues the same way, which will simplify and improve communication over what constitutes quality. It requires a sophisticated level of experience. Those who aren’t particularly skilled at LQE run the risk of costly inefficiencies and unreliable reporting.

The latest LQE software has the ability to base a quality program on an industry standard, such as the TAUS Dynamic Quality Framework (DQF) or the EU Multidimensional Quality Metrics (MQM). Because these standards can be overly complex and may contain more error types than needed, the software allows you to create a custom quality program by selecting elements of each.

Define error types, categories and severities

Inside the TMS, quality managers can create and define the core components of their quality program by defining error types, categories, and severities.

Severity levels range from major–errors that can affect product delivery or legal liability–to minor errors that don’t impact comprehension, but could have been stated more clearly. An error-rate model counts the errors resulting in a percentage score, starting at 100% and deducting for points lost. It is important to differentiate between how serious the error is, so a numerical multiplier is added to account for severity. The less common rubric model begins at zero and points are added if the translation meets specific requirements, for example, awarding points for adherence to terminology and style guides.

Publishing

After creating your quality program, you need to think about how you are going to publish and distribute the quality program. Change management can become a nightmare if the program isn’t centralized. A cloud-based program allows you to publish, change, and unpublish quickly, so if you make an adjustment to a severity level designation, you have the ability to notify all users of the change immediately.

A cloud LQE app lets you keep prior versions of quality programs for historical reference, so translations will be held to the standards that applied at the time of translation, and not necessarily the most current standard. If your TMS doesn’t include this functionality, consider publishing your quality program on a wiki or in one of the many options for cloud-storage. This provides a centralized place that everyone is referring back to, instead of an offline spreadsheet.

Flexible and dynamic scoring

Scorecards, as CSA mentioned, need to be dynamic–based by content type, domain, connector, etc.–to manage translation in and out of the translation technology. Not all content requires the same quality level. A discussion forum or blog post may not need the level of review that a legal document or customer-facing brochure might require. The new frontier in flexible and dynamic scoring contains an algorithm that can set up scorecards automatically depending on content type.

The algorithm also lets you establish a standardized word count as a baseline for comparing quality scores among documents of different sizes. This gives you an apples-to-apples comparison, because the same number of errors should be viewed differently in a 500-word document than in a 5,000-word sample. To create an accurate and efficient weighting or total error point system, flexibility is important.

Feedback loop

The most critical component for improving quality is for feedback to be accessible by all parties involved: linguists and translators, reviewers, quality managers, and clients. When all parties have access to feedback, it improves communication and reduces the discussion that occurs when debating the subjective elements of scoring. When you have clear communication and scoring that is continually represented, it helps reviewers provide the appropriate feedback, quickly and easily.

Continuous, real-time feedback also creates an opportunity for improvement that is immediate. In offline scoring, a linguist may continue making the same mistake in several other projects before learning about the error. Cloud LQE enables real-time feedback that not only corrects an issue, but also trains linguists to improve the quality for the next (or even current) project.

The transparency this provides moves the entire process toward more objectivity, and the more objective the feedback, the less discussion is required to get clarification when a quality issue arises.

Quality reporting

Once linguistic quality evaluation has been done, you want to be able to review the data for quality reporting purposes. Cloud LQE allows reporting to be shared, so that clients can see issues affecting quality over time. You can track quality over time, by project and by locale, for all targets. Easy-to-read pie charts display the number of quality issues in each category such as terminology, style, language, and accuracy. This lets you monitor trends over time and to use that objective data for insights into improving quality delivery.

Conclusion

The new frontier in LQE is a cloud-based solution that improves user experience by streamlining quality evaluation. It reduces ambiguity, improves communication, and creates an objective platform to discuss and resolve quality issues.

With a single app for managing quality, LSPs and enterprise quality managers can streamline project set up and don’t have to rely on labor-intensive spreadsheets to describe or score the quality program. The minimal effort required to set up an online program is more than offset by the efficiency gains. You don’t have to move from Microsoft Excel to Word, then to a computer-assisted translation (CAT) tool, it’s now all in one place.

Efficiency of communication is also improved, making it easier for everyone to be on the same page when it comes to creation, scoring, publishing, and feedback. Improved quality data collection and reporting lets you monitor trends over time and use the objective data to inform your strategic decision making to improve translation quality.

As the CSA  industry survey discovered, it’s not the price of translation, it’s the quality, so now may be the time to go boldly into this new, LQE frontier.

Reference: https://bit.ly/2ItLRWF

Edit Distance in Translation Industry

Edit Distance in Translation Industry

In computational linguistics, edit distance or Levenshtein distance, is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.  The edit distance between (a, b) is the minimum-weight series of edit operations that transforms a into b. One of the simplest sets of edit operations is that defined by Levenshtein in 1966 which are:

1- Insertion.

2- Deletion

3- Substitution.

In Levenshtein’s original definition, each of these operations has unit cost (except that substitution of a character by itself has zero cost), so the Levenshtein distance is equal to the minimum number of operations required to transform a to b.

For example, the Levenshtein distance between “kitten” and “sitting” is 3. A minimal edit script that transforms the former into the latter is:

  • kitten – sitten (substitution of “s” for “k”).
  • sitten –  sittin (substitution of “i” for “e”).
  • sittin –  sitting (insertion of “g” at the end).

What are the application of edit distance in translation industry?

1- Spell Checkers

Edit distance is applied where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question.

2- Machine Translation Evaluation and Post Editing

Edit distance can be used to compare a postedited file to the machine translated output that was the starting point for the postediting. When you calculate the edit distance, you are calculating the “effort” that the posteditor made to improve the quality of the machine translation to a certain level. Starting from the source content and same MT output, if you perform a light postediting and a full postediting, the edit distance for each task will be different, and the human quality level is expected to have a higher edit distance, because more changes are needed. This means that you are measuring light and full postediting using the edit distance.

Therefore, the edit distance is a kind of “word count” measure of the effort, similar in a way to the word count used to quantify the work of translators throughout the localization industry. It also helps in evaluating the quality of MT engine by comparing the raw MT to the post edited version by a human translator.

3- Fuzzy Match

In translation memories, edit distance is the technique of finding strings that match a pattern approximately (rather than exactly). Translation memories provide suggestions to translators, and fuzzy matches are used to measure the effort made to improve those suggestions.

How to Cut Localization Costs with Translation Technology

How to Cut Localization Costs with Translation Technology

What is translation technology?

Translation technologies are sets of software tools designed to process translation materials and help linguists in their everyday tasks. They are divided in three main subcategories:

Machine Translation (MT)

Translation tasks are performed by machines (computers) either on the basis of statistical models (MT engines execute translation tasks on the basis of accumulated translated materials) or neural models (MT engines are based on artificial intelligence). The computer-translated output is edited by professional human linguists through the process of postediting that may be more or less demanding depending on language combinations and the complexity of materials, as well as the volume of content.

Computer-Aided Translation (CAT)

Computer-aided or computer-assisted translation is performed by professional human translators who use specific CAT or productivity software tools to optimize their process and increase their output.

Providing a perfect combination of technological advantages and human expertise, CAT software packages are the staple tools of the language industry. CAT tools are essentially advanced text editors that break the source content into segments, and split the screen into source and target fields which in and of itself makes the translator’s job easier. However, they also include an array of advanced features that enable the optimization of the translation/localization process, enhance the quality of output and save time and resources. For this reason, they are also called productivity tools.

Figure 1 – CAT software in use

The most important features of productivity tools include:

  • Translation Asset Management
  • Advanced grammar and spell checkers
  • Advanced source and target text search
  • Concordance search.

Standard CAT tools include Across Language ServerSDL Trados StudioSDL GroupShare, SDL PassolomemoQMemsource CloudWordfastTranslation Workspace and others, and they come both in forms of installed software and cloud solutions.

Quality Assurance (QA)

Quality assurance tools are used for various quality control checks during and after the translation/localization process. These tools use sophisticated algorithms to check spelling, consistency, general and project-specific style, code and layout integrity and more.

All productivity tools have built-in QA features, but there are also dedicated quality assurance tools such as Xbench and Verifika QA.

What is a translation asset?

We all know that information has value and the same holds true for translated information. This is why previously translated/localized and edited textual elements in a specific language pair are regarded as translation assets in the language industry – once translated/localized and approved, textual elements do not need to be translated again and no additional resources are spent. These elements that are created, managed and used with productivity tools include:

Translation Memories (TM)

Translation memories are segmented databases containing previously translated elements in a specific language pair that can be reused and recycled in further projects. Productivity software calculates the percentage of similarity between the new content for translation/localization and the existing segments that were previously translated, edited and proofread, and the linguist team is able to access this information, use it and adapt it where necessary. This percentage has a direct impact on costs associated with a translation/localization project and the time required for project completion, as the matching segments cost less and require less time for processing.

Figure 2 – Translation memory in use (aligned sample from English to German)

Translation memories are usually developed during the initial stages of a translation/localization project and they grow over time, progressively cutting localization costs and reducing the time required for project completion. However, translation memories require regular maintenance, i.e. cleaning for this very reason, as the original content may change and new terminology may be adopted.

In case when an approved translation of a document exists, but it was performed without productivity tools, translation memories can be produced through the process of alignment:

Figure 3 – Document alignment example

Source and target documents are broken into segments that are subsequently matched to produce a TM file that can be used for a project.

Termbases (TB)

Termbases or terminology bases (TB) are databases containing translations of specific terms in a specific language pair that provide assistance to the linguist team and assure lexical consistency throughout projects.

Termbases can be developed before the project, when specific terminology translations have been confirmed by all stakeholders (client, content producer, linguist), or during the project, as the terms are defined. They are particularly useful in the localization of medical devices, technical materials and software.

Glossaries

Unlike termbases, glossaries are monolingual documents explaining specific terminology in either source or target language. They provide further context to linguists and can be used for the development of terminology bases.

Benefits of Translation Technology

The primary purpose of all translation technology is the optimization and unification of the translation/localization process, as well as providing the technological infrastructure that facilitates work and full utilization of the expertise of professional human translators.

As we have already seen, translation memories, once developed, provide immediate price reduction (that varies depending on the source materials and the amount of matching segments, but may run up to 20% in the initial stages and it may only grow over time), but the long-term, more subtle benefits of the smart integration of translation technology are the ones that really make a difference and they include:

Human Knowledge with Digital Infrastructure

While it has a limited application, machine translation still does not yield satisfactory results that can be used for commercial purposes. All machine translations need to be postedited by professional linguists and this process is known to take more time and resources instead of less.

On the other hand, translation performed in productivity tools is performed by people, translation assets are checked and approved by people, specific terminology is developed in collaboration with the client, content producers, marketing managers, subject-field experts and all other stakeholders, eventually providing a perfect combination of human expertise, feel and creativity, and technological solutions.

Time Saving

Professional human linguists are able to produce more in less time. Productivity software, TMs, TBs and glossaries all reduce the valuable hours of research and translation, and enable linguists to perform their tasks in a timely manner, with technological infrastructure acting as a stylistic and lexical guide.

This eventually enables the timely release of a localized product/service, with all the necessary quality checks performed.

Consistent Quality Control

The use of translation technology itself represents real-time quality control, as linguists rely on previously proofread and quality-checked elements, and maintain the established style, terminology and quality used in previous translations.

Brand Message Consistency

Translation assets enable the consistent use of a particular tonestyle and intent of the brand in all translation/localization projects. This means that the specific features of a corporate message for a particular market/target group will remain intact even if the linguist team changes on future projects.

Code / Layout Integrity Preservation

Translation technology enables the preservation of features of the original content across translated/localized versions, regardless of whether the materials are intended for printing or online publishing.

Different solutions are developed for different purposes. For example, advanced cloud-based solutions for the localization of WordPress-powered websites enable full preservation of codes and other technical elements, save a lot of time and effort in advance and optimize complex multilingual localization projects.

Wrap-up

In a larger scheme of things, all these benefits eventually spell long-term cost/time savings and a leaner translation/localization process due to their preventive functions that, in addition to direct price reduction, provide consistencyquality control and preservation of the integrity of source materials.

Reference: https://goo.gl/r5kmCJ