Tag: TEnT

Kilgray Language Terminal

Kilgray Language Terminal

By introducing Language Terminal platform, Kilgray aims at offering added value for translators.

Here is what you can do with this fourth release of Language Terminal:

Task assignment. Use the Documents tab of project records to upload and assign documents to other Language Terminal users.

Vendors’ list. Search for and find fellow Language Terminal members, and add them to the list of your preferred vendors.

Search for language service vendors. Search Language Terminal for fellow users based on their language pairs, subject field expertise, translation tool experience, and many other details.

Professional details in your profile. Your clients can now find you on Language Terminal. Language Terminal stores and displays your language and subject field expertise, translation tool experience, association memberships, and your full curriculum vitae if you wish.

Project register for freelancers. Turn Language Terminal into your project register. Click Projects, create project records – and then you can put together quotes, send them to your clients, collect and send your delivery, and track the status of your projects. This service comes free of charge.

Back up your translation projects. If you have backed up your translation projects in the memoQ (.mqbkf) format, or you have them in a .zip file, store them here in the cloud, and restore them anywhere where memoQ runs.

Import InDesign documents with preview. Process Adobe® InDesign™ documents, and produce XLIFF files that have live preview in memoQ. Import native INDD files, not just IDMLs and INXs. (Caution: the output will be in the InDesign CS6 format.)

Light resource marketplace. Share your memoQ light resources with other Language Terminal members, and gain access to theirs. Save others from the effort to put together a complex filter configuration or segmentation rule, and save your time by using resources that others have made available.

 

How to use Across

How to use Across

Interactive Tutorials on “Translating with Across”

Interactive Tutorials on “Project Management with Across”

Internal fuzzy matches in memoQ, Wordfast Pro and Studio 2011

Internal fuzzy matches in memoQ, Wordfast Pro and Studio 2011

Calculation of internal fuzzy matches in source files means that if there is a partial repetition of a segment in a source file, it will be calculated as an internal fuzzy match.

This video, by CATguru illustrates how memoQ, Wordfast Pro and Studio 2011 deals with internal fuzzy matches.

Translation Memory

Translation Memory

Translation memory: a computer-aided translation program. In essence it is a database that stores translated sentences (translation units or segments) with their respective source segments in a database (the “memory”). For each new segment to be translated, the program scans the database for a previous source segment that matches the new segment exactly or approximately (a fuzzy match) and, if found, suggests the corresponding target segment as a possible translation. A translator can then accept, modify or reject the suggested translation.

Translation memory system: refers to a type of machine-aided human translation tool that stores previous translations and offers these translations when identical or similar sentences are encountered when translating new materials.

Similarity match: a type of matching scheme for the free-form queries in a computer-aided translation system. The queries are first passed through the system and the browser performs a similarity match between the internal representation of the queries and the internal representation of each sentence in the database. In this way, both surface similarity and structural similarities can be matched.

Source: A Dictionary of Translation Technology, Chan Sin-wai, The Chinese University Press, 2004

—Translation memory has been defined as a “multilingual text archive containing (segmented, aligned, parsed and classified) multilingual texts, allowing storage and retrieval of aligned multilingual text segments against various search conditions” (EAGLES 1996—The Expert Advisory Group on Language Engineering Standards). Unlike machine translation systems, which generate translations automatically, translation memory systems allow professional translators to be in charge of the decision-making whether to accept or reject a term or an equivalent phrase or segment suggested by the system during the translation process. Virtually all TM systems are language-independent and support international character sets that represent many, if not all, alphabets and scripts digitally.
—Translation memory technology works by reusing previously translated texts and their originals in order to facilitate the production of new translations. It can also interface with databases of stored specialized terminologies that can be accessed and retrieved for reuse in new translations.
A translation memory system has no linguistic component, and two different approaches are employed to extract translation segments from the previously stored texts. These are known as perfect matching and fuzzy matching.

•  A perfect or exact match occurs when a new source language segment is completely identical including spelling, punctuation and inflections, to the old segment found in the database, that is in the TM.

•  Unlike a perfect match, a fuzzy match occurs when an old and a new source language segment are similar but not exactly identical. Even a very small difference such as punctuation leads to a fuzzy match.

As the degree of similarity between old source segments in the database or memory and new source text segments currently being translated may vary, an algorithm is used to calculate a percentage which expresses the degree of match. The higher the percentage of the fuzzy match the closer the similarity between the two source language segments. The threshold percentage can be set by the user at a high level, for instance at 90%, to restrict the retrieval of old source language segments to those containing only small differences from the new source language segment. In contrast, the threshold can be set at a low level, for instance at 10%, to allow the translation memory to retrieve segments only weakly related to the new segment. Segments that mean the same thing but differ in format such as dates, measurements, time and spellings all fall in the fuzzy match category although they are differently categorized. Some systems allow for the automatic processing of such changes. Polysemous and homonymous words, that is homographs, always need careful handling a present a challenge.

Segmentation is the process of breaking a text up into units consisting of a word or a string of words that is linguistically acceptable. Segmentation is needed in order for a TM to perform the matching (perfect and fuzzy) process. A pair of old source and target language texts is usually segmented into individual pairs of sentences. However, not all parts of texts, particularly specialist texts, are in a sentence format. Exceptions include headings, lists and bullet points. As a result, different units of segmentation are needed. A translator can decide the length of a segment but often punctuation is used as an indicator. A segment is then allocated a unique number or tag by the system. It is important to note that while segmentation is quite natural for Latin-based alphabets, it is rather alien to languages such as Chinese, Thai and Vietnamese, which are written continuously without any spaces between characters. Thus, other methods of segmentation are required to determine the beginning and ending of a segment in such cases. New segments can be added to the TM while translating, and alternatively previously translated source language texts and their translations can be entered into the memory through a process of text alignment.

Source: Translation and Technology, C.K. Quah, Palgrave Macmillan, 2006

Most simply, a TM can be viewed as a list of source text segments explicitly aligned with their target text counterparts. The resulting structure is sometimes referred to as a parallel corpus or a bitext. Translation units are stored in the TM database. Some sophisticated TM programs use a type of technology called a neural network to store information. A neural network allows information to be retrieved more quickly than a sequential search technique. The essential idea behind a TM system is that it allows a translator to reuse or recycle previously translated segments. Reusing a previous translation in a new text is sometimes referred to as “leveraging”.

How does a TM system work? This technology works by automatically comparing a new source text against a database of texts that have already been translated. When a translator has a new segment to translate, the TM system consults the database to see if this new segment corresponds to a previously translated segment. If a matching segment is found, the TM system presents the translator with the previous translation, and the translator decides whether or not to incorporate it into the new translation.

Segmentation: In most instances, the basic unit of segmentation is the sentence. However, not all text is written in sentence form. Headings, list items and table cells are familiar elements of text, but they may not strictly qualify as sentences. Therefore, many TM systems allow the user to define other units of segmentation in addition to sentences. These units can include sentence fragments or entire paragraphs. Deciding what constitutes a segment is not a trivial task. How can the TM system identify sentences? Punctuation parks such as periods, exclamation points, and question marks are typically used. Problematic cases are abbreviations, or section headings, or embedded sentences. Some of these problems can be resolved by incorporating stop lists (eg. Lists of abbreviations that do not indicate the end of a sentence, such as Mrs. and e.g.) into the TM system. An additional issue is the fact that the segmentation units used in the source text may not correspond exactly to those used in the translation. This lack of one-to-one correspondence can create difficulties for automatic alignment programs.

Matches: most TM systems present the user with a number of different types of segment matches. The most common types are exact, fuzzy, and term matches. Research is being done on full and sub-segment matches. Exact matches are the most straightforward or perfect matches.

An exact match is 100% identical to the segment that the translator is currently translating, both linguistically and in terms of formatting. The process used by the TM system to identify perfectly matching segments is one of strict pattern matching. This means that the two strings must be identical in every way, including spelling, punctuation, inflection, numbers, and even formatting. Any segment in the new source text that does not match an original segment precisely will not produce an exact match. The translator is not forced to accept the translation proposed by the TM system. Even though a segment may be identical, translators are concerned with translating complete texts rather than isolated segments so it is important to read the proposed translation in its new context to be sure that it s both stylistically appropriate and semantically correct.

Full matches occur when a new source segment differs from a stored TM unit only in terms of so-called variable elements, which are sometimes referred to as “placeables” or “named entities”. Variable elements include numbers, dates, times, currencies, measurements, and sometimes proper names. These elements typically require some kind of special treatment in a text. TM systems need to ignore variable elements for matching purposes.

Fuzzy matches are approximate or partial matches. A fuzzy match retrieves a segment that is similar, but not identical, to the new source segment. Some TM systems use color coding to illustrate various types of differences between the new source text segment and the retrieved segment. The degree of similarity in a fuzzy match can range from 1% to 99%, and the user generally has the ability to set the sensitivity threshold to allow the TM system to locate previously translated segments that may differ only slightly from the new source text segment or segments that vary greatly. If the sensitivity threshold is set too high, there is a risk that the TM will produce “silence”: potentially useful partial matches will not be retrieved. However, if it is set too low, the system will produce “noise”: the suggested translations that are retrieved will be too different from the new source text segment and therefore not helpful. When the threshold is very low, a match may be made on the basis of very general words (“the”, “and”) and the overall content of the retrieved segment may contain little of value for helping the translator to translate the new segment. Many translators prefer to set the threshold somewhere between 60% and 70%. Although fuzzy matching can be useful, it requires careful proofreading and editing to ensure that the proposed translation is appropriate for inclusion in the new target text.

Term matches are done through the process of active terminology recognition and essentially constitutes automatic dictionary lookup. If one or more terms are recognized as being in the term base, the TM system points to the appropriate term records and the translator can then make use of the relevant information contained there. This means that when no exact or fuzzy matches are found for source text segments, the translator might at least find some translation equivalents for individual terms in the term base.

Sub-segment matching falls partway between fuzzy and term matching. In fuzzy matching, the two segments must have a number of elements in common in order for a match to be established. In term matching, the new source segment is compared against entries in the term base. In the case of sub-segment matching, the elements that are compared are smaller chunks of segments. This means that a match can be retrieved between two small chunks of segments, even if the complete segments do not have a high degree of overall similarity. When both segments contain a chunk that is very similar indeed, there is a possibility that the translator may be able to reuse that chunk. Further refined, a combined full segment/sub-segment approach allows the TM system to automatically compare the new source text segment against the stored TM. It will begin by examining complete segments, first looking for exact matches and then for fuzzy matches, and if no such match is found at the segment level, it will compare increasingly smaller chunks in an effort to find a match. In this way, the translator may be presented with sub-segment matches originating from several different segments, even if none of those complete segments qualified as a fuzzy match.

This strategy is similar to the approach used in example-based machine translation (EBMT). The principal difference between a TM as a support tool and a full-fledged EBMT System is basically a question of who has the primary responsibility for analysis of the segments and formulation of the target text, whereas with EBMT, the computer is responsible for producing a complete draft of a target text, though this may still need to be post-edited by a human translator.

No matches: in which case the translator must translate from scratch. Another option is to use a machine translation system to translate the portions of the source text for which no match was found in the TM.

There are two main ways in which translations can be entered into the TM database: through interactive translation or through post-translation alignment. Interactive translation has the potential to produce a TM that is high in quality but initially low in volume, where post-translation alignment has the potential to produce a TM that is higher in volume but (possibly) lower in quality. It is entirely possible to build a TM using a combination of both.

Interactive translation is the most straightforward way for translators to construct a TM, adding translation units to the memory as they go along. Each time the translator translates a source text segment, the paired translation unit can be stored in the TM database. Once a segment has been translated and stored, it immediately becomes part of the TM. This means that if that segment, or a similar one, occurs again in the text-even in the very next sentence- the previous translation is suggested to the translator automatically. The translator then has the choice of accepting the previous translation or editing it if the context requires change. Note that many TM systems can also be networked, which means that multiple translators can contribute to one TM, and the volume of data that it contains can be built more quickly. In a networked situation, it is possible to give different types of privileges to different users in order to exercise some form of quality control. For ex., all users can be given permission to consult the TM, but the ability to add new TUs can be restricted to revisers or senior translators.

Working with an existing TM: there are two main methods – interactive mode and batch mode. A translator working in interactive mode proceeds to work through the new source text segment by segment, and the TM system attempts to match the segments stored in the database against the new source text segments.  As each new segment is translated, the TU is immediately added to the TM and is available for reuse the next time an identical or similar segment is encountered. In the second, most TM systems also allow for batch translation, sometimes referred to as pre-translation, which means that a user can run a complete source text through the system, and whenever it finds an exact match, it will automatically replace the new source text segment with the translation that is stored in the TM. Segments for which no match is found must later be translated by either a human translator or a machine-translation system. In either case, the entire text must then be post-edited by a human translator to ensure that the replacements made by the system were correct. If the translator makes changes to any matches that were inserted automatically, these changes can subsequently be added to the TM to keep it up to date.

TM systems are often integrated with other tools:

– With terminology-management systems — the TM system compares the source text segments against the previously translated segments stored in the TM database and at the same time, using a process known as active terminology recognition, the TMS compares the individual terms contained in each source text segment against the terms contained in the term base. If the term is recognized as being in the term base, the translator’s attention is drawn to the fact that an entry exists for this term, and the translator can view the term record and then insert the term from the record directly into the target text.

–  With bilingual concordancers – which allow the user to retrieve all instances of a specific search string and view these occurrences in their immediate context. This means that a translator can ask to see all the occurrences of any text fragment (not just a pre-defined segment) that appear anywhere in the TM, along with their translation equivalents. This allows the translator to quickly view the search string in context together with its translations, which may not always be the same.

–  With machine translation systems – where a new source text is first compared against a TM, which will replace those segments for which exact matches are retrieved. The segments that are still untranslated can be fed into a machine translation system, which produces a draft translation. The entire document is then passed on to a human translator for post-editing. The final translation can be aligned with the original source text and stored in the TM database for future reuse.

Source: Computer-Aided Translation Technology,  Lynne Bowker, University of Ottawa Press, 2002

Most current commercial TM systems offer a quantitative evaluation of the match in the form of a score, often expressed as a percentage, and sometimes called a fuzzy match score or similar. How this “score”, is arrived at can be quite complex, and is not usually made explicit in commercial systems, for proprietary reasons.

In all systems, matching is essentially based on character-string similarity, but many systems allow the user to indicate weightings for other factors, such as the source of the example, formatting differences, and even significance of certain words. The character-string similarity calculation uses the well-established concept of “sequence comparison”, also known as the “string-edit distance” because of its use in spell checkers, or more formally the “Levenshtein distance” after the Russian mathematician who discovered the most efficient way to calculate it. The string-edit distance is a measure of the minimum number of insertions, deletions and substitutions needed to change one sequence of letters into another. For ex., to change “waiter” into “waitress” requires one deletion and three insertions. The measure can be adjusted to weight in favor of insertions, deletions or substitutions, or to favor contiguous deletions over non-contiguous ones. In fact, the sequence-comparison algorithm developed by Levenshtein, which compares any sequences of symbols—characters, words, digits, etc.—has a huge number of applications, ranging from file comparison in computers, to speech recognition (sound waves represented as sequences of digits), comparison of genetic sequences such as DNA, image processing…in fact anything that can be digitized can be compared using Levenshtein distance.

Source: “Translation Memory Systems”, Harold Somers, Computers and Translation, A translator’s guide, 2003

 

Concordance Search in 3 CAT tools

Concordance Search in 3 CAT tools

Concordance Search enables you to search translation memories for some word(s). Results are show as complete translation units, the original text followed by the translation.

Concordance: Wordfast Classic (Video by CATguru)

Concordance: Trados Workbench (Video by CATguru)

Concordance: Déjà Vu (Video by CATguru)

 

Among stand-alone tools that offer a concordance feature are ApSIC Xbench (bilingual) as well as AntConc and DocFetcher (monolingual).

Your First Translation with Wordfast Pro

Your First Translation with Wordfast Pro

You can translate files using Wordfast Pro as follows:

  • Create/open a project (“File” menu)
  • Create/open a translation memory (“Translation Memory” menu > “New/Select TM”).
  • “Open” the source file you want to translate (“File” menu).
  • Insert the translation in the target cell and move to the next segment using Alt+Down.
  • Make sure you insert the tags/placeables by using Ctrl+Alt+Right to select a tag and Ctrl+Alt+Down to insert it.
  • After finalizing the translation, press Alt+End and click “Save”.
  • Select “Save Translated file” from “File” menu to have the target translated file in the same original format to send back to your client.

Fluency, a TEnT that you might like

Fluency, a TEnT that you might like

Fluency Translation Suite 2013 provides several useful features to assist translators not just in handling files, but translating faster and more accurately. Fluency includes all the major components that you’d expect to see in a CAT (computer-aided translation) tool and a whole lot more.

Access to all the major industry file formats including Trados compatibility
Revolutionary interface for a natural translation environment
Integrated and customizable web resources for language translation
Extensive Machine Translation for Post MT workflows
Support for 3X as many languages as any other Computer Aided Translation tool
Built from the ground up to support all Unicode languages
Automatically reversing language pairs to maximize TMs
Integrated terminologies, glossaries, and dictionaries
Speech to command provides quick navigation support
Text to Speech support for the blind and visually impaired
Video tutorials to get you up and translating
Easy computer to computer synchronization in our translation management system
Quality assurance processes to help catch errors including :
Tracked term enforcement
Blacklist enforcement
Personal terminology enforcement
Case and punctuation enforcement
Translation memory enforcement
Number enforcement

PRODUCT FEATURES

Simple yet powerful interface gets you going quickly and easily
Supports all major translation file formats
Terminology available in over 35 language pairs
Integrated browsers provides dynamic web resources
Easily add and manage your personal terminology
Intuitive interface reduces eyestrain
Online resources a click away
Translator suggested features added regularly
Automatic updates give you the latest features

EXTENSIVE DICTIONARY DATABASE

Integrated Terminology is available in over 35 languages. Fluency integrates into one interface, some of the largest multilingual terminology databases in the world. These extensive dictionaries provide a quick reference that is dynamically updated as the translation proceeds.

TRANSLATION PROCESS

Fluency displays both the source text and the completed portion of the translation in text-editor-like panes so the translator can see the context of the segments. Fluency’s WYSIWYG interface is a big step up from the grid-view of other translation products. The translator still translates one sentence at a time, allowing the translator to focus on the meaning of individual words and phrases. Clicking on a specific word also automatically populates the resources tabs with definitions, translation memories, web searches, and more.

CUSTOMIZABLE RESOURCE TABS

Fluency’s easy-access resource tabs make having a dozen open windows a thing of the past. Translators can easily navigate through extensive dictionary databases, online dictionaries, web searches, translation memories and more, all within a single interface. Users are free to organize the resource tabs to suit their workflow. In addition, four “My URL” tabs allow users to specify their own custom online references.

TRANSLATION MEMORIES IMPORT/EXPORT/MANAGEMENT

Translation Memories (TMs) greatly enhance and accelerate translation projects by storing phrases that have been translated previously. TM functionality within Fluency includes automatic population of 100% to 30% matches (user selected), automatic sub-segment lookup down to 5%, user-specified concordance search in the source and target, and much more. The translation of recurring terms and phrases can be inserted with the click of a button, saving translators time and energy. Fluency can import TM’s in many file formats, including TMX, Bilingual TTX, and Doc files. Fluency also provides an easy-to-use TM management interface and export functionality.

SUPPORTS MAJOR TRANSLATION MEMORY FORMATS

TMX
Bilingual TTX
Bilingual Word
XML
Access MDB
Trados TWBExport
Trados SDLTM
Wordfast txt export

SUPORTS MAJOR TERMINOLOGY FORMATS

TBX
CSV
Tab
Text
Microsoft Excel

CUSTOMIZABLE TOOLS AND RESOURCES

Provides full translation memory management
Provides full personal terminology management
Synchronized, easy-to-use tools: all you need, all in one place
Integrated terminology: both bilingual and monolingual dictionaries
Auto Research Resources: Highlight a word and get unprecedented, instantaneous access to dictionaries, translation memories, web search engines, and machine translation sites; one click then lets you apply just the right word(s) to your translation
Focus on translating: your document’s format is automatically preserved
Alignment Tool: create translation memories from past translation projects
Use Transcription and OCR tools to convert image-only scans to text for Fluency

TERMINOLOGY IMPORT/EXPORT/MANAGEMENT

Terminology tools in Fluency are truly revolutionary. Fluency allows the translator to import their personal terminology alongside the built-in terminology, add new terminology while translating, and enforce terminology rules. During translation, each segment is automatically analyzed for potential terms, which, in turn, makes resource research quick and easy.

SUPPORTS ALL MAJOR SOURCE AND TARGET FORMATS

Microsoft Word®, Excel®, PowerPoint®, Visio®, Publisher®* (2000, 2003, 2007*, 2010*)
Bilingual Word (Trados unclean) files
Adobe® PDF
Adobe InDesign® IDML and INX (CS – CS6)
Adobe Framemaker® (.mif)
SDL Trados Bilingual TTX (.ttx)
SDL Trados Project files (.sdlppx)
SDL Trados WorldServer package files (.wsxz)
XLIFF, SDLXLIFF
WordFast (.txml)
Text
RTF
HTML
XML
String files (.string)
Portable Object files (.po)

SUPORTS ALL MAJOR MACHINE TRANSLATION ENGINES

Microsoft
Microsoft Translator Hub (Gold Partner)
Google
Systran
Language Weaver
Translated.net / MyMemory

You can download Fluency Translation Software Trial at: http://www.westernstandard.com/Fluency/Trial.aspx

What’s new in memoQ 2013

What’s new in memoQ 2013

memoQ 2013 R2 features to improve the work of language professionals

Monolingual review. Update your translation memory by importing reviewed monolingual documents. Export your translated document and send it for review. Your translation memory can be updated with changes made to the monolingual document by importing the file back into memoQ, no matter if you work with the translator pro or the project manager edition of memoQ.

PDF import with full formatting. Imported PDF files will keep the layout of the original document, and translated texts can be exported as .doc or .docx files.

Microsoft Word integration enhancements. Use Microsoft Word dictionaries for spell-check, change fonts automatically in .docx files according to the most widespread fonts, when you translate between European and Asian (CCJK) languages. Comments added to your source document will appear as comments or translatable content in the translation grid depending on your preference. Comments added during translation can be exported into a Word file. You can also export change-tracked DOCX files to display the changes the reviewer made.

Startup wizard. The memoQ Startup wizard guides you through a number of basic configuration options to set up memoQ according to your needs: adjusting font sizes, setting the layout of the translation grid, configuring machine translation and online term base plug-ins are just a few of the many options that you can set up using this wizard.

Language detection. If you select a document for import, memoQ will tell you what language it is in and offer you to create the right project. You can also lock out segments in the text that are in a different language.

memoQ TM search tool. Look up your translation memories for words or phrases outside of memoQ, and copy the contents to the Windows clipboard, and paste the contents in another application.

New filters for YAML and JSON file formats. You can now import both of these file formats into memoQ.

Enhanced compatibility with other translation tools including Wordfast TMX import and non-segmented SDLXLIFF import.


memoQ 2013 features to improve the work of translators

Language Terminal: the freelance translator’s project management tool. Keep track of your translation jobs easily: create quotes after analyzing translation documents, keep track of jobs and payments, archive all your projects, quotations, files in the cloud without having any third-party tool.

memoQ web search. You can search in Google for an expression, browse for words in a dictionary without opening an internet browser.

Translating concordance. In addition to offering concordance hits, memoQ suggests translations for certain expressions from the target segments.

Multiple comments to segments, advanced highlighting. Mark part of the text and add a comment to that. Using different colors you can indicate if that comment is for information, a warning, an error or anything else.

Translate numbers-only segments. Before starting a translation project, you can fill all segments which only contain numbers with the appropriate numbers in the target locale format – without human intervention.

Fuzzy terminology lookup. memoQ will find also term base entries that are similar to the original term.

Edit distance reporting. If you review a translated text, you wish to know how many segments you edited, how much work you spent with this editing. Running an analysis, you can do this with a single click.

Superscript and subscript support. The translator wants to freely switch any text part into superscript or subscript, just like they work with bold, italics and underline formatting.

Document name in view. When creating a view, users will be able to see which document the view’s segment came from.

Smaller improvements include an improved fragment assembly, support for new file formats, including GetText PO files, the XLIFF:doc and TIPP package formats of the Interoperability Now! Initiative, and a TMX filter for enhanced management and review of translation memory contents.