Solving the Post Edit Puzzle by Paul Filkin (reposted with permission; original post)
It would be very arrogant of me to suggest that I have the solution for measuring the effort that goes into post-editing translations, wherever they originated from, but in particular machine translation. So let’s table that right away because there are many ways to measure, and pay for, post-editing work and I’m not going to suggest a single answer to suit everyone.
But I think I can safely say that finding a way to measure, and pay for post-editing translations in a consistent way that provided good visibility into how many changes had been made, and allowed you to build a cost model you could be happy with, is something many companies and translators are still investigating.
The first problem of course is that when you use Machine Translation you can’t see where the differences are between the suggested translation and the one you wish to provide.
I’m reliably informed that a better translation of this Machine Translated text would require a number of changes that would equate to a 58% fuzzy match:
But how would I know this, and is it correct to view the effort required here as a 58% fuzzy match and pay the translator on this basis? Well, if you could measure the effort in this way there are a few obvious things to consider. First of all you don’t know it was a 58% match until after the work was complete. Secondly a 58% post-edit analysis may not require the same effort as a 58% fuzzy from a traditional Translation Memory because there is no information provided to help the translator see what needs to be changed in the first place. Thirdly it may be quite appropriate when post-editing to be satisfied that the structure is basically correct and the meaning can be understood. So no need to change it so that it is written more in keeping with the style and perfection you might be expected to deliver for a legal or medical task.
You could consider other methods such as a fixed rate, an hourly rate, a productivity measure (useful on large Projects perhaps… not so much on small jobs) or perhaps a way to measure the quality of the original Machine Translation and the likely effort to “improve” it. All of these are subject to more variables such as the quality of the Machine Translation in the first place, and this can not only vary between Machine Translation engines but also quality of the source text and the language pairs in question. There are probably more factors outside the scope of this article I haven’t mentioned at all, but the overriding consideration in my opinion is that you want to be able to provide fair compensation for the translator (or post-editor) and balance this with a fair return for the work giver who has probably adopted and invested into Machine Translation looking for better returns for their own business.
All of these methods have their pros and cons, some allowing you to estimate the costs of the translation before you do the work, and some only once the work has been completed. In this article I want to discuss one possible solution based on the post-edit analysis approach I introduced at the start. This used an application written by Patrick Hartnett of the SDLXLIFF to Legacy Converter, SDLXLIFF Compare and the TAUS Search TM Provider fame.
Post-Edit Compare is a standalone application (available on the OpenExchange from here : Post-Edit Compare) that uses the Studio APIs to compare Studio Projects. So the idea is that you would take a copy of the original pretranslated Project and then compare this with a post-edited version. The pretranslation could be based on machine translation, conventional Translation Memory or you could even use this to compare a human translated Project before and after client review for example.
The main interface looks like this, with the files for the original Project on the left and the post-edited Project on the right. There are various controls to display more or less files as you see fit, so for example in this view the red files have been post-edited whilst the black ones are exactly the same; but I could filter in this to see only the ones that have changed.
On the right hand side you can see a Projects Pane. This allows you to have many Comparison Projects on the go at the same time and easily navigate to them which will be a god send for a busy Project Manager as the Project list grows. It also maintains the file name alignment functionality which is useful if the filenames changed slightly during the course of a Project as it can quickly re-match the files and save the information in the right place.
Once you have selected the Projects you wish to compare you can create various reports out of the box to show things like these:
- Studio summary analysis providing the post-edit statistics across all files in the project and an associated value for the work. You can also get an overview at a glance of many of the statistics by reviewing the nicely laid out graphical representations:
- At a more detailed level you can drill into the individual changes and see exactly what changes were made, by whom if track changes are used for the editing, and the Edit Distance calculation (based around the Damerau-Levenshtein method for those of you interested in the theory of these things). In this screenshot you can see segment 16 was the result of post-editing a Translation Memory result with a low 50% fuzzy match whereas segment 23 was a post-edit of a Machine Translated segment. The help files with the application explain in some detail what all the columns are for and how they are calculated:
There is a lot more information in the reporting than this, and you have a lot of built in options within the tool itself that can reduce or increase the level of detail you want to see. I also know the developer is very keen to extend the reporting capability as he has built in an excellent reporting tool, so I’m sure your feedback as you use the tool will be welcomed. But for now the only other part I want to look at briefly is the costs shown in the Post-Edit Modifications Analysis… Studio doesn’t report on costs out of the box so this is another nice feature within this tool.
The detailed Post-Edit Modification (PEM) calculations allow the application to assign the PEM changes to the appropriate analysis band. So just like Studio these are based on analysis bands (100%, 95% – 99%, 85% – 94% etc.) and it can apply a price calculation based on rates for the different analysis band categories. It does this by storing the rates in something called Price Groups.
These Price Groups are very powerful and they give you the ability to create different price groups for different Companies, or Translators and also for different language pairs. Once you have set these up in the way you wish you can then generate the appropriate costs for the changes in a post-editing Project at the click of a button… more or less!
I’m not going to go into too much detail on this aspect because it would be worth downloading a trial version to take a look yourself. The price of the full version is € 49,99 and in my opinion well worth it. The use cases are numerous… here’s a few off the top of my head:
- An Agency looking for a way to agree appropriate rates for post-editing work and then be able to consistently pay on this basis with different Price Groups for different linguistic requirements perhaps;
- A Translator looking for a way to agree, or charge rates for handling occasional requests for post-editing work. If you are not part of a regular post-editing workflow then it’s a convenient way of playing with the rates to see whether the occasional post-edit assignment can work for you and then store your rates to make it easier in the future;
- A Project Manager can use the Reporting to quickly and easily check the amount of changes needed to correct translation work and then provide appropriate feedback, with accurate details, to the translator to help them meet quality expectations for final delivery of target files… and this could be less editing or more depending on the context.
Whilst I deliberately didn’t go into too much detail around the use of Machine Translation and the different ways of handling compensation for post-editing effort I’d be interested to hear your views on this topic in the comments, and also if you try the application what you thought?