‘Human Parity Achieved’ in MT

According to Microsoft’s March 14, 2018 research paper with the full title of “Achieving Human Parity on Automatic Chinese to English News Translation,” a few variations of a new NMT system they developed have achieved “human parity,” i.e. they were considered equal in quality to human translations (the paper defines human quality as “professional human translations on the WMT 2017 Chinese to English news task”).

Microsoft came up with a new human evaluation system to come to this convenient conclusion, but first they had to make sure “human parity” was less nebulous and more well-defined.

Microsoft’s definition for human parity in their research is thus: “If a bilingual human judges the quality of a candidate translation produced by a human to be equivalent to one produced by a machine, then the machine has achieved human parity.”

In mathematical, testable terms, human parity is achieved “if there is no statistically significant difference between human quality scores for a test set of candidate translations from a machine translation system and the scores for the corresponding human translations.”

Microsoft made everything about this new research open source, citing external validation and future research as the reason.