Quantisation of Neural Machine Translation models

Quantisation of Neural Machine Translation models

Reading Time: 1 minute

When large amounts of training data are available, the quality of Neural MT engines increases with the size of the model. However, larger models imply decoding with more parameters, which makes the engine slower at test time. Improving the trade-off between model compactness and translation quality is an active research topic. One of the ways to achieve more compact models is via quantisation, that is, by requiring each parameter value to occupy a fixed number of bits, thus limiting the computational cost. In this post we take a look at a paper which achieves 4 times more compact Transformer Neural MT models via quantisation into 8 bit values, with no loss in translation quality according to BLEU score.

Read more here

Print Friendly, PDF & Email
Spread Knowledge
Comments are closed.