254 — The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Chen, Firat & Bapna et al (1804.09849)
Read on 01 May 2018There are many challenges facing automated machine translation (MT) techniques (#79). And translation in general is a very tough nut to crack. Or, as Google Translate would say,
Translation generally severe nut crack
Despite this, we have recently seen enormous strides toward automated end-to-end MT. seq2seq, by Google, was a huge improvement upon existing CNN approaches. And then seq2seq was outperformed by the Transformer model.
In this paper, Google puts forward a series of hybrid models which mix the existing approaches to achieve:
A RNMT+ model — a recurrent neural machine translation model which uses separate decoder and encoder elements around an attention network (such as an LSTM or GRU). This method also uses a self-attention network to improve learning-by-trial.
This new model outperforms SotA on the WMT14 En→Fr and En→De datasets (scoring for BLEU score), though it requires more processing power than all other tested MT models (besides the largest Transformer net).