79 — Six Challenges for Neural Machine Translation
I took a really amazing Machine Translation course with Philipp Koehn at Hopkins a few years ago, so I was excited to read this paper which acts as a topical review of the challenges faced by machine-translation systems in the age of neural networks. Notably, it
This paper covers six (and a half) industry-wide challenges in the neural machine translation (NMT) space, and then proposes some ways of addressing these challenges in new contributions.
The problems (in the order of introduction in the paper):
Domain Mismatch. Translations generated by neural nets often fail to address the larger context of a word’s domain when translating. One example of this might be training a NMT system on a medical corpus and then trying to translate movie subtitles. (Compared to statistical machine translation — or SMT — NMT performs far worse.)
This is likely a result of overfitting the domain-specific language and not adequately describing the nuances of the langauge itself.
Amount of Training Data. This is a well-known problem in machine learning in general: Neural networks require very large amounts of training data. While SMT performs increasingly well with larger sets of training data, NMT requires a large corpus before it even begins to start translating with usable results. In other words, for small training sets, SMT is probably still the better solution.
Rare Words. These, unsurprisingly, embed poorly in the NMT neural latent space. Rare words are a problem for both SMT and NMT systems. Interestingly, NMT systems correctly translate words they’ve never seen before 60% of the time, whereas SMT only achieves 53%.
Long Sentences. Large amounts of temporal or spatial context are a problem for all neural networks: In particular, long sentences with a lot of semantic context prove challenging for neural networks. While NMT completely fails to translate very long (>80 word) sentences, SMT still outperforms NMT systems for n>60.
Word Alignment. One fundamental challenge faced by machine translation systems is that of word alignment; it is useful to determine single word-to-word matchings in order to understand the “reckoning” of a MT system. This is notoriously hard for neural networks because it is, in a way, a function of explainability.
Beam Search. The “beam” is the breadth of translation candidates that are considered by the MT system. Wider beams (larger values) generally yield better results (BLEU metric). However, NMT systems will occasionally pick a bad initial word — this means that the resulting translations will be biased toward less accurate responses.