Neural Machine Translation By Jointly Learning to Align and Translate (Bahdanau et al.)
June 21, 2024
-instead of phrase-based machine translation, use a single neural network that reads a sentence and outputs a correct translation
Problem: previous approaches employ encoder-decoder architectures that embed sentences into a fixed-length vector
-this means that as sentences get longer, less "granular" information can be captured
-as sentences get longer, translation quality deteriorates
Solution: encoder-decoder that jointly learns to align and translate
-each time the model generates a word, it searches for a set of positions in the source sentence where the most relevant information is concentrated (self-attention)
-generate new words based on context vectors + source positions + previously generated words
-attention referred to as "annotations" in the text, which contains information about the whole input sequence with focus on the parts surrounding the i-th word