Neural Machine Translation By Jointly Learning to Align and Translate (Bahdanau et al.)

June 21, 2024

-instead of phrase-based machine translation, use a single neural network that reads a sentence and outputs a correct translation

Problem: previous approaches employ encoder-decoder architectures that embed sentences into a fixed-length vector

-this means that as sentences get longer, less "granular" information can be captured

-as sentences get longer, translation quality deteriorates

Solution: encoder-decoder that jointly learns to align and translate

-each time the model generates a word, it searches for a set of positions in the source sentence where the most relevant information is concentrated (self-attention)

-generate new words based on context vectors + source positions + previously generated words

-attention referred to as "annotations" in the text, which contains information about the whole input sequence with focus on the parts surrounding the i-th word