tags: study
paper
DSMI lab
paper: Effective Approaches to Attention-based Neural Machine Translation
Introduction
- Neural Machine Translation (NMT) requires minimal domain knowledge and is conceptually simple
- NMT generalizes very well to very long word sequences => don’t need to store phrase tables
- The concept of “attention”: learn alignments between different modalities
- image caption generation task: visual features of a picture v.s. text description
- speech recognition task: speech frames v.s. text
- Proposed method: novel types of attention- based models
- global approach
- local approach
Neural Machine Translation
- Goal: translate the source sentence $x_1, x_2,…,x_n$ to the target sentence $y_1, y_2,…,y_m$
- A basic form of NMT consists of two components:
- Encoder: compute the representation $s$ for each sentence
- Decoder: generates one target word at a time
$p(y_j|y_{<j},s)=softmax(g(h_j))$, where $g$ is a transformation function that outputs a vocabulary-sized vector, $h_j$ is the RNN hidden unit.
- Traning objective: $J_t=\sum_{(x,y)\in D}-log(p(y|x))$, $D$ is the parallel training corpus.
Attention-based model
- Global Attention
Difference compared with Bahdanau:
- Bahdanau uses bidirectional encdoer
- Bahdanau uses deep-output and max-out layer
- Bahdanau uses a different alignment funciton (but ours are better): $e_{ij}=v^T tanh(W_ah_i+U_a\hat{h_j})$
- Local Attention
- Global attention is computational costy when the source sentence is long.
- Input-feeding approach
- make the model fully aware of previous alignment choices
- create a very deep network spanning both horizontally and vertically.
Experiments
Training Data: WMT’14
- 4.5M sentence pairs.
- 116M English words, 110M German words
- vocabularies: top 50K modst frequent words for both languages.
Model:
- stacking LSTM models with 4 layers
- each layers with 1000 cells
- 1000 dimensional embeddings
Results:
English-German reuslts
German-English results
Analysis
- Sample Translations
baseline model的問題:- 人名翻錯
- 雙重否定翻錯
Reference
- github in pytorch: https://github.com/AotY/Pytorch-NMT
- slides: https://slideplayer.com/slide/7710523/