tags: study
paper
DSMI lab
paper: Effective Approaches to Attention-based Neural Machine Translation
Introduction
- Neural Machine Translation (NMT) requires minimal domain knowledge and is conceptually simple
- NMT generalizes very well to very long word sequences => don’t need to store phrase tables
- The concept of “attention”: learn alignments between different modalities
- image caption generation task: visual features of a picture v.s. text description
- speech recognition task: speech frames v.s. text
- Proposed method: novel types of attention- based models
- global approach
- local approach
Neural Machine Translation
- Goal: translate the source sentence x1,x2,…,xn to the target sentence y1,y2,…,ym
- A basic form of NMT consists of two components:
- Encoder: compute the representation s for each sentence
- Decoder: generates one target word at a time
p(yj|y<j,s)=softmax(g(hj)), where g is a transformation function that outputs a vocabulary-sized vector, hj is the RNN hidden unit.
- Traning objective: Jt=∑(x,y)∈D−log(p(y|x)), D is the parallel training corpus.
Attention-based model
- Bahdanau uses bidirectional encdoer
- Bahdanau uses deep-output and max-out layer
- Bahdanau uses a different alignment funciton (but ours are better): eij=vTtanh(Wahi+Ua^hj)
- Local Attention
- Input-feeding approach
- make the model fully aware of previous alignment choices
- create a very deep network spanning both horizontally and vertically.
Experiments
Training Data: WMT’14
- 4.5M sentence pairs.
- 116M English words, 110M German words
- vocabularies: top 50K modst frequent words for both languages.
Model:
- stacking LSTM models with 4 layers
- each layers with 1000 cells
- 1000 dimensional embeddings
Results:
Analysis
Reference
- github in pytorch: https://github.com/AotY/Pytorch-NMT
- slides: https://slideplayer.com/slide/7710523/