Sequence to Sequence Learning with Neural Networks

2020-04-22 nlp study group PV:

tags: `study` `paper` `DSMI lab`

paper: Sequence to Sequence Learning with Neural Networks

Abstract and Introduction

DNN cannot be used to map sequences to sequences because they can only work when dimensionalty of input/output is fixed and known
Task: English to French translation task from the WMT’14 dataset
Proposed method 的優點:
- Minimal assumptions on the sequence structure: 甚麼樣的sequence 架構都可以處理
- Sensitive to word order
- Does well on long senetances
- The model
Goal: given an input sentance $(x_1,x_2,…,x_T)$ and its corresponding output sentacne $(y_1, y_2, …,y_{T’})$ (where $T$ need not equal to $T’$), want to estimate $p(y_1, y_2, …,y_{T’}|x_1,x_2,…,x_T)$
$p(y_1, y_2, …,y_{T’}|x_1,x_2,…,x_T)=\Pi_{t=1}^{T’}p(y_{t}|v,y_1,…,y_{t-1})$, where $p(y_{t}|v,y_1,…,y_{t-1})$ is each distribution is represented with a softmax over all the words in the vocabulary
把 input sequence 倒過來餵進去效果比較好

Experiment

Dataset: WMT’14 English to French dataset
- 12M sentences
- Vocabulary: 160000 most frequently used English words and 80000 most frequently used Frech words
- 沒有在vocabulary出現的字用”UNK”代替
- test set (for evaluation): 1000-best lists generated by SMT system(baseline)
Objective: maximizing the log probability of a correct translation $T$ given the source sentence $S$: $\hat{T}=\mathop{argmax}\limits_{T}p(T|S)$
Left-to-Right beam search
Reverse the source sentence:
- Imporoves the performance, but they don’t have a complete explanation XDD
- 當input sentence 被反過來之後，input sentence 的前幾個字和output sentence的前幾個字更近了，有助於output sentence 在一開始就有更精準的生成，後面生成的也會比較準 (類似好的開始就是成功的一半的概念?)
Trianing details
- 1000 dimensional word embedding (但他沒有說word embedding是怎麼做的)
- LSTM: 4 layers, 1000 cells in each layers
- Parameter initialization from uniform distribution between -0.08 to 0.08
- 平行化計算: 總共使用8個GPU，訓練10天
Experimental Results
最好的結果是ensemble不同random initialization 的LSTM 所得到的
Model analysis:
- 能夠分辨使用相同字但不同排序的句子，以及相同意思但使用不同文字表達的句子
- 對於長句的表現仍然良好(左圖: x軸是句子長度)
- 句子中如果有出現很多不常用的字，表現也能維持一定的水準(右圖: x軸是句子裡面出現的字的詞頻在整個vocabulary中的排名的平均)

補充

SMT system: Statistical Machine translation
- 通過對大量的平行語料進行統計分析，構建統計翻譯模型
- 不需要依靠語法規則，所以容易推廣到不同語言的翻譯工作
- Word-based translation: 一個一個字翻
- Phrase-based translation: 視情況將幾個字組起來變成詞彙來翻
- Syntax-based translation: 使用句法分析(例如parsing tree)作為翻譯的依據
- Hierarchical phrase-based translation: a combination of pharse-based and syntax based
Beam search: 演算法細節再這裡

tags: study paper DSMI lab

Abstract and Introduction

The model

Experiment

補充

tags: `study` `paper` `DSMI lab`