Sequence to Sequence Learning with Neural Networks

tags: study paper DSMI lab

paper: Sequence to Sequence Learning with Neural Networks

Abstract and Introduction

  • DNN cannot be used to map sequences to sequences because they can only work when dimensionalty of input/output is fixed and known
  • Task: English to French translation task from the WMT’14 dataset
  • Proposed method 的優點:
    • Minimal assumptions on the sequence structure: 甚麼樣的sequence 架構都可以處理
    • Sensitive to word order
    • Does well on long senetances
    • The model

  • Goal: given an input sentance $(x_1,x_2,…,x_T)$ and its corresponding output sentacne $(y_1, y_2, …,y_{T’})$ (where $T$ need not equal to $T’$), want to estimate $p(y_1, y_2, …,y_{T’}|x_1,x_2,…,x_T)$
  • $p(y_1, y_2, …,y_{T’}|x_1,x_2,…,x_T)=\Pi_{t=1}^{T’}p(y_{t}|v,y_1,…,y_{t-1})$, where $p(y_{t}|v,y_1,…,y_{t-1})$ is each distribution is represented with a softmax over all the words in the vocabulary
  • 把 input sequence 倒過來餵進去效果比較好

Experiment

  • Dataset: WMT’14 English to French dataset

    • 12M sentences
    • Vocabulary: 160000 most frequently used English words and 80000 most frequently used Frech words
    • 沒有在vocabulary出現的字用”UNK”代替
    • test set (for evaluation): 1000-best lists generated by SMT system(baseline)
  • Objective: maximizing the log probability of a correct translation $T$ given the source sentence $S$: $\hat{T}=\mathop{argmax}\limits_{T}p(T|S)$

  • Left-to-Right beam search

  • Reverse the source sentence:

    • Imporoves the performance, but they don’t have a complete explanation XDD
    • 當input sentence 被反過來之後,input sentence 的前幾個字和output sentence的前幾個字更近了,有助於output sentence 在一開始就有更精準的生成,後面生成的也會比較準 (類似好的開始就是成功的一半的概念?)
  • Trianing details

    • 1000 dimensional word embedding (但他沒有說word embedding是怎麼做的)
    • LSTM: 4 layers, 1000 cells in each layers
    • Parameter initialization from uniform distribution between -0.08 to 0.08
    • 平行化計算: 總共使用8個GPU,訓練10天
  • Experimental Results
    最好的結果是ensemble不同random initialization 的LSTM 所得到的

  • Model analysis:

    • 能夠分辨使用相同字但不同排序的句子,以及相同意思但使用不同文字表達的句子
    • 對於長句的表現仍然良好(左圖: x軸是句子長度)
    • 句子中如果有出現很多不常用的字,表現也能維持一定的水準(右圖: x軸是句子裡面出現的字的詞頻在整個vocabulary中的排名的平均)

補充

  • SMT system: Statistical Machine translation
    • 通過對大量的平行語料進行統計分析,構建統計翻譯模型
    • 不需要依靠語法規則,所以容易推廣到不同語言的翻譯工作
    • Word-based translation: 一個一個字翻
    • Phrase-based translation: 視情況將幾個字組起來變成詞彙來翻
    • Syntax-based translation: 使用句法分析(例如parsing tree)作為翻譯的依據
    • Hierarchical phrase-based translation: a combination of pharse-based and syntax based
  • Beam search: 演算法細節再這裡

Powered by Hexo and Hexo-theme-hiker

Copyright © 2020 - 2021 DSMI Lab's website All Rights Reserved.

UV : | PV :