Effective Approaches to Attention-based Neural Machine Translation

tags: study paper DSMI lab

paper: Effective Approaches to Attention-based Neural Machine Translation

Introduction

  • Neural Machine Translation (NMT) requires minimal domain knowledge and is conceptually simple
  • NMT generalizes very well to very long word sequences => don’t need to store phrase tables
  • The concept of “attention”: learn alignments between different modalities
    • image caption generation task: visual features of a picture v.s. text description
    • speech recognition task: speech frames v.s. text
  • Proposed method: novel types of attention- based models
    • global approach
    • local approach

Neural Machine Translation

  • Goal: translate the source sentence x1,x2,,xn to the target sentence y1,y2,,ym
  • A basic form of NMT consists of two components:
    • Encoder: compute the representation s for each sentence
    • Decoder: generates one target word at a time
      p(yj|y<j,s)=softmax(g(hj)), where g is a transformation function that outputs a vocabulary-sized vector, hj is the RNN hidden unit.
  • Traning objective: Jt=(x,y)Dlog(p(y|x)), D is the parallel training corpus.

    Attention-based model

  1. Global Attention

    Difference compared with Bahdanau:
  • Bahdanau uses bidirectional encdoer
  • Bahdanau uses deep-output and max-out layer
  • Bahdanau uses a different alignment funciton (but ours are better): eij=vTtanh(Wahi+Ua^hj)
  1. Local Attention
  • Global attention is computational costy when the source sentence is long.

  1. Input-feeding approach
  • make the model fully aware of previous alignment choices
  • create a very deep network spanning both horizontally and vertically.

Experiments

  • Training Data: WMT’14

    • 4.5M sentence pairs.
    • 116M English words, 110M German words
    • vocabularies: top 50K modst frequent words for both languages.
  • Model:

    • stacking LSTM models with 4 layers
    • each layers with 1000 cells
    • 1000 dimensional embeddings
  • Results:

    • English-German reuslts

    • German-English results

Analysis


  • Sample Translations
    baseline model的問題:
    • 人名翻錯
    • 雙重否定翻錯

Reference

Powered by Hexo and Hexo-theme-hiker

Copyright © 2020 - 2021 DSMI Lab's website All Rights Reserved.

UV : 10168 | PV : 16464