Effective Approaches to Attention-based Neural Machine Translation

tags: study paper DSMI lab

paper: Effective Approaches to Attention-based Neural Machine Translation

Introduction

  • Neural Machine Translation (NMT) requires minimal domain knowledge and is conceptually simple
  • NMT generalizes very well to very long word sequences => don’t need to store phrase tables
  • The concept of “attention”: learn alignments between different modalities
    • image caption generation task: visual features of a picture v.s. text description
    • speech recognition task: speech frames v.s. text
  • Proposed method: novel types of attention- based models
    • global approach
    • local approach

Neural Machine Translation

  • Goal: translate the source sentence $x_1, x_2,…,x_n$ to the target sentence $y_1, y_2,…,y_m$
  • A basic form of NMT consists of two components:
    • Encoder: compute the representation $s$ for each sentence
    • Decoder: generates one target word at a time
      $p(y_j|y_{<j},s)=softmax(g(h_j))$, where $g$ is a transformation function that outputs a vocabulary-sized vector, $h_j$ is the RNN hidden unit.
  • Traning objective: $J_t=\sum_{(x,y)\in D}-log(p(y|x))$, $D$ is the parallel training corpus.

    Attention-based model

  1. Global Attention

    Difference compared with Bahdanau:
  • Bahdanau uses bidirectional encdoer
  • Bahdanau uses deep-output and max-out layer
  • Bahdanau uses a different alignment funciton (but ours are better): $e_{ij}=v^T tanh(W_ah_i+U_a\hat{h_j})$
  1. Local Attention
  • Global attention is computational costy when the source sentence is long.

  1. Input-feeding approach
  • make the model fully aware of previous alignment choices
  • create a very deep network spanning both horizontally and vertically.

Experiments

  • Training Data: WMT’14

    • 4.5M sentence pairs.
    • 116M English words, 110M German words
    • vocabularies: top 50K modst frequent words for both languages.
  • Model:

    • stacking LSTM models with 4 layers
    • each layers with 1000 cells
    • 1000 dimensional embeddings
  • Results:

    • English-German reuslts

    • German-English results

Analysis


  • Sample Translations
    baseline model的問題:
    • 人名翻錯
    • 雙重否定翻錯

Reference

Powered by Hexo and Hexo-theme-hiker

Copyright © 2020 - 2021 DSMI Lab's website All Rights Reserved.

UV : | PV :