Enriching Word Vectors with Subword Information

2020-05-06 nlp study group PV:

Enriching Word Vectors with Subword Information

前情提要:

傳統上在訓練word embeddings時是用distinct vector來表示一個vocabulary。而這樣會忽略word的internal structure (e.g. 某些語言有相當豐富的詞形變化，像是西班牙文就高達40種)

這篇主要就是考慮internal structure 後再去做訓練

subword Model

將每個word $w$ 用 bag of character $n$-gram 來表示

add special boundary symbols < and > at the beginning and end of words, allowing to distinguish prefixes and suffixes from other character sequences.
include the word $w$ itself in the set of its $n$-gram
example (n = 3)

word character $n$-grams

<where> <wh, whe, her, ere, re> 、<where>

word	character $n$-grams
<where>	<wh, whe, her, ere, re> 、<where>

represent a word by sum of the vector representation of its $n$ -gram

Experiment

Human similarity judgement

word similarity

model and baseline:

baseline: skipgram(sg)、cbow
model: 針對在DataSet內有出現，但沒在Training Data 出現的字(i.e. out-of-vocabulary) , 採以下兩種方式:

model name	說明
sisg-	use null vectors for these words
sisg	taking the sum of its n-gram vectors

evaluate function

利用 Spearman’s rank 計算 human judgement 和 cosine similarity between the vector representations 的相關性

Result:

第一列為語言第二列為 DataSet

還有跟一些其他的model做比較:

前四個model和後兩個model的Training Data 不同

Word analogy tasks

$A$ is to $B$ as $C$ is to $D$

Result : Accuracy of word analogy tasks

第一列為語言第二列為 DataSet

在 syntactic tasks 有大大的進步
在 semantic questions 幾乎沒什麼進步

Language Model

model:

用LSTM架構來訓練，LSTM的細節請參考論文5.6

Result: Test perplexity on the language modeling task

計算test perplexity 時有幾種不同的方法列表如下:

model name	說明
LSTM	model without using pre-trained word vectors
sg	model with word vectors pre-trained without subword information
sisg	model with our vectors

Enriching Word Vectors with Subword Information

Enriching Word Vectors with Subword Information

前情提要:

subword Model

example (n = 3)

Experiment

Human similarity judgement

model and baseline:

evaluate function

Result:

還有跟一些其他的model做比較:

Word analogy tasks

Result : Accuracy of word analogy tasks

Language Model

model:

Result: Test perplexity on the language modeling task

Effect of the size of the training data

Effect of the size of n-grams

Nearest neighbors

Character n-gram and morpheme

Result: