Attention is not not explanation

2020-06-24 nlp study group PV:

tags: `study` `paper` `DSMI lab`

(不太喜歡這篇的表達方式，很多雙重否定@@，而且結構有點亂)

Introduction

這篇是基於”Attention is not explanation”(以下簡稱 ==J&W==)這篇論文的探討。目的不是在說attention有解釋性，而是在說”attention 不一定沒有解釋性”。文中有點出幾個在”Attention is not explanation”的實驗和論述裡面，潛在的文問題。並且提出一些比較好的分析方式(雖然我沒有完全被說服)。

Main Claim

Attention Distribution is not a Primitive

The base attention weights are not assigned arbitrarily by the model, but rather computed by an integral component whose parameters were trained alongside the rest of the layers; the way they work depends on each other.

Attention是有參與模型訓練的，並不是獨立於模型存在。J&W的把attention random permute 的實驗不是那麼適當。在製造adversary的時候應該要retrain model.

Existence does not Entail Exclusivity

We hold that attention scores are used as providing an explanation; not the explanation.

找到另一個可解釋的方式不代表本來的方式沒有解釋性。解釋性不具有唯一性。尤其對於binary classification task，input是很多個字(維度很大)，output是0~1，在降維的過程中容易有比較大的彈性。

Defining Explanation

J&W 沒有清楚的定義甚麼是explanation，其實在過去的文獻對於explanation有不同定義。提到AI的可解釋性，常會出現下列三個名詞。

transparency: 找到model中可以令人理解的部分。

Attention mechanisms do provide a look into the inner workings of a model, as they produce an easily-understandable weighting of hidden states.
explainability
- 可以提供決策
- 可以模擬人類從過去所發生的事情中進行推斷的能力
interpretability
- relationship between input and output (類似回歸裡面的regressor)
- 需要專家幫忙鑑定

Experiments

Dataset

跟J&W做一樣的實驗，但只有做 binary classification，沒有做QA。

1. Diabetes: whether a patient is diagnosed with diabetes from their ICU discharge summary
Anemia: hether the patient is diagnosed with acute (neg.) or chronic (pos.) anemia
IMDb: positive or negative sentiment from movie reviews
SST:positive or negative sentiment from sentences
AgNews:the topic of news articles as either world (neg.) or business (pos.)
20News:the topic of news articles as either baseball (neg.) or hockey (pos.)

Model

same as J&W

single-layer bidirectional LSTM with tanh activation
attention layer
softmax prediction
hyperparameters are set to be same as J&W

Uniform as the Adversary

Attention is not explanation if you don’t need it

每個字給相同的權重在某些task上面其實跟attention一樣好，那在這些task上面，attention的確沒什麼用。

Variance within a Model

(其實我不太懂這個實驗到底要幹嘛)

Test whether the variances observed by J&W between trained attention scores and adversarially-obtained ones are unusual.
Train 8 models with different initial seeds
Plot the distribution of JSD(Jensen-Shannon Divergence) attention weights by the models
IMDB, SST, Anemia are robust to with seed changes.
(e): J&W 所產生出的advesary attentions 確實和本來的model很不一樣
(d):negative-label instances in Diabetes dataset subject to relatively arbitrary distributions from the different random seeds.因此對於分布差很多的attention weights，效果可能還是不錯
(f): 所以J&W 所產生出的advesary attentions 不是足夠adversarial

Training an Adversary

所以要怎麼樣才能建立合理的Adversary呢?上面有提到attention weight 不是獨立於model而存在的，所以每換一組attention weight都應該train a new model

Given a base model $M_b$, train a model $M_a$ which
- can provide similar prediction scores
- its distribution of attention weights should be very different from $M_b$
Loss function: $L(M_a, M_b)=\sum^N_{i=1}TVD(\hat{y}_a^i,\hat{y}_b^i)-\lambda KL(\alpha_a^i||\alpha_b^i)$
$TVD(\hat{y}_a^i,\hat{y}_b^i)=\frac{1}{2}|\hat{y}_a^i-\hat{y}_b^i|$
以下的圖:
- 曲線越convex 表示attention weight 越可以被操控。
- x軸是某個model和$M_b$的attention JSD
圖例:
- 三角形: 固定$\lambda$ (不同dataet自己最好的$\lambda$)，使用不同random seeds 所train 出來的model
- 正方形: uniform weight model
- 加號: J&W’s adversary model
- 點點: 不同$\lambda$所train 出來的model

Diagnosing attention distribution by guiding simpler models

使用RNN系列的 model，會有前後字的影響，其實很難排除前後字是不是會影響解釋性，所以這篇使用MLP(non-contextual model，不能看左右鄰居)來診斷if attention weights provide better guides.

選擇一組pretrained attention weight (有對比MLP自己學)
train MLP
實驗結果有發現用本來的model還是最好的(so attention provides better guide)

Conclusion

首先要定義好explanation是甚麼
J&W 的實驗有不少漏洞，我們提供了比要合理的實驗方式
從MLP的實驗可以看到attention is somehow meaningful
Future work: 擴展實驗到QA tasks、不是英文的語言、請專家鑑定

tags: study paper DSMI lab