Long Range Arena - A Benchmark For Efficient Transformers

2021-12-17 nlp study group PV:

Long Range Arena: A Benchmark For Efficient Transformers

Conference: ICLR
Year: 2021
link: https://openreview.net/pdf?id=qVyeW-grC2k

This is the new benchmark evaluating the efficiency of variant Transformer models

測試的資料範圍：
長度 1K to 16K tokens (K=thousand)
範圍 text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning

Benchmark評測項目的選擇標準

Generality: All efficient Transformers models should be applicable to our tasks. For instance, given that not all xformer models are able to perform autoregressive decoding (Wang et al., 2020), we include tasks that only require encoding.
Simplicity: The tasks should have a simple setup. All factors that make comparisons difficult should be removed. This encourages simple models instead of cumbersome pipelined approaches. For instance, we avoid including any particular data augmentation and consider pretraining to be out of scope of this benchmark.
Challenging: The tasks should be difficult enough for current models to ensure there is room for improvement to encourage future research in this direction.
Long inputs: The input sequence lengths should be reasonably long since assessing how different models capture long-range dependencies is a core focus of LRA.
Probing diverse aspects: The set of tasks should assess different capabilities of models like their ability to model relations and hierarchical/spatial structures, generalization capability, etc.
Non-resource intensive and accessible: The benchmarks should be deliberately designed to be lightweight so as to be accessible to researchers without industry-grade computing resources

Task

Long ListOps

Test: read hierarchically structured data in a long-context

Test the ability to reason hierarchically while handling long contexts

Byte-Level Text Classification

Test: read real-world data and long documents

Text classification in particular is associated with many real-world applications

Byte-Level Document Retrieval

Test: encode and store compressed representations

Image Classification On Sequences Of Pixels

image is flattened to a sequence (similar to how the previous tasks require capturing the hierarchical structure in the data)

not allow extra modules such as a CNN

Pathfinder (Long-Range Spatial Dependency)

Test: learning long-range spatial dependencies. (also as sequences of pixels)

make a binary decision whether two points represented as circles are connected by a path consisting of dashes.

Pathfinder-X

change the pixel of the image from 32 × 32 to 128 × 128

see if the same algorithmic challenges bear a different extent of difficulty when sequence lengths are much longer