Long Range Arena: A Benchmark For Efficient Transformers
Conference: ICLR
Year: 2021
link: https://openreview.net/pdf?id=qVyeW-grC2k
This is the new benchmark evaluating the efficiency of variant Transformer models
測試的資料範圍:
長度 1K to 16K tokens (K=thousand)
範圍 text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning
Benchmark評測項目的選擇標準
- Generality: All efficient Transformers models should be applicable to our tasks. For instance, given that not all xformer models are able to perform autoregressive decoding (Wang et al., 2020), we include tasks that only require encoding.
- Simplicity: The tasks should have a simple setup. All factors that make comparisons difficult should be removed. This encourages simple models instead of cumbersome pipelined approaches. For instance, we avoid including any particular data augmentation and consider pretraining to be out of scope of this benchmark.
- Challenging: The tasks should be difficult enough for current models to ensure there is room for improvement to encourage future research in this direction.
- Long inputs: The input sequence lengths should be reasonably long since assessing how different models capture long-range dependencies is a core focus of LRA.
- Probing diverse aspects: The set of tasks should assess different capabilities of models like their ability to model relations and hierarchical/spatial structures, generalization capability, etc.
- Non-resource intensive and accessible: The benchmarks should be deliberately designed to be lightweight so as to be accessible to researchers without industry-grade computing resources
Task
Long ListOps
Test: read hierarchically structured data in a long-context
Test the ability to reason hierarchically while handling long contexts
Byte-Level Text Classification
Test: read real-world data and long documents
Text classification in particular is associated with many real-world applications
Byte-Level Document Retrieval
Test: encode and store compressed representations
Image Classification On Sequences Of Pixels
image is flattened to a sequence (similar to how the previous tasks require capturing the hierarchical structure in the data)
not allow extra modules such as a CNN
Pathfinder (Long-Range Spatial Dependency)
Test: learning long-range spatial dependencies. (also as sequences of pixels)
make a binary decision whether two points represented as circles are connected by a path consisting of dashes.
Pathfinder-X
change the pixel of the image from 32 × 32 to 128 × 128
see if the same algorithmic challenges bear a different extent of difficulty when sequence lengths are much longer