SOTAVerified

Long-range modeling

A new task for testing the long-sequence modeling capabilities and efficiency of language models.

Image credit: SCROLLS: Standardized CompaRison Over Long Language Sequences

Papers

Showing 5195 of 95 papers

TitleStatusHype
Sparse Modular Activation for Efficient Sequence ModelingCode1
The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon TasksCode1
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal RepresentationCode1
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT OperatorCode1
Focus Your Attention (with Adaptive IIR Filters)0
T-former: An Efficient Transformer for Image InpaintingCode1
A General-Purpose Multilingual Document EncoderCode0
RFR-WWANet: Weighted Window Attention-Based Recovery Feature Resolution Network for Unsupervised Image RegistrationCode0
HST-MRF: Heterogeneous Swin Transformer with Multi-Receptive Field for Medical Image Segmentation0
CoLT5: Faster Long-Range Transformers with Conditional Computation0
Hungry Hungry Hippos: Towards Language Modeling with State Space ModelsCode2
Token Transformer: Can class token help window-based transformer build better long-range interactions?0
What Makes Convolutional Models Great on Long Sequence Modeling?Code1
CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingCode1
Pose Guided Human Image Synthesis with Partially Decoupled GAN0
Multi-scale Attention Network for Single Image Super-ResolutionCode1
Liquid Structural State-Space ModelsCode2
Mega: Moving Average Equipped Gated AttentionCode2
Adapting Pretrained Text-to-Text Models for Long Text SequencesCode1
CNSNet: A Cleanness-Navigated-Shadow Network for Shadow RemovalCode0
Simplified State Space Layers for Sequence ModelingCode2
Investigating Efficiently Extending Transformers for Long Input SummarizationCode3
U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?Code1
Efficient Long-Text Understanding with Short-Text ModelsCode1
Weakly Supervised Object Localization via Transformer with Implicit Spatial CalibrationCode1
How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis ProjectionsCode0
On the Parameterization and Initialization of Diagonal State Space ModelsCode0
0/1 Deep Neural Networks via Block Coordinate Descent0
ChordMixer: A Scalable Neural Attention Model for Sequences with Different LengthsCode1
UL2: Unifying Language Learning ParadigmsCode1
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-AttentionCode1
Diagonal State Spaces are as Effective as Structured State SpacesCode0
SCROLLS: Standardized CompaRison Over Long Language SequencesCode1
Classification of Long Sequential Data using Circular Dilated Convolutional Neural NetworksCode1
LongT5: Efficient Text-To-Text Transformer for Long SequencesCode1
Efficiently Modeling Long Sequences with Structured State SpacesCode1
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions0
Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy0
Sparse Factorization of Large Square MatricesCode0
Image Super-Resolution With Non-Local Sparse AttentionCode1
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation LearningCode1
Gated Relational Graph Attention Networks0
Long Range Arena: A Benchmark for Efficient TransformersCode1
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action RecognitionCode1
V4D:4D Convolutional Neural Networks for Video-level Representation LearningCode1
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.