SOTAVerified|Agents Browse Leaderboard About Blog

Long-Context Understanding

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 81 papers

Title	Date	Tasks	Status	Hype
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models	Jul 13, 2025	AttributeBenchmarking	CodeCode Available	0
Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?	Jun 20, 2025	Book summarizationLong-Context Understanding	CodeCode Available	1
PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding	Jun 18, 2025	Long-Context Understanding	—Unverified	0
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration	Jun 6, 2025	Computational EfficiencyLanguage Modeling	CodeCode Available	1
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training	Jun 5, 2025	Language ModelingLanguage Modelling	—Unverified	0
ATLAS: Learning to Optimally Memorize the Context at Test Time	May 29, 2025	Common Sense ReasoningLanguage Modeling	—Unverified	0
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences	May 27, 2025	16kLong-Context Understanding	CodeCode Available	0
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression	May 26, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models	May 26, 2025	Data CompressionLong-Context Understanding	CodeCode Available	1
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning	May 22, 2025	Long-Context Understanding	—Unverified	0

Show:10 25 50

← PrevPage 1 of 9Next →

All datasets MMNeedle Ada-LEval (BestAnswer)Ada-LEval (TSort)L-Eval LongBench

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4o	1 Image, 4*4 Stitching, Exact Accuracy	83	—	Unverified
2	GPT-4V	1 Image, 4*4 Stitching, Exact Accuracy	54.72	—	Unverified
3	Gemini Pro 1.5	1 Image, 4*4 Stitching, Exact Accuracy	39.85	—	Unverified
4	Gemini Pro 1.0	1 Image, 4*4 Stitching, Exact Accuracy	24.78	—	Unverified
5	LLaVA-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	17.5	—	Unverified
6	Claude 3 Opus	1 Image, 4*4 Stitching, Exact Accuracy	12.3	—	Unverified
7	IDEFICS2-8B	1 Image, 4*4 Stitching, Exact Accuracy	7.8	—	Unverified
8	InstructBLIP-Flan-T5-XXL	1 Image, 4*4 Stitching, Exact Accuracy	6.2	—	Unverified
9	CogVLM2-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	0.9	—	Unverified
10	mPLUG-Owl-v2	1 Image, 4*4 Stitching, Exact Accuracy	0.3	—	Unverified