Long-Context Understanding

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 81 papers

Title	Date	Tasks	Status	Hype
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	May 11, 2023	1 Image, 2*2 StitchingDiversity	CodeCode Available	2
Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?	Jun 20, 2025	Book summarizationLong-Context Understanding	CodeCode Available	1
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration	Jun 6, 2025	Computational EfficiencyLanguage Modeling	CodeCode Available	1
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression	May 26, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models	May 26, 2025	Data CompressionLong-Context Understanding	CodeCode Available	1
LiveLongBench: Tackling Long-Context Understanding for Spoken Texts from Live Streams	Apr 24, 2025	Long-Context UnderstandingSpoken Language Understanding	CodeCode Available	1
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement	Apr 22, 2025	BenchmarkingLanguage Modeling	CodeCode Available	1
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning	Mar 14, 2025	Long-Context Understanding	CodeCode Available	1
Self-Taught Agentic Long Context Understanding	Feb 21, 2025	Long-Context Understanding	CodeCode Available	1
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models	Feb 11, 2025	Code GenerationInstruction Following	CodeCode Available	1
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios	Dec 12, 2024	Logical ReasoningLong-Context Understanding	CodeCode Available	1
GATEAU: Selecting Influential Samples for Long Context Alignment	Oct 21, 2024	Instruction FollowingLong-Context Understanding	CodeCode Available	1
BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression	Oct 20, 2024	In-Context LearningLong-Context Understanding	CodeCode Available	1
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?	Oct 3, 2024	8kDocument Summarization	CodeCode Available	1
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness	Jun 28, 2024	Long-Context Understanding	CodeCode Available	1
From Text to Pixel: Advancing Long-Context Understanding in MLLMs	May 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs	Apr 16, 2024	Long-Context UnderstandingToken Reduction	CodeCode Available	1
Gemini: A Family of Highly Capable Multimodal Models	Dec 19, 2023	1 Image, 2*2 StitchingArithmetic Reasoning	CodeCode Available	1
Marathon: A Race Through the Realm of Long Context with Large Language Models	Dec 15, 2023	Long-Context UnderstandingMultiple-choice	CodeCode Available	1
LooGLE: Can Long-Context Language Models Understand Long Contexts?	Nov 8, 2023	In-Context LearningLong-Context Understanding	CodeCode Available	1
S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models	Oct 23, 2023	Long-Context Understanding	CodeCode Available	1
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models	Jul 13, 2025	AttributeBenchmarking	CodeCode Available	0
PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding	Jun 18, 2025	Long-Context Understanding	—Unverified	0
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training	Jun 5, 2025	Language ModelingLanguage Modelling	—Unverified	0
ATLAS: Learning to Optimally Memorize the Context at Test Time	May 29, 2025	Common Sense ReasoningLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 2 of 4Next →

All datasets MMNeedle Ada-LEval (BestAnswer)Ada-LEval (TSort)L-Eval LongBench

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4o	1 Image, 4*4 Stitching, Exact Accuracy	83	—	Unverified
2	GPT-4V	1 Image, 4*4 Stitching, Exact Accuracy	54.72	—	Unverified
3	Gemini Pro 1.5	1 Image, 4*4 Stitching, Exact Accuracy	39.85	—	Unverified
4	Gemini Pro 1.0	1 Image, 4*4 Stitching, Exact Accuracy	24.78	—	Unverified
5	LLaVA-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	17.5	—	Unverified
6	Claude 3 Opus	1 Image, 4*4 Stitching, Exact Accuracy	12.3	—	Unverified
7	IDEFICS2-8B	1 Image, 4*4 Stitching, Exact Accuracy	7.8	—	Unverified
8	InstructBLIP-Flan-T5-XXL	1 Image, 4*4 Stitching, Exact Accuracy	6.2	—	Unverified
9	CogVLM2-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	0.9	—	Unverified
10	mPLUG-Owl-v2	1 Image, 4*4 Stitching, Exact Accuracy	0.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4-Turbo-1106	1k	74	—	Unverified
2	GPT-4-Turbo-0125	1k	73.5	—	Unverified
3	Claude-2	1k	65	—	Unverified
4	GPT-3.5-Turbo-1106	1k	61.5	—	Unverified
5	InternLM2-7b	1k	58.6	—	Unverified
6	Vicuna-13b-v1.5-16k	1k	53.4	—	Unverified
7	ChatGLM3-6b-32k	1k	39.8	—	Unverified
8	Vicuna-7b-v1.5-16k	1k	37	—	Unverified
9	LongChat-7b-v1.5-32k	1k	32.4	—	Unverified
10	ChatGLM2-6b-32k	1k	31.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4-Turbo-1106	2k	18.5	—	Unverified
2	GPT-4-Turbo-0125	2k	15.5	—	Unverified
3	Vicuna-13b-v1.5-16k	2k	5.4	—	Unverified
4	Vicuna-7b-v1.5-16k	2k	5.3	—	Unverified
5	LongChat-7b-v1.5-32k	2k	5.3	—	Unverified
6	InternLM2-7b	2k	5.1	—	Unverified
7	Claude-2	2k	5	—	Unverified
8	GPT-3.5-Turbo-1106	2k	4	—	Unverified
9	ChatGLM3-6b-32k	2k	2.3	—	Unverified
10	ChatGLM2-6b-32k	2k	0.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	59.21	—	Unverified
2	GALI(Llama3-8b-ins-4k-to-32k)	Average Score	59.1	—	Unverified
3	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	42.79	—	Unverified
4	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	42.32	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	46.22	—	Unverified
2	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	45.38	—	Unverified
3	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	45.17	—	Unverified