Long-Context Understanding

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–81 of 81 papers

Title	Date	Tasks	Status
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks	Jan 11, 2025	Code GenerationHumanEval	—Unverified
Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning	Sep 4, 2024	Long-Context UnderstandingMulti-Objective Reinforcement Learning	—Unverified
XL^2Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies	Apr 8, 2024	Long-Context UnderstandingReading Comprehension	—Unverified
Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise	Jul 16, 2024	DiagnosticLong-Context Understanding	—Unverified
LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning	Dec 18, 2024	In-Context LearningLong-Context Understanding	—Unverified
LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning	Feb 20, 2025	In-Context LearningLong-Context Understanding	—Unverified
Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning	Feb 18, 2025	2kLong-Context Understanding	—Unverified
Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications	Sep 23, 2024	HallucinationLong-Context Understanding	—Unverified
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities	Jul 19, 2024	4k8k	—Unverified
Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning	May 16, 2025	HallucinationInformation Retrieval	—Unverified
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?	Feb 4, 2025	Arithmetic ReasoningCode Generation	—Unverified
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores	Apr 23, 2025	Long-Context Understandingtoken-classification	—Unverified
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models	Feb 3, 2024	Logical ReasoningLong-Context Understanding	—Unverified
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning	May 22, 2025	Long-Context Understanding	—Unverified
PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding	Jun 18, 2025	Long-Context Understanding	—Unverified
Repository Structure-Aware Training Makes SLMs Better Issue Resolver	Dec 26, 2024	Long-Context Understanding	—Unverified
ATLAS: Learning to Optimally Memorize the Context at Test Time	May 29, 2025	Common Sense ReasoningLanguage Modeling	—Unverified
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks	Sep 10, 2024	Long-Context UnderstandingRetrieval	—Unverified
Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration	May 24, 2023	Long-Context Understanding	—Unverified
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis	Jul 24, 2023	Code GenerationDenoising	—Unverified
What matters when building vision-language models?	May 3, 2024	1 Image, 2*2 StitchingImage Retrieval	—Unverified
Anomaly Detection of Tabular Data Using LLMs	Jun 24, 2024	Anomaly DetectionLong-Context Understanding	—Unverified
How Effective Is Self-Consistency for Long-Context Problems?	Nov 2, 2024	Long-Context UnderstandingPosition	—Unverified
Token Weighting for Long-Range Language Modeling	Mar 12, 2025	Language ModelingLanguage Modelling	CodeCode Available
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training	Jun 5, 2025	Language ModelingLanguage Modelling	CodeCode Available
Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding	Jun 4, 2024	ArticlesLong-Context Understanding	CodeCode Available
Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels	May 20, 2025	Language ModelingLanguage Modelling	CodeCode Available
SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning	Feb 19, 2025	Long-Context Understanding	CodeCode Available
A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)	Feb 4, 2025	Long-Context Understanding	CodeCode Available
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models	Jul 13, 2025	AttributeBenchmarking	CodeCode Available
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences	May 27, 2025	16kLong-Context Understanding	CodeCode Available

Show:10 25 50

← PrevPage 2 of 2Next →

All datasets MMNeedle Ada-LEval (BestAnswer)Ada-LEval (TSort)L-Eval LongBench

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4o	1 Image, 4*4 Stitching, Exact Accuracy	83	—	Unverified
2	GPT-4V	1 Image, 4*4 Stitching, Exact Accuracy	54.72	—	Unverified
3	Gemini Pro 1.5	1 Image, 4*4 Stitching, Exact Accuracy	39.85	—	Unverified
4	Gemini Pro 1.0	1 Image, 4*4 Stitching, Exact Accuracy	24.78	—	Unverified
5	LLaVA-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	17.5	—	Unverified
6	Claude 3 Opus	1 Image, 4*4 Stitching, Exact Accuracy	12.3	—	Unverified
7	IDEFICS2-8B	1 Image, 4*4 Stitching, Exact Accuracy	7.8	—	Unverified
8	InstructBLIP-Flan-T5-XXL	1 Image, 4*4 Stitching, Exact Accuracy	6.2	—	Unverified
9	CogVLM2-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	0.9	—	Unverified
10	mPLUG-Owl-v2	1 Image, 4*4 Stitching, Exact Accuracy	0.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4-Turbo-1106	1k	74	—	Unverified
2	GPT-4-Turbo-0125	1k	73.5	—	Unverified
3	Claude-2	1k	65	—	Unverified
4	GPT-3.5-Turbo-1106	1k	61.5	—	Unverified
5	InternLM2-7b	1k	58.6	—	Unverified
6	Vicuna-13b-v1.5-16k	1k	53.4	—	Unverified
7	ChatGLM3-6b-32k	1k	39.8	—	Unverified
8	Vicuna-7b-v1.5-16k	1k	37	—	Unverified
9	LongChat-7b-v1.5-32k	1k	32.4	—	Unverified
10	ChatGLM2-6b-32k	1k	31.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4-Turbo-1106	2k	18.5	—	Unverified
2	GPT-4-Turbo-0125	2k	15.5	—	Unverified
3	Vicuna-13b-v1.5-16k	2k	5.4	—	Unverified
4	Vicuna-7b-v1.5-16k	2k	5.3	—	Unverified
5	LongChat-7b-v1.5-32k	2k	5.3	—	Unverified
6	InternLM2-7b	2k	5.1	—	Unverified
7	Claude-2	2k	5	—	Unverified
8	GPT-3.5-Turbo-1106	2k	4	—	Unverified
9	ChatGLM3-6b-32k	2k	2.3	—	Unverified
10	ChatGLM2-6b-32k	2k	0.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	59.21	—	Unverified
2	GALI(Llama3-8b-ins-4k-to-32k)	Average Score	59.1	—	Unverified
3	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	42.79	—	Unverified
4	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	42.32	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	46.22	—	Unverified
2	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	45.38	—	Unverified
3	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	45.17	—	Unverified