Long-Context Understanding

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 81 papers

Title	Date	Tasks	Status
Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning	Sep 4, 2024	Long-Context UnderstandingMulti-Objective Reinforcement Learning	—Unverified
XL^2Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies	Apr 8, 2024	Long-Context UnderstandingReading Comprehension	—Unverified
Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise	Jul 16, 2024	DiagnosticLong-Context Understanding	—Unverified
LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning	Dec 18, 2024	In-Context LearningLong-Context Understanding	—Unverified
LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning	Feb 20, 2025	In-Context LearningLong-Context Understanding	—Unverified
Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning	Feb 18, 2025	2kLong-Context Understanding	—Unverified
Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications	Sep 23, 2024	HallucinationLong-Context Understanding	—Unverified
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training	Jun 5, 2025	Language ModelingLanguage Modelling	—Unverified
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities	Jul 19, 2024	4k8k	—Unverified
Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning	May 16, 2025	HallucinationInformation Retrieval	—Unverified
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?	Feb 4, 2025	Arithmetic ReasoningCode Generation	—Unverified
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores	Apr 23, 2025	Long-Context Understandingtoken-classification	—Unverified
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models	Feb 3, 2024	Logical ReasoningLong-Context Understanding	—Unverified
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning	May 22, 2025	Long-Context Understanding	—Unverified
PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding	Jun 18, 2025	Long-Context Understanding	—Unverified
Repository Structure-Aware Training Makes SLMs Better Issue Resolver	Dec 26, 2024	Long-Context Understanding	—Unverified
ATLAS: Learning to Optimally Memorize the Context at Test Time	May 29, 2025	Common Sense ReasoningLanguage Modeling	—Unverified
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks	Sep 10, 2024	Long-Context UnderstandingRetrieval	—Unverified
Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration	May 24, 2023	Long-Context Understanding	—Unverified
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis	Jul 24, 2023	Code GenerationDenoising	—Unverified
What matters when building vision-language models?	May 3, 2024	1 Image, 2*2 StitchingImage Retrieval	—Unverified
Anomaly Detection of Tabular Data Using LLMs	Jun 24, 2024	Anomaly DetectionLong-Context Understanding	—Unverified
How Effective Is Self-Consistency for Long-Context Problems?	Nov 2, 2024	Long-Context UnderstandingPosition	—Unverified
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks	Jan 11, 2025	Code GenerationHumanEval	—Unverified
Token Weighting for Long-Range Language Modeling	Mar 12, 2025	Language ModelingLanguage Modelling	CodeCode Available

Show:10 25 50

← PrevPage 3 of 4Next →

All datasets MMNeedle Ada-LEval (BestAnswer)Ada-LEval (TSort)L-Eval LongBench

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4o	1 Image, 4*4 Stitching, Exact Accuracy	83	—	Unverified
2	GPT-4V	1 Image, 4*4 Stitching, Exact Accuracy	54.72	—	Unverified
3	Gemini Pro 1.5	1 Image, 4*4 Stitching, Exact Accuracy	39.85	—	Unverified
4	Gemini Pro 1.0	1 Image, 4*4 Stitching, Exact Accuracy	24.78	—	Unverified
5	LLaVA-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	17.5	—	Unverified
6	Claude 3 Opus	1 Image, 4*4 Stitching, Exact Accuracy	12.3	—	Unverified
7	IDEFICS2-8B	1 Image, 4*4 Stitching, Exact Accuracy	7.8	—	Unverified
8	InstructBLIP-Flan-T5-XXL	1 Image, 4*4 Stitching, Exact Accuracy	6.2	—	Unverified
9	CogVLM2-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	0.9	—	Unverified
10	mPLUG-Owl-v2	1 Image, 4*4 Stitching, Exact Accuracy	0.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4-Turbo-1106	1k	74	—	Unverified
2	GPT-4-Turbo-0125	1k	73.5	—	Unverified
3	Claude-2	1k	65	—	Unverified
4	GPT-3.5-Turbo-1106	1k	61.5	—	Unverified
5	InternLM2-7b	1k	58.6	—	Unverified
6	Vicuna-13b-v1.5-16k	1k	53.4	—	Unverified
7	ChatGLM3-6b-32k	1k	39.8	—	Unverified
8	Vicuna-7b-v1.5-16k	1k	37	—	Unverified
9	LongChat-7b-v1.5-32k	1k	32.4	—	Unverified
10	ChatGLM2-6b-32k	1k	31.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4-Turbo-1106	2k	18.5	—	Unverified
2	GPT-4-Turbo-0125	2k	15.5	—	Unverified
3	Vicuna-13b-v1.5-16k	2k	5.4	—	Unverified
4	Vicuna-7b-v1.5-16k	2k	5.3	—	Unverified
5	LongChat-7b-v1.5-32k	2k	5.3	—	Unverified
6	InternLM2-7b	2k	5.1	—	Unverified
7	Claude-2	2k	5	—	Unverified
8	GPT-3.5-Turbo-1106	2k	4	—	Unverified
9	ChatGLM3-6b-32k	2k	2.3	—	Unverified
10	ChatGLM2-6b-32k	2k	0.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	59.21	—	Unverified
2	GALI(Llama3-8b-ins-4k-to-32k)	Average Score	59.1	—	Unverified
3	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	42.79	—	Unverified
4	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	42.32	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	46.22	—	Unverified
2	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	45.38	—	Unverified
3	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	45.17	—	Unverified