Long-Context Understanding

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 81 papers

Title	Date	Tasks	Status	Hype	Score
RULER: What's the Real Context Size of Your Long-Context Language Models?	Apr 9, 2024	Long-Context Understanding	CodeCode Available	9	5
InternLM2 Technical Report	Mar 26, 2024	4kLong-Context Understanding	CodeCode Available	9	5
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena	Jun 9, 2023	ChatbotLanguage Modelling	CodeCode Available	7	5
GPT-4 Technical Report	Mar 15, 2023	answerability predictionArithmetic Reasoning	CodeCode Available	6	5
GLM-130B: An Open Bilingual Pre-trained Model	Oct 5, 2022	Language ModelingLanguage Modelling	CodeCode Available	6	5
CogVLM: Visual Expert for Pretrained Language Models	Nov 6, 2023	1 Image, 2*2 StitchingFS-MEVQA	CodeCode Available	5	5
Long-context LLMs Struggle with Long In-context Learning	Apr 2, 2024	2kIn-Context Learning	CodeCode Available	5	5
Kimi-VL Technical Report	Apr 10, 2025	Long-Context UnderstandingMathematical Reasoning	CodeCode Available	5	5
Gated Delta Networks: Improving Mamba2 with Delta Rule	Dec 9, 2024	Common Sense ReasoningLanguage Modeling	CodeCode Available	4	5
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration	Nov 7, 2023	1 Image, 2*2 StitchingDecoder	CodeCode Available	4	5
M+: Extending MemoryLLM with Scalable Long-Term Memory	Feb 1, 2025	16kGPU	CodeCode Available	3	5
Retrieval Head Mechanistically Explains Long-Context Factuality	Apr 24, 2024	Continual PretrainingHallucination	CodeCode Available	3	5
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	Mar 8, 2024	1 Image, 2*2 StitchingCode Generation	CodeCode Available	3	5
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images	Mar 18, 2024	Long-Context UnderstandingTextVQA	CodeCode Available	3	5
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding	Aug 28, 2023	16kCode Completion	CodeCode Available	3	5
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM	Jun 10, 2024	Long-Context UnderstandingQuestion Answering	CodeCode Available	2	5
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA	Jun 25, 2024	BenchmarkingLong-Context Understanding	CodeCode Available	2	5
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance	Feb 12, 2025	BenchmarkingLong-Context Understanding	CodeCode Available	2	5
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models	Jun 17, 2024	Benchmarking	CodeCode Available	2	5
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models	Sep 24, 2024	Long-Context UnderstandingText Generation	CodeCode Available	2	5
LongProLIP: A Probabilistic Vision-Language Model with Long Context Text	Mar 11, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
FABLES: Evaluating faithfulness and content selection in book-length summarization	Apr 1, 2024	Long-Context Understanding	CodeCode Available	2	5
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	May 11, 2023	1 Image, 2*2 StitchingDiversity	CodeCode Available	2	5
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks	Apr 9, 2024	Answer SelectionLong-Context Understanding	CodeCode Available	2	5
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression	Jun 21, 2024	GPULanguage Modeling	CodeCode Available	2	5

Show:10 25 50

← PrevPage 1 of 4Next →

All datasets MMNeedle Ada-LEval (BestAnswer)Ada-LEval (TSort)L-Eval LongBench

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4o	1 Image, 4*4 Stitching, Exact Accuracy	83	—	Unverified
2	GPT-4V	1 Image, 4*4 Stitching, Exact Accuracy	54.72	—	Unverified
3	Gemini Pro 1.5	1 Image, 4*4 Stitching, Exact Accuracy	39.85	—	Unverified
4	Gemini Pro 1.0	1 Image, 4*4 Stitching, Exact Accuracy	24.78	—	Unverified
5	LLaVA-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	17.5	—	Unverified
6	Claude 3 Opus	1 Image, 4*4 Stitching, Exact Accuracy	12.3	—	Unverified
7	IDEFICS2-8B	1 Image, 4*4 Stitching, Exact Accuracy	7.8	—	Unverified
8	InstructBLIP-Flan-T5-XXL	1 Image, 4*4 Stitching, Exact Accuracy	6.2	—	Unverified
9	CogVLM2-Llama-3	1 Image, 4*4 Stitching, Exact Accuracy	0.9	—	Unverified
10	mPLUG-Owl-v2	1 Image, 4*4 Stitching, Exact Accuracy	0.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4-Turbo-1106	1k	74	—	Unverified
2	GPT-4-Turbo-0125	1k	73.5	—	Unverified
3	Claude-2	1k	65	—	Unverified
4	GPT-3.5-Turbo-1106	1k	61.5	—	Unverified
5	InternLM2-7b	1k	58.6	—	Unverified
6	Vicuna-13b-v1.5-16k	1k	53.4	—	Unverified
7	ChatGLM3-6b-32k	1k	39.8	—	Unverified
8	Vicuna-7b-v1.5-16k	1k	37	—	Unverified
9	LongChat-7b-v1.5-32k	1k	32.4	—	Unverified
10	ChatGLM2-6b-32k	1k	31.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4-Turbo-1106	2k	18.5	—	Unverified
2	GPT-4-Turbo-0125	2k	15.5	—	Unverified
3	Vicuna-13b-v1.5-16k	2k	5.4	—	Unverified
4	LongChat-7b-v1.5-32k	2k	5.3	—	Unverified
5	Vicuna-7b-v1.5-16k	2k	5.3	—	Unverified
6	InternLM2-7b	2k	5.1	—	Unverified
7	Claude-2	2k	5	—	Unverified
8	GPT-3.5-Turbo-1106	2k	4	—	Unverified
9	ChatGLM3-6b-32k	2k	2.3	—	Unverified
10	ChatGLM2-6b-32k	2k	0.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	59.21	—	Unverified
2	GALI(Llama3-8b-ins-4k-to-32k)	Average Score	59.1	—	Unverified
3	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	42.79	—	Unverified
4	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	42.32	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	46.22	—	Unverified
2	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	45.38	—	Unverified
3	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	45.17	—	Unverified