SOTAVerified

Long-Context Understanding

Papers

Showing 5181 of 81 papers

TitleStatusHype
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks0
Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning0
XL^2Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies0
Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise0
LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning0
LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning0
Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning0
Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications0
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities0
Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning0
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?0
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores0
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models0
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning0
PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding0
Repository Structure-Aware Training Makes SLMs Better Issue Resolver0
ATLAS: Learning to Optimally Memorize the Context at Test Time0
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks0
Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration0
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis0
What matters when building vision-language models?0
Anomaly Detection of Tabular Data Using LLMs0
How Effective Is Self-Consistency for Long-Context Problems?0
Token Weighting for Long-Range Language ModelingCode0
MesaNet: Sequence Modeling by Locally Optimal Test-Time TrainingCode0
Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context UnderstandingCode0
Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With NovelsCode0
SCALAR: Scientific Citation-based Live Assessment of Long-context Academic ReasoningCode0
A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)Code0
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language ModelsCode0
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long SequencesCode0
Show:102550
← PrevPage 2 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4o1 Image, 4*4 Stitching, Exact Accuracy83Unverified
2GPT-4V1 Image, 4*4 Stitching, Exact Accuracy54.72Unverified
3Gemini Pro 1.51 Image, 4*4 Stitching, Exact Accuracy39.85Unverified
4Gemini Pro 1.01 Image, 4*4 Stitching, Exact Accuracy24.78Unverified
5LLaVA-Llama-31 Image, 4*4 Stitching, Exact Accuracy17.5Unverified
6Claude 3 Opus1 Image, 4*4 Stitching, Exact Accuracy12.3Unverified
7IDEFICS2-8B1 Image, 4*4 Stitching, Exact Accuracy7.8Unverified
8InstructBLIP-Flan-T5-XXL1 Image, 4*4 Stitching, Exact Accuracy6.2Unverified
9CogVLM2-Llama-31 Image, 4*4 Stitching, Exact Accuracy0.9Unverified
10mPLUG-Owl-v21 Image, 4*4 Stitching, Exact Accuracy0.3Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4-Turbo-11061k74Unverified
2GPT-4-Turbo-01251k73.5Unverified
3Claude-21k65Unverified
4GPT-3.5-Turbo-11061k61.5Unverified
5InternLM2-7b1k58.6Unverified
6Vicuna-13b-v1.5-16k1k53.4Unverified
7ChatGLM3-6b-32k1k39.8Unverified
8Vicuna-7b-v1.5-16k1k37Unverified
9LongChat-7b-v1.5-32k1k32.4Unverified
10ChatGLM2-6b-32k1k31.2Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4-Turbo-11062k18.5Unverified
2GPT-4-Turbo-01252k15.5Unverified
3Vicuna-13b-v1.5-16k2k5.4Unverified
4Vicuna-7b-v1.5-16k2k5.3Unverified
5LongChat-7b-v1.5-32k2k5.3Unverified
6InternLM2-7b2k5.1Unverified
7Claude-22k5Unverified
8GPT-3.5-Turbo-11062k4Unverified
9ChatGLM3-6b-32k2k2.3Unverified
10ChatGLM2-6b-32k2k0.9Unverified
#ModelMetricClaimedVerifiedStatus
1GALI(Llama3-8b-ins-4k-to-16k)Average Score59.21Unverified
2GALI(Llama3-8b-ins-4k-to-32k)Average Score59.1Unverified
3GALI(Llama3-8b-ins-8k-to-32k)Average Score42.79Unverified
4GALI(Llama3-8b-ins-8k-to-16k)Average Score42.32Unverified
#ModelMetricClaimedVerifiedStatus
1GALI(Llama3-8b-ins-4k-to-16k)Average Score46.22Unverified
2GALI(Llama3-8b-ins-8k-to-32k)Average Score45.38Unverified
3GALI(Llama3-8b-ins-8k-to-16k)Average Score45.17Unverified