| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |
| Recurrent Context Compression: Efficiently Expanding the Context Window of LLM | Jun 10, 2024 | Long-Context UnderstandingQuestion Answering | CodeCode Available | 2 |
| Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding | Jun 4, 2024 | ArticlesLong-Context Understanding | CodeCode Available | 0 |
| From Text to Pixel: Advancing Long-Context Understanding in MLLMs | May 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Equipping Transformer with Random-Access Reading for Long-Context Understanding | May 21, 2024 | ChunkingLong-Context Understanding | —Unverified | 0 |
| What matters when building vision-language models? | May 3, 2024 | 1 Image, 2*2 StitchingImage Retrieval | —Unverified | 0 |
| Retrieval Head Mechanistically Explains Long-Context Factuality | Apr 24, 2024 | Continual PretrainingHallucination | CodeCode Available | 3 |
| Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs | Apr 16, 2024 | Long-Context UnderstandingToken Reduction | CodeCode Available | 1 |
| RULER: What's the Real Context Size of Your Long-Context Language Models? | Apr 9, 2024 | Long-Context Understanding | CodeCode Available | 9 |
| Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks | Apr 9, 2024 | Answer SelectionLong-Context Understanding | CodeCode Available | 2 |
| XL^2Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies | Apr 8, 2024 | Long-Context UnderstandingReading Comprehension | —Unverified | 0 |
| Long-context LLMs Struggle with Long In-context Learning | Apr 2, 2024 | 2kIn-Context Learning | CodeCode Available | 5 |
| FABLES: Evaluating faithfulness and content selection in book-length summarization | Apr 1, 2024 | Long-Context Understanding | CodeCode Available | 2 |
| InternLM2 Technical Report | Mar 26, 2024 | 4kLong-Context Understanding | CodeCode Available | 9 |
| LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images | Mar 18, 2024 | Long-Context UnderstandingTextVQA | CodeCode Available | 3 |
| Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | Mar 8, 2024 | 1 Image, 2*2 StitchingCode Generation | CodeCode Available | 3 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 |
| Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models | Feb 3, 2024 | Logical ReasoningLong-Context Understanding | —Unverified | 0 |
| Gemini: A Family of Highly Capable Multimodal Models | Dec 19, 2023 | 1 Image, 2*2 StitchingArithmetic Reasoning | CodeCode Available | 1 |
| Marathon: A Race Through the Realm of Long Context with Large Language Models | Dec 15, 2023 | Long-Context UnderstandingMultiple-choice | CodeCode Available | 1 |
| LooGLE: Can Long-Context Language Models Understand Long Contexts? | Nov 8, 2023 | In-Context LearningLong-Context Understanding | CodeCode Available | 1 |
| mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration | Nov 7, 2023 | 1 Image, 2*2 StitchingDecoder | CodeCode Available | 4 |
| CogVLM: Visual Expert for Pretrained Language Models | Nov 6, 2023 | 1 Image, 2*2 StitchingFS-MEVQA | CodeCode Available | 5 |
| S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models | Oct 23, 2023 | Long-Context Understanding | CodeCode Available | 1 |
| LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding | Aug 28, 2023 | 16kCode Completion | CodeCode Available | 3 |