| Scaling Synthetic Data Creation with 1,000,000,000 Personas | Jun 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 |
| NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context? | Jul 16, 2024 | 4k8k | CodeCode Available | 9 |
| LLaVA-CoT: Let Vision Language Models Reason Step-by-Step | Nov 15, 2024 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 7 |
| PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation | Jan 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| SGLang: Efficient Execution of Structured Language Model Programs | Dec 12, 2023 | Few-Shot LearningLanguage Modeling | CodeCode Available | 6 |
| Training Compute-Optimal Large Language Models | Mar 29, 2022 | AnachronismsAnalogical Similarity | CodeCode Available | 6 |
| Training Large Language Models to Reason in a Continuous Latent Space | Dec 9, 2024 | Logical Reasoning | CodeCode Available | 5 |
| From System 1 to System 2: A Survey of Reasoning Large Language Models | Feb 24, 2025 | Logical Reasoning | CodeCode Available | 5 |
| SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models | Jun 15, 2025 | Logical ReasoningReinforcement Learning (RL) | CodeCode Available | 5 |
| Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Nov 4, 2024 | Logical ReasoningMathematical Problem-Solving | CodeCode Available | 5 |
| MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI | Nov 27, 2023 | Complex Query AnsweringLogical Reasoning | CodeCode Available | 5 |
| Knowledge Fusion of Large Language Models | Jan 19, 2024 | Code GenerationCommon Sense Reasoning | CodeCode Available | 4 |
| R1-Onevision:An Open-Source Multimodal Large Language Model Capable of Deep Reasoning | Feb 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL | Mar 10, 2025 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 4 |
| OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning | Dec 31, 2024 | BenchmarkingLogical Reasoning | CodeCode Available | 4 |
| GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations | Feb 19, 2024 | Card GamesLogical Reasoning | CodeCode Available | 3 |
| Faithful Logical Reasoning via Symbolic Chain-of-Thought | May 28, 2024 | Logical Reasoning | CodeCode Available | 3 |
| A Survey on Large Language Model Acceleration based on KV Cache Management | Dec 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Reasoning with Language Model Prompting: A Survey | Dec 19, 2022 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 3 |
| Measuring AI Ability to Complete Long Tasks | Mar 18, 2025 | Logical Reasoning | CodeCode Available | 3 |
| LLM4Drive: A Survey of Large Language Models for Autonomous Driving | Nov 2, 2023 | Autonomous DrivingFew-Shot Learning | CodeCode Available | 3 |
| Advancing LLM Reasoning Generalists with Preference Trees | Apr 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 3 |
| Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models | Apr 19, 2023 | Logical Reasoning | CodeCode Available | 3 |
| InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners | Apr 19, 2025 | Action GenerationLogical Reasoning | CodeCode Available | 2 |
| Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples | Jun 9, 2024 | ARCDiversity | CodeCode Available | 2 |