| ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning | Mar 19, 2022 | Chart Question AnsweringLogical Reasoning | CodeCode Available | 2 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 |
| Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation | May 27, 2025 | Large Language ModelLogical Reasoning | CodeCode Available | 1 |
| Large Language Models for Planning: A Comprehensive and Systematic Survey | May 26, 2025 | Logical ReasoningNavigate | CodeCode Available | 1 |
| Do Large Language Models Excel in Complex Logical Reasoning with Formal Language? | May 22, 2025 | Logical Reasoning | CodeCode Available | 1 |
| NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning | May 21, 2025 | General Reinforcement LearningLogical Reasoning | CodeCode Available | 1 |
| Learning to Reason via Mixture-of-Thought for Logical Reasoning | May 21, 2025 | Logical ReasoningNatural Language Inference | CodeCode Available | 1 |
| Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues? | May 19, 2025 | Logical ReasoningOptical Character Recognition | CodeCode Available | 1 |
| LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images? | May 18, 2025 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 1 |
| BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs | May 18, 2025 | Logical Reasoning | CodeCode Available | 1 |
| Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration | Apr 17, 2025 | Geometry Problem SolvingLarge Language Model | CodeCode Available | 1 |
| Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization | Apr 9, 2025 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? | Mar 28, 2025 | Logical ReasoningMath | CodeCode Available | 1 |
| AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models | Feb 24, 2025 | Logical ReasoningMultiple-choice | CodeCode Available | 1 |
| Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models | Feb 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation | Feb 10, 2025 | Logical Reasoning | CodeCode Available | 1 |
| Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation | Dec 24, 2024 | Graph Question AnsweringHallucination | CodeCode Available | 1 |
| WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model | Dec 13, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 1 |
| RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios | Dec 12, 2024 | Logical ReasoningLong-Context Understanding | CodeCode Available | 1 |
| ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression | Dec 4, 2024 | 2kLogical Reasoning | CodeCode Available | 1 |
| The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units | Nov 4, 2024 | Logical Reasoning | CodeCode Available | 1 |
| LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation | Nov 1, 2024 | Logical ReasoningSequential Decision Making | CodeCode Available | 1 |
| Neuro-symbolic Learning Yielding Logical Constraints | Oct 28, 2024 | Logical Reasoning | CodeCode Available | 1 |
| Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning | Oct 10, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| Automatic Curriculum Expert Iteration for Reliable LLM Reasoning | Oct 10, 2024 | HallucinationLogical Reasoning | CodeCode Available | 1 |
| GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Oct 7, 2024 | GSM8KLogical Reasoning | CodeCode Available | 1 |
| RATIONALYST: Pre-training Process-Supervision for Improving Reasoning | Oct 1, 2024 | Logical Reasoning | CodeCode Available | 1 |
| VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning | Sep 3, 2024 | Chart Question AnsweringData Visualization | CodeCode Available | 1 |
| LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models | Aug 28, 2024 | BenchmarkingLogical Reasoning | CodeCode Available | 1 |
| CHECKWHY: Causal Fact Verification via Argument Structure | Aug 20, 2024 | Fact VerificationLogical Reasoning | CodeCode Available | 1 |
| Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding | Jul 11, 2024 | EEGLanguage Modeling | CodeCode Available | 1 |
| R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning | Jul 8, 2024 | Logical Reasoning | CodeCode Available | 1 |
| ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models | Jul 7, 2024 | FairnessGeneral Knowledge | CodeCode Available | 1 |
| LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Jul 6, 2024 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| PUZZLES: A Benchmark for Neural Algorithmic Reasoning | Jun 29, 2024 | Decision MakingLogical Reasoning | CodeCode Available | 1 |
| VideoVista: A Versatile Benchmark for Video Understanding and Reasoning | Jun 17, 2024 | Anomaly DetectionLogical Reasoning | CodeCode Available | 1 |
| A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners | Jun 16, 2024 | Logical Reasoning | CodeCode Available | 1 |
| LogiCode: an LLM-Driven Framework for Logical Anomaly Detection | Jun 7, 2024 | Anomaly DetectionBinary Classification | CodeCode Available | 1 |
| LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models | Apr 23, 2024 | Logical ReasoningQuestion Answering | CodeCode Available | 1 |
| LeanReasoner: Boosting Complex Logical Reasoning with Lean | Mar 20, 2024 | Automated Theorem ProvingLogical Reasoning | CodeCode Available | 1 |
| SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials | Feb 22, 2024 | Chart Question AnsweringLanguage Modeling | CodeCode Available | 1 |
| OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models | Feb 21, 2024 | General KnowledgeLogical Reasoning | CodeCode Available | 1 |
| Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs | Feb 18, 2024 | Logical Reasoning | CodeCode Available | 1 |
| The Quantified Boolean Bayesian Network: Theory and Experiments with a Logical Graphical Model | Feb 9, 2024 | Information RetrievalLanguage Modelling | CodeCode Available | 1 |
| Conditional and Modal Reasoning in Large Language Models | Jan 30, 2024 | Logical Reasoning | CodeCode Available | 1 |
| Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions | Jan 17, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models | Jan 1, 2024 | Code GenerationIn-Context Learning | CodeCode Available | 1 |
| TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning | Dec 25, 2023 | Knowledge GraphsLogical Reasoning | CodeCode Available | 1 |
| Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation | Dec 25, 2023 | Knowledge GraphsLogical Reasoning | CodeCode Available | 1 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |