| GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Oct 7, 2024 | GSM8KLogical Reasoning | CodeCode Available | 1 |
| RATIONALYST: Pre-training Process-Supervision for Improving Reasoning | Oct 1, 2024 | Logical Reasoning | CodeCode Available | 1 |
| VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning | Sep 3, 2024 | Chart Question AnsweringData Visualization | CodeCode Available | 1 |
| LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models | Aug 28, 2024 | BenchmarkingLogical Reasoning | CodeCode Available | 1 |
| CHECKWHY: Causal Fact Verification via Argument Structure | Aug 20, 2024 | Fact VerificationLogical Reasoning | CodeCode Available | 1 |
| Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding | Jul 11, 2024 | EEGLanguage Modeling | CodeCode Available | 1 |
| R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning | Jul 8, 2024 | Logical Reasoning | CodeCode Available | 1 |
| ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models | Jul 7, 2024 | FairnessGeneral Knowledge | CodeCode Available | 1 |
| LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Jul 6, 2024 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| PUZZLES: A Benchmark for Neural Algorithmic Reasoning | Jun 29, 2024 | Decision MakingLogical Reasoning | CodeCode Available | 1 |
| VideoVista: A Versatile Benchmark for Video Understanding and Reasoning | Jun 17, 2024 | Anomaly DetectionLogical Reasoning | CodeCode Available | 1 |
| A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners | Jun 16, 2024 | Logical Reasoning | CodeCode Available | 1 |
| LogiCode: an LLM-Driven Framework for Logical Anomaly Detection | Jun 7, 2024 | Anomaly DetectionBinary Classification | CodeCode Available | 1 |
| LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models | Apr 23, 2024 | Logical ReasoningQuestion Answering | CodeCode Available | 1 |
| LeanReasoner: Boosting Complex Logical Reasoning with Lean | Mar 20, 2024 | Automated Theorem ProvingLogical Reasoning | CodeCode Available | 1 |
| SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials | Feb 22, 2024 | Chart Question AnsweringLanguage Modeling | CodeCode Available | 1 |
| OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models | Feb 21, 2024 | General KnowledgeLogical Reasoning | CodeCode Available | 1 |
| Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs | Feb 18, 2024 | Logical Reasoning | CodeCode Available | 1 |
| The Quantified Boolean Bayesian Network: Theory and Experiments with a Logical Graphical Model | Feb 9, 2024 | Information RetrievalLanguage Modelling | CodeCode Available | 1 |
| Conditional and Modal Reasoning in Large Language Models | Jan 30, 2024 | Logical Reasoning | CodeCode Available | 1 |
| Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions | Jan 17, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models | Jan 1, 2024 | Code GenerationIn-Context Learning | CodeCode Available | 1 |
| TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning | Dec 25, 2023 | Knowledge GraphsLogical Reasoning | CodeCode Available | 1 |
| Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation | Dec 25, 2023 | Knowledge GraphsLogical Reasoning | CodeCode Available | 1 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |