| ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models | Jul 7, 2024 | FairnessGeneral Knowledge | CodeCode Available | 1 |
| LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Jul 6, 2024 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games | Jul 5, 2024 | Logical Reasoning | —Unverified | 0 |
| Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring | Jul 4, 2024 | Logical Reasoning | —Unverified | 0 |
| PUZZLES: A Benchmark for Neural Algorithmic Reasoning | Jun 29, 2024 | Decision MakingLogical Reasoning | CodeCode Available | 1 |
| Scaling Synthetic Data Creation with 1,000,000,000 Personas | Jun 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 |
| FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Jun 27, 2024 | Decision MakingLogical Reasoning | —Unverified | 0 |
| Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism | Jun 26, 2024 | Logical Reasoning | —Unverified | 0 |
| LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic | Jun 25, 2024 | ARCLogical Reasoning | —Unverified | 0 |
| Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models | Jun 24, 2024 | Logical ReasoningNatural Language Understanding | CodeCode Available | 0 |
| Large Language Models Are Cross-Lingual Knowledge-Free Reasoners | Jun 24, 2024 | Cross-Lingual TransferLogical Reasoning | CodeCode Available | 0 |
| Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy | Jun 23, 2024 | Bilevel OptimizationImitation Learning | —Unverified | 0 |
| Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference | Jun 21, 2024 | Logical Reasoning | —Unverified | 0 |
| Pathformer: Recursive Path Query Encoding for Complex Logical Query Answering | Jun 21, 2024 | Knowledge GraphsLogical Reasoning | —Unverified | 0 |
| The neural correlates of logical-mathematical symbol systems processing resemble that of spatial cognition more than natural language processing | Jun 20, 2024 | Logical Reasoning | —Unverified | 0 |
| Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models | Jun 18, 2024 | Logical Reasoning | CodeCode Available | 0 |
| VideoVista: A Versatile Benchmark for Video Understanding and Reasoning | Jun 17, 2024 | Anomaly DetectionLogical Reasoning | CodeCode Available | 1 |
| Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment | Jun 17, 2024 | Logical ReasoningMath | —Unverified | 0 |
| Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars | Jun 16, 2024 | Automated Theorem ProvingLogical Reasoning | CodeCode Available | 0 |
| City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization | Jun 16, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Ontology Embedding: A Survey of Methods, Applications and Resources | Jun 16, 2024 | Logical ReasoningOntology Embedding | CodeCode Available | 2 |
| A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners | Jun 16, 2024 | Logical Reasoning | CodeCode Available | 1 |
| Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam | Jun 14, 2024 | FairnessLogical Reasoning | CodeCode Available | 0 |
| Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs | Jun 13, 2024 | Arithmetic ReasoningFact Verification | CodeCode Available | 2 |
| Large Language Models are Limited in Out-of-Context Knowledge Reasoning | Jun 11, 2024 | AttributeLogical Reasoning | CodeCode Available | 0 |