| Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models | Aug 21, 2024 | Logical ReasoningMotion Synthesis | —Unverified | 0 |
| SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding | Aug 21, 2024 | Logical ReasoningMathematical Reasoning | —Unverified | 0 |
| CHECKWHY: Causal Fact Verification via Argument Structure | Aug 20, 2024 | Fact VerificationLogical Reasoning | CodeCode Available | 1 |
| A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models | Aug 16, 2024 | Logical Reasoningvalid | —Unverified | 0 |
| Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions | Aug 16, 2024 | DescriptiveHallucination | —Unverified | 0 |
| LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image | Aug 14, 2024 | Autonomous DrivingLogical Reasoning | —Unverified | 0 |
| Can Large Language Models Reason? A Characterization via 3-SAT | Aug 13, 2024 | Logical Reasoning | —Unverified | 0 |
| P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training | Aug 10, 2024 | DiversityLogical Reasoning | —Unverified | 0 |
| Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset | Aug 8, 2024 | Logical Reasoning | CodeCode Available | 0 |
| Automated Theorem Provers Help Improve Large Language Model Reasoning | Aug 7, 2024 | Formal LogicLanguage Modeling | —Unverified | 0 |
| Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation | Aug 7, 2024 | Logical ReasoningRecommendation Systems | —Unverified | 0 |
| Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference | Aug 4, 2024 | Logical ReasoningPrompt Engineering | —Unverified | 0 |
| Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation | Jul 31, 2024 | Logical ReasoningMisinformation | —Unverified | 0 |
| CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge | Jul 30, 2024 | In-Context LearningKnowledge Graphs | —Unverified | 0 |
| Take A Step Back: Rethinking the Two Stages in Visual Reasoning | Jul 29, 2024 | Logical ReasoningQuestion Answering | —Unverified | 0 |
| Logic Distillation: Learning from Code Function by Function for Planning and Decision-making | Jul 28, 2024 | Decision MakingKnowledge Distillation | —Unverified | 0 |
| An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought | Jul 22, 2024 | FormLogical Reasoning | —Unverified | 0 |
| Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter? | Jul 20, 2024 | Logical Reasoning | CodeCode Available | 0 |
| An Explainable Fast Deep Neural Network for Emotion Recognition | Jul 20, 2024 | AttributeEmotion Classification | —Unverified | 0 |
| NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context? | Jul 16, 2024 | 4k8k | CodeCode Available | 9 |
| Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? | Jul 12, 2024 | Logical ReasoningMultiple-choice | CodeCode Available | 0 |
| Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding | Jul 11, 2024 | EEGLanguage Modeling | CodeCode Available | 1 |
| Analyzing Large language models chatbots: An experimental approach using a probability test | Jul 10, 2024 | ChatbotLogical Reasoning | —Unverified | 0 |
| Why should we ever automate moral decision making? | Jul 10, 2024 | Decision MakingEthics | —Unverified | 0 |
| R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning | Jul 8, 2024 | Logical Reasoning | CodeCode Available | 1 |
| ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models | Jul 7, 2024 | FairnessGeneral Knowledge | CodeCode Available | 1 |
| LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Jul 6, 2024 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games | Jul 5, 2024 | Logical Reasoning | —Unverified | 0 |
| Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring | Jul 4, 2024 | Logical Reasoning | —Unverified | 0 |
| PUZZLES: A Benchmark for Neural Algorithmic Reasoning | Jun 29, 2024 | Decision MakingLogical Reasoning | CodeCode Available | 1 |
| Scaling Synthetic Data Creation with 1,000,000,000 Personas | Jun 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 |
| FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Jun 27, 2024 | Decision MakingLogical Reasoning | —Unverified | 0 |
| Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism | Jun 26, 2024 | Logical Reasoning | —Unverified | 0 |
| LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic | Jun 25, 2024 | ARCLogical Reasoning | —Unverified | 0 |
| Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models | Jun 24, 2024 | Logical ReasoningNatural Language Understanding | CodeCode Available | 0 |
| Large Language Models Are Cross-Lingual Knowledge-Free Reasoners | Jun 24, 2024 | Cross-Lingual TransferLogical Reasoning | CodeCode Available | 0 |
| Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy | Jun 23, 2024 | Bilevel OptimizationImitation Learning | —Unverified | 0 |
| Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference | Jun 21, 2024 | Logical Reasoning | —Unverified | 0 |
| Pathformer: Recursive Path Query Encoding for Complex Logical Query Answering | Jun 21, 2024 | Knowledge GraphsLogical Reasoning | —Unverified | 0 |
| The neural correlates of logical-mathematical symbol systems processing resemble that of spatial cognition more than natural language processing | Jun 20, 2024 | Logical Reasoning | —Unverified | 0 |
| Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models | Jun 18, 2024 | Logical Reasoning | CodeCode Available | 0 |
| VideoVista: A Versatile Benchmark for Video Understanding and Reasoning | Jun 17, 2024 | Anomaly DetectionLogical Reasoning | CodeCode Available | 1 |
| Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment | Jun 17, 2024 | Logical ReasoningMath | —Unverified | 0 |
| Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars | Jun 16, 2024 | Automated Theorem ProvingLogical Reasoning | CodeCode Available | 0 |
| City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization | Jun 16, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Ontology Embedding: A Survey of Methods, Applications and Resources | Jun 16, 2024 | Logical ReasoningOntology Embedding | CodeCode Available | 2 |
| A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners | Jun 16, 2024 | Logical Reasoning | CodeCode Available | 1 |
| Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam | Jun 14, 2024 | FairnessLogical Reasoning | CodeCode Available | 0 |
| Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs | Jun 13, 2024 | Arithmetic ReasoningFact Verification | CodeCode Available | 2 |
| Large Language Models are Limited in Out-of-Context Knowledge Reasoning | Jun 11, 2024 | AttributeLogical Reasoning | CodeCode Available | 0 |