| A Structured Unplugged Approach for Foundational AI Literacy in Primary Education | May 27, 2025 | Logical ReasoningMisconceptions | CodeCode Available | 0 |
| Interleaved Reasoning for Large Language Models via Reinforcement Learning | May 26, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles | May 26, 2025 | ARCLogical Reasoning | —Unverified | 0 |
| Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers | May 26, 2025 | Logical ReasoningMathematical Problem-Solving | CodeCode Available | 0 |
| CP-Router: An Uncertainty-Aware Router Between LLM and LRM | May 26, 2025 | Conformal PredictionLogical Reasoning | —Unverified | 0 |
| ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding | May 25, 2025 | Chart UnderstandingLogical Reasoning | CodeCode Available | 0 |
| MARCO: Meta-Reflection with Cross-Referencing for Code Reasoning | May 23, 2025 | Logical Reasoning | —Unverified | 0 |
| Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation | May 22, 2025 | Financial AnalysisLogical Reasoning | —Unverified | 0 |
| Reasoning in Neurosymbolic AI | May 22, 2025 | FairnessLogical Reasoning | —Unverified | 0 |
| Sudoku-Bench: Evaluating creative reasoning with Sudoku variants | May 22, 2025 | DiversityLogical Reasoning | CodeCode Available | 0 |
| SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas | May 20, 2025 | BenchmarkingLogical Reasoning | —Unverified | 0 |
| Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning | May 20, 2025 | Logical ReasoningMathematical Reasoning | —Unverified | 0 |
| Curriculum Abductive Learning | May 18, 2025 | Logical Reasoning | —Unverified | 0 |
| System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection | May 10, 2025 | Logical ReasoningRAG | —Unverified | 0 |
| Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time | May 6, 2025 | Computational EfficiencyDecision Making | —Unverified | 0 |
| HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking | May 5, 2025 | Logical Reasoning | —Unverified | 0 |
| Reasoning Capabilities and Invariability of Large Language Models | May 1, 2025 | Logical Reasoning | CodeCode Available | 0 |
| A Report on the llms evaluating the high school questions | Apr 30, 2025 | Logical Reasoning | —Unverified | 0 |
| LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning | Apr 28, 2025 | Anomaly DetectionLogical Reasoning | CodeCode Available | 0 |
| POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications | Apr 21, 2025 | HallucinationLogical Reasoning | —Unverified | 0 |
| CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs | Apr 21, 2025 | Claim VerificationLogical Reasoning | CodeCode Available | 0 |
| HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation | Apr 19, 2025 | Explainable RecommendationLogical Reasoning | —Unverified | 0 |
| Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy | Apr 18, 2025 | HallucinationLogical Reasoning | —Unverified | 0 |
| LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models | Apr 18, 2025 | Logical Reasoning | —Unverified | 0 |
| Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes | Apr 18, 2025 | Knowledge GraphsLogical Reasoning | —Unverified | 0 |
| LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection | Apr 17, 2025 | Anomaly DetectionLogical Reasoning | —Unverified | 0 |
| d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | Apr 16, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving | Apr 15, 2025 | Logical ReasoningVisual Question Answering (VQA) | —Unverified | 0 |
| MediSee: Reasoning-based Pixel-level Perception in Medical Images | Apr 15, 2025 | Logical ReasoningReasoning Segmentation | —Unverified | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 |
| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| Socrates or Smartypants: Testing Logic Reasoning Capabilities of Large Language Models with Logic Programming-based Test Oracles | Apr 9, 2025 | Logical FallaciesLogical Reasoning | CodeCode Available | 0 |
| Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification | Apr 7, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent | Apr 7, 2025 | Logical Reasoning | —Unverified | 0 |
| Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition | Apr 4, 2025 | Logical Reasoning | —Unverified | 0 |
| Adaptive Rectification Sampling for Test-Time Compute Scaling | Apr 2, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 |
| VGRP-Bench: Visual Grid Reasoning Puzzle Benchmark for Large Vision-Language Models | Mar 29, 2025 | Logical Reasoning | —Unverified | 0 |
| Negation: A Pink Elephant in the Large Language Models' Room? | Mar 28, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning | Mar 26, 2025 | Logical Reasoning | —Unverified | 0 |
| Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning | Mar 25, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| (G)I-DLE: Generative Inference via Distribution-preserving Logit Exclusion with KL Divergence Minimization for Constrained Decoding | Mar 23, 2025 | Logical Reasoning | —Unverified | 0 |
| A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives | Mar 23, 2025 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| Enhancing Retrieval Systems with Inference-Time Logical Reasoning | Mar 22, 2025 | Computational EfficiencyLogical Reasoning | —Unverified | 0 |
| LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning | Mar 21, 2025 | Code GenerationDeep Reinforcement Learning | —Unverified | 0 |
| Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1 | Mar 20, 2025 | Large Language ModelLogical Reasoning | —Unverified | 0 |
| From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models | Mar 20, 2025 | Logical Reasoning | —Unverified | 0 |
| Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack | Mar 18, 2025 | 8kBenchmarking | —Unverified | 0 |
| 3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o | Mar 17, 2025 | Logical ReasoningPrompt Engineering | —Unverified | 0 |
| Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models | Mar 12, 2025 | Logical ReasoningSurvey | —Unverified | 0 |
| Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation | Mar 12, 2025 | Allcounterfactual | —Unverified | 0 |