| HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation | Apr 19, 2025 | Explainable RecommendationLogical Reasoning | —Unverified | 0 |
| Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes | Apr 18, 2025 | Knowledge GraphsLogical Reasoning | —Unverified | 0 |
| LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models | Apr 18, 2025 | Logical Reasoning | —Unverified | 0 |
| Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy | Apr 18, 2025 | HallucinationLogical Reasoning | —Unverified | 0 |
| Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration | Apr 17, 2025 | Geometry Problem SolvingLarge Language Model | CodeCode Available | 1 |
| LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection | Apr 17, 2025 | Anomaly DetectionLogical Reasoning | —Unverified | 0 |
| d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | Apr 16, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving | Apr 15, 2025 | Logical ReasoningVisual Question Answering (VQA) | —Unverified | 0 |
| MediSee: Reasoning-based Pixel-level Perception in Medical Images | Apr 15, 2025 | Logical ReasoningReasoning Segmentation | —Unverified | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 |
| Socrates or Smartypants: Testing Logic Reasoning Capabilities of Large Language Models with Logic Programming-based Test Oracles | Apr 9, 2025 | Logical FallaciesLogical Reasoning | CodeCode Available | 0 |
| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization | Apr 9, 2025 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification | Apr 7, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent | Apr 7, 2025 | Logical Reasoning | —Unverified | 0 |
| Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition | Apr 4, 2025 | Logical Reasoning | —Unverified | 0 |
| Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing | Apr 3, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 2 |
| Adaptive Rectification Sampling for Test-Time Compute Scaling | Apr 2, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 |
| Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | Mar 31, 2025 | Logical ReasoningMultiple-choice | CodeCode Available | 2 |
| VGRP-Bench: Visual Grid Reasoning Puzzle Benchmark for Large Vision-Language Models | Mar 29, 2025 | Logical Reasoning | —Unverified | 0 |
| Negation: A Pink Elephant in the Large Language Models' Room? | Mar 28, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? | Mar 28, 2025 | Logical ReasoningMath | CodeCode Available | 1 |
| ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning | Mar 26, 2025 | Logical Reasoning | —Unverified | 0 |
| Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning | Mar 25, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives | Mar 23, 2025 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |