| Pensez: Less Data, Better Reasoning -- Rethinking French LLM | Mar 17, 2025 | Large Language ModelMath | —Unverified | 0 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 |
| SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? | Mar 16, 2025 | Board GamesCard Games | —Unverified | 0 |
| Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data | Mar 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search | Mar 13, 2025 | Image RetrievalMath | CodeCode Available | 1 |
| Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | Mar 13, 2025 | Domain GeneralizationMath | CodeCode Available | 4 |
| Understanding the Logical Capabilities of Large Language Models via Out-of-Context Representation Learning | Mar 13, 2025 | In-Context LearningMath | —Unverified | 0 |
| Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression | Mar 13, 2025 | Code GenerationConformal Prediction | —Unverified | 0 |
| StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error | Mar 13, 2025 | Math | CodeCode Available | 0 |
| The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory | Mar 13, 2025 | MathMultiple-choice | —Unverified | 0 |