| Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL | Feb 17, 2025 | Code GenerationMath | CodeCode Available | 1 |
| Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping | Feb 16, 2025 | Code GenerationInstruction Following | CodeCode Available | 1 |
| Graders should cheat: privileged information enables expert-level automated evaluations | Feb 16, 2025 | Math | —Unverified | 0 |
| Dyve: Thinking Fast and Slow for Dynamic Process Verification | Feb 16, 2025 | Math | CodeCode Available | 1 |
| Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Feb 16, 2025 | Computational EfficiencyGSM8K | CodeCode Available | 0 |
| 1bit-Merging: Dynamic Quantized Merging for Large Language Models | Feb 15, 2025 | Code GenerationMath | —Unverified | 0 |
| MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Feb 13, 2025 | BenchmarkingMath | —Unverified | 0 |
| CRANE: Reasoning with constrained LLM generation | Feb 13, 2025 | Code GenerationMath | —Unverified | 0 |
| Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving | Feb 12, 2025 | Mathmultimodal interaction | —Unverified | 0 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 |