SOTAVerified

Math

Papers

Showing 251300 of 1596 papers

TitleStatusHype
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM ReasoningCode1
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource SettingsCode1
Advancing Multimodal Reasoning via Reinforcement Learning with Cold StartCode1
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained KnowledgeCode1
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical ReasoningCode1
Unifying Multimodal Large Language Model Capabilities and Modalities via Model MergingCode1
Value-Guided Search for Efficient Chain-of-Thought ReasoningCode1
RaDeR: Reasoning-aware Dense Retrieval ModelsCode1
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement LearningCode1
Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem SolvingCode1
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMsCode1
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World ChallengesCode1
The Unreasonable Effectiveness of Entropy Minimization in LLM ReasoningCode1
Training Step-Level Reasoning Verifiers with Formal Verification ToolsCode1
TinyV: Reducing False Negatives in Verification Improves RL for LLM ReasoningCode1
Let's Verify Math Questions Step by StepCode1
Efficient RL Training for Reasoning Models via Length-Aware OptimizationCode1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM SystemsCode1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reportsCode1
Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model ReasoningCode1
Rewriting Pre-Training Data Boosts LLM Performance in Math and CodeCode1
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
DeepCritic: Deliberate Critique with Large Language ModelsCode1
Efficient Reasoning for LLMs through Speculative Chain-of-ThoughtCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit GenerationCode1
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?Code1
Efficient Process Reward Model Training via Active LearningCode1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsCode1
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for CompressionCode1
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language ModelsCode1
Large (Vision) Language Models are Unsupervised In-Context LearnersCode1
BlenderGym: Benchmarking Foundational Model Systems for Graphics EditingCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?Code1
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning ModelsCode1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web SearchCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language ModelsCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Self-Training Elicits Concise Reasoning in Large Language ModelsCode1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?Code1
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
Reasoning with Reinforced Functional Token TuningCode1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
Thinking Preference OptimizationCode1
Show:102550
← PrevPage 6 of 32Next →

No leaderboard results yet.