SOTAVerified

Math

Papers

Showing 351375 of 1596 papers

TitleStatusHype
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models0
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts0
MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model TrainingCode0
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Self-Training Elicits Concise Reasoning in Large Language ModelsCode1
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning0
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?Code1
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning0
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution0
From Euler to AI: Unifying Formulas for Mathematical ConstantsCode0
Learning Decentralized Swarms Using Rotation Equivariant Graph Neural NetworksCode0
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language ModelsCode2
Linguistic Generalizability of Test-Time Scaling in Mathematical ReasoningCode0
Reasoning with Latent Thoughts: On the Power of Looped Transformers0
DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling0
SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance0
Inference Computation Scaling for Feature Augmentation in Recommendation Systems0
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning0
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not LongerCode0
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
S*: Test Time Scaling for Code GenerationCode7
GATE: Graph-based Adaptive Tool Evolution Across Diverse TasksCode0
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement LearningCode7
Show:102550
← PrevPage 15 of 64Next →

No leaderboard results yet.