SOTAVerified

Math

Papers

Showing 150 of 1596 papers

TitleStatusHype
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All ToolsCode14
Qwen2.5 Technical ReportCode13
Qwen2.5-Coder Technical ReportCode11
AgentRxiv: Towards Collaborative Autonomous ResearchCode9
s1: Simple test-time scalingCode9
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end ModelCode9
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code IntelligenceCode9
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsCode9
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
OpenThoughts: Data Recipes for Reasoning ModelsCode7
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
TTRL: Test-Time Reinforcement LearningCode7
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the WildCode7
xLSTM 7B: A Recurrent LLM for Fast and Efficient InferenceCode7
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement LearningCode7
S*: Test Time Scaling for Code GenerationCode7
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!Code7
Kimi k1.5: Scaling Reinforcement Learning with LLMsCode7
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep ThinkingCode7
O1 Replication Journey: A Strategic Progress Report -- Part 1Code7
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference FeedbackCode7
StarCoder 2 and The Stack v2: The Next GenerationCode7
DSPy: Compiling Declarative Language Model Calls into Self-Improving PipelinesCode7
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language ModelsCode7
Mistral 7BCode6
Qwen Technical ReportCode6
AWQ: Activation-aware Weight Quantization for LLM Compression and AccelerationCode6
GPT-4 Technical ReportCode6
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsCode6
Reinforcement Learning from Human FeedbackCode5
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language ModelsCode5
LIMO: Less is More for ReasoningCode5
Process Reinforcement through Implicit RewardsCode5
Free Process Rewards without Process LabelsCode5
OpenR: An Open Source Framework for Advanced Reasoning with Large Language ModelsCode5
LiveBench: A Challenging, Contamination-Limited LLM BenchmarkCode5
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8BCode5
MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkitCode5
Evolutionary Optimization of Model Merging RecipesCode5
Common 7B Language Models Already Possess Strong Math CapabilitiesCode5
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-InstructCode5
Energy-Based Transformers are Scalable Learners and ThinkersCode4
Skywork Open Reasoner 1 Technical ReportCode4
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level SupervisionCode4
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning datasetCode4
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and BeyondCode4
CodeI/O: Condensing Reasoning Patterns via Code Input-Output PredictionCode4
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought TemplatesCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
SuperCorrect: Supervising and Correcting Language Models with Error-Driven InsightsCode4
Show:102550
← PrevPage 1 of 32Next →

No leaderboard results yet.