SOTAVerified

Math

Papers

Showing 150 of 1596 papers

TitleStatusHype
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All ToolsCode14
Qwen2.5 Technical ReportCode13
Qwen2.5-Coder Technical ReportCode11
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code IntelligenceCode9
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end ModelCode9
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsCode9
s1: Simple test-time scalingCode9
AgentRxiv: Towards Collaborative Autonomous ResearchCode9
O1 Replication Journey: A Strategic Progress Report -- Part 1Code7
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference FeedbackCode7
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep ThinkingCode7
OpenThoughts: Data Recipes for Reasoning ModelsCode7
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!Code7
DSPy: Compiling Declarative Language Model Calls into Self-Improving PipelinesCode7
S*: Test Time Scaling for Code GenerationCode7
Kimi k1.5: Scaling Reinforcement Learning with LLMsCode7
StarCoder 2 and The Stack v2: The Next GenerationCode7
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
xLSTM 7B: A Recurrent LLM for Fast and Efficient InferenceCode7
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language ModelsCode7
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the WildCode7
TTRL: Test-Time Reinforcement LearningCode7
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement LearningCode7
AWQ: Activation-aware Weight Quantization for LLM Compression and AccelerationCode6
Qwen Technical ReportCode6
Mistral 7BCode6
GPT-4 Technical ReportCode6
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsCode6
Process Reinforcement through Implicit RewardsCode5
LiveBench: A Challenging, Contamination-Limited LLM BenchmarkCode5
MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkitCode5
OpenR: An Open Source Framework for Advanced Reasoning with Large Language ModelsCode5
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8BCode5
Common 7B Language Models Already Possess Strong Math CapabilitiesCode5
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language ModelsCode5
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-InstructCode5
LIMO: Less is More for ReasoningCode5
Evolutionary Optimization of Model Merging RecipesCode5
Free Process Rewards without Process LabelsCode5
Reinforcement Learning from Human FeedbackCode5
Dive into Deep LearningCode4
LLaMA Pro: Progressive LLaMA with Block ExpansionCode4
Lean Workbook: A large-scale Lean problem set formalized from natural language math problemsCode4
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN proverCode4
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
Let's Verify Step by StepCode4
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and BeyondCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
How is ChatGPT's behavior changing over time?Code4
Show:102550
← PrevPage 1 of 32Next →

No leaderboard results yet.