SOTAVerified

Math

Papers

Showing 76100 of 1596 papers

TitleStatusHype
ToRL: Scaling Tool-Integrated RLCode3
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem ProvingCode3
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time ScalingCode3
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-ThoughtCode3
Scaling up Masked Diffusion Models on TextCode3
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated CapabilitiesCode3
Large Language Monkeys: Scaling Inference Compute with Repeated SamplingCode3
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSONCode3
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsCode3
Step-level Value Preference Optimization for Mathematical ReasoningCode3
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language ModelsCode3
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM FinetuningCode3
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical ReasoningCode3
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference LearningCode3
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
Rho-1: Not All Tokens Are What You NeedCode3
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsCode3
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language ModelsCode3
Noise Contrastive Alignment of Language Models with Explicit RewardsCode3
Self-Discover: Large Language Models Self-Compose Reasoning StructuresCode3
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible PipelineCode3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationCode3
Llemma: An Open Language Model For MathematicsCode3
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data CompositionCode3
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingCode3
Show:102550
← PrevPage 4 of 64Next →

No leaderboard results yet.