SOTAVerified

Math

Papers

Showing 51100 of 1596 papers

TitleStatusHype
Galactica: A Large Language Model for ScienceCode4
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought TemplatesCode4
How is ChatGPT's behavior changing over time?Code4
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
Energy-Based Transformers are Scalable Learners and ThinkersCode4
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetCode4
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level SupervisionCode4
LLaMA Pro: Progressive LLaMA with Block ExpansionCode4
InternLM-Math: Open Math Large Language Models Toward Verifiable ReasoningCode4
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning datasetCode4
Dive into Deep LearningCode4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language modelsCode4
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction DataCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
Thinkless: LLM Learns When to ThinkCode3
ThoughtSource: A central hub for large language model reasoning dataCode3
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free ResolutionCode3
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSONCode3
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingCode3
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated CapabilitiesCode3
SymForce: Symbolic Computation and Code Generation for RoboticsCode3
ToRL: Scaling Tool-Integrated RLCode3
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsCode3
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsCode3
Step-level Value Preference Optimization for Mathematical ReasoningCode3
Large Language Monkeys: Scaling Inference Compute with Repeated SamplingCode3
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
Self-Discover: Large Language Models Self-Compose Reasoning StructuresCode3
Learning to Reason under Off-Policy GuidanceCode3
Llemma: An Open Language Model For MathematicsCode3
Spurious Rewards: Rethinking Training Signals in RLVRCode3
Training Verifiers to Solve Math Word ProblemsCode3
Reinforcement Learning for Reasoning in Large Language Models with One Training ExampleCode3
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data CompositionCode3
Rho-1: Not All Tokens Are What You NeedCode3
General-Reasoner: Advancing LLM Reasoning Across All DomainsCode3
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning TasksCode3
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem ProvingCode3
PAL: Program-aided Language ModelsCode3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationCode3
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical ReasoningCode3
Noise Contrastive Alignment of Language Models with Explicit RewardsCode3
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference LearningCode3
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM FinetuningCode3
Dynamic Cheatsheet: Test-Time Learning with Adaptive MemoryCode3
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time ScalingCode3
Scaling up Masked Diffusion Models on TextCode3
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language ModelsCode3
MathArena: Evaluating LLMs on Uncontaminated Math CompetitionsCode3
Show:102550
← PrevPage 2 of 32Next →

No leaderboard results yet.