SOTAVerified

Math

Papers

Showing 51100 of 1596 papers

TitleStatusHype
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought TemplatesCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
Reasoning with Language Model is Planning with World ModelCode4
Galactica: A Large Language Model for ScienceCode4
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction DataCode4
InternLM-Math: Open Math Large Language Models Toward Verifiable ReasoningCode4
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language modelsCode4
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level SupervisionCode4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
Energy-Based Transformers are Scalable Learners and ThinkersCode4
How is ChatGPT's behavior changing over time?Code4
Dive into Deep LearningCode4
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning datasetCode4
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetCode4
ThoughtSource: A central hub for large language model reasoning dataCode3
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSONCode3
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free ResolutionCode3
Thinkless: LLM Learns When to ThinkCode3
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingCode3
Learning to Reason under Off-Policy GuidanceCode3
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated CapabilitiesCode3
SymForce: Symbolic Computation and Code Generation for RoboticsCode3
ToRL: Scaling Tool-Integrated RLCode3
Step-level Value Preference Optimization for Mathematical ReasoningCode3
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
Spurious Rewards: Rethinking Training Signals in RLVRCode3
Self-Discover: Large Language Models Self-Compose Reasoning StructuresCode3
Large Language Monkeys: Scaling Inference Compute with Repeated SamplingCode3
Llemma: An Open Language Model For MathematicsCode3
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsCode3
Training Verifiers to Solve Math Word ProblemsCode3
Reinforcement Learning for Reasoning in Large Language Models with One Training ExampleCode3
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time ScalingCode3
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem ProvingCode3
General-Reasoner: Advancing LLM Reasoning Across All DomainsCode3
Rho-1: Not All Tokens Are What You NeedCode3
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning TasksCode3
PAL: Program-aided Language ModelsCode3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationCode3
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical ReasoningCode3
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference LearningCode3
Noise Contrastive Alignment of Language Models with Explicit RewardsCode3
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data CompositionCode3
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical ReasoningCode3
MathArena: Evaluating LLMs on Uncontaminated Math CompetitionsCode3
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsCode3
Scaling up Masked Diffusion Models on TextCode3
Dynamic Cheatsheet: Test-Time Learning with Adaptive MemoryCode3
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM FinetuningCode3
Show:102550
← PrevPage 2 of 32Next →

No leaderboard results yet.