SOTAVerified

Math

Papers

Showing 51100 of 1596 papers

TitleStatusHype
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction DataCode4
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN proverCode4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
Lean Workbook: A large-scale Lean problem set formalized from natural language math problemsCode4
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetCode4
InternLM-Math: Open Math Large Language Models Toward Verifiable ReasoningCode4
ReFT: Reasoning with Reinforced Fine-TuningCode4
LLaMA Pro: Progressive LLaMA with Block ExpansionCode4
How is ChatGPT's behavior changing over time?Code4
Let's Verify Step by StepCode4
Reasoning with Language Model is Planning with World ModelCode4
Galactica: A Large Language Model for ScienceCode4
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language modelsCode4
Dive into Deep LearningCode4
Spurious Rewards: Rethinking Training Signals in RLVRCode3
MathArena: Evaluating LLMs on Uncontaminated Math CompetitionsCode3
General-Reasoner: Advancing LLM Reasoning Across All DomainsCode3
Thinkless: LLM Learns When to ThinkCode3
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical ReasoningCode3
Reinforcement Learning for Reasoning in Large Language Models with One Training ExampleCode3
An Empirical Study on Prompt Compression for Large Language ModelsCode3
Learning to Reason under Off-Policy GuidanceCode3
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free ResolutionCode3
Dynamic Cheatsheet: Test-Time Learning with Adaptive MemoryCode3
ToRL: Scaling Tool-Integrated RLCode3
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem ProvingCode3
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time ScalingCode3
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-ThoughtCode3
Scaling up Masked Diffusion Models on TextCode3
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated CapabilitiesCode3
Large Language Monkeys: Scaling Inference Compute with Repeated SamplingCode3
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSONCode3
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsCode3
Step-level Value Preference Optimization for Mathematical ReasoningCode3
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language ModelsCode3
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM FinetuningCode3
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical ReasoningCode3
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference LearningCode3
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
Rho-1: Not All Tokens Are What You NeedCode3
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsCode3
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language ModelsCode3
Noise Contrastive Alignment of Language Models with Explicit RewardsCode3
Self-Discover: Large Language Models Self-Compose Reasoning StructuresCode3
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible PipelineCode3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationCode3
Llemma: An Open Language Model For MathematicsCode3
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data CompositionCode3
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingCode3
Show:102550
← PrevPage 2 of 32Next →

No leaderboard results yet.