SOTAVerified

Math

Papers

Showing 101150 of 1596 papers

TitleStatusHype
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data CompositionCode3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationCode3
Reinforcement Learning for Reasoning in Large Language Models with One Training ExampleCode3
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning TasksCode3
Spurious Rewards: Rethinking Training Signals in RLVRCode3
Evaluating Mathematical Reasoning Beyond AccuracyCode2
PaLM: Scaling Language Modeling with PathwaysCode2
On the Emergence of Thinking in LLMs I: Searching for the Right IntuitionCode2
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal ModelsCode2
AGIEval: A Human-Centric Benchmark for Evaluating Foundation ModelsCode2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic CorpusCode2
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique PipelineCode2
OctoThinker: Mid-training Incentivizes Reinforcement Learning ScalingCode2
Multi-View Reasoning: Consistent Contrastive Learning for Math Word ProblemCode2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function OptimizationCode2
Offline Reinforcement Learning for LLM Multi-Step ReasoningCode2
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning ProcessCode2
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem SolvingCode2
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
Essential-Web v1.0: 24T tokens of organized web dataCode2
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language ModelsCode2
Meta Prompting for AI SystemsCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsCode2
Efficient Reinforcement Finetuning via Adaptive Curriculum LearningCode2
Memorizing TransformersCode2
An Expression Tree Decoding Strategy for Mathematical Equation GenerationCode2
Easy-to-Hard Generalization: Scalable Alignment Beyond Human SupervisionCode2
Meta-Design Matters: A Self-Design Multi-Agent SystemCode2
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsCode2
Measuring Mathematical Problem Solving With the MATH DatasetCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
Measuring Multimodal Mathematical Reasoning with MATH-Vision DatasetCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
Accelerating Sparse Deep Neural NetworksCode2
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language ModelsCode2
Advancing Language Model Reasoning through Reinforcement Learning and Inference ScalingCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
MegaMath: Pushing the Limits of Open Math CorporaCode2
MM-Vet: Evaluating Large Multimodal Models for Integrated CapabilitiesCode2
MAmmoTH: Building Math Generalist Models through Hybrid Instruction TuningCode2
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical ProblemsCode2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of ParametersCode2
Agent Lumos: Unified and Modular Training for Open-Source Language AgentsCode2
AdaptThink: Reasoning Models Can Learn When to ThinkCode2
MAS-Zero: Designing Multi-Agent Systems with Zero SupervisionCode2
Balancing LoRA Performance and Efficiency with Simple Shard SharingCode2
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language ModelsCode2
Show:102550
← PrevPage 3 of 32Next →

No leaderboard results yet.