SOTAVerified

Math

Papers

Showing 201250 of 1596 papers

TitleStatusHype
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical TextsCode2
Can AI Assistants Know What They Don't Know?Code2
SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in ChineseCode2
Tuning Language Models by ProxyCode2
SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language ModelsCode2
MathPile: A Billion-Token-Scale Pretraining Corpus for MathCode2
YUAN 2.0: A Large Language Model with Localized Filtering-based AttentionCode2
System 2 Attention (is something you might need too)Code2
Meta Prompting for AI SystemsCode2
Agent Lumos: Unified and Modular Training for Open-Source Language AgentsCode2
An Expression Tree Decoding Strategy for Mathematical Equation GenerationCode2
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math ReasoningCode2
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsCode2
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMsCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
MAmmoTH: Building Math Generalist Models through Hybrid Instruction TuningCode2
GPT Can Solve Mathematical Problems Without a CalculatorCode2
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-VerificationCode2
Cumulative Reasoning with Large Language ModelsCode2
MM-Vet: Evaluating Large Multimodal Models for Integrated CapabilitiesCode2
LeanDojo: Theorem Proving with Retrieval-Augmented Language ModelsCode2
Progressive-Hint Prompting Improves Reasoning in Large Language ModelsCode2
AGIEval: A Human-Centric Benchmark for Evaluating Foundation ModelsCode2
Specializing Smaller Language Models towards Multi-Step ReasoningCode2
A Survey of Deep Learning for Mathematical ReasoningCode2
Multi-View Reasoning: Consistent Contrastive Learning for Math Word ProblemCode2
Language Models are Multilingual Chain-of-Thought ReasonersCode2
PaLM: Scaling Language Modeling with PathwaysCode2
Memorizing TransformersCode2
Accelerating Sparse Deep Neural NetworksCode2
Full Page Handwriting Recognition via Image to Sequence ExtractionCode2
Measuring Mathematical Problem Solving With the MATH DatasetCode2
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong GainsCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
OJBench: A Competition Level Code Benchmark For Large Language ModelsCode1
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad TeamCode1
Steering LLM Thinking with Budget GuidanceCode1
RePO: Replay-Enhanced Policy OptimizationCode1
Resa: Transparent Reasoning Models via SAEsCode1
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM ReasoningCode1
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement LearningCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent FrameworkCode1
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic TasksCode1
Show:102550
← PrevPage 5 of 32Next →

No leaderboard results yet.