Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 1596 papers

Title	Date	Tasks	Status	Hype
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models	Feb 1, 2025	Math	CodeCode Available	2
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate	Jan 29, 2025	Instruction FollowingMath	CodeCode Available	2
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling	Jan 20, 2025	Imitation LearningLanguage Modeling	CodeCode Available	2
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics	Jan 8, 2025	MathMathematical Reasoning	CodeCode Available	2
Offline Reinforcement Learning for LLM Multi-Step Reasoning	Dec 20, 2024	GSM8KMath	CodeCode Available	2
ProcessBench: Identifying Process Errors in Mathematical Reasoning	Dec 9, 2024	GSM8KMath	CodeCode Available	2
Preference Optimization for Reasoning with Pseudo Feedback	Nov 25, 2024	GSM8KMath	CodeCode Available	2
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training	Nov 24, 2024	MathMixture-of-Experts	CodeCode Available	2
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus	Nov 19, 2024	Formal LogicLogical Reasoning	CodeCode Available	2
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models	Oct 28, 2024	DiversityMath	CodeCode Available	2
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch	Oct 24, 2024	MathMathematical Reasoning	CodeCode Available	2
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model	Oct 17, 2024	Math	CodeCode Available	2
JudgeBench: A Benchmark for Evaluating LLM-based Judges	Oct 16, 2024	Math	CodeCode Available	2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization	Oct 11, 2024	GSM8KLanguage Modeling	CodeCode Available	2
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models	Oct 10, 2024	Math	CodeCode Available	2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code	Oct 10, 2024	MathMathematical Reasoning	CodeCode Available	2
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models	Oct 10, 2024	GSM8KMath	CodeCode Available	2
Steering Large Language Models between Code Execution and Textual Reasoning	Oct 4, 2024	Code GenerationMath	CodeCode Available	2
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment	Oct 2, 2024	GSM8KMath	CodeCode Available	2
Archon: An Architecture Search Framework for Inference-Time Techniques	Sep 23, 2024	Hyperparameter OptimizationInstruction Following	CodeCode Available	2
Balancing LoRA Performance and Efficiency with Simple Shard Sharing	Sep 19, 2024	Computational EfficiencyGSM8K	CodeCode Available	2
Training Language Models to Self-Correct via Reinforcement Learning	Sep 19, 2024	HumanEvalMath	CodeCode Available	2
VAE Explainer: Supplement Learning Variational Autoencoders with Interactive Visualization	Sep 13, 2024	Math	CodeCode Available	2
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models	Sep 4, 2024	GSM8KMath	CodeCode Available	2
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models	Aug 1, 2024	Math	CodeCode Available	2
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process	Jul 29, 2024	GSM8KMath	CodeCode Available	2
Weak-to-Strong Reasoning	Jul 18, 2024	GSM8KMath	CodeCode Available	2
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?	Jul 1, 2024	MathMathematical Reasoning	CodeCode Available	2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data	Jun 26, 2024	BenchmarkingMath	CodeCode Available	2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models	Jun 25, 2024	DiversityMath	CodeCode Available	2
Adaptable Logical Control for Large Language Models	Jun 19, 2024	MathText Generation	CodeCode Available	2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	Jun 18, 2024	Arithmetic ReasoningMath	CodeCode Available	2
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models	Jun 13, 2024	MathQuantization	CodeCode Available	2
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning	Jun 7, 2024	Instruction FollowingMath	CodeCode Available	2
Yuan 2.0-M32: Mixture of Experts with Attention Router	May 28, 2024	ARCMath	CodeCode Available	2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters	May 27, 2024	BenchmarkingGSM8K	CodeCode Available	2
Autoformalizing Euclidean Geometry	May 27, 2024	Math	CodeCode Available	2
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models	May 24, 2024	Common Sense ReasoningLanguage Modelling	CodeCode Available	2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark	May 20, 2024	College MathematicsGSM8K	CodeCode Available	2
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning	May 5, 2024	GSM8KMath	CodeCode Available	2
Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards	Apr 16, 2024	GSM8KMath	CodeCode Available	2
Evaluating Mathematical Reasoning Beyond Accuracy	Apr 8, 2024	MathMathematical Reasoning	CodeCode Available	2
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems	Apr 6, 2024	Logical ReasoningMath	CodeCode Available	2
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline	Apr 3, 2024	MathMathematical Problem-Solving	CodeCode Available	2
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision	Mar 14, 2024	MathReinforcement Learning (RL)	CodeCode Available	2
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap	Feb 29, 2024	Math	CodeCode Available	2
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers	Feb 29, 2024	GSM8KMath	CodeCode Available	2
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset	Feb 22, 2024	DiversityMath	CodeCode Available	2
Reformatted Alignment	Feb 19, 2024	GSM8KHallucination	CodeCode Available	2
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic	Feb 19, 2024	Instruction FollowingMath	CodeCode Available	2

Show:10 25 50

← PrevPage 4 of 32Next →

No leaderboard results yet.