SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 951–975 of 1596 papers

Title	Date	Tasks	Status	Hype
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic	Feb 19, 2024	Instruction FollowingMath	CodeCode Available	2
Reformatted Alignment	Feb 19, 2024	GSM8KHallucination	CodeCode Available	2
LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks	Feb 18, 2024	Math	—Unverified	0
Orca-Math: Unlocking the potential of SLMs in Grade School Math	Feb 16, 2024	Arithmetic ReasoningGSM8K	—Unverified	0
Language Models as Science Tutors	Feb 16, 2024	GSM8KMath	CodeCode Available	1
Language Models with Conformal Factuality Guarantees	Feb 15, 2024	Conformal PredictionLanguage Modeling	—Unverified	0
Mathematical Opportunities in Digital Twins (MATH-DT)	Feb 15, 2024	Math	—Unverified	0
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset	Feb 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	4
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving	Feb 15, 2024	Geometry Problem SolvingMath	CodeCode Available	1
AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails	Feb 14, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications	Feb 14, 2024	Math	—Unverified	0
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data	Feb 14, 2024	Automated Theorem ProvingLanguage Modelling	CodeCode Available	1
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements	Feb 13, 2024	GSM8KMath	—Unverified	0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models	Feb 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts	Feb 12, 2024	Continual PretrainingGSM8K	CodeCode Available	2
Understanding the Progression of Educational Topics via Semantic Matching	Feb 10, 2024	Math	—Unverified	0
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning	Feb 9, 2024	Data AugmentationGSM8K	CodeCode Available	4
V-STaR: Training Verifiers for Self-Taught Reasoners	Feb 9, 2024	Code GenerationMath	—Unverified	0
Noise Contrastive Alignment of Language Models with Explicit Rewards	Feb 8, 2024	Language ModellingMath	CodeCode Available	3
In-Context Principle Learning from Mistakes	Feb 8, 2024	GSM8KIn-Context Learning	CodeCode Available	0
Self-Discover: Large Language Models Self-Compose Reasoning Structures	Feb 6, 2024	Math	CodeCode Available	3
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models	Feb 6, 2024	GSM8KMath	—Unverified	0
Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation	Feb 5, 2024	Knowledge GraphsMath	CodeCode Available	1
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models	Feb 5, 2024	Arithmetic ReasoningMath	CodeCode Available	9

Show:10 25 50

← PrevPage 39 of 64Next →

No leaderboard results yet.