Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 951–1000 of 1596 papers

Title	Date	Tasks	Status	Hype
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic	Feb 19, 2024	Instruction FollowingMath	CodeCode Available	2
Reformatted Alignment	Feb 19, 2024	GSM8KHallucination	CodeCode Available	2
LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks	Feb 18, 2024	Math	—Unverified	0
Orca-Math: Unlocking the potential of SLMs in Grade School Math	Feb 16, 2024	Arithmetic ReasoningGSM8K	—Unverified	0
Language Models as Science Tutors	Feb 16, 2024	GSM8KMath	CodeCode Available	1
Language Models with Conformal Factuality Guarantees	Feb 15, 2024	Conformal PredictionLanguage Modeling	—Unverified	0
Mathematical Opportunities in Digital Twins (MATH-DT)	Feb 15, 2024	Math	—Unverified	0
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset	Feb 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	4
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving	Feb 15, 2024	Geometry Problem SolvingMath	CodeCode Available	1
AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails	Feb 14, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications	Feb 14, 2024	Math	—Unverified	0
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data	Feb 14, 2024	Automated Theorem ProvingLanguage Modelling	CodeCode Available	1
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements	Feb 13, 2024	GSM8KMath	—Unverified	0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models	Feb 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts	Feb 12, 2024	Continual PretrainingGSM8K	CodeCode Available	2
Understanding the Progression of Educational Topics via Semantic Matching	Feb 10, 2024	Math	—Unverified	0
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning	Feb 9, 2024	Data AugmentationGSM8K	CodeCode Available	4
V-STaR: Training Verifiers for Self-Taught Reasoners	Feb 9, 2024	Code GenerationMath	—Unverified	0
Noise Contrastive Alignment of Language Models with Explicit Rewards	Feb 8, 2024	Language ModellingMath	CodeCode Available	3
In-Context Principle Learning from Mistakes	Feb 8, 2024	GSM8KIn-Context Learning	CodeCode Available	0
Self-Discover: Large Language Models Self-Compose Reasoning Structures	Feb 6, 2024	Math	CodeCode Available	3
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models	Feb 6, 2024	GSM8KMath	—Unverified	0
Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation	Feb 5, 2024	Knowledge GraphsMath	CodeCode Available	1
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models	Feb 5, 2024	Arithmetic ReasoningMath	CodeCode Available	9
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision	Feb 5, 2024	GSM8KMath	—Unverified	0
Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation	Feb 4, 2024	HallucinationMath	—Unverified	0
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models	Feb 2, 2024	Language ModellingLarge Language Model	CodeCode Available	1
Salsa Fresca: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors	Feb 2, 2024	Math	—Unverified	0
Large Language Models for Mathematical Reasoning: Progresses and Challenges	Jan 31, 2024	DiversityMath	—Unverified	0
Efficient Tool Use with Chain-of-Abstraction Reasoning	Jan 30, 2024	MathMathematical Reasoning	—Unverified	0
Taxonomy of Mathematical Plagiarism	Jan 30, 2024	MathQuestion Answering	CodeCode Available	0
ReGAL: Refactoring Programs to Discover Generalizable Abstractions	Jan 29, 2024	Date UnderstandingMath	CodeCode Available	1
GAPS: Geometry-Aware Problem Solver	Jan 29, 2024	Geometry Problem SolvingMath	—Unverified	0
YODA: Teacher-Student Progressive Learning for Language Models	Jan 28, 2024	GSM8KMath	—Unverified	0
Exploring Educational Equity: A Machine Learning Approach to Unravel Achievement Disparities in Georgia	Jan 25, 2024	Math	—Unverified	0
Can AI Assistants Know What They Don't Know?	Jan 24, 2024	MathOpen-Domain Question Answering	CodeCode Available	2
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks	Jan 23, 2024	MathQuestion Answering	CodeCode Available	1
Using Java Geometry Expert as Guide in the Preparations for Math Contests	Jan 22, 2024	Math	—Unverified	0
SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese	Jan 22, 2024	DiversityGSM8K	CodeCode Available	2
Over-Reasoning and Redundant Calculation of Large Language Models	Jan 21, 2024	GSM8KMath	CodeCode Available	1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning	Jan 19, 2024	GSM8KMath	CodeCode Available	1
Augmenting Math Word Problems via Iterative Question Composing	Jan 17, 2024	MathMathematical Reasoning	CodeCode Available	1
Large Language Models Are Neurosymbolic Reasoners	Jan 17, 2024	Common Sense ReasoningMath	CodeCode Available	1
ReFT: Reasoning with Reinforced Fine-Tuning	Jan 17, 2024	GSM8KMath	CodeCode Available	4
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions	Jan 17, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
Tuning Language Models by Proxy	Jan 16, 2024	Domain AdaptationMath	CodeCode Available	2
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination	Jan 16, 2024	GSM8KLanguage Modeling	—Unverified	0
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline	Jan 16, 2024	GSM8KMath	CodeCode Available	3
SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models	Jan 15, 2024	MathMathematical Reasoning	CodeCode Available	2

Show:10 25 50

← PrevPage 20 of 32Next →

No leaderboard results yet.