Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 851–900 of 1596 papers

Title	Date	Tasks	Status	Hype
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context	May 8, 2024	MathMathematical Reasoning	CodeCode Available	1
MAmmoTH2: Scaling Instructions from the Web	May 6, 2024	ChatbotGSM8K	—Unverified	0
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning	May 5, 2024	GSM8KMath	CodeCode Available	2
Assessing and Verifying Task Utility in LLM-Powered Applications	May 3, 2024	Math	—Unverified	0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration	May 1, 2024	Language ModelingLanguage Modelling	—Unverified	0
GOLD: Geometry Problem Solver with Natural Language Description	May 1, 2024	Math	CodeCode Available	1
A Careful Examination of Large Language Model Performance on Grade School Arithmetic	May 1, 2024	GSM8KLanguage Modeling	—Unverified	0
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning	May 1, 2024	ARCGSM8K	CodeCode Available	3
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models	May 1, 2024	Math	—Unverified	0
Iterative Reasoning Preference Optimization	Apr 30, 2024	ARCGSM8K	—Unverified	0
PECC: Problem Extraction and Coding Challenges	Apr 29, 2024	Code GenerationMath	CodeCode Available	1
Small Language Models Need Strong Verifiers to Self-Correct Reasoning	Apr 26, 2024	Math	CodeCode Available	0
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding	Apr 25, 2024	GSM8KHellaSwag	CodeCode Available	3
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation	Apr 25, 2024	Code GenerationMath	CodeCode Available	1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems	Apr 23, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	1
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training	Apr 22, 2024	MathMathematical Reasoning	—Unverified	0
MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit	Apr 22, 2024	Math	CodeCode Available	5
PARAMANU-GANITA: Language Model with Mathematical Capabilities	Apr 22, 2024	Domain AdaptationGSM8K	—Unverified	0
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone	Apr 22, 2024	Language ModelingLanguage Modelling	—Unverified	0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank	Apr 19, 2024	Distractor GenerationMath	—Unverified	0
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing	Apr 18, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	1
On the Empirical Complexity of Reasoning and Planning in LLMs	Apr 17, 2024	Math	—Unverified	0
Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards	Apr 16, 2024	GSM8KMath	CodeCode Available	2
Mental Stress Detection: Development and Evaluation of a Wearable In-Ear Plethysmography	Apr 12, 2024	MathMental Stress Detection	—Unverified	0
Rho-1: Not All Tokens Are What You Need	Apr 11, 2024	AllContinual Pretraining	CodeCode Available	3
Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems	Apr 10, 2024	Math	—Unverified	0
MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education	Apr 10, 2024	Math	—Unverified	0
Evaluating Mathematical Reasoning Beyond Accuracy	Apr 8, 2024	MathMathematical Reasoning	CodeCode Available	2
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification	Apr 7, 2024	Image ComprehensionMath	CodeCode Available	0
FRACTAL: Fine-Grained Scoring from Aggregate Text Labels	Apr 7, 2024	MathMultiple Instance Learning	—Unverified	0
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems	Apr 6, 2024	Logical ReasoningMath	CodeCode Available	2
Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving	Apr 5, 2024	Data AugmentationIn-Context Learning	—Unverified	0
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models	Apr 3, 2024	GPUMath	CodeCode Available	3
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline	Apr 3, 2024	MathMathematical Problem-Solving	CodeCode Available	2
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1
LM^2: A Simple Society of Language Models Solves Complex Reasoning	Apr 2, 2024	MathMedQA	CodeCode Available	0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models	Apr 2, 2024	Distractor GenerationIn-Context Learning	CodeCode Available	0
HyperCLOVA X Technical Report	Apr 2, 2024	Instruction FollowingMachine Translation	—Unverified	0
Exploring the Mystery of Influential Data for Mathematical Reasoning	Apr 1, 2024	MathMathematical Reasoning	—Unverified	0
Stable Code Technical Report	Apr 1, 2024	Code CompletionLanguage Modelling	—Unverified	0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations	Apr 1, 2024	BenchmarkingMath	—Unverified	0
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models	Apr 1, 2024	In-Context LearningMath	CodeCode Available	0
What is in Your Safe Data? Identifying Benign Data that Breaks Safety	Apr 1, 2024	Math	CodeCode Available	1
Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange	Mar 30, 2024	MathMathematical Problem-Solving	CodeCode Available	0
ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain	Mar 28, 2024	Math	—Unverified	0
Large Language Models Are Struggle to Cope with Unreasonability in Math Problems	Mar 28, 2024	Math	—Unverified	0
Scaling up ridge regression for brain encoding in a massive individual fMRI dataset	Mar 28, 2024	CPUMath	CodeCode Available	0
Few-Shot Recalibration of Language Models	Mar 27, 2024	MathMMLU	—Unverified	0
The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian	Mar 27, 2024	Language ModellingMath	—Unverified	0
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization	Mar 26, 2024	Automated Theorem ProvingGSM8K	CodeCode Available	1

Show:10 25 50

← PrevPage 18 of 32Next →

No leaderboard results yet.