Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 1596 papers

Title	Date	Tasks	Status	Hype
MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents	Jul 24, 2024	Math	CodeCode Available	1
Nerva: a Truly Sparse Implementation of Neural Networks	Jul 24, 2024	Math	CodeCode Available	1
Toward Adaptive Reasoning in Large Language Models with Thought Rollback	Jul 21, 2024	Arithmetic ReasoningMath	CodeCode Available	1
Learning Goal-Conditioned Representations for Language Reward Models	Jul 18, 2024	GSM8KMath	CodeCode Available	1
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish	Jul 17, 2024	MathMultiple-choice	CodeCode Available	1
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling	Jul 13, 2024	BenchmarkingMath	CodeCode Available	1
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models	Jul 11, 2024	Language ModellingMath	CodeCode Available	1
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning	Jul 4, 2024	AvgGSM8K	CodeCode Available	1
Eliminating Position Bias of Language Models: A Mechanistic Approach	Jul 1, 2024	Mathobject-detection	CodeCode Available	1
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning	Jun 30, 2024	GSM8KMath	CodeCode Available	1
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs	Jun 24, 2024	Instruction FollowingMath	CodeCode Available	1
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold	Jun 20, 2024	MathReinforcement Learning (RL)	CodeCode Available	1
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback	Jun 20, 2024	Binary ClassificationGSM8K	CodeCode Available	1
CityGPT: Empowering Urban Spatial Cognition of Large Language Models	Jun 20, 2024	Code GenerationMath	CodeCode Available	1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles	Jun 18, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling	Jun 17, 2024	GSM8KMath	CodeCode Available	1
Collective Constitutional AI: Aligning a Language Model with Public Input	Jun 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning	Jun 6, 2024	Math	CodeCode Available	1
TAIA: Large Language Models are Out-of-Distribution Data Learners	May 30, 2024	Math	CodeCode Available	1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions	May 29, 2024	BenchmarkingDialogue Understanding	CodeCode Available	1
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models	May 23, 2024	Knowledge DistillationMath	CodeCode Available	1
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1
TANQ: An open domain dataset of table answered questions	May 13, 2024	MathOpen-Domain Question Answering	CodeCode Available	1
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context	May 8, 2024	MathMathematical Reasoning	CodeCode Available	1
GOLD: Geometry Problem Solver with Natural Language Description	May 1, 2024	Math	CodeCode Available	1
PECC: Problem Extraction and Coding Challenges	Apr 29, 2024	Code GenerationMath	CodeCode Available	1
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation	Apr 25, 2024	Code GenerationMath	CodeCode Available	1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems	Apr 23, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	1
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing	Apr 18, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1
What is in Your Safe Data? Identifying Benign Data that Breaks Safety	Apr 1, 2024	Math	CodeCode Available	1
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization	Mar 26, 2024	Automated Theorem ProvingGSM8K	CodeCode Available	1
Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices	Mar 19, 2024	Math	CodeCode Available	1
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?	Mar 14, 2024	Hallucinationimage-classification	CodeCode Available	1
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models	Mar 4, 2024	Data AugmentationGSM8K	CodeCode Available	1
Brilla AI: AI Contestant for the National Science and Maths Quiz	Mar 4, 2024	MathQuestion Answering	CodeCode Available	1
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning	Mar 2, 2024	MathMisconceptions	CodeCode Available	1
Case-Based or Rule-Based: How Do Transformers Do the Math?	Feb 27, 2024	MathSystematic Generalization	CodeCode Available	1
Stepwise Self-Consistent Mathematical Reasoning with Large Language Models	Feb 24, 2024	MathMathematical Reasoning	CodeCode Available	1
MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations	Feb 24, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models	Feb 22, 2024	MathMathematical Reasoning	CodeCode Available	1
Language Models as Science Tutors	Feb 16, 2024	GSM8KMath	CodeCode Available	1
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving	Feb 15, 2024	Geometry Problem SolvingMath	CodeCode Available	1
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data	Feb 14, 2024	Automated Theorem ProvingLanguage Modelling	CodeCode Available	1
Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation	Feb 5, 2024	Knowledge GraphsMath	CodeCode Available	1
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models	Feb 2, 2024	Language ModellingLarge Language Model	CodeCode Available	1
ReGAL: Refactoring Programs to Discover Generalizable Abstractions	Jan 29, 2024	Date UnderstandingMath	CodeCode Available	1
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks	Jan 23, 2024	MathQuestion Answering	CodeCode Available	1
Over-Reasoning and Redundant Calculation of Large Language Models	Jan 21, 2024	GSM8KMath	CodeCode Available	1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning	Jan 19, 2024	GSM8KMath	CodeCode Available	1

Show:10 25 50

← PrevPage 8 of 32Next →

No leaderboard results yet.