Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 1596 papers

Title	Date	Tasks	Status	Hype
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish	Jul 17, 2024	MathMultiple-choice	CodeCode Available	1
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task	Jul 17, 2024	MathMinecraft	—Unverified	0
CCoE: A Compact LLM with Collaboration of Experts	Jul 16, 2024	Language ModellingLarge Language Model	—Unverified	0
Reasoning with Large Language Models, a Survey	Jul 16, 2024	Few-Shot LearningIn-Context Learning	—Unverified	0
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling	Jul 13, 2024	BenchmarkingMath	CodeCode Available	1
Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models	Jul 12, 2024	GSM8KMath	—Unverified	0
TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models	Jul 12, 2024	Code GenerationMath	—Unverified	0
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors	Jul 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models	Jul 11, 2024	Language ModellingMath	CodeCode Available	1
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist	Jul 11, 2024	GSM8KMath	—Unverified	0
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On	Jul 11, 2024	GSM8KMath	—Unverified	0
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine	Jul 11, 2024	Contrastive LearningLanguage Modelling	CodeCode Available	4
ConvNLP: Image-based AI Text Detection	Jul 9, 2024	Domain GeneralizationMath	—Unverified	0
Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models	Jul 9, 2024	Math	CodeCode Available	0
Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?	Jul 6, 2024	Math	CodeCode Available	0
Smart Vision-Language Reasoners	Jul 5, 2024	MathMathematical Reasoning	CodeCode Available	0
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning	Jul 4, 2024	AvgGSM8K	CodeCode Available	1
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior	Jul 2, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?	Jul 1, 2024	MathMathematical Reasoning	CodeCode Available	2
Eliminating Position Bias of Language Models: A Mechanistic Approach	Jul 1, 2024	Mathobject-detection	CodeCode Available	1
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning	Jun 30, 2024	GSM8KMath	CodeCode Available	1
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning	Jun 29, 2024	Binary ClassificationGSM8K	—Unverified	0
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models	Jun 28, 2024	DiversityMath	—Unverified	0
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting	Jun 28, 2024	Bilevel OptimizationInstruction Following	—Unverified	0
LiveBench: A Challenging, Contamination-Limited LLM Benchmark	Jun 27, 2024	ArticlesInstruction Following	CodeCode Available	5
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions	Jun 27, 2024	Distractor GenerationMath	CodeCode Available	0
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs	Jun 26, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	3
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data	Jun 26, 2024	BenchmarkingMath	CodeCode Available	2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models	Jun 25, 2024	DiversityMath	CodeCode Available	2
Task Oriented In-Domain Data Augmentation	Jun 24, 2024	Data AugmentationMath	—Unverified	0
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs	Jun 24, 2024	Instruction FollowingMath	CodeCode Available	1
Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions	Jun 20, 2024	Active LearningMath	—Unverified	0
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold	Jun 20, 2024	MathReinforcement Learning (RL)	CodeCode Available	1
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback	Jun 20, 2024	Binary ClassificationGSM8K	CodeCode Available	1
Towards Infinite-Long Prefix in Transformer	Jun 20, 2024	Mathparameter-efficient fine-tuning	CodeCode Available	0
CityGPT: Empowering Urban Spatial Cognition of Large Language Models	Jun 20, 2024	Code GenerationMath	CodeCode Available	1
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning	Jun 20, 2024	GSM8KHeuristic Search	—Unverified	0
Adaptable Logical Control for Large Language Models	Jun 19, 2024	MathText Generation	CodeCode Available	2
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever	Jun 19, 2024	MathSemantic Similarity	—Unverified	0
Can LLMs Reason in the Wild with Programs?	Jun 19, 2024	GSM8KMath	CodeCode Available	0
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	Jun 18, 2024	Arithmetic ReasoningMath	CodeCode Available	2
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	Jun 18, 2024	AllGSM8K	CodeCode Available	14
Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems	Jun 18, 2024	In-Context LearningMath	—Unverified	0
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles	Jun 18, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts	Jun 17, 2024	Math	—Unverified	0
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling	Jun 17, 2024	GSM8KMath	CodeCode Available	1
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	Jun 17, 2024	16kLanguage Modeling	CodeCode Available	9
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation	Jun 17, 2024	Image GenerationMath	CodeCode Available	0
Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment	Jun 17, 2024	Logical ReasoningMath	—Unverified	0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified	0

Show:10 25 50

← PrevPage 16 of 32Next →

No leaderboard results yet.