SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–775 of 1596 papers

Title	Date	Tasks	Status	Hype
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish	Jul 17, 2024	MathMultiple-choice	CodeCode Available	1
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task	Jul 17, 2024	MathMinecraft	—Unverified	0
Reasoning with Large Language Models, a Survey	Jul 16, 2024	Few-Shot LearningIn-Context Learning	—Unverified	0
CCoE: A Compact LLM with Collaboration of Experts	Jul 16, 2024	Language ModellingLarge Language Model	—Unverified	0
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling	Jul 13, 2024	BenchmarkingMath	CodeCode Available	1
Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models	Jul 12, 2024	GSM8KMath	—Unverified	0
TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models	Jul 12, 2024	Code GenerationMath	—Unverified	0
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors	Jul 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On	Jul 11, 2024	GSM8KMath	—Unverified	0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist	Jul 11, 2024	GSM8KMath	—Unverified	0
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models	Jul 11, 2024	Language ModellingMath	CodeCode Available	1
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine	Jul 11, 2024	Contrastive LearningLanguage Modelling	CodeCode Available	4
ConvNLP: Image-based AI Text Detection	Jul 9, 2024	Domain GeneralizationMath	—Unverified	0
Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models	Jul 9, 2024	Math	CodeCode Available	0
Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?	Jul 6, 2024	Math	CodeCode Available	0
Smart Vision-Language Reasoners	Jul 5, 2024	MathMathematical Reasoning	CodeCode Available	0
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning	Jul 4, 2024	AvgGSM8K	CodeCode Available	1
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior	Jul 2, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Eliminating Position Bias of Language Models: A Mechanistic Approach	Jul 1, 2024	Mathobject-detection	CodeCode Available	1
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?	Jul 1, 2024	MathMathematical Reasoning	CodeCode Available	2
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning	Jun 30, 2024	GSM8KMath	CodeCode Available	1
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning	Jun 29, 2024	Binary ClassificationGSM8K	—Unverified	0
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models	Jun 28, 2024	DiversityMath	—Unverified	0
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting	Jun 28, 2024	Bilevel OptimizationInstruction Following	—Unverified	0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions	Jun 27, 2024	Distractor GenerationMath	CodeCode Available	0

Show:10 25 50

← PrevPage 31 of 64Next →

No leaderboard results yet.