SOTAVerified

Math

Papers

Showing 751775 of 1596 papers

TitleStatusHype
TurkishMMLU: Measuring Massive Multitask Language Understanding in TurkishCode1
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task0
Reasoning with Large Language Models, a Survey0
CCoE: A Compact LLM with Collaboration of Experts0
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization ModelingCode1
Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models0
TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models0
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model TutorsCode0
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist0
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language ModelsCode1
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
ConvNLP: Image-based AI Text Detection0
Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language ModelsCode0
Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?Code0
Smart Vision-Language ReasonersCode0
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical ReasoningCode1
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behaviorCode0
Eliminating Position Bias of Language Models: A Mechanistic ApproachCode1
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?Code2
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical ReasoningCode1
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning0
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models0
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
Show:102550
← PrevPage 31 of 64Next →

No leaderboard results yet.