SOTAVerified

Math

Papers

Showing 851900 of 1596 papers

TitleStatusHype
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual ContextCode1
MAmmoTH2: Scaling Instructions from the Web0
Exploring the Compositional Deficiency of Large Language Models in Mathematical ReasoningCode2
Assessing and Verifying Task Utility in LLM-Powered Applications0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
GOLD: Geometry Problem Solver with Natural Language DescriptionCode1
A Careful Examination of Large Language Model Performance on Grade School Arithmetic0
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference LearningCode3
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models0
Iterative Reasoning Preference Optimization0
PECC: Problem Extraction and Coding ChallengesCode1
Small Language Models Need Strong Verifiers to Self-Correct ReasoningCode0
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code GenerationCode1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word ProblemsCode1
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training0
MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkitCode5
PARAMANU-GANITA: Language Model with Mathematical Capabilities0
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Toward Self-Improvement of LLMs via Imagination, Searching, and CriticizingCode1
On the Empirical Complexity of Reasoning and Planning in LLMs0
Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained RewardsCode2
Mental Stress Detection: Development and Evaluation of a Wearable In-Ear Plethysmography0
Rho-1: Not All Tokens Are What You NeedCode3
Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems0
MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education0
Evaluating Mathematical Reasoning Beyond AccuracyCode2
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained ClassificationCode0
FRACTAL: Fine-Grained Scoring from Aggregate Text Labels0
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical ProblemsCode2
Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving0
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsCode3
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique PipelineCode2
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
LM^2: A Simple Society of Language Models Solves Complex ReasoningCode0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
HyperCLOVA X Technical Report0
Exploring the Mystery of Influential Data for Mathematical Reasoning0
Stable Code Technical Report0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations0
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language ModelsCode0
What is in Your Safe Data? Identifying Benign Data that Breaks SafetyCode1
Can LLMs Master Math? Investigating Large Language Models on Math Stack ExchangeCode0
ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain0
Large Language Models Are Struggle to Cope with Unreasonability in Math Problems0
Scaling up ridge regression for brain encoding in a massive individual fMRI datasetCode0
Few-Shot Recalibration of Language Models0
The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian0
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with AutoformalizationCode1
Show:102550
← PrevPage 18 of 32Next →

No leaderboard results yet.