SOTAVerified

Math

Papers

Showing 451475 of 1596 papers

TitleStatusHype
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images0
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant EvaluationCode1
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages0
Kimi k1.5: Scaling Reinforcement Learning with LLMsCode7
Pairwise RM: Perform Best-of-N Sampling with Knockout TournamentCode1
An Optimal Transport approach to arbitrage correction: Application to volatility Stress-Tests0
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs0
RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?0
Advancing Language Model Reasoning through Reinforcement Learning and Inference ScalingCode2
Control LLM: Controlled Evolution for Intelligence Retention in LLMCode1
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective0
Language Representation Favored Zero-Shot Cross-Domain Cognitive DiagnosisCode0
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback0
Iterative Label Refinement Matters More than Preference Optimization under Weak SupervisionCode0
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem SolvingCode0
Can Vision-Language Models Evaluate Handwritten Math?Code0
ZNO-Eval: Benchmarking reasoning capabilities of large language models in UkrainianCode1
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMsCode1
Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models0
A General Retrieval-Augmented Generation Framework for Multimodal Case-Based Reasoning Applications0
Stream Aligner: Efficient Sentence-Level Alignment via Distribution InductionCode0
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep ThinkingCode7
End-to-End Bangla AI for Solving Math Olympiad Problem Benchmark: Leveraging Large Language Model Using Integrated Approach0
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal MathematicsCode2
A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem SolvingCode0
Show:102550
← PrevPage 19 of 64Next →

No leaderboard results yet.