SOTAVerified

Math

Papers

Showing 601625 of 1596 papers

TitleStatusHype
When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems0
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math ReasoningCode0
LoRA Soups: Merging LoRAs for Practical Skill Composition TasksCode1
JudgeBench: A Benchmark for Evaluating LLM-based JudgesCode2
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs0
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning0
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps0
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical ReasoningCode1
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces0
HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsCode1
Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning0
OpenR: An Open Source Framework for Advanced Reasoning with Large Language ModelsCode5
Testing GPT-4-o1-preview on math and science problems: A follow-up study0
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function OptimizationCode2
SuperCorrect: Supervising and Correcting Language Models with Error-Driven InsightsCode4
The Geometry of Concepts: Sparse Autoencoder Feature StructureCode1
VibeCheck: Discover and Quantify Qualitative Differences in Large Language ModelsCode2
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language ModelsCode2
Cognitive Noise and Altruistic Preferences0
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language ModelsCode0
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
Herald: A Natural Language Annotated Lean 4 Dataset0
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders0
Show:102550
← PrevPage 25 of 64Next →

No leaderboard results yet.