SOTAVerified

Math

Papers

Showing 301350 of 1596 papers

TitleStatusHype
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQLCode1
Dyve: Thinking Fast and Slow for Dynamic Process VerificationCode1
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical MappingCode1
Do Large Language Model Benchmarks Test Reliability?Code1
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo MethodsCode1
Efficient Neural Theorem Proving via Fine-grained Proof Structure AnalysisCode1
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant EvaluationCode1
Pairwise RM: Perform Best-of-N Sampling with Knockout TournamentCode1
Control LLM: Controlled Evolution for Intelligence Retention in LLMCode1
ZNO-Eval: Benchmarking reasoning capabilities of large language models in UkrainianCode1
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMsCode1
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoningCode1
CoT-based Synthesizer: Enhancing LLM Performance through Answer SynthesisCode1
Toward Adaptive Reasoning in Large Language Models with Thought RollbackCode1
CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language ModelsCode1
Entropy-Regularized Process Reward ModelCode1
HARP: A challenging human-annotated math reasoning benchmarkCode1
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMsCode1
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning CapabilityCode1
Training and Evaluating Language Models with Template-based Data GenerationCode1
Unlocking State-Tracking in Linear RNNs Through Negative EigenvaluesCode1
Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring ConversationsCode1
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?Code1
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding ThoughtsCode1
Aioli: A Unified Optimization Framework for Language Model Data MixingCode1
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language ModelsCode1
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic ConsistencyCode1
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of HeuristicsCode1
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward PassesCode1
Non-myopic Generation of Language Models for Reasoning and PlanningCode1
LoRA Soups: Merging LoRAs for Practical Skill Composition TasksCode1
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical ReasoningCode1
HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsCode1
The Geometry of Concepts: Sparse Autoencoder Feature StructureCode1
DataEnvGym: Data Generation Agents in Teacher Environments with Student FeedbackCode1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed BanditsCode1
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree SearchCode1
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoningCode1
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for ReasoningCode1
Diversify and Conquer: Diversity-Centric Data Selection with Iterative RefinementCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
Sirius: Contextual Sparsity with Correction for Efficient LLMsCode1
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language ModelsCode1
What makes math problems hard for reinforcement learning: a case studyCode1
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language ModelsCode1
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical ReasoningCode1
Bridging and Modeling Correlations in Pairwise Data for Direct Preference OptimizationCode1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty AgentsCode1
Boosting Large Language Models with Socratic Method for Conversational Mathematics TeachingCode1
Show:102550
← PrevPage 7 of 32Next →

No leaderboard results yet.