SOTAVerified

Math

Papers

Showing 601650 of 1596 papers

TitleStatusHype
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math ReasoningCode0
LoRA Soups: Merging LoRAs for Practical Skill Composition TasksCode1
When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems0
JudgeBench: A Benchmark for Evaluating LLM-based JudgesCode2
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs0
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning0
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical ReasoningCode1
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps0
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces0
HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsCode1
Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning0
OpenR: An Open Source Framework for Advanced Reasoning with Large Language ModelsCode5
Testing GPT-4-o1-preview on math and science problems: A follow-up study0
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function OptimizationCode2
SuperCorrect: Supervising and Correcting Language Models with Error-Driven InsightsCode4
The Geometry of Concepts: Sparse Autoencoder Feature StructureCode1
VibeCheck: Discover and Quantify Qualitative Differences in Large Language ModelsCode2
Cognitive Noise and Altruistic Preferences0
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language ModelsCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language ModelsCode0
Herald: A Natural Language Annotated Lean 4 Dataset0
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders0
Subtle Errors Matter: Preference Learning via Error-injected Self-editing0
O1 Replication Journey: A Strategic Progress Report -- Part 1Code7
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning0
Solving Functional Optimization with Deep Networks and Variational Principles0
DataEnvGym: Data Generation Agents in Teacher Environments with Student FeedbackCode1
Give me a hint: Can LLMs take a hint to solve math problems?Code0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths0
fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models0
Rule-based Data Selection for Large Language Models0
Intriguing Properties of Large Language and Vision Models0
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification0
BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts0
Steering Large Language Models between Code Execution and Textual ReasoningCode2
Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model0
Towards the Pedagogical Steering of Large Language Models for Tutoring: A Case Study with Modeling Productive FailureCode0
Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models0
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning0
Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge InjectionCode0
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation0
Deep Knowledge Tracing for Personalized Adaptive Learning at Historically Black Colleges and Universities0
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo0
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation0
An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task SettingsCode0
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks0
Show:102550
← PrevPage 13 of 32Next →

No leaderboard results yet.