SOTAVerified

Math

Papers

Showing 901950 of 1596 papers

TitleStatusHype
Can Stories Help LLMs Reason? Curating Information Space Through Narrative0
ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning0
From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems0
Mixture of Parrots: Experts improve memorization more than reasoning0
MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning0
Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality0
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation0
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration0
Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation0
No more hard prompts: SoftSRV prompting for synthetic data generation0
PromptHive: Bringing Subject Matter Experts Back to the Forefront with Collaborative Prompt Engineering for Educational Content Creation0
On Designing Effective RL Reward at Training Time for LLM Reasoning0
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology0
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems0
Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning0
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens0
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented GenerationCode0
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math ReasoningCode0
When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems0
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling0
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs0
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning0
Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning0
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces0
Testing GPT-4-o1-preview on math and science problems: A follow-up study0
Cognitive Noise and Altruistic Preferences0
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language ModelsCode0
Herald: A Natural Language Annotated Lean 4 Dataset0
Subtle Errors Matter: Preference Learning via Error-injected Self-editing0
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders0
Give me a hint: Can LLMs take a hint to solve math problems?Code0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning0
Solving Functional Optimization with Deep Networks and Variational Principles0
Intriguing Properties of Large Language and Vision Models0
Rule-based Data Selection for Large Language Models0
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths0
fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models0
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification0
BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts0
Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model0
Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models0
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning0
Towards the Pedagogical Steering of Large Language Models for Tutoring: A Case Study with Modeling Productive FailureCode0
Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge InjectionCode0
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation0
An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task SettingsCode0
Evaluating Robustness of Reward Models for Mathematical Reasoning0
Show:102550
← PrevPage 19 of 32Next →

No leaderboard results yet.