SOTAVerified

Math

Papers

Showing 551600 of 1596 papers

TitleStatusHype
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic CorpusCode2
Unlocking State-Tracking in Linear RNNs Through Negative EigenvaluesCode1
MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMsCode0
RESOLVE: Relational Reasoning with Symbolic and Object-Level Features Using Vector Symbolic ProcessingCode0
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?Code1
Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring ConversationsCode1
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding ThoughtsCode1
OpenAI-o1 AB Testing: Does the o1 model really do good reasoning in math problem solving?0
VISTA: Visual Integrated System for Tailored Automation in Math Problem Generation Using LLM0
Aioli: A Unified Optimization Framework for Language Model Data MixingCode1
Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams0
Meta-Reasoning Improves Tool Use in Large Language ModelsCode0
Self-Consistency Preference Optimization0
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology0
Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question ClassificationCode0
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language ModelsCode1
Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models0
STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing0
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models0
Improving Math Problem Solving in Large Language Models Through Categorization and Strategy Tailoring0
Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses0
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic ConsistencyCode1
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of HeuristicsCode1
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation0
Flaming-hot Initiation with Regular Execution Sampling for Large Language ModelsCode2
Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?Code0
Library Learning Doesn't: The Curious Case of the Single-Use "Library"Code0
Can Stories Help LLMs Reason? Curating Information Space Through Narrative0
Mixture of Parrots: Experts improve memorization more than reasoning0
ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning0
Scaling up Masked Diffusion Models on TextCode3
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from ScratchCode2
From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems0
MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning0
Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation0
Non-myopic Generation of Language Models for Reasoning and PlanningCode1
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward PassesCode1
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration0
Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality0
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation0
PromptHive: Bringing Subject Matter Experts Back to the Forefront with Collaborative Prompt Engineering for Educational Content Creation0
No more hard prompts: SoftSRV prompting for synthetic data generation0
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology0
On Designing Effective RL Reward at Training Time for LLM Reasoning0
Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning0
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens0
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems0
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented GenerationCode0
A Comparative Study on Reasoning Patterns of OpenAI's o1 ModelCode2
Show:102550
← PrevPage 12 of 32Next →

No leaderboard results yet.