SOTAVerified

GSM8K

Papers

Showing 151200 of 439 papers

TitleStatusHype
Self-Polish: Enhance Reasoning in Large Language Models via Problem RefinementCode1
Solving Math Word Problems by Combining Language Models With Symbolic SolversCode1
Boosted Prompt Ensembles for Large Language ModelsCode1
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context LearningCode1
Learning Math Reasoning from Self-Sampled Correct and Partially-Correct SolutionsCode1
Self-Consistency Improves Chain of Thought Reasoning in Language ModelsCode1
GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems0
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt CompressionCode0
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?0
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
Activation Steering for Chain-of-Thought CompressionCode0
Scaling Speculative Decoding with Lookahead ReasoningCode0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute0
Excessive Reasoning Attack on Reasoning LLMs0
Re-Initialization Token Learning for Tool-Augmented Large Language ModelsCode0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing0
LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment0
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty0
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models0
Learning a Continue-Thinking Token for Enhanced Test-Time ScalingCode0
Slimming Down LLMs Without Losing Their Minds0
Unsupervised Elicitation of Language ModelsCode0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
Text-to-LoRA: Instant Transformer AdaptionCode0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers0
Evaluation of LLMs for mathematical problem solving0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections0
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation0
Discriminative Policy Optimization for Token-Level Reward ModelsCode0
Maximizing Confidence Alone Improves Reasoning0
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models0
The Price of Format: Diversity Collapse in LLMsCode0
Efficient Data Selection at Scale via Influence Distillation0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts0
Steering LLM Reasoning Through Bias-Only Adaptation0
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware BudgetingCode0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models0
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision0
Dual Decomposition of Weights and Singular Value Low Rank Adaptation0
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst0
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models0
Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
Show:102550
← PrevPage 4 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified