SOTAVerified

GSM8K

Papers

Showing 201250 of 439 papers

TitleStatusHype
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
Cost-Saving LLM Cascades with Early Abstention0
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks0
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic0
D^2LoRA: Data-Driven LoRA Initialization for Low Resource Tasks0
Dialectical Behavior Therapy Approach to LLM Prompting0
Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models0
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?0
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models0
DNA 1.0 Technical Report0
Does your data spark joy? Performance gains from domain upsampling at the end of training0
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs0
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models0
Dual Decomposition of Weights and Singular Value Low Rank Adaptation0
Dynamic Parallel Tree Search for Efficient LLM Reasoning0
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models0
Efficient Data Selection at Scale via Influence Distillation0
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation0
Evaluation of LLMs for mathematical problem solving0
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
Excessive Reasoning Attack on Reasoning LLMs0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation0
Exploring an LM to generate Prolog Predicates from Mathematics Questions0
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree0
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
Fine-Grained Self-Endorsement Improves Factuality and Reasoning0
First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning0
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute0
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education0
From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting0
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference0
GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems0
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization0
Improve Mathematical Reasoning in Language Models by Automated Process Supervision0
Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach0
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification0
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion0
Instance-adaptive Zero-shot Chain-of-Thought Prompting0
Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models0
Interpretable Math Word Problem Solution Generation Via Step-by-step Planning0
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist0
Show:102550
← PrevPage 5 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified