SOTAVerified

GSM8K

Papers

Showing 301325 of 439 papers

TitleStatusHype
Pheromone-based Learning of Optimal Reasoning Paths0
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models0
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning0
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches0
PORT: Preference Optimization on Reasoning Traces0
Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency0
Predicting Emergent Capabilities by Finetuning0
Premise Order Matters in Reasoning with Large Language Models0
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models0
Prompt Baking0
Prompt Engineering a Prompt Engineer0
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression0
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control0
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning0
Quasi-random Multi-Sample Inference for Large Language Models0
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks0
Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration0
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement0
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought0
ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths0
Reasoning Robustness of LLMs to Adversarial Typographical Errors0
Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models0
Self-Consistency Preference Optimization0
Show:102550
← PrevPage 13 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified