SOTAVerified

GSM8K

Papers

Showing 276300 of 439 papers

TitleStatusHype
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning0
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches0
PORT: Preference Optimization on Reasoning Traces0
Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency0
Predicting Emergent Capabilities by Finetuning0
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning0
Premise Order Matters in Reasoning with Large Language Models0
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models0
Evaluation of LLMs for mathematical problem solving0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation0
Prompt Baking0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
Prompt Engineering a Prompt Engineer0
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression0
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control0
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning0
Quasi-random Multi-Sample Inference for Large Language Models0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks0
Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration0
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth0
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement0
Efficient Data Selection at Scale via Influence Distillation0
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models0
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought0
Show:102550
← PrevPage 12 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified