SOTAVerified

GSM8K

Papers

Showing 5175 of 439 papers

TitleStatusHype
Synthetic Data RL: Task Definition Is All You NeedCode2
SLOT: Sample-specific Language Model Optimization at Test-timeCode2
Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context LearningCode1
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models0
Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Rewriting Pre-Training Data Boosts LLM Performance in Math and CodeCode1
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients0
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth0
TutorGym: A Testbed for Evaluating AI Agents as Tutors and StudentsCode0
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
Local Prompt Optimization0
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition0
AutoJudge: Judge Decoding Without Manual Annotation0
Efficient Reasoning for LLMs through Speculative Chain-of-ThoughtCode1
Training Large Language Models to Reason via EM Policy Gradient0
Dynamic Early Exit in Reasoning ModelsCode2
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation0
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free ResolutionCode3
Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration0
Supervised Optimism Correction: Be Confident When LLMs Are Sure0
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use0
Show:102550
← PrevPage 3 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified