SOTAVerified

GSM8K

Papers

Showing 6170 of 439 papers

TitleStatusHype
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth0
TutorGym: A Testbed for Evaluating AI Agents as Tutors and StudentsCode0
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
Local Prompt Optimization0
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition0
AutoJudge: Judge Decoding Without Manual Annotation0
Efficient Reasoning for LLMs through Speculative Chain-of-ThoughtCode1
Training Large Language Models to Reason via EM Policy Gradient0
Dynamic Early Exit in Reasoning ModelsCode2
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning0
Show:102550
← PrevPage 7 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified