SOTAVerified

GSM8K

Papers

Showing 201225 of 439 papers

TitleStatusHype
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients0
TutorGym: A Testbed for Evaluating AI Agents as Tutors and StudentsCode0
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth0
Local Prompt Optimization0
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition0
AutoJudge: Judge Decoding Without Manual Annotation0
Training Large Language Models to Reason via EM Policy Gradient0
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation0
Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration0
Supervised Optimism Correction: Be Confident When LLMs Are Sure0
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models0
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language ModelsCode0
Adaptive Rectification Sampling for Test-Time Compute ScalingCode0
Exploring LLM Reasoning Through Controlled Prompt VariationsCode0
D^2LoRA: Data-Driven LoRA Initialization for Low Resource Tasks0
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?Code0
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs0
Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach0
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models0
Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency0
Show:102550
← PrevPage 9 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified