SOTAVerified

GSM8K

Papers

Showing 76100 of 439 papers

TitleStatusHype
SEAL: Steerable Reasoning Calibration of Large Language Models for FreeCode2
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language ModelsCode0
Large (Vision) Language Models are Unsupervised In-Context LearnersCode1
Exploring LLM Reasoning Through Controlled Prompt VariationsCode0
Adaptive Rectification Sampling for Test-Time Compute ScalingCode0
Entropy-Based Adaptive Weighting for Self-TrainingCode1
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
Qwen2.5-Omni Technical ReportCode7
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?Code0
D^2LoRA: Data-Driven LoRA Initialization for Low Resource Tasks0
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language ModelsCode2
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model MergingCode1
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs0
Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach0
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language ModelsCode1
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models0
Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency0
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning0
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models0
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language ModelsCode1
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
CODI: Compressing Chain-of-Thought into Continuous Space via Self-DistillationCode0
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge0
Show:102550
← PrevPage 4 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified