SOTAVerified

GSM8K

Papers

Showing 76100 of 439 papers

TitleStatusHype
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
Scaling Relationship on Learning Mathematical Reasoning with Large Language ModelsCode2
Progressive-Hint Prompting Improves Reasoning in Large Language ModelsCode2
Language Models are Multilingual Chain-of-Thought ReasonersCode2
Large Language Models are Zero-Shot ReasonersCode2
IRanker: Towards Ranking Foundation ModelCode1
CommVQ: Commutative Vector Quantization for KV Cache CompressionCode1
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad TeamCode1
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph PropertiesCode1
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language ModelsCode1
Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context LearningCode1
Rewriting Pre-Training Data Boosts LLM Performance in Math and CodeCode1
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
Efficient Reasoning for LLMs through Speculative Chain-of-ThoughtCode1
Large (Vision) Language Models are Unsupervised In-Context LearnersCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model MergingCode1
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language ModelsCode1
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language ModelsCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Self-Training Elicits Concise Reasoning in Large Language ModelsCode1
SMART: Self-Aware Agent for Tool Overuse MitigationCode1
MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought ThinkingCode1
Entropy-Regularized Process Reward ModelCode1
Show:102550
← PrevPage 4 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified