SOTAVerified

GSM8K

Papers

Showing 151175 of 439 papers

TitleStatusHype
Large Language Models as OptimizersCode1
Lexico: Extreme KV Cache Compression via Sparse Coding over Universal DictionariesCode1
Design of Chain-of-Thought in Math Problem SolvingCode1
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language ModelsCode1
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph PropertiesCode1
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad TeamCode1
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word ProblemsCode0
COrAL: Order-Agnostic Language Modeling for Efficient Iterative RefinementCode0
SEGO: Sequential Subgoal Optimization for Mathematical Problem-SolvingCode0
Exploring LLM Reasoning Through Controlled Prompt VariationsCode0
Exploring Equation as a Better Intermediate Meaning Representation for Numerical ReasoningCode0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank AdaptationCode0
Scaling Speculative Decoding with Lookahead ReasoningCode0
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-ProblemsCode0
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented GenerationCode0
Activation Steering for Chain-of-Thought CompressionCode0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
CODI: Compressing Chain-of-Thought into Continuous Space via Self-DistillationCode0
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language ModelsCode0
Enhancing Knowledge Distillation for LLMs with Response-Priming PromptingCode0
Adaptive Rectification Sampling for Test-Time Compute ScalingCode0
Re-Initialization Token Learning for Tool-Augmented Large Language ModelsCode0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay PerspectiveCode0
Show:102550
← PrevPage 7 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified