SOTAVerified

GSM8K

Papers

Showing 125 of 439 papers

TitleStatusHype
GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems0
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt CompressionCode0
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?0
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
Activation Steering for Chain-of-Thought CompressionCode0
any4: Learned 4-bit Numeric Representation for LLMsCode2
IRanker: Towards Ranking Foundation ModelCode1
Scaling Speculative Decoding with Lookahead ReasoningCode0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
CommVQ: Commutative Vector Quantization for KV Cache CompressionCode1
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing0
Re-Initialization Token Learning for Tool-Augmented Large Language ModelsCode0
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad TeamCode1
Excessive Reasoning Attack on Reasoning LLMs0
LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment0
Learning a Continue-Thinking Token for Enhanced Test-Time ScalingCode0
Slimming Down LLMs Without Losing Their Minds0
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty0
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models0
Unsupervised Elicitation of Language ModelsCode0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
Text-to-LoRA: Instant Transformer AdaptionCode0
Show:102550
← PrevPage 1 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified