SOTAVerified

GSM8K

Papers

Showing 226250 of 439 papers

TitleStatusHype
Excessive Reasoning Attack on Reasoning LLMs0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation0
Exploring an LM to generate Prolog Predicates from Mathematics Questions0
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree0
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
Fine-Grained Self-Endorsement Improves Factuality and Reasoning0
First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning0
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute0
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education0
From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting0
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference0
GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems0
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization0
Improve Mathematical Reasoning in Language Models by Automated Process Supervision0
Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach0
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification0
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion0
Instance-adaptive Zero-shot Chain-of-Thought Prompting0
Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models0
Interpretable Math Word Problem Solution Generation Via Step-by-step Planning0
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist0
Show:102550
← PrevPage 10 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified