SOTAVerified

GSM8K

Papers

Showing 201210 of 439 papers

TitleStatusHype
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-SolvingCode0
In-Context Principle Learning from MistakesCode0
LLM-TOPLA: Efficient LLM Ensemble by Maximising DiversityCode0
A mixed policy to improve performance of language models on math problemsCode0
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning PerspectiveCode0
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt CompressionCode0
GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM DeploymentCode0
MathScale: Scaling Instruction Tuning for Mathematical ReasoningCode0
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware BudgetingCode0
Show:102550
← PrevPage 21 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified