SOTAVerified

GSM8K

Papers

Showing 376400 of 439 papers

TitleStatusHype
Unsupervised Elicitation of Language Models0
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities0
Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning0
When is the consistent prediction likely to be a correct prediction?0
YODA: Teacher-Student Progressive Learning for Language Models0
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models0
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models0
Self-Consistency Boosts Calibration for Math Reasoning0
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented GenerationCode0
Re-Initialization Token Learning for Tool-Augmented Large Language ModelsCode0
Scaling Speculative Decoding with Lookahead ReasoningCode0
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language ModelsCode0
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-ProblemsCode0
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic SystemsCode0
CODI: Compressing Chain-of-Thought into Continuous Space via Self-DistillationCode0
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuningCode0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
SEGO: Sequential Subgoal Optimization for Mathematical Problem-SolvingCode0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math ReasoningCode0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language ModelsCode0
A mixed policy to improve performance of language models on math problemsCode0
Enhancing Knowledge Distillation for LLMs with Response-Priming PromptingCode0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
Show:102550
← PrevPage 16 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified