SOTAVerified

GSM8K

Papers

Showing 351400 of 439 papers

TitleStatusHype
System-2 Mathematical Reasoning via Enriched Instruction Tuning0
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs0
Teaching Small Language Models to Reason0
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback0
The ART of LLM Refinement: Ask, Refine, and Trust0
The Role of Deductive and Inductive Reasoning in Large Language Models0
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
Think before you speak: Training Language Models With Pause Tokens0
Think Beyond Size: Adaptive Prompting for More Effective Reasoning0
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs0
TinyGSM: achieving >80% on GSM8k with small language models0
Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models0
Towards Multilingual LLM Evaluation for European Languages0
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning0
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition0
Training Chain-of-Thought via Latent-Variable Inference0
Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning0
Training Large Language Models to Reason via EM Policy Gradient0
Transcending Scaling Laws with 0.1% Extra Compute0
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning0
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling0
Uncertainty Aware Learning for Language Model Alignment0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
Unlocking Structured Thinking in Language Models with Cognitive Prompting0
Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures0
Unsupervised Elicitation of Language Models0
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities0
Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning0
When is the consistent prediction likely to be a correct prediction?0
YODA: Teacher-Student Progressive Learning for Language Models0
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models0
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models0
Self-Consistency Boosts Calibration for Math Reasoning0
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented GenerationCode0
Re-Initialization Token Learning for Tool-Augmented Large Language ModelsCode0
Scaling Speculative Decoding with Lookahead ReasoningCode0
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language ModelsCode0
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-ProblemsCode0
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic SystemsCode0
CODI: Compressing Chain-of-Thought into Continuous Space via Self-DistillationCode0
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuningCode0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
SEGO: Sequential Subgoal Optimization for Mathematical Problem-SolvingCode0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math ReasoningCode0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language ModelsCode0
A mixed policy to improve performance of language models on math problemsCode0
Enhancing Knowledge Distillation for LLMs with Response-Priming PromptingCode0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
Show:102550
← PrevPage 8 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified