SOTAVerified

GSM8K

Papers

Showing 151200 of 439 papers

TitleStatusHype
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-ContrastCode1
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA OptimizationCode1
Markovian Transformers for Informative Language ModelingCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human AnnotationsCode1
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding ThoughtsCode1
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word ProblemsCode0
COrAL: Order-Agnostic Language Modeling for Efficient Iterative RefinementCode0
metabench -- A Sparse Benchmark to Measure General Ability in Large Language ModelsCode0
The Price of Format: Diversity Collapse in LLMsCode0
Exploring LLM Reasoning Through Controlled Prompt VariationsCode0
Exploring Equation as a Better Intermediate Meaning Representation for Numerical ReasoningCode0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
Activation Steering for Chain-of-Thought CompressionCode0
AlignedCoT: Prompting Large Language Models via Native-Speaking DemonstrationsCode0
Text-to-LoRA: Instant Transformer AdaptionCode0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
SMART: Self-learning Meta-strategy Agent for Reasoning TasksCode0
CODI: Compressing Chain-of-Thought into Continuous Space via Self-DistillationCode0
Enhancing Knowledge Distillation for LLMs with Response-Priming PromptingCode0
Adaptive Rectification Sampling for Test-Time Compute ScalingCode0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank AdaptationCode0
LLM2: Let Large Language Models Harness System 2 ReasoningCode0
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay PerspectiveCode0
SEGO: Sequential Subgoal Optimization for Mathematical Problem-SolvingCode0
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented GenerationCode0
Re-Initialization Token Learning for Tool-Augmented Large Language ModelsCode0
Learning a Continue-Thinking Token for Enhanced Test-Time ScalingCode0
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration PitfallsCode0
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language ModelsCode0
Can LLMs Reason in the Wild with Programs?Code0
Scaling Speculative Decoding with Lookahead ReasoningCode0
DIVE: Diversified Iterative Self-ImprovementCode0
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem SolvingCode0
Distilling Reasoning Capabilities into Smaller Language ModelsCode0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
Discriminative Policy Optimization for Token-Level Reward ModelsCode0
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy TheoryCode0
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuningCode0
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic SystemsCode0
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model EvaluationCode0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-SolvingCode0
In-Context Principle Learning from MistakesCode0
A mixed policy to improve performance of language models on math problemsCode0
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning PerspectiveCode0
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt CompressionCode0
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language ModelsCode0
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical RangesCode0
Show:102550
← PrevPage 4 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified