GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 439 papers

Title	Date	Tasks	Status
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Jun 25, 2024	ARCBenchmarking	CodeCode Available
PORT: Preference Optimization on Reasoning Traces	Jun 23, 2024	ARCGSM8K	—Unverified
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation	Jun 20, 2024	GSM8KLanguage Model Evaluation	CodeCode Available
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning	Jun 20, 2024	GSM8KHeuristic Search	—Unverified
Can LLMs Reason in the Wild with Programs?	Jun 19, 2024	GSM8KMath	CodeCode Available
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available
Uncertainty Aware Learning for Language Model Alignment	Jun 7, 2024	GSM8KLanguage Modeling	—Unverified
Does your data spark joy? Performance gains from domain upsampling at the end of training	Jun 5, 2024	GSM8KHumanEval	—Unverified
Improve Mathematical Reasoning in Language Models by Automated Process Supervision	Jun 5, 2024	GSM8KMath	—Unverified
GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment	May 30, 2024	GSM8KKnowledge Distillation	CodeCode Available
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths	May 30, 2024	GSM8KHumanEval	—Unverified
Arithmetic Reasoning with LLM: Prolog Generation & Permutation	May 28, 2024	Arithmetic ReasoningData Augmentation	—Unverified
Multi-Reference Preference Optimization for Large Language Models	May 26, 2024	GSM8KTruthfulQA	—Unverified
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time	May 25, 2024	GSM8KMath	—Unverified
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving	May 20, 2024	GSM8KMath	—Unverified
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications	May 14, 2024	GSM8KMath	—Unverified
MathDivide: Improved mathematical reasoning by large language models	May 12, 2024	GSM8KLogical Reasoning	—Unverified
MAmmoTH2: Scaling Instructions from the Web	May 6, 2024	ChatbotGSM8K	—Unverified
A Careful Examination of Large Language Model Performance on Grade School Arithmetic	May 1, 2024	GSM8KLanguage Modeling	—Unverified
Iterative Reasoning Preference Optimization	Apr 30, 2024	ARCGSM8K	—Unverified
PARAMANU-GANITA: Language Model with Mathematical Capabilities	Apr 22, 2024	Domain AdaptationGSM8K	—Unverified
Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?	Apr 19, 2024	GSM8K	—Unverified
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models	Apr 18, 2024	GSM8KMMLU	—Unverified
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning	Apr 17, 2024	GSM8KNavigate	—Unverified
Automatic Prompt Selection for Large Language Models	Apr 3, 2024	GSM8KQuestion Answering	—Unverified
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression	Mar 30, 2024	GSM8KRelation	—Unverified
Supervisory Prompt Training	Mar 26, 2024	GSM8KSentence	—Unverified
Self-Consistency Boosts Calibration for Math Reasoning	Mar 14, 2024	GSM8KMath	—Unverified
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control	Mar 11, 2024	Code GenerationDiversity	—Unverified
MathScale: Scaling Instruction Tuning for Mathematical Reasoning	Mar 5, 2024	GSM8KMath	CodeCode Available
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning	Mar 4, 2024	GSM8KMath	—Unverified
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs	Feb 26, 2024	GSM8KMath	—Unverified
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models	Feb 24, 2024	GSM8KMathematical Reasoning	—Unverified
Fine-Grained Self-Endorsement Improves Factuality and Reasoning	Feb 23, 2024	GSM8KLanguage Modeling	—Unverified
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning	Feb 20, 2024	Arithmetic ReasoningGSM8K	—Unverified
Can Separators Improve Chain-of-Thought Prompting?	Feb 16, 2024	8kGSM8K	—Unverified
Orca-Math: Unlocking the potential of SLMs in Grade School Math	Feb 16, 2024	Arithmetic ReasoningGSM8K	—Unverified
Premise Order Matters in Reasoning with Large Language Models	Feb 14, 2024	GSM8KMathematical Problem-Solving	—Unverified
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements	Feb 13, 2024	GSM8KMath	—Unverified
The Unreasonable Effectiveness of Eccentric Automatic Prompts	Feb 9, 2024	Arithmetic ReasoningGSM8K	—Unverified
In-Context Principle Learning from Mistakes	Feb 8, 2024	GSM8KIn-Context Learning	CodeCode Available
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models	Feb 6, 2024	GSM8KMath	—Unverified
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision	Feb 5, 2024	GSM8KMath	—Unverified
YODA: Teacher-Student Progressive Learning for Language Models	Jan 28, 2024	GSM8KMath	—Unverified
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination	Jan 16, 2024	GSM8KLanguage Modeling	—Unverified
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities	Dec 22, 2023	ChatbotGSM8K	—Unverified
From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting	Dec 18, 2023	DiversityGSM8K	—Unverified
TinyGSM: achieving >80% on GSM8k with small language models	Dec 14, 2023	Arithmetic ReasoningGSM8K	—Unverified
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning	Dec 14, 2023	Arithmetic ReasoningFew-Shot Learning	—Unverified
Training Chain-of-Thought via Latent-Variable Inference	Nov 28, 2023	GSM8K	—Unverified

Show:10 25 50

← PrevPage 8 of 9Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified