SOTAVerified

StrategyQA

StrategyQA aims to measure the ability of models to answer questions that require multi-step implicit reasoning.

Source: BIG-bench

Papers

Showing 125 of 40 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
PaLM: Scaling Language Modeling with PathwaysCode2
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-ContrastCode1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningCode1
CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail KnowledgeCode1
Self-Consistency Improves Chain of Thought Reasoning in Language ModelsCode1
AutoReason: Automatic Few-Shot Reasoning DecompositionCode1
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive TasksCode1
Visconde: Multi-document QA with GPT-3 and Neural RerankingCode1
Improving Planning with Large Language Models: A Modular Agentic ArchitectureCode1
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning StrategiesCode1
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and DistillationCode1
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models0
Self-Evaluation Guided Beam Search for Reasoning0
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs0
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning0
A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions0
Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval0
Better Retrieval May Not Lead to Better Question Answering0
Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models0
Dialectical Behavior Therapy Approach to LLM Prompting0
Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage0
IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.