SOTAVerified

StrategyQA

StrategyQA aims to measure the ability of models to answer questions that require multi-step implicit reasoning.

Source: BIG-bench

Papers

Showing 140 of 40 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
PaLM: Scaling Language Modeling with PathwaysCode2
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-ContrastCode1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningCode1
CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail KnowledgeCode1
Self-Consistency Improves Chain of Thought Reasoning in Language ModelsCode1
AutoReason: Automatic Few-Shot Reasoning DecompositionCode1
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive TasksCode1
Visconde: Multi-document QA with GPT-3 and Neural RerankingCode1
Improving Planning with Large Language Models: A Modular Agentic ArchitectureCode1
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning StrategiesCode1
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and DistillationCode1
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models0
Self-Evaluation Guided Beam Search for Reasoning0
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs0
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning0
A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions0
Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval0
Better Retrieval May Not Lead to Better Question Answering0
Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models0
Dialectical Behavior Therapy Approach to LLM Prompting0
Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage0
IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions0
Improving Attributed Text Generation of Large Language Models via Preference Learning0
Large Language Models Are Also Good Prototypical Commonsense Reasoners0
Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts0
Meta-prompting Optimized Retrieval-augmented Generation0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning0
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks0
The ART of LLM Refinement: Ask, Refine, and Trust0
Towards Uncertainty-Aware Language Agent0
Unraveling Indirect In-Context Learning Using Influence Functions0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
Rationale-Aware Answer Verification by Pairwise Self-EvaluationCode0
Distilling Reasoning Capabilities into Smaller Language ModelsCode0
Tailoring Self-Rationalizers with Multi-Reward DistillationCode0
Teaching Smaller Language Models To Generalise To Unseen Compositional QuestionsCode0
Voting or Consensus? Decision-Making in Multi-Agent DebateCode0
Show:102550

No leaderboard results yet.