SOTAVerified

StrategyQA

StrategyQA aims to measure the ability of models to answer questions that require multi-step implicit reasoning.

Source: BIG-bench

Papers

Showing 125 of 40 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
PaLM: Scaling Language Modeling with PathwaysCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
AutoReason: Automatic Few-Shot Reasoning DecompositionCode1
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-ContrastCode1
CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail KnowledgeCode1
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and DistillationCode1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningCode1
Improving Planning with Large Language Models: A Modular Agentic ArchitectureCode1
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive TasksCode1
Visconde: Multi-document QA with GPT-3 and Neural RerankingCode1
Self-Consistency Improves Chain of Thought Reasoning in Language ModelsCode1
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning StrategiesCode1
Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage0
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
Voting or Consensus? Decision-Making in Multi-Agent DebateCode0
Unraveling Indirect In-Context Learning Using Influence Functions0
Dialectical Behavior Therapy Approach to LLM Prompting0
Rationale-Aware Answer Verification by Pairwise Self-EvaluationCode0
A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning0
Meta-prompting Optimized Retrieval-augmented Generation0
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.