| HeadlineCause: A Dataset of News Headlines for Detecting Causalities | Aug 28, 2021 | Commonsense Causal ReasoningCommon Sense Reasoning | CodeCode Available | 1 | 5 |
| Counterfactual reasoning: Do language models need world knowledge for causal understanding? | Dec 6, 2022 | counterfactualCounterfactual Reasoning | CodeCode Available | 1 | 5 |
| A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge | Jun 3, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 | 5 |
| Dense X Retrieval: What Retrieval Granularity Should We Use? | Dec 11, 2023 | RetrievalSentence | CodeCode Available | 1 | 5 |
| Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration | Sep 30, 2023 | World Knowledge | CodeCode Available | 1 | 5 |
| A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering | Nov 13, 2023 | Decision MakingExplanation Generation | CodeCode Available | 1 | 5 |
| Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios | May 26, 2023 | counterfactualCounterfactual Reasoning | CodeCode Available | 1 | 5 |
| GRILLBot In Practice: Lessons and Tradeoffs Deploying Large Language Models for Adaptable Conversational Task Assistants | Feb 12, 2024 | Code GenerationManagement | CodeCode Available | 1 | 5 |
| Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs? | Aug 20, 2023 | Knowledge GraphsWorld Knowledge | CodeCode Available | 1 | 5 |
| Imagine This! Scripts to Compositions to Videos | Apr 10, 2018 | RetrievalWorld Knowledge | CodeCode Available | 1 | 5 |