| A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering | Nov 13, 2023 | Decision MakingExplanation Generation | CodeCode Available | 1 | 5 |
| Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs? | Aug 20, 2023 | Knowledge GraphsWorld Knowledge | CodeCode Available | 1 | 5 |
| InGram: Inductive Knowledge Graph Embedding via Relation Graphs | May 31, 2023 | Entity EmbeddingsGraph Embedding | CodeCode Available | 1 | 5 |
| Knowledge Editing through Chain-of-Thought | Dec 23, 2024 | knowledge editingWorld Knowledge | CodeCode Available | 1 | 5 |
| Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias | May 9, 2024 | Data VisualizationLanguage Modeling | CodeCode Available | 1 | 5 |
| Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge | Apr 6, 2021 | Common Sense ReasoningWorld Knowledge | CodeCode Available | 1 | 5 |
| BLADE: Benchmarking Language Model Agents for Data-Driven Science | Aug 19, 2024 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios | May 26, 2023 | counterfactualCounterfactual Reasoning | CodeCode Available | 1 | 5 |
| Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language | Mar 1, 2021 | SentenceWorld Knowledge | CodeCode Available | 1 | 5 |
| CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models | Sep 27, 2024 | Reinforcement Learning (RL)World Knowledge | CodeCode Available | 1 | 5 |