| A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge | Jun 3, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 | 5 |
| Differentially Private Federated Knowledge Graphs Embedding | May 17, 2021 | Graph EmbeddingKnowledge Graph Embedding | CodeCode Available | 1 | 5 |
| Off-Policy General Value Functions to Represent Dynamic Role Assignments in RoboCup 3D Soccer Simulation | Feb 18, 2014 | Reinforcement LearningReinforcement Learning (RL) | CodeCode Available | 1 | 5 |
| Pretrained Language Model Embryology: The Birth of ALBERT | Oct 6, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU | Oct 7, 2023 | Multi-task Language UnderstandingWorld Knowledge | CodeCode Available | 1 | 5 |
| Large Scale Knowledge Washing | May 26, 2024 | DecoderMemorization | CodeCode Available | 1 | 5 |
| Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models | Dec 18, 2024 | Contrastive LearningKnowledge Graphs | CodeCode Available | 1 | 5 |
| Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models | May 15, 2024 | AI AgentWorld Knowledge | CodeCode Available | 1 | 5 |
| Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering | Sep 20, 2023 | Graph Question AnsweringLanguage Modeling | CodeCode Available | 1 | 5 |
| ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval | Oct 24, 2024 | Image RetrievalRetrieval | CodeCode Available | 0 | 5 |
| MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization | Sep 22, 2021 | ArticlesDocument Summarization | CodeCode Available | 0 | 5 |
| Mitigating Hallucination in Fictional Character Role-Play | Jun 25, 2024 | HallucinationWorld Knowledge | CodeCode Available | 0 | 5 |
| A surprisal oracle for when every layer counts | Dec 4, 2024 | Common Sense ReasoningLanguage Modeling | CodeCode Available | 0 | 5 |
| Causal interventions expose implicit situation models for commonsense language understanding | Jun 6, 2023 | World Knowledge | CodeCode Available | 0 | 5 |
| MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations | Jun 25, 2025 | World Knowledge | CodeCode Available | 0 | 5 |
| Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations | Mar 27, 2024 | AttributeDiagnostic | CodeCode Available | 0 | 5 |
| Arrows are the Verbs of Diagrams | Aug 1, 2018 | BIG-bench Machine LearningWorld Knowledge | CodeCode Available | 0 | 5 |
| MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Aug 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 | 5 |
| Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages | Feb 9, 2023 | FormKnowledge Graphs | CodeCode Available | 0 | 5 |
| Memory-Modular Classification: Learning to Generalize with Memory Replacement | Apr 8, 2025 | Classificationimage-classification | CodeCode Available | 0 | 5 |
| Mitigating Temporal Misalignment by Discarding Outdated Facts | May 24, 2023 | Question AnsweringRetrieval | CodeCode Available | 0 | 5 |
| LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation | Mar 25, 2025 | counterfactualDecision Making | CodeCode Available | 0 | 5 |
| Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge | Oct 23, 2023 | Phrase GroundingWorld Knowledge | CodeCode Available | 0 | 5 |
| LLM4CD: Leveraging Large Language Models for Open-World Knowledge Augmented Cognitive Diagnosis | May 14, 2025 | cognitive diagnosisWorld Knowledge | CodeCode Available | 0 | 5 |
| LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model | May 3, 2024 | Image CaptioningInstruction Following | CodeCode Available | 0 | 5 |