| Bravo MaRDI: A Wikibase Powered Knowledge Graph on Mathematics | Sep 20, 2023 | World Knowledge | CodeCode Available | 0 | 5 |
| DynaBench: A benchmark dataset for learning dynamical systems from low-resolution data | Jun 9, 2023 | World Knowledge | CodeCode Available | 0 | 5 |
| My Teacher Thinks The World Is Flat! Interpreting Automatic Essay Scoring Mechanism | Dec 27, 2020 | Common Sense ReasoningNatural Language Understanding | CodeCode Available | 0 | 5 |
| Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries | Feb 9, 2025 | DiversityFairness | CodeCode Available | 0 | 5 |
| AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge | Dec 18, 2024 | BenchmarkingWorld Knowledge | CodeCode Available | 0 | 5 |
| NLITrans at SemEval-2018 Task 12: Transfer of Semantic Knowledge for Argument Comprehension | Apr 23, 2018 | PositionSentence | CodeCode Available | 0 | 5 |
| DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension | Feb 1, 2019 | Dialogue UnderstandingMultiple-choice | CodeCode Available | 0 | 5 |
| BottleHumor: Self-Informed Humor Explanation using the Information Bottleneck Principle | Feb 22, 2025 | World Knowledge | CodeCode Available | 0 | 5 |
| Morph Call: Probing Morphosyntactic Content of Multilingual Transformers | Apr 26, 2021 | Common Sense ReasoningMORPH | CodeCode Available | 0 | 5 |
| DORA The Explorer: Directed Outreaching Reinforcement Action-Selection | Apr 11, 2018 | Reinforcement LearningReinforcement Learning (RL) | CodeCode Available | 0 | 5 |
| Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering | Apr 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment | Jun 24, 2025 | Informativenessreinforcement-learning | CodeCode Available | 0 | 5 |
| QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs | Dec 16, 2024 | BenchmarkingCommon Sense Reasoning | CodeCode Available | 0 | 5 |
| An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models | Sep 6, 2021 | Knowledge ProbingPrompt Engineering | CodeCode Available | 0 | 5 |
| A Study of Implicit Ranking Unfairness in Large Language Models | Nov 13, 2023 | Data AugmentationFairness | CodeCode Available | 0 | 5 |
| Does Commonsense help in detecting Sarcasm? | Sep 17, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Mitigating Hallucination in Fictional Character Role-Play | Jun 25, 2024 | HallucinationWorld Knowledge | CodeCode Available | 0 | 5 |
| Mitigating Temporal Misalignment by Discarding Outdated Facts | May 24, 2023 | Question AnsweringRetrieval | CodeCode Available | 0 | 5 |
| MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization | Sep 22, 2021 | ArticlesDocument Summarization | CodeCode Available | 0 | 5 |
| Anchoring Path for Inductive Relation Prediction in Knowledge Graphs | Dec 21, 2023 | Inductive Relation PredictionKnowledge Graphs | CodeCode Available | 0 | 5 |
| MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations | Jun 25, 2025 | World Knowledge | CodeCode Available | 0 | 5 |
| Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection | May 18, 2025 | MemorizationWorld Knowledge | CodeCode Available | 0 | 5 |
| BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models | May 8, 2024 | Knowledge GraphsLanguage Modeling | CodeCode Available | 0 | 5 |
| Memory-Modular Classification: Learning to Generalize with Memory Replacement | Apr 8, 2025 | Classificationimage-classification | CodeCode Available | 0 | 5 |
| Advancing and Benchmarking Personalized Tool Invocation for LLMs | May 7, 2025 | BenchmarkingWorld Knowledge | CodeCode Available | 0 | 5 |
| Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages | Feb 9, 2023 | FormKnowledge Graphs | CodeCode Available | 0 | 5 |
| Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations | Mar 27, 2024 | AttributeDiagnostic | CodeCode Available | 0 | 5 |
| Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding | Nov 4, 2018 | Graph EmbeddingKnowledge Graph Completion | CodeCode Available | 0 | 5 |
| LoRec: Large Language Model for Robust Sequential Recommendation against Poisoning Attacks | Jan 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Locating and Extracting Relational Concepts in Large Language Models | Jun 19, 2024 | World Knowledge | CodeCode Available | 0 | 5 |
| LoFTI: Localization and Factuality Transfer to Indian Locales | Jul 16, 2024 | World Knowledge | CodeCode Available | 0 | 5 |
| LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation | Mar 25, 2025 | counterfactualDecision Making | CodeCode Available | 0 | 5 |
| Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge | Oct 23, 2023 | Phrase GroundingWorld Knowledge | CodeCode Available | 0 | 5 |
| Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges | May 16, 2025 | BenchmarkingState Estimation | CodeCode Available | 0 | 5 |
| CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs | Dec 13, 2023 | ClusteringContrastive Learning | CodeCode Available | 0 | 5 |
| Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning | Jun 18, 2024 | BenchmarkingWorld Knowledge | CodeCode Available | 0 | 5 |
| LLM4CD: Leveraging Large Language Models for Open-World Knowledge Augmented Cognitive Diagnosis | May 14, 2025 | cognitive diagnosisWorld Knowledge | CodeCode Available | 0 | 5 |
| LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description | Aug 9, 2024 | DiversityInstruction Following | CodeCode Available | 0 | 5 |
| LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model | May 3, 2024 | Image CaptioningInstruction Following | CodeCode Available | 0 | 5 |
| MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Aug 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 | 5 |
| Modeling Semantic Plausibility by Injecting World Knowledge | Apr 2, 2018 | World Knowledge | CodeCode Available | 0 | 5 |
| Language models show human-like content effects on reasoning tasks | Jul 14, 2022 | Language ModellingLogical Reasoning | CodeCode Available | 0 | 5 |
| Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior Simulation | Mar 27, 2024 | Common Sense ReasoningWorld Knowledge | CodeCode Available | 0 | 5 |
| Contextual Knowledge Pursuit for Faithful Visual Synthesis | Nov 29, 2023 | Language ModellingRetrieval | CodeCode Available | 0 | 5 |
| Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data | Jan 31, 2024 | BenchmarkingChange Detection | CodeCode Available | 0 | 5 |
| Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models | Mar 21, 2024 | SentenceWorld Knowledge | CodeCode Available | 0 | 5 |
| Knowledge Graph Completion with Mixed Geometry Tensor Factorization | Apr 3, 2025 | Knowledge Graph CompletionKnowledge Graphs | CodeCode Available | 0 | 5 |
| Knowledge Boundary and Persona Dynamic Shape A Better Social Media Agent | Mar 28, 2024 | World Knowledge | CodeCode Available | 0 | 5 |
| Knowledge Generation -- Variational Bayes on Knowledge Graphs | Jan 21, 2021 | DecoderGraph Matching | CodeCode Available | 0 | 5 |
| Language Model Behavior: A Comprehensive Survey | Mar 20, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |