| HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation | Jun 26, 2025 | counterfactualCounterfactual Reasoning | —Unverified | 0 |
| KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | Jun 24, 2025 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations | May 20, 2025 | Fact CheckingHallucination | CodeCode Available | 0 |
| Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards | May 7, 2025 | BenchmarkingHallucination | CodeCode Available | 1 |
| Mitigating Image Captioning Hallucinations in Vision-Language Models | May 6, 2025 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs | Apr 30, 2025 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best? | Mar 27, 2025 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs | Mar 26, 2025 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation | Mar 25, 2025 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization | Mar 3, 2025 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation | Feb 19, 2025 | Dataset GenerationGSM8K | CodeCode Available | 0 |
| Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization | Nov 15, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine | Nov 14, 2024 | FormHallucination | CodeCode Available | 0 |
| DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark | Nov 5, 2024 | Data AugmentationHallucination | CodeCode Available | 0 |
| Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models | Oct 30, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| A Survey of Hallucination in Large Visual Language Models | Oct 20, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models | Oct 13, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| TLDR: Token-Level Detective Reward Model for Large Vision Language Models | Oct 7, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization | Sep 22, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs | Sep 20, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering | Sep 19, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models | Aug 18, 2024 | AttributeHallucination | CodeCode Available | 1 |
| Enhancing LLM's Cognition via Structurization | Jul 23, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework | Jul 15, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Lynx: An Open Source Hallucination Evaluation Model | Jul 11, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models | Jun 16, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 3 |
| DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation | Jun 13, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation | Jun 11, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems | May 24, 2024 | DiagnosticHallucination | —Unverified | 0 |
| TextSquare: Scaling up Text-Centric Visual Instruction Tuning | Apr 19, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation | Apr 18, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Exploring and Evaluating Hallucinations in LLM-Powered Code Generation | Apr 1, 2024 | Code GenerationHallucination | —Unverified | 0 |
| PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset | Mar 17, 2024 | AttributeCommon Sense Reasoning | CodeCode Available | 1 |
| DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models | Mar 1, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space | Feb 27, 2024 | Contrastive LearningHallucination | CodeCode Available | 2 |
| Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models | Feb 24, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Do Androids Know They're Only Dreaming of Electric Sheep? | Dec 28, 2023 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Alleviating Hallucinations of Large Language Models through Induced Hallucinations | Dec 25, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites | Dec 4, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation | Nov 26, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data | Nov 22, 2023 | Attributecounterfactual | CodeCode Available | 1 |
| Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization | Nov 15, 2023 | Abstractive Text SummarizationHallucination | CodeCode Available | 1 |
| AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation | Nov 13, 2023 | AttributeHallucination | CodeCode Available | 1 |
| HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models | Oct 23, 2023 | DiagnosticHallucination | CodeCode Available | 2 |
| ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks | Oct 19, 2023 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Analyzing and Mitigating Object Hallucination in Large Vision-Language Models | Oct 1, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| Evaluation and Analysis of Hallucination in Large Vision-Language Models | Aug 29, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models | Aug 17, 2023 | Decision MakingHallucination | CodeCode Available | 2 |
| HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models | May 19, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 2 |