| Phare: A Safety Probe for Large Language Models | May 16, 2025 | DiagnosticHallucination | CodeCode Available | 1 | 5 |
| A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs | May 13, 2025 | HallucinationUncertainty Quantification | CodeCode Available | 1 | 5 |
| EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control | Apr 14, 2025 | Hallucination | CodeCode Available | 1 | 5 |
| Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity | Feb 9, 2024 | Conformal PredictionHallucination | CodeCode Available | 1 | 5 |
| Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow | Feb 28, 2025 | HallucinationObject | CodeCode Available | 1 | 5 |
| MMRel: A Relation Understanding Benchmark in the MLLM Era | Jun 13, 2024 | DiversityHallucination | CodeCode Available | 1 | 5 |
| Deficiency-Aware Masked Transformer for Video Inpainting | Jul 17, 2023 | HallucinationImage Inpainting | CodeCode Available | 1 | 5 |
| Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering | May 5, 2025 | HallucinationQuestion Answering | CodeCode Available | 1 | 5 |
| DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation | Jun 9, 2024 | Common Sense ReasoningDenoising | CodeCode Available | 1 | 5 |
| Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation | May 12, 2023 | HallucinationIn-Context Learning | CodeCode Available | 1 | 5 |
| MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory | Apr 17, 2024 | HallucinationLanguage Modeling | CodeCode Available | 1 | 5 |
| JDocQA: Japanese Document Question Answering Dataset for Generative Language Models | Mar 28, 2024 | HallucinationQuestion Answering | CodeCode Available | 1 | 5 |
| MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context | Jul 3, 2024 | HallucinationResponse Generation | CodeCode Available | 1 | 5 |
| Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models | Jun 5, 2025 | DiagnosticHallucination | CodeCode Available | 1 | 5 |
| MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models | Jun 9, 2025 | DiagnosticHallucination | CodeCode Available | 1 | 5 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 | 5 |
| Med-HALT: Medical Domain Hallucination Test for Large Language Models | Jul 28, 2023 | HallucinationInformation Retrieval | CodeCode Available | 1 | 5 |
| Doc2Query--: When Less is More | Jan 9, 2023 | HallucinationRetrieval | CodeCode Available | 1 | 5 |
| Detecting and Preventing Hallucinations in Large Vision Language Models | Aug 11, 2023 | 16kHallucination | CodeCode Available | 1 | 5 |
| Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards | May 7, 2025 | BenchmarkingHallucination | CodeCode Available | 1 | 5 |
| AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation | Oct 4, 2023 | HallucinationText Generation | CodeCode Available | 1 | 5 |
| Detecting Hallucinated Content in Conditional Neural Sequence Generation | Nov 5, 2020 | Abstractive Text SummarizationHallucination | CodeCode Available | 1 | 5 |
| ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark | Jan 9, 2025 | FairnessHallucination | CodeCode Available | 1 | 5 |
| Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites | Dec 4, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 1 | 5 |
| DiffFuSR: Super-Resolution of all Sentinel-2 Multispectral Bands using Diffusion Models | Jun 13, 2025 | AllHallucination | CodeCode Available | 1 | 5 |