| CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation | Jun 17, 2024 | DiagnosticHallucination | —Unverified | 0 |
| MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts | Jun 17, 2024 | HallucinationMixture-of-Experts | CodeCode Available | 1 |
| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |
| mDPO: Conditional Preference Optimization for Multimodal Large Language Models | Jun 17, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals | Jun 16, 2024 | Hallucination | —Unverified | 0 |
| Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations | Jun 16, 2024 | HallucinationMisinformation | CodeCode Available | 0 |
| AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models | Jun 16, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 3 |
| Detecting and Evaluating Medical Hallucinations in Large Vision Language Models | Jun 14, 2024 | HallucinationMedical Visual Question Answering | —Unverified | 0 |
| MMRel: A Relation Understanding Benchmark in the MLLM Era | Jun 13, 2024 | DiversityHallucination | CodeCode Available | 1 |
| Understanding Hallucinations in Diffusion Models through Mode Interpolation | Jun 13, 2024 | HallucinationImage Generation | CodeCode Available | 2 |
| DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation | Jun 13, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs | Jun 12, 2024 | Code GenerationHallucination | CodeCode Available | 1 |
| Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models | Jun 12, 2024 | Audio captioningHallucination | CodeCode Available | 2 |
| Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis | Jun 11, 2024 | HallucinationLanguage Modelling | —Unverified | 0 |
| REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy | Jun 11, 2024 | DiversityHallucination | CodeCode Available | 1 |
| Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Jun 11, 2024 | HallucinationImage Description | CodeCode Available | 2 |
| A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation | Jun 11, 2024 | Hallucination | CodeCode Available | 0 |
| HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation | Jun 11, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| On the Hallucination in Simultaneous Machine Translation | Jun 11, 2024 | HallucinationMachine Translation | CodeCode Available | 0 |
| Progressive Query Expansion for Retrieval Over Cost-constrained Data Sources | Jun 11, 2024 | HallucinationRetrieval | —Unverified | 0 |
| Estimating the Hallucination Rate of Generative AI | Jun 11, 2024 | HallucinationIn-Context Learning | —Unverified | 0 |
| DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation | Jun 9, 2024 | Common Sense ReasoningDenoising | CodeCode Available | 1 |
| Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation | Jun 8, 2024 | Abstractive Text SummarizationDialogue Generation | —Unverified | 0 |
| CRAG -- Comprehensive RAG Benchmark | Jun 7, 2024 | HallucinationLanguage Modelling | CodeCode Available | 3 |
| An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Jun 7, 2024 | Hallucinationparameter-efficient fine-tuning | CodeCode Available | 1 |