| VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models | Jun 24, 2024 | HallucinationVideo Understanding | —Unverified | 0 |
| Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models | Jun 24, 2024 | Common Sense ReasoningHallucination | CodeCode Available | 1 |
| Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models | Jun 24, 2024 | HallucinationImage Generation | CodeCode Available | 0 |
| Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs | Jun 22, 2024 | HallucinationUncertainty Quantification | CodeCode Available | 2 |
| Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework | Jun 20, 2024 | HallucinationQuestion Answering | CodeCode Available | 2 |
| Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? | Jun 20, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment | Jun 20, 2024 | DescriptiveHallucination | —Unverified | 0 |
| HIGHT: Hierarchical Graph Tokenization for Molecule-Language Alignment | Jun 20, 2024 | Graph Neural NetworkHallucination | —Unverified | 0 |
| Large Language Models are Skeptics: False Negative Problem of Input-conflicting Hallucination | Jun 20, 2024 | Hallucination | —Unverified | 0 |
| Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases | Jun 19, 2024 | 8kHallucination | CodeCode Available | 2 |
| Knowledge Graph-Enhanced Large Language Models via Path Selection | Jun 19, 2024 | HallucinationKnowledge Graphs | CodeCode Available | 1 |
| StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation | Jun 19, 2024 | HallucinationRetrieval | CodeCode Available | 0 |
| Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors | Jun 18, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding | Jun 18, 2024 | Hallucination | CodeCode Available | 1 |
| RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation | Jun 18, 2024 | HallucinationRAG | —Unverified | 0 |
| What Matters in Memorizing and Recalling Facts? Multifaceted Benchmarks for Knowledge Probing in Language Models | Jun 18, 2024 | DecoderHallucination | —Unverified | 0 |
| On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation | Jun 18, 2024 | HallucinationResponse Generation | CodeCode Available | 0 |
| Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning? | Jun 18, 2024 | AttributeHallucination | —Unverified | 0 |
| Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models | Jun 18, 2024 | Hallucination | —Unverified | 0 |
| InternalInspector I^2: Robust Confidence Estimation in LLMs through Internal States | Jun 17, 2024 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Self-training Large Language Models through Knowledge Detection | Jun 17, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector | Jun 17, 2024 | 2kHallucination | CodeCode Available | 1 |
| Mitigating Large Language Model Hallucination with Faithful Finetuning | Jun 17, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs | Jun 17, 2024 | counterfactualHallucination | CodeCode Available | 0 |
| Hallucination Mitigation Prompts Long-term Video Understanding | Jun 17, 2024 | Answer GenerationHallucination | CodeCode Available | 0 |
| CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation | Jun 17, 2024 | DiagnosticHallucination | —Unverified | 0 |
| MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts | Jun 17, 2024 | HallucinationMixture-of-Experts | CodeCode Available | 1 |
| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |
| mDPO: Conditional Preference Optimization for Multimodal Large Language Models | Jun 17, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals | Jun 16, 2024 | Hallucination | —Unverified | 0 |
| Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations | Jun 16, 2024 | HallucinationMisinformation | CodeCode Available | 0 |
| AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models | Jun 16, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 3 |
| Detecting and Evaluating Medical Hallucinations in Large Vision Language Models | Jun 14, 2024 | HallucinationMedical Visual Question Answering | —Unverified | 0 |
| MMRel: A Relation Understanding Benchmark in the MLLM Era | Jun 13, 2024 | DiversityHallucination | CodeCode Available | 1 |
| Understanding Hallucinations in Diffusion Models through Mode Interpolation | Jun 13, 2024 | HallucinationImage Generation | CodeCode Available | 2 |
| DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation | Jun 13, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs | Jun 12, 2024 | Code GenerationHallucination | CodeCode Available | 1 |
| Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models | Jun 12, 2024 | Audio captioningHallucination | CodeCode Available | 2 |
| Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis | Jun 11, 2024 | HallucinationLanguage Modelling | —Unverified | 0 |
| REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy | Jun 11, 2024 | DiversityHallucination | CodeCode Available | 1 |
| Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Jun 11, 2024 | HallucinationImage Description | CodeCode Available | 2 |
| A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation | Jun 11, 2024 | Hallucination | CodeCode Available | 0 |
| HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation | Jun 11, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| On the Hallucination in Simultaneous Machine Translation | Jun 11, 2024 | HallucinationMachine Translation | CodeCode Available | 0 |
| Progressive Query Expansion for Retrieval Over Cost-constrained Data Sources | Jun 11, 2024 | HallucinationRetrieval | —Unverified | 0 |
| Estimating the Hallucination Rate of Generative AI | Jun 11, 2024 | HallucinationIn-Context Learning | —Unverified | 0 |
| DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation | Jun 9, 2024 | Common Sense ReasoningDenoising | CodeCode Available | 1 |
| Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation | Jun 8, 2024 | Abstractive Text SummarizationDialogue Generation | —Unverified | 0 |
| CRAG -- Comprehensive RAG Benchmark | Jun 7, 2024 | HallucinationLanguage Modelling | CodeCode Available | 3 |
| An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Jun 7, 2024 | Hallucinationparameter-efficient fine-tuning | CodeCode Available | 1 |