| DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration | Mar 15, 2024 | AttributeBlind Face Restoration | —Unverified | 0 |
| AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models | Mar 13, 2024 | Hallucination | CodeCode Available | 0 |
| Detecting Hallucination and Coverage Errors in Retrieval Augmented Generation for Controversial Topics | Mar 13, 2024 | HallucinationRetrieval | —Unverified | 0 |
| Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems | Mar 12, 2024 | Domain AdaptationHallucination | CodeCode Available | 0 |
| TRAWL: External Knowledge-Enhanced Recommendation with LLM Assistance | Mar 11, 2024 | Contrastive LearningDenoising | —Unverified | 0 |
| Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos | Mar 11, 2024 | HallucinationTranslation | —Unverified | 0 |
| Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds | Mar 11, 2024 | Hallucination | —Unverified | 0 |
| On the Benefits of Fine-Grained Loss Truncation: A Case Study on Factuality in Summarization | Mar 9, 2024 | HallucinationText Summarization | CodeCode Available | 0 |
| Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach | Mar 8, 2024 | Decision MakingHallucination | —Unverified | 0 |
| ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models | Mar 8, 2024 | AttributeHallucination | CodeCode Available | 0 |
| Can Large Language Models Play Games? A Case Study of A Self-Play Approach | Mar 8, 2024 | Decision MakingHallucination | —Unverified | 0 |
| Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation | Mar 8, 2024 | ArticlesHallucination | —Unverified | 0 |
| ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues | Mar 8, 2024 | HallucinationQuestion Answering | —Unverified | 0 |
| Effectiveness Assessment of Recent Large Vision-Language Models | Mar 7, 2024 | Anomaly DetectionAttribute | —Unverified | 0 |
| Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification | Mar 7, 2024 | Fact CheckingHallucination | —Unverified | 0 |
| HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild | Mar 7, 2024 | HallucinationQuestion Answering | CodeCode Available | 0 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset | Mar 6, 2024 | HallucinationIn-Context Learning | CodeCode Available | 0 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 |
| Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models | Mar 3, 2024 | Hallucination | —Unverified | 0 |
| Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering | Mar 3, 2024 | Claim VerificationGraph Question Answering | —Unverified | 0 |
| Self-Consistent Decoding for More Factual Open Responses | Mar 1, 2024 | HallucinationResponse Generation | CodeCode Available | 0 |
| MALTO at SemEval-2024 Task 6: Leveraging Synthetic Data for LLM Hallucination Detection | Mar 1, 2024 | Data AugmentationHallucination | —Unverified | 0 |
| Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models | Mar 1, 2024 | HallucinationRetrieval | —Unverified | 0 |
| Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models | Feb 29, 2024 | Hallucination | —Unverified | 0 |
| Navigating Hallucinations for Reasoning of Unintentional Activities | Feb 29, 2024 | HallucinationNavigate | —Unverified | 0 |
| Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models | Feb 28, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Collaborative decoding of critical tokens for boosting factuality of large language models | Feb 28, 2024 | HallucinationInstruction Following | —Unverified | 0 |
| Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore | Feb 28, 2024 | DiversityForm | CodeCode Available | 0 |
| Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models | Feb 27, 2024 | HallucinationIn-Context Learning | —Unverified | 0 |
| Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses | Feb 27, 2024 | Hallucination | CodeCode Available | 0 |
| GROUNDHOG: Grounding Large Language Models to Holistic Segmentation | Feb 26, 2024 | Causal Language ModelingGeneralized Referring Expression Segmentation | —Unverified | 0 |
| Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models | Feb 26, 2024 | Decision MakingHallucination | —Unverified | 0 |
| AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation | Feb 25, 2024 | Face GenerationHallucination | —Unverified | 0 |
| HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs | Feb 25, 2024 | BenchmarkingChatbot | CodeCode Available | 0 |
| Rethinking Software Engineering in the Foundation Model Era: A Curated Catalogue of Challenges in the Development of Trustworthy FMware | Feb 25, 2024 | Hallucination | —Unverified | 0 |
| Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models | Feb 24, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| CARBD-Ko: A Contextually Annotated Review Benchmark Dataset for Aspect-Level Sentiment Classification in Korean | Feb 23, 2024 | ClassificationHallucination | —Unverified | 0 |
| UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models | Feb 22, 2024 | HallucinationRetrieval | CodeCode Available | 0 |
| Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer | Feb 22, 2024 | Generative Question AnsweringHallucination | —Unverified | 0 |
| DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models | Feb 22, 2024 | Hallucination | CodeCode Available | 0 |
| Science Checker Reloaded: A Bidirectional Paradigm for Transparency and Logical Reasoning | Feb 21, 2024 | HallucinationInformation Retrieval | CodeCode Available | 0 |
| OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination Detection with Weakly Supervised Data | Feb 20, 2024 | Few-Shot LearningHallucination | —Unverified | 0 |
| Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation | Feb 20, 2024 | HallucinationMachine Translation | —Unverified | 0 |
| Emergence and dynamics of delusions and hallucinations across stages in early psychosis | Feb 20, 2024 | Hallucination | —Unverified | 0 |
| GOOD: Towards Domain Generalized Orientated Object Detection | Feb 20, 2024 | HallucinationObject | —Unverified | 0 |
| OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification | Feb 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Structured Chain-of-Thought Prompting for Few-Shot Generation of Content-Grounded QA Conversations | Feb 19, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| Enabling Weak LLMs to Judge Response Reliability via Meta Ranking | Feb 19, 2024 | HallucinationIn-Context Learning | —Unverified | 0 |
| M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation | Feb 19, 2024 | counterfactualDialogue Generation | —Unverified | 0 |