| Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification | Mar 7, 2024 | Fact CheckingHallucination | —Unverified | 0 |
| Effectiveness Assessment of Recent Large Vision-Language Models | Mar 7, 2024 | Anomaly DetectionAttribute | —Unverified | 0 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset | Mar 6, 2024 | HallucinationIn-Context Learning | CodeCode Available | 0 |
| KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | Mar 5, 2024 | HallucinationSelf-Learning | CodeCode Available | 3 |
| InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers | Mar 5, 2024 | Hallucination | CodeCode Available | 1 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 |
| Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering | Mar 3, 2024 | Claim VerificationGraph Question Answering | —Unverified | 0 |
| Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models | Mar 3, 2024 | Hallucination | —Unverified | 0 |
| CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail Knowledge | Mar 3, 2024 | Claim VerificationGraph Question Answering | CodeCode Available | 1 |
| In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation | Mar 3, 2024 | HallucinationTruthfulQA | CodeCode Available | 2 |
| MALTO at SemEval-2024 Task 6: Leveraging Synthetic Data for LLM Hallucination Detection | Mar 1, 2024 | Data AugmentationHallucination | —Unverified | 0 |
| DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models | Mar 1, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding | Mar 1, 2024 | HallucinationObject | CodeCode Available | 2 |
| Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models | Mar 1, 2024 | HallucinationRetrieval | —Unverified | 0 |
| Self-Consistent Decoding for More Factual Open Responses | Mar 1, 2024 | HallucinationResponse Generation | CodeCode Available | 0 |
| Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models | Feb 29, 2024 | Hallucination | —Unverified | 0 |
| The All-Seeing Project V2: Towards General Relation Comprehension of the Open World | Feb 29, 2024 | AllHallucination | CodeCode Available | 4 |
| Navigating Hallucinations for Reasoning of Unintentional Activities | Feb 29, 2024 | HallucinationNavigate | —Unverified | 0 |
| Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore | Feb 28, 2024 | DiversityForm | CodeCode Available | 0 |
| Collaborative decoding of critical tokens for boosting factuality of large language models | Feb 28, 2024 | HallucinationInstruction Following | —Unverified | 0 |
| All in an Aggregated Image for In-Image Learning | Feb 28, 2024 | AllHallucination | CodeCode Available | 1 |
| Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models | Feb 28, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models | Feb 27, 2024 | HallucinationIn-Context Learning | —Unverified | 0 |
| TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space | Feb 27, 2024 | Contrastive LearningHallucination | CodeCode Available | 2 |
| Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses | Feb 27, 2024 | Hallucination | CodeCode Available | 0 |
| Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models | Feb 26, 2024 | Decision MakingHallucination | —Unverified | 0 |
| GROUNDHOG: Grounding Large Language Models to Holistic Segmentation | Feb 26, 2024 | Causal Language ModelingGeneralized Referring Expression Segmentation | —Unverified | 0 |
| HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs | Feb 25, 2024 | BenchmarkingChatbot | CodeCode Available | 0 |
| Rethinking Software Engineering in the Foundation Model Era: A Curated Catalogue of Challenges in the Development of Trustworthy FMware | Feb 25, 2024 | Hallucination | —Unverified | 0 |
| AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation | Feb 25, 2024 | Face GenerationHallucination | —Unverified | 0 |
| Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy | Feb 25, 2024 | HallucinationSentence | CodeCode Available | 1 |
| Citation-Enhanced Generation for LLM-based Chatbots | Feb 25, 2024 | ChatbotCitation Prediction | CodeCode Available | 1 |
| Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models | Feb 24, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models | Feb 23, 2024 | Hallucination | CodeCode Available | 1 |
| CARBD-Ko: A Contextually Annotated Review Benchmark Dataset for Aspect-Level Sentiment Classification in Korean | Feb 23, 2024 | ClassificationHallucination | —Unverified | 0 |
| Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding | Feb 23, 2024 | HallucinationObject | CodeCode Available | 1 |
| UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models | Feb 22, 2024 | HallucinationRetrieval | CodeCode Available | 0 |
| DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models | Feb 22, 2024 | Hallucination | CodeCode Available | 0 |
| Visual Hallucinations of Multi-modal Large Language Models | Feb 22, 2024 | DiversityHallucination | CodeCode Available | 1 |
| Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective | Feb 22, 2024 | HallucinationSentence | CodeCode Available | 2 |
| Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer | Feb 22, 2024 | Generative Question AnsweringHallucination | —Unverified | 0 |
| Science Checker Reloaded: A Bidirectional Paradigm for Transparency and Logical Reasoning | Feb 21, 2024 | HallucinationInformation Retrieval | CodeCode Available | 0 |
| Emergence and dynamics of delusions and hallucinations across stages in early psychosis | Feb 20, 2024 | Hallucination | —Unverified | 0 |
| Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation | Feb 20, 2024 | HallucinationMachine Translation | —Unverified | 0 |
| OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination Detection with Weakly Supervised Data | Feb 20, 2024 | Few-Shot LearningHallucination | —Unverified | 0 |
| OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification | Feb 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| GOOD: Towards Domain Generalized Orientated Object Detection | Feb 20, 2024 | HallucinationObject | —Unverified | 0 |
| TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization | Feb 20, 2024 | HallucinationNews Summarization | CodeCode Available | 1 |
| Structured Chain-of-Thought Prompting for Few-Shot Generation of Content-Grounded QA Conversations | Feb 19, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |