| Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions | Jun 7, 2024 | HallucinationMathematical Reasoning | —Unverified | 0 |
| 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination | Jun 7, 2024 | Hallucination | CodeCode Available | 2 |
| Chaos with Keywords: Exposing Large Language Models Sycophantic Hallucination to Misleading Keywords and Evaluating Defense Strategies | Jun 6, 2024 | HallucinationKnowledge Probing | —Unverified | 0 |
| ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints | Jun 6, 2024 | DiagnosticHallucination | —Unverified | 0 |
| Confabulation: The Surprising Value of Large Language Model Hallucinations | Jun 6, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework | Jun 5, 2024 | Fact CheckingHallucination | —Unverified | 0 |
| Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends | Jun 5, 2024 | Hallucination | —Unverified | 0 |
| OTTAWA: Optimal TransporT Adaptive Word Aligner for Hallucination and Omission Translation Errors Detection | Jun 4, 2024 | HallucinationMachine Translation | CodeCode Available | 0 |
| Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs | Jun 4, 2024 | BenchmarkingFairness | —Unverified | 0 |
| CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models | Jun 4, 2024 | HallucinationInformativeness | —Unverified | 0 |
| How to Explore with Belief: State Entropy Maximization in POMDPs | Jun 4, 2024 | Hallucination | —Unverified | 0 |
| Ask-EDA: A Design Assistant Empowered by LLM, Hybrid RAG and Abbreviation De-hallucination | Jun 3, 2024 | HallucinationQuestion Answering | —Unverified | 0 |
| Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach | Jun 3, 2024 | AI AgentDeep Reinforcement Learning | —Unverified | 0 |
| Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs | Jun 3, 2024 | Decision MakingEvent Argument Extraction | —Unverified | 0 |
| Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost | Jun 3, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| Comprehensive Evaluation of Large Language Models for Topic Modeling | Jun 2, 2024 | HallucinationTopic Models | —Unverified | 0 |
| DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models | May 31, 2024 | HallucinationModel Editing | CodeCode Available | 0 |
| Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training | May 31, 2024 | HallucinationMulti-Task Learning | CodeCode Available | 1 |
| Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts | May 30, 2024 | AllHallucination | —Unverified | 0 |
| Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools | May 30, 2024 | HallucinationRAG | —Unverified | 0 |
| ANAH: Analytical Annotation of Hallucinations in Large Language Models | May 30, 2024 | Generative Question AnsweringHallucination | CodeCode Available | 2 |
| NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models | May 30, 2024 | Hallucination | CodeCode Available | 0 |
| MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification | May 29, 2024 | HallucinationImage Captioning | —Unverified | 0 |
| MASSIVE Multilingual Abstract Meaning Representation: A Dataset and Baselines for Hallucination Detection | May 29, 2024 | Abstract Meaning RepresentationHallucination | —Unverified | 0 |
| Two-Layer Retrieval-Augmented Generation Framework for Low-Resource Medical Question Answering Using Reddit Data: Proof-of-Concept Study | May 29, 2024 | Answer GenerationHallucination | —Unverified | 0 |
| Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization | May 28, 2024 | Hallucination | CodeCode Available | 1 |
| LLMs and Memorization: On Quality and Specificity of Copyright Compliance | May 28, 2024 | HallucinationMemorization | CodeCode Available | 0 |
| Data-augmented phrase-level alignment for mitigating object hallucination | May 28, 2024 | Data AugmentationHallucination | —Unverified | 0 |
| RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models | May 28, 2024 | HallucinationMME | —Unverified | 0 |
| Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action | May 28, 2024 | Conversational Question AnsweringHallucination | —Unverified | 0 |
| TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | May 28, 2024 | Hallucination | CodeCode Available | 1 |
| RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness | May 27, 2024 | HallucinationImage Captioning | CodeCode Available | 11 |
| Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings | May 27, 2024 | Domain AdaptationGPU | —Unverified | 0 |
| Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks | May 27, 2024 | HallucinationObject Hallucination | CodeCode Available | 0 |
| GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases | May 25, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Large Language Model Pruning | May 24, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement | May 24, 2024 | HallucinationImage Comprehension | CodeCode Available | 2 |
| CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems | May 24, 2024 | DiagnosticHallucination | —Unverified | 0 |
| Scaling Laws for Discriminative Classification in Large Language Models | May 24, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception | May 24, 2024 | Hallucination | CodeCode Available | 1 |
| Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization | May 24, 2024 | Hallucination | CodeCode Available | 0 |
| Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs | May 24, 2024 | HallucinationResponse Generation | CodeCode Available | 1 |
| Calibrated Self-Rewarding Vision Language Models | May 23, 2024 | HallucinationLanguage Modelling | CodeCode Available | 2 |
| RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models | May 23, 2024 | HallucinationSentence | CodeCode Available | 3 |
| WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models | May 23, 2024 | HallucinationModel Editing | —Unverified | 0 |
| Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation | May 22, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| Gradient Projection For Continual Parameter-Efficient Tuning | May 22, 2024 | Continual LearningHallucination | —Unverified | 0 |
| CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models | May 22, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games | May 22, 2024 | Code GenerationDecision Making | —Unverified | 0 |
| Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution | May 21, 2024 | Graph Neural NetworkHallucination | —Unverified | 0 |