SOTAVerified

Hallucination Evaluation

Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.

Papers

Showing 149 of 49 papers

TitleStatusHype
HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation0
KnowRL: Exploring Knowledgeable Reinforcement Learning for FactualityCode1
MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM HallucinationsCode0
Benchmarking LLM Faithfulness in RAG with Evolving LeaderboardsCode1
Mitigating Image Captioning Hallucinations in Vision-Language Models0
Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs0
Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best?0
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs0
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and MitigationCode1
Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of SummarizationCode0
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination EvaluationCode0
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization0
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in BiomedicineCode0
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation BenchmarkCode0
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language ModelsCode0
A Survey of Hallucination in Large Visual Language Models0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
TLDR: Token-Level Detective Reward Model for Large Vision Language Models0
Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption UtilizationCode0
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs0
Evaluating Image Hallucination in Text-to-Image Generation with Question-AnsweringCode1
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language ModelsCode1
Enhancing LLM's Cognition via StructurizationCode1
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework0
Lynx: An Open Source Hallucination Evaluation Model0
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language ModelsCode3
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination EvaluationCode0
CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems0
TextSquare: Scaling up Text-Centric Visual Instruction Tuning0
Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation0
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation0
PhD: A ChatGPT-Prompted Visual hallucination Evaluation DatasetCode1
DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language ModelsCode1
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful SpaceCode2
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models0
Do Androids Know They're Only Dreaming of Electric Sheep?0
Alleviating Hallucinations of Large Language Models through Induced HallucinationsCode1
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption RewritesCode1
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained GenerationCode1
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction DataCode1
Investigating Hallucinations in Pruned Large Language Models for Abstractive SummarizationCode1
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination EvaluationCode1
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language ModelsCode2
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks0
Analyzing and Mitigating Object Hallucination in Large Vision-Language ModelsCode1
Evaluation and Analysis of Hallucination in Large Vision-Language ModelsCode1
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language ModelsCode2
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language ModelsCode2
Show:102550

No leaderboard results yet.