Hallucination Evaluation

Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–49 of 49 papers

Title	Date	Tasks	Status	Hype
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models	Jun 16, 2024	HallucinationHallucination Evaluation	CodeCode Available	3
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation	Jun 13, 2024	BenchmarkingHallucination	CodeCode Available	0
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation	Jun 11, 2024	HallucinationHallucination Evaluation	CodeCode Available	0
CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems	May 24, 2024	DiagnosticHallucination	—Unverified	0
TextSquare: Scaling up Text-Centric Visual Instruction Tuning	Apr 19, 2024	HallucinationHallucination Evaluation	—Unverified	0
Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation	Apr 18, 2024	HallucinationHallucination Evaluation	—Unverified	0
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation	Apr 1, 2024	Code GenerationHallucination	—Unverified	0
PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset	Mar 17, 2024	AttributeCommon Sense Reasoning	CodeCode Available	1
DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models	Mar 1, 2024	HallucinationHallucination Evaluation	CodeCode Available	1
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space	Feb 27, 2024	Contrastive LearningHallucination	CodeCode Available	2
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models	Feb 24, 2024	HallucinationHallucination Evaluation	—Unverified	0
Do Androids Know They're Only Dreaming of Electric Sheep?	Dec 28, 2023	HallucinationHallucination Evaluation	—Unverified	0
Alleviating Hallucinations of Large Language Models through Induced Hallucinations	Dec 25, 2023	HallucinationHallucination Evaluation	CodeCode Available	1
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites	Dec 4, 2023	HallucinationHallucination Evaluation	CodeCode Available	1
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation	Nov 26, 2023	BenchmarkingHallucination	CodeCode Available	1
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data	Nov 22, 2023	Attributecounterfactual	CodeCode Available	1
Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization	Nov 15, 2023	Abstractive Text SummarizationHallucination	CodeCode Available	1
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation	Nov 13, 2023	AttributeHallucination	CodeCode Available	1
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models	Oct 23, 2023	DiagnosticHallucination	CodeCode Available	2
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks	Oct 19, 2023	HallucinationHallucination Evaluation	—Unverified	0
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models	Oct 1, 2023	HallucinationHallucination Evaluation	CodeCode Available	1
Evaluation and Analysis of Hallucination in Large Vision-Language Models	Aug 29, 2023	HallucinationHallucination Evaluation	CodeCode Available	1
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models	Aug 17, 2023	Decision MakingHallucination	CodeCode Available	2
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models	May 19, 2023	HallucinationHallucination Evaluation	CodeCode Available	2

Show:10 25 50

← PrevPage 2 of 2Next →

No leaderboard results yet.