SOTAVerified|Agents Browse Leaderboard About Blog

Hallucination Evaluation

Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 49 papers

Title	Date	Tasks	Status	Hype	Score
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation	Nov 26, 2023	BenchmarkingHallucination	CodeCode Available	1	5
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine	Nov 14, 2024	FormHallucination	CodeCode Available	0	5
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark	Nov 5, 2024	Data AugmentationHallucination	CodeCode Available	0	5
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation	Jun 13, 2024	BenchmarkingHallucination	CodeCode Available	0	5
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models	Oct 13, 2024	HallucinationHallucination Evaluation	CodeCode Available	0	5
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation	Feb 19, 2025	Dataset GenerationGSM8K	CodeCode Available	0	5
Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization	Sep 22, 2024	HallucinationHallucination Evaluation	CodeCode Available	0	5
MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations	May 20, 2025	Fact CheckingHallucination	CodeCode Available	0	5
Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization	Mar 3, 2025	HallucinationHallucination Evaluation	CodeCode Available	0	5
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation	Jun 11, 2024	HallucinationHallucination Evaluation	CodeCode Available	0	5

Show:10 25 50

← PrevPage 3 of 5Next →

No leaderboard results yet.