SOTAVerified

Hallucination Evaluation

Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.

Papers

Showing 125 of 49 papers

TitleStatusHype
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language ModelsCode3
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language ModelsCode2
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language ModelsCode2
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful SpaceCode2
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language ModelsCode2
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption RewritesCode1
Alleviating Hallucinations of Large Language Models through Induced HallucinationsCode1
Analyzing and Mitigating Object Hallucination in Large Vision-Language ModelsCode1
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination EvaluationCode1
Benchmarking LLM Faithfulness in RAG with Evolving LeaderboardsCode1
DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language ModelsCode1
Enhancing LLM's Cognition via StructurizationCode1
Evaluating Image Hallucination in Text-to-Image Generation with Question-AnsweringCode1
Evaluation and Analysis of Hallucination in Large Vision-Language ModelsCode1
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and MitigationCode1
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction DataCode1
KnowRL: Exploring Knowledgeable Reinforcement Learning for FactualityCode1
Investigating Hallucinations in Pruned Large Language Models for Abstractive SummarizationCode1
PhD: A ChatGPT-Prompted Visual hallucination Evaluation DatasetCode1
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language ModelsCode1
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained GenerationCode1
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in BiomedicineCode0
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation BenchmarkCode0
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.