SOTAVerified

Hallucination Evaluation

Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.

Papers

Showing 2130 of 49 papers

TitleStatusHype
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained GenerationCode1
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in BiomedicineCode0
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation BenchmarkCode0
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination EvaluationCode0
Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption UtilizationCode0
MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM HallucinationsCode0
Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of SummarizationCode0
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination EvaluationCode0
Show:102550
← PrevPage 3 of 5Next →

No leaderboard results yet.