SOTAVerified

Hallucination

Papers

Showing 14011425 of 1816 papers

TitleStatusHype
RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language ModelingCode1
Metric Ensembles For Hallucination Detection0
Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic PapersCode1
Assessing the Reliability of Large Language Model KnowledgeCode0
Configuration Validation with Large Language Models0
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference LettersCode1
Improving Large Language Models in Event Relation Logical PredictionCode1
KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination DetectionCode1
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language ModelsCode2
GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language ModelsCode0
GameGPT: Multi-agent Collaborative Framework for Game Development0
Enhancing Text-based Knowledge Graph Completion with Zero-Shot Large Language Models: A Focus on Semantic EnhancementCode1
Ferret: Refer and Ground Anything Anywhere at Any GranularityCode5
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language ModelsCode1
A New Benchmark and Reverse Validation Method for Passage-level Hallucination DetectionCode0
Teaching Language Models to Hallucinate Less with Synthetic Tasks0
Towards Mitigating Hallucination in Large Language Models via Self-Reflection0
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models0
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations0
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning0
Chain of Natural Language Inference for Reducing Large Language Model Ungrounded HallucinationsCode1
Evaluating Hallucinations in Chinese Large Language ModelsCode3
FreshLLMs: Refreshing Large Language Models with Search Engine AugmentationCode2
MLAgentBench: Evaluating Language Agents on Machine Learning ExperimentationCode2
AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language GenerationCode1
Show:102550
← PrevPage 57 of 73Next →

No leaderboard results yet.