SOTAVerified

Hallucination

Papers

Showing 401425 of 1816 papers

TitleStatusHype
Phare: A Safety Probe for Large Language ModelsCode1
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM OutputsCode1
EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot ControlCode1
Introspective Planning: Aligning Robots' Uncertainty with Inherent Task AmbiguityCode1
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information FlowCode1
MMRel: A Relation Understanding Benchmark in the MLLM EraCode1
Deficiency-Aware Masked Transformer for Video InpaintingCode1
Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question AnsweringCode1
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented GenerationCode1
Is ChatGPT a Good Causal Reasoner? A Comprehensive EvaluationCode1
MemLLM: Finetuning LLMs to Use An Explicit Read-Write MemoryCode1
JDocQA: Japanese Document Question Answering Dataset for Generative Language ModelsCode1
MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical ContextCode1
Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning ModelsCode1
MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language ModelsCode1
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and BeyondCode1
Med-HALT: Medical Domain Hallucination Test for Large Language ModelsCode1
Doc2Query--: When Less is MoreCode1
Detecting and Preventing Hallucinations in Large Vision Language ModelsCode1
Benchmarking LLM Faithfulness in RAG with Evolving LeaderboardsCode1
AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language GenerationCode1
Detecting Hallucinated Content in Conditional Neural Sequence GenerationCode1
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition BenchmarkCode1
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption RewritesCode1
DiffFuSR: Super-Resolution of all Sentinel-2 Multispectral Bands using Diffusion ModelsCode1
Show:102550
← PrevPage 17 of 73Next →

No leaderboard results yet.