SOTAVerified

Hallucination

Papers

Showing 201225 of 1816 papers

TitleStatusHype
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language ModelsCode1
How well can a large language model explain business processes as perceived by users?Code1
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented GenerationCode1
How Language Model Hallucinations Can SnowballCode1
HyperPocket: Generative Point Cloud CompletionCode1
High-resolution Face Swapping via Latent Semantics DisentanglementCode1
Phare: A Safety Probe for Large Language ModelsCode1
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsCode1
Benchmarking LLM Faithfulness in RAG with Evolving LeaderboardsCode1
Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference ChallengesCode1
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and InteractivityCode1
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language ModelsCode1
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training ModelCode1
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference OptimizationCode1
Distinguishing Ignorance from Error in LLM HallucinationsCode1
Balanced Classification: A Unified Framework for Long-Tailed Object DetectionCode1
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought MethodCode1
Enhancing LLM's Cognition via StructurizationCode1
BachGAN: High-Resolution Image Synthesis from Salient Object LayoutCode1
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition BenchmarkCode1
Enhancing Semantics in Multimodal Chain of Thought via Soft Negative SamplingCode1
DiffFuSR: Super-Resolution of all Sentinel-2 Multispectral Bands using Diffusion ModelsCode1
DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language ModelsCode1
Doc2Query--: When Less is MoreCode1
Improving Simultaneous Machine Translation with Monolingual DataCode1
Show:102550
← PrevPage 9 of 73Next →

No leaderboard results yet.