SOTAVerified

Hallucination

Papers

Showing 151200 of 1816 papers

TitleStatusHype
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language ModelsCode2
Lawyer LLaMA Technical ReportCode2
Enabling Large Language Models to Generate Text with CitationsCode2
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language ModelsCode2
HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine TranslationCode2
Evaluating Object Hallucination in Large Vision-Language ModelsCode2
Exploring Human-Like Translation Strategy with Large Language ModelsCode2
GPT-NER: Named Entity Recognition via Large Language ModelsCode2
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language ModelsCode2
MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in SummarizationCode2
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step QuestionsCode2
Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects SuppressionCode2
PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervisionCode2
Mitigating Object Hallucinations via Sentence-Level Early InterventionCode1
KnowRL: Exploring Knowledgeable Reinforcement Learning for FactualityCode1
DiffFuSR: Super-Resolution of all Sentinel-2 Multispectral Bands using Diffusion ModelsCode1
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMsCode1
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language ModelsCode1
Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning ModelsCode1
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal VerificationCode1
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data SynthesisCode1
FlySearch: Exploring how vision-language models exploreCode1
The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning ModelsCode1
CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language ModelsCode1
R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement LearningCode1
Removal of Hallucination on Hallucination: Debate-Augmented RAGCode1
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head SuppressionCode1
Know Or Not: a library for evaluating out-of-knowledge base robustnessCode1
Phare: A Safety Probe for Large Language ModelsCode1
Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented GenerationCode1
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM OutputsCode1
Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language ModelsCode1
Benchmarking LLM Faithfulness in RAG with Evolving LeaderboardsCode1
Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question AnsweringCode1
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video UnderstandingCode1
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object PerceptionCode1
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal RepresentationsCode1
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video ModelsCode1
EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot ControlCode1
The Mirage of Performance Gains: Why Contrastive Decoding Fails to Address Multimodal HallucinationCode1
Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-GenerationCode1
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and MitigationCode1
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer TextCode1
CAFe: Unifying Representation and Generation with Contrastive-Autoregressive FinetuningCode1
GeoBenchX: Benchmarking LLMs for Multistep Geospatial TasksCode1
ProDehaze: Prompting Diffusion Models Toward Faithful Image DehazingCode1
Grounded Chain-of-Thought for Multimodal Large Language ModelsCode1
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-InterventionCode1
Towards General Visual-Linguistic Face Forgery Detection(V2)Code1
Show:102550
← PrevPage 4 of 37Next →

No leaderboard results yet.