SOTAVerified

Hallucination

Papers

Showing 251300 of 1816 papers

TitleStatusHype
Automated Review Generation Method Based on Large Language ModelsCode1
Enhancing LLM's Cognition via StructurizationCode1
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal ReasoningCode1
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive TasksCode1
Multi-Object Hallucination in Vision-Language ModelsCode1
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?Code1
MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical ContextCode1
FineSurE: Fine-grained Summarization Evaluation using LLMsCode1
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language ModelsCode1
GraphArena: Benchmarking Large Language Models on Graph Computational ProblemsCode1
ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language ModelsCode1
Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language ModelsCode1
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language ModelsCode1
Knowledge Graph-Enhanced Large Language Models via Path SelectionCode1
Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative DecodingCode1
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination DetectorCode1
MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-ExpertsCode1
MMRel: A Relation Understanding Benchmark in the MLLM EraCode1
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMsCode1
REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic EntropyCode1
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented GenerationCode1
An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language ModelsCode1
Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial TrainingCode1
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language ModelsCode1
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference OptimizationCode1
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image PerceptionCode1
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMsCode1
The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG)Code1
Automated Multi-level Preference for MLLMsCode1
Enhancing Semantics in Multimodal Chain of Thought via Soft Negative SamplingCode1
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language ModelsCode1
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based VerificationCode1
LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented GenerationCode1
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language ModelsCode1
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI FeedbackCode1
Exploring the Transferability of Visual Prompting for Multimodal Large Language ModelsCode1
MemLLM: Finetuning LLMs to Use An Explicit Read-Write MemoryCode1
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsCode1
Harnessing GPT-4V(ision) for Insurance: A Preliminary ExplorationCode1
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMsCode1
CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph PromptingCode1
Tackling Structural Hallucination in Image Translation with Local DiffusionCode1
Learning From Correctness Without Prompting Makes LLM Efficient ReasonerCode1
Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question AnsweringCode1
JDocQA: Japanese Document Question Answering Dataset for Generative Language ModelsCode1
UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator PredictionCode1
Pensieve: Retrospect-then-Compare Mitigates Visual HallucinationCode1
What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal ModelsCode1
PhD: A ChatGPT-Prompted Visual hallucination Evaluation DatasetCode1
Circuit Transformer: A Transformer That Preserves Logical EquivalenceCode1
Show:102550
← PrevPage 6 of 37Next →

No leaderboard results yet.