SOTAVerified

Hallucination

Papers

Showing 2650 of 1816 papers

TitleStatusHype
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-CollaborationCode4
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question AnsweringCode4
Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in ChineseCode4
Retrieval-Augmented Generation for Large Language Models: A SurveyCode4
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video UnderstandingCode4
Retrieval Head Mechanistically Explains Long-Context FactualityCode3
RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language ModelsCode3
ResumeFlow: An LLM-facilitated Pipeline for Personalized Resume Generation and RefinementCode3
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language ModelsCode3
Automated Hypothesis Validation with Agentic Sequential FalsificationsCode3
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language ProcessingCode3
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth FusionCode3
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language ModelsCode3
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative ModelsCode3
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon GenerationCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language ModelsCode3
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language ModelsCode3
Learning Dynamics of LLM FinetuningCode3
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge DistillationCode3
CRAG -- Comprehensive RAG BenchmarkCode3
PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language ModelsCode3
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language ModelsCode3
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning AgentCode3
Show:102550
← PrevPage 2 of 73Next →

No leaderboard results yet.