SOTAVerified

Hallucination

Papers

Showing 2650 of 1816 papers

TitleStatusHype
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo LabellingCode4
Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in ChineseCode4
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language ModelsCode4
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-CollaborationCode4
Multimodal Chain-of-Thought Reasoning in Language ModelsCode4
ReAct: Synergizing Reasoning and Acting in Language ModelsCode4
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language ModelsCode3
Verdict: A Library for Scaling Judge-Time ComputeCode3
Automated Hypothesis Validation with Agentic Sequential FalsificationsCode3
VideoRoPE: What Makes for Good Video Rotary Position Embedding?Code3
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth FusionCode3
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning AgentCode3
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG SystemsCode3
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language ModelsCode3
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and AudioCode3
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language ModelsCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge DistillationCode3
Graph Retrieval-Augmented Generation: A SurveyCode3
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
Learning Dynamics of LLM FinetuningCode3
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language ModelsCode3
CRAG -- Comprehensive RAG BenchmarkCode3
RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language ModelsCode3
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language ProcessingCode3
Show:102550
← PrevPage 2 of 73Next →

No leaderboard results yet.