SOTAVerified

Hallucination

Papers

Showing 150 of 1816 papers

TitleStatusHype
MiniCPM-V: A GPT-4V Level MLLM on Your PhoneCode12
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language ModelsCode11
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V TrustworthinessCode11
SWIFT:A Scalable lightWeight Infrastructure for Fine-TuningCode11
MoE-LLaVA: Mixture of Experts for Large Vision-Language ModelsCode7
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?Code7
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human FeedbackCode6
Gorilla: Large Language Model Connected with Massive APIsCode6
Lean Copilot: Large Language Models as Copilots for Theorem Proving in LeanCode5
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement LearningCode5
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble ScorersCode5
Weakly Supervised Detection of Hallucinations in LLM ActivationsCode5
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language ModelCode5
Ferret: Refer and Ground Anything Anywhere at Any GranularityCode5
UQLM: A Python Package for Uncertainty Quantification in Large Language ModelsCode5
Retrieval-Augmented Generation for Large Language Models: A SurveyCode4
Hallucination of Multimodal Large Language Models: A SurveyCode4
LettuceDetect: A Hallucination Detection Framework for RAG ApplicationsCode4
ReAct: Synergizing Reasoning and Acting in Language ModelsCode4
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language ModelsCode4
LLM-Enhanced Data ManagementCode4
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo LabellingCode4
Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in ChineseCode4
Multimodal Chain-of-Thought Reasoning in Language ModelsCode4
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
Halu-J: Critique-Based Hallucination JudgeCode4
The All-Seeing Project V2: Towards General Relation Comprehension of the Open WorldCode4
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-CollaborationCode4
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question AnsweringCode4
A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and ChallengesCode4
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video UnderstandingCode4
Retrieval Head Mechanistically Explains Long-Context FactualityCode3
RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language ModelsCode3
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language ModelsCode3
ResumeFlow: An LLM-facilitated Pipeline for Personalized Resume Generation and RefinementCode3
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language ProcessingCode3
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative ModelsCode3
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon GenerationCode3
Evaluating Hallucinations in Chinese Large Language ModelsCode3
PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language ModelsCode3
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth FusionCode3
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language ModelsCode3
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language ModelsCode3
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language ModelsCode3
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language ModelsCode3
Learning Dynamics of LLM FinetuningCode3
Automated Hypothesis Validation with Agentic Sequential FalsificationsCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge DistillationCode3
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language ModelsCode3
Show:102550
← PrevPage 1 of 37Next →

No leaderboard results yet.