| GraphArena: Benchmarking Large Language Models on Graph Computational Problems | Jun 29, 2024 | BenchmarkingHallucination | CodeCode Available | 1 |
| Grounded Chain-of-Thought for Multimodal Large Language Models | Mar 17, 2025 | HallucinationSpatial Reasoning | CodeCode Available | 1 |
| Phare: A Safety Probe for Large Language Models | May 16, 2025 | DiagnosticHallucination | CodeCode Available | 1 |
| GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks | Mar 23, 2025 | BenchmarkingHallucination | CodeCode Available | 1 |
| A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity | Feb 8, 2023 | Code GenerationHallucination | CodeCode Available | 1 |
| BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models | Sep 23, 2023 | Code CompletionHallucination | CodeCode Available | 1 |
| Balanced Classification: A Unified Framework for Long-Tailed Object Detection | Aug 4, 2023 | HallucinationLong-tailed Object Detection | CodeCode Available | 1 |
| Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations | Apr 15, 2024 | BenchmarkingBias Detection | CodeCode Available | 1 |
| Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards | May 7, 2025 | BenchmarkingHallucination | CodeCode Available | 1 |
| BachGAN: High-Resolution Image Synthesis from Salient Object Layout | Mar 26, 2020 | Generative Adversarial NetworkHallucination | CodeCode Available | 1 |
| Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations | Apr 18, 2025 | Hallucination | CodeCode Available | 1 |
| Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations | Feb 10, 2024 | DiagnosticHallucination | CodeCode Available | 1 |
| Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model | Aug 2, 2023 | HallucinationImage Captioning | CodeCode Available | 1 |
| Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization | Nov 28, 2023 | HallucinationMME | CodeCode Available | 1 |
| FlySearch: Exploring how vision-language models explore | Jun 3, 2025 | HallucinationTask Planning | CodeCode Available | 1 |
| Can LLMs be Good Graph Judge for Knowledge Graph Construction? | Nov 26, 2024 | Denoisinggraph construction | CodeCode Available | 1 |
| PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model | Jan 21, 2025 | HallucinationImage Captioning | CodeCode Available | 1 |
| Generating Natural Language Proofs with Verifier-Guided Search | May 25, 2022 | Hallucinationvalid | CodeCode Available | 1 |
| An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Jun 7, 2024 | Hallucinationparameter-efficient fine-tuning | CodeCode Available | 1 |
| Harnessing GPT-4V(ision) for Insurance: A Preliminary Exploration | Apr 15, 2024 | Hallucination | CodeCode Available | 1 |
| BIGPrior: Towards Decoupling Learned Prior Hallucination and Data Fidelity in Image Restoration | Nov 3, 2020 | ColorizationDenoising | CodeCode Available | 1 |
| KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | Jun 24, 2025 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| Federated Recommendation via Hybrid Retrieval Augmented Generation | Mar 7, 2024 | HallucinationPrivacy Preserving | CodeCode Available | 1 |
| Automatic Curriculum Expert Iteration for Reliable LLM Reasoning | Oct 10, 2024 | HallucinationLogical Reasoning | CodeCode Available | 1 |
| AdaPlanner: Adaptive Planning from Feedback with Language Models | May 26, 2023 | Decision MakingHallucination | CodeCode Available | 1 |