| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 | 5 |
| Distinguishing Ignorance from Error in LLM Hallucinations | Oct 29, 2024 | HallucinationQuestion Answering | CodeCode Available | 1 | 5 |
| Mitigating Multilingual Hallucination in Large Vision-Language Models | Aug 1, 2024 | Hallucination | CodeCode Available | 1 | 5 |
| ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark | Jan 9, 2025 | FairnessHallucination | CodeCode Available | 1 | 5 |
| KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection | Oct 13, 2023 | Abstractive Text SummarizationHallucination | CodeCode Available | 1 | 5 |
| K-QA: A Real-World Medical Q&A Benchmark | Jan 25, 2024 | HallucinationIn-Context Learning | CodeCode Available | 1 | 5 |
| Alleviating Hallucinations of Large Language Models through Induced Hallucinations | Dec 25, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 1 | 5 |
| Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation | May 12, 2023 | HallucinationIn-Context Learning | CodeCode Available | 1 | 5 |
| DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models | Mar 1, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 1 | 5 |
| DiffFuSR: Super-Resolution of all Sentinel-2 Multispectral Bands using Diffusion Models | Jun 13, 2025 | AllHallucination | CodeCode Available | 1 | 5 |