| MedScore: Factuality Evaluation of Free-Form Medical Answers | May 24, 2025 | FormHallucination | CodeCode Available | 0 | 5 |
| On the Universal Truthfulness Hyperplane Inside LLMs | Jul 11, 2024 | DiversityDomain Generalization | CodeCode Available | 0 | 5 |
| MAVEN-Fact: A Large-scale Event Factuality Detection Dataset | Jul 22, 2024 | Hallucination | CodeCode Available | 0 | 5 |
| Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling | May 1, 2024 | HallucinationTopic Classification | CodeCode Available | 0 | 5 |
| MCiteBench: A Multimodal Benchmark for Generating Text with Citations | Mar 4, 2025 | HallucinationText Generation | CodeCode Available | 0 | 5 |
| MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback | Oct 17, 2024 | Fact VerificationHallucination | CodeCode Available | 0 | 5 |
| Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews | May 19, 2023 | Decision MakingHallucination | CodeCode Available | 0 | 5 |
| Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models | Jul 23, 2024 | HallucinationMachine Translation | CodeCode Available | 0 | 5 |
| OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models | Jan 22, 2025 | Hallucination | CodeCode Available | 0 | 5 |
| LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression | Mar 6, 2025 | BenchmarkingCommon Sense Reasoning | CodeCode Available | 0 | 5 |