| Med-HALT: Medical Domain Hallucination Test for Large Language Models | Jul 28, 2023 | HallucinationInformation Retrieval | CodeCode Available | 1 | 5 |
| Doc2Query--: When Less is More | Jan 9, 2023 | HallucinationRetrieval | CodeCode Available | 1 | 5 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 | 5 |
| IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking | Oct 9, 2024 | ARCCode Generation | CodeCode Available | 1 | 5 |
| Distinguishing Ignorance from Error in LLM Hallucinations | Oct 29, 2024 | HallucinationQuestion Answering | CodeCode Available | 1 | 5 |
| AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation | Oct 4, 2023 | HallucinationText Generation | CodeCode Available | 1 | 5 |
| Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations | Apr 15, 2024 | BenchmarkingBias Detection | CodeCode Available | 1 | 5 |
| Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback | Apr 22, 2024 | AttributeHallucination | CodeCode Available | 1 | 5 |
| Detecting and Preventing Hallucinations in Large Vision Language Models | Aug 11, 2023 | 16kHallucination | CodeCode Available | 1 | 5 |
| Lyra: Orchestrating Dual Correction in Automated Theorem Proving | Sep 27, 2023 | Automated Theorem ProvingHallucination | CodeCode Available | 1 | 5 |