| Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers | Apr 27, 2025 | HallucinationQuestion Answering | CodeCode Available | 5 |
| Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean | Apr 18, 2024 | Automated Theorem ProvingHallucination | CodeCode Available | 5 |
| Weakly Supervised Detection of Hallucinations in LLM Activations | Dec 5, 2023 | HallucinationLanguage Modeling | CodeCode Available | 5 |
| Ferret: Refer and Ground Anything Anywhere at Any Granularity | Oct 11, 2023 | HallucinationLanguage Modeling | CodeCode Available | 5 |
| Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model | Jun 28, 2023 | HallucinationKnowledge Graphs | CodeCode Available | 5 |
| LettuceDetect: A Hallucination Detection Framework for RAG Applications | Feb 24, 2025 | 8kGPU | CodeCode Available | 4 |
| Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding | Jan 14, 2025 | Embodied Question AnsweringHallucination | CodeCode Available | 4 |
| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 |
| Halu-J: Critique-Based Hallucination Judge | Jul 17, 2024 | Evidence SelectionHallucination | CodeCode Available | 4 |
| Hallucination of Multimodal Large Language Models: A Survey | Apr 29, 2024 | HallucinationSurvey | CodeCode Available | 4 |