| Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers | Apr 27, 2025 | HallucinationQuestion Answering | CodeCode Available | 5 | 5 |
| Weakly Supervised Detection of Hallucinations in LLM Activations | Dec 5, 2023 | HallucinationLanguage Modeling | CodeCode Available | 5 | 5 |
| DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning | May 20, 2025 | HallucinationMathematical Reasoning | CodeCode Available | 5 | 5 |
| Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model | Jun 28, 2023 | HallucinationKnowledge Graphs | CodeCode Available | 5 | 5 |
| Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean | Apr 18, 2024 | Automated Theorem ProvingHallucination | CodeCode Available | 5 | 5 |
| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 | 5 |
| Multimodal Chain-of-Thought Reasoning in Language Models | Feb 2, 2023 | HallucinationLanguage Modelling | CodeCode Available | 4 | 5 |
| Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Feb 12, 2024 | HallucinationObject Localization | CodeCode Available | 4 | 5 |
| Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese | Sep 8, 2023 | Domain AdaptationHallucination | CodeCode Available | 4 | 5 |
| Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models | Jul 30, 2023 | HallucinationPrompt Engineering | CodeCode Available | 4 | 5 |