| ReAct: Synergizing Reasoning and Acting in Language Models | Oct 6, 2022 | Decision MakingFact Verification | CodeCode Available | 4 |
| AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models | May 22, 2025 | BenchmarkingFairness | CodeCode Available | 3 |
| Verdict: A Library for Scaling Judge-Time Compute | Feb 25, 2025 | Fact CheckingHallucination | CodeCode Available | 3 |
| Automated Hypothesis Validation with Agentic Sequential Falsifications | Feb 14, 2025 | Decision MakingHallucination | CodeCode Available | 3 |
| VideoRoPE: What Makes for Good Video Rotary Position Embedding? | Feb 7, 2025 | HallucinationPosition | CodeCode Available | 3 |
| Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Dec 5, 2024 | Contrastive LearningHallucination | CodeCode Available | 3 |
| HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems | Nov 5, 2024 | HallucinationRAG | CodeCode Available | 3 |
| Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | Nov 5, 2024 | BenchmarkingHallucination | CodeCode Available | 3 |
| The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio | Oct 16, 2024 | Hallucination | CodeCode Available | 3 |
| MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models | Oct 16, 2024 | DiagnosticHallucination | CodeCode Available | 3 |