| RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models | Nov 1, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers | Oct 31, 2024 | Hallucination | —Unverified | 0 |
| Exploring the Knowledge Mismatch Hypothesis: Hallucination Propensity in Small Models Fine-tuned on Data from Larger Models | Oct 31, 2024 | HallucinationMisinformation | —Unverified | 0 |
| EF-LLM: Energy Forecasting LLM with AI-assisted Automation, Enhanced Sparse Prediction, Hallucination Detection | Oct 30, 2024 | Continual LearningHallucination | —Unverified | 0 |
| Beyond Ontology in Dialogue State Tracking for Goal-Oriented Chatbot | Oct 30, 2024 | ChatbotDialogue State Tracking | CodeCode Available | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models | Oct 30, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| Distinguishing Ignorance from Error in LLM Hallucinations | Oct 29, 2024 | HallucinationQuestion Answering | CodeCode Available | 1 |
| MARCO: Multi-Agent Real-time Chat Orchestration | Oct 29, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation | Oct 29, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |