| Exploring the Knowledge Mismatch Hypothesis: Hallucination Propensity in Small Models Fine-tuned on Data from Larger Models | Oct 31, 2024 | HallucinationMisinformation | —Unverified | 0 |
| Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers | Oct 31, 2024 | Hallucination | —Unverified | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| EF-LLM: Energy Forecasting LLM with AI-assisted Automation, Enhanced Sparse Prediction, Hallucination Detection | Oct 30, 2024 | Continual LearningHallucination | —Unverified | 0 |
| Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models | Oct 30, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| Beyond Ontology in Dialogue State Tracking for Goal-Oriented Chatbot | Oct 30, 2024 | ChatbotDialogue State Tracking | CodeCode Available | 0 |
| FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation | Oct 29, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| MARCO: Multi-Agent Real-time Chat Orchestration | Oct 29, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges | Oct 28, 2024 | Drug DiscoveryHallucination | —Unverified | 0 |
| A Debate-Driven Experiment on LLM Hallucinations and Accuracy | Oct 25, 2024 | Fact CheckingHallucination | —Unverified | 0 |