| Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification | Mar 7, 2024 | Fact CheckingHallucination | —Unverified | 0 |
| Effectiveness Assessment of Recent Large Vision-Language Models | Mar 7, 2024 | Anomaly DetectionAttribute | —Unverified | 0 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset | Mar 6, 2024 | HallucinationIn-Context Learning | CodeCode Available | 0 |
| KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | Mar 5, 2024 | HallucinationSelf-Learning | CodeCode Available | 3 |
| InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers | Mar 5, 2024 | Hallucination | CodeCode Available | 1 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 |
| Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering | Mar 3, 2024 | Claim VerificationGraph Question Answering | —Unverified | 0 |
| Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models | Mar 3, 2024 | Hallucination | —Unverified | 0 |
| CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail Knowledge | Mar 3, 2024 | Claim VerificationGraph Question Answering | CodeCode Available | 1 |