| How Language Model Hallucinations Can Snowball | May 22, 2023 | HallucinationLanguage Modeling | CodeCode Available | 1 |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources | May 22, 2023 | HallucinationLanguage Modelling | CodeCode Available | 1 |
| Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination | May 20, 2023 | HallucinationMachine Translation | CodeCode Available | 1 |
| HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models | May 19, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 2 |
| HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation | May 19, 2023 | HallucinationMachine Translation | CodeCode Available | 2 |
| RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews | May 19, 2023 | Decision MakingHallucination | CodeCode Available | 0 |
| Evaluating Object Hallucination in Large Vision-Language Models | May 17, 2023 | HallucinationObject | CodeCode Available | 2 |
| Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation | May 12, 2023 | HallucinationIn-Context Learning | CodeCode Available | 1 |