| Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions | Jun 7, 2024 | HallucinationMathematical Reasoning | —Unverified | 0 |
| 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination | Jun 7, 2024 | Hallucination | CodeCode Available | 2 |
| Chaos with Keywords: Exposing Large Language Models Sycophantic Hallucination to Misleading Keywords and Evaluating Defense Strategies | Jun 6, 2024 | HallucinationKnowledge Probing | —Unverified | 0 |
| ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints | Jun 6, 2024 | DiagnosticHallucination | —Unverified | 0 |
| Confabulation: The Surprising Value of Large Language Model Hallucinations | Jun 6, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework | Jun 5, 2024 | Fact CheckingHallucination | —Unverified | 0 |
| Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends | Jun 5, 2024 | Hallucination | —Unverified | 0 |
| OTTAWA: Optimal TransporT Adaptive Word Aligner for Hallucination and Omission Translation Errors Detection | Jun 4, 2024 | HallucinationMachine Translation | CodeCode Available | 0 |
| Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs | Jun 4, 2024 | BenchmarkingFairness | —Unverified | 0 |
| CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models | Jun 4, 2024 | HallucinationInformativeness | —Unverified | 0 |