| DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation | Jun 9, 2024 | Common Sense ReasoningDenoising | CodeCode Available | 1 | 5 |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 | 5 |
| Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation | Mar 16, 2022 | HallucinationMachine Translation | CodeCode Available | 1 | 5 |
| 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior | Mar 31, 2020 | 3D Semantic Scene Completion3D Semantic Scene Completion from a single RGB image | CodeCode Available | 1 | 5 |
| Advancing TTP Analysis: Harnessing the Power of Large Language Models with Retrieval Augmented Generation | Dec 30, 2023 | DecoderHallucination | CodeCode Available | 1 | 5 |
| BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models | Oct 2, 2023 | HallucinationRetrieval | CodeCode Available | 1 | 5 |
| AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation | Nov 13, 2023 | AttributeHallucination | CodeCode Available | 1 | 5 |
| CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning | Mar 25, 2025 | HallucinationLanguage Modeling | CodeCode Available | 1 | 5 |
| Doc2Query--: When Less is More | Jan 9, 2023 | HallucinationRetrieval | CodeCode Available | 1 | 5 |
| ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark | Jan 9, 2025 | FairnessHallucination | CodeCode Available | 1 | 5 |