| CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy | Dec 3, 2024 | HallucinationKey Information Extraction | —Unverified | 0 |
| AI Benchmarks and Datasets for LLM Evaluation | Dec 2, 2024 | BenchmarkingDistributed Computing | —Unverified | 0 |
| Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment | Dec 1, 2024 | Action DetectionActivity Detection | CodeCode Available | 0 |
| Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs | Nov 28, 2024 | AttributeHallucination | —Unverified | 0 |
| OPCap:Object-aware Prompting Captioning | Nov 27, 2024 | AttributeDecoder | —Unverified | 0 |
| DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models | Nov 27, 2024 | AttributeHallucination | —Unverified | 0 |
| Can LLMs be Good Graph Judge for Knowledge Graph Construction? | Nov 26, 2024 | Denoisinggraph construction | CodeCode Available | 1 |
| Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach | Nov 26, 2024 | Hallucination | —Unverified | 0 |
| VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models | Nov 26, 2024 | Hallucination | —Unverified | 0 |
| Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning | Nov 26, 2024 | HallucinationLogical Reasoning | —Unverified | 0 |