| Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs | Apr 1, 2024 | Common Sense ReasoningObject | —Unverified | 0 |
| Designing a Robust Radiology Report Generation System | Nov 2, 2024 | Decision MakingDiagnostic | —Unverified | 0 |
| Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs | Nov 28, 2024 | AttributeHallucination | —Unverified | 0 |
| Achieving Human Parity on Visual Question Answering | Nov 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning | Jan 28, 2024 | Data AugmentationQuestion Answering | —Unverified | 0 |
| Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions | Apr 6, 2023 | In-Context LearningQuestion Answering | —Unverified | 0 |
| Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning | May 19, 2024 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model | Jan 29, 2024 | FormLanguage Modeling | —Unverified | 0 |
| Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis | May 1, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT | Apr 11, 2023 | DiagnosticImage Captioning | —Unverified | 0 |
| DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs | Jun 6, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning | Oct 8, 2024 | Image RetrievalMath | —Unverified | 0 |
| An experimental study of the vision-bottleneck in VQA | Feb 14, 2022 | ObjectQuestion Answering | —Unverified | 0 |
| Improved Baselines for Data-efficient Perceptual Augmentation of LLMs | Mar 20, 2024 | Audio captioningImage Captioning | —Unverified | 0 |
| Improved Bilinear Pooling with CNNs | Jul 21, 2017 | GPUQuestion Answering | —Unverified | 0 |
| An Evaluation of GPT-4V and Gemini in Online VQA | Dec 17, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Deep learning evaluation using deep linguistic processing | Jun 5, 2017 | Deep LearningMultimodal Deep Learning | —Unverified | 0 |
| Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models | Jul 23, 2024 | Computational EfficiencyImage Captioning | —Unverified | 0 |
| Deep Exemplar Networks for VQA and VQG | Dec 19, 2019 | DecoderQuestion Answering | —Unverified | 0 |
| Deep Bayesian Active Learning for Multiple Correct Outputs | Dec 2, 2019 | Active LearningAnswer Generation | —Unverified | 0 |
| BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering | Dec 13, 2023 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Deep Attention Neural Tensor Network for Visual Question Answering | Sep 1, 2018 | Deep AttentionQuestion Answering | —Unverified | 0 |
| Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering | Sep 4, 2019 | Image CaptioningObject | —Unverified | 0 |
| Benchmarking Vision Language Models for Cultural Understanding | Jul 15, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering | Jan 1, 2023 | Continual LearningLanguage Modelling | —Unverified | 0 |