| Interactive Visual Task Learning for Robots | Dec 20, 2023 | Continual LearningNovel Concepts | —Unverified | 0 |
| Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering | Dec 20, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering | Dec 19, 2023 | Image RetrievalQuestion Answering | CodeCode Available | 0 |
| OsmLocator: locating overlapping scatter marks with a non-training generative perspective | Dec 18, 2023 | ClusteringCombinatorial Optimization | CodeCode Available | 0 |
| CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update | Dec 18, 2023 | Continual LearningQuestion Answering | —Unverified | 0 |
| Silkie: Preference Distillation for Large Visual Language Models | Dec 17, 2023 | HallucinationMME | —Unverified | 0 |
| p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models | Dec 17, 2023 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| An Evaluation of GPT-4V and Gemini in Online VQA | Dec 17, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Advancing Surgical VQA with Scene Graph Knowledge | Dec 15, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering | Dec 13, 2023 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment | Dec 12, 2023 | image-classificationImage Classification | —Unverified | 0 |
| Image Content Generation with Causal Reasoning | Dec 12, 2023 | Image GenerationQuestion Answering | CodeCode Available | 0 |
| Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models | Dec 9, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| CLAMP: Contrastive LAnguage Model Prompt-tuning | Dec 4, 2023 | Contrastive LearningImage Captioning | —Unverified | 0 |
| MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation | Dec 4, 2023 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario | Dec 4, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Merlin:Empowering Multimodal LLMs with Foresight Minds | Nov 30, 2023 | Visual Question Answering | —Unverified | 0 |
| Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering | Nov 29, 2023 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback | Nov 29, 2023 | Image GenerationQuestion Answering | —Unverified | 0 |
| The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation | Nov 28, 2023 | DiversityQuestion Answering | —Unverified | 0 |
| Fully Authentic Visual Question Answering Dataset from Online Communities | Nov 27, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models | Nov 23, 2023 | Language ModellingLarge Language Model | —Unverified | 0 |
| ShareGPT4V: Improving Large Multi-Modal Models with Better Captions | Nov 21, 2023 | DescriptiveMME | CodeCode Available | 0 |
| Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions | Nov 20, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts | Nov 15, 2023 | Question AnsweringSentence | CodeCode Available | 0 |