| Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA | Feb 24, 2024 | 3D Question Answering (3D-QA)Question Answering | CodeCode Available | 1 |
| Uncertainty-Aware Evaluation for Vision-Language Models | Feb 22, 2024 | Conformal PredictionLanguage Modeling | CodeCode Available | 1 |
| Visual Hallucinations of Multi-modal Large Language Models | Feb 22, 2024 | DiversityHallucination | CodeCode Available | 1 |
| Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment | Feb 21, 2024 | Language ModellingQuestion Answering | CodeCode Available | 1 |
| Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models | Feb 16, 2024 | DiversityInstruction Following | CodeCode Available | 1 |
| Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy | Feb 11, 2024 | Language ModelingOpen Vocabulary Attribute Detection | CodeCode Available | 1 |
| Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations | Feb 10, 2024 | DiagnosticHallucination | CodeCode Available | 1 |
| Text-Guided Image Clustering | Feb 5, 2024 | ClusteringImage Captioning | CodeCode Available | 1 |
| Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge | Jan 19, 2024 | Question AnsweringQuestion Generation | CodeCode Available | 1 |
| Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation | Jan 18, 2024 | Contrastive LearningPrompt Engineering | CodeCode Available | 1 |