Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Jan 31, 2024 Question Answering Visual Question Answering (VQA)
— Unverified 0Common Sense Reasoning for Deepfake Detection Jan 31, 2024 Binary Classification Common Sense Reasoning
Code Code Available 3Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis Jan 31, 2024 Multi-Task Learning Question Answering
Code Code Available 0LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering Jan 29, 2024 Language Modeling Language Modelling
— Unverified 0Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA Jan 29, 2024 Benchmarking Image Comprehension
— Unverified 0Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning Jan 28, 2024 Data Augmentation Question Answering
— Unverified 0Free Form Medical Visual Question Answering in Radiology Jan 23, 2024 Diagnostic Form
— Unverified 0SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Jan 22, 2024 Question Answering Spatial Reasoning
— Unverified 0Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge Jan 19, 2024 Question Answering Question Generation
Code Code Available 1Veagle: Advancements in Multimodal Representation Learning Jan 18, 2024 Image Captioning Language Modelling
Code Code Available 1Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation Jan 18, 2024 Contrastive Learning Prompt Engineering
Code Code Available 1Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation Jan 18, 2024 Caption Generation Language Modeling
— Unverified 0COCO is "ALL'' You Need for Visual Instruction Fine-tuning Jan 17, 2024 All Image Captioning
— Unverified 0Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine Strategy Jan 16, 2024 Image Quality Assessment Video Quality Assessment
— Unverified 0Uncovering the Full Potential of Visual Grounding Methods in VQA Jan 15, 2024 Question Answering Visual Grounding
Code Code Available 0BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining Jan 12, 2024 Question Answering Visual Question Answering
— Unverified 0Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model Jan 12, 2024 Language Modeling Language Modelling
Code Code Available 0Cross-modal Retrieval for Knowledge-based Visual Question Answering Jan 11, 2024 Cross-Modal Retrieval Question Answering
Code Code Available 1Hallucination Benchmark in Medical Visual Question Answering Jan 11, 2024 Hallucination Medical Visual Question Answering
Code Code Available 0MISS: A Generative Pretraining and Finetuning Approach for Med-VQA Jan 10, 2024 Medical Visual Question Answering Multi-Task Learning
Code Code Available 1GRAM: Global Reasoning for Multi-Page VQA Jan 7, 2024 Question Answering Visual Question Answering
— Unverified 03DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding Jan 6, 2024 Scene Understanding Visual Question Answering (VQA)
Code Code Available 1PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging Jan 5, 2024 Medical Report Generation Medical Visual Question Answering
Code Code Available 2Subjective and Objective Analysis of Indian Social Media Video Quality Jan 5, 2024 Mixture-of-Experts Visual Question Answering (VQA)
Code Code Available 0ArtQuest: Countering Hidden Language Biases in ArtVQA Jan 4, 2024 Question Answering Visual Question Answering
Code Code Available 0Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training Jan 4, 2024 Descriptive Image Captioning
Code Code Available 1Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles Jan 1, 2024 Question Answering Visual Question Answering
— Unverified 0Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action Jan 1, 2024 Image Generation Instruction Following
— Unverified 0Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA Jan 1, 2024 Chart Question Answering Data Augmentation
— Unverified 0DIEM: Decomposition-Integration Enhancing Multimodal Insights Jan 1, 2024 MM-Vet Question Answering
— Unverified 0Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems Jan 1, 2024 Question Answering Visual Question Answering
— Unverified 0Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels Dec 28, 2023 Aesthetics Quality Assessment Image Quality Assessment
Code Code Available 2Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification Dec 28, 2023 Attribute cross-modal alignment
— Unverified 0TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Dec 28, 2023 Computational Efficiency Image Captioning
Code Code Available 3Gemini Pro Defeated by GPT-4V: Evidence from Education Dec 27, 2023 image-classification Image Classification
— Unverified 0Knowledge Guided Semi-Supervised Learning for Quality Assessment of User Generated Videos Dec 24, 2023 Representation Learning Transfer Learning
Code Code Available 0Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models Dec 23, 2023 Image Quality Assessment Video Quality Assessment
— Unverified 0Towards a Unified Multimodal Reasoning Framework Dec 22, 2023 Multimodal Reasoning Multiple-choice
Code Code Available 0DriveLM: Driving with Graph Visual Question Answering Dec 21, 2023 Autonomous Driving Question Answering
Code Code Available 3LLM4VG: Large Language Models Evaluation for Video Grounding Dec 21, 2023 Image Captioning Video Grounding
— Unverified 0InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Dec 21, 2023 Image Retrieval Image-to-Text Retrieval
Code Code Available 1Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts Dec 21, 2023 Hallucination Question Answering
— Unverified 0Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA Dec 21, 2023 Contrastive Learning counterfactual
Code Code Available 1Object Attribute Matters in Visual Question Answering Dec 20, 2023 Attribute Graph Neural Network
Code Code Available 0Interactive Visual Task Learning for Robots Dec 20, 2023 Continual Learning Novel Concepts
— Unverified 0Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering Dec 20, 2023 Question Answering Visual Question Answering
— Unverified 0BloomVQA: Assessing Hierarchical Multi-modal Comprehension Dec 20, 2023 Data Augmentation Memorization
— Unverified 0Full-reference Video Quality Assessment for User Generated Content Transcoding Dec 19, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering Dec 19, 2023 Image Retrieval Question Answering
Code Code Available 0EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering Dec 19, 2023 Object Object Counting
Code Code Available 1