Knowledge Condensation and Reasoning for Knowledge-based VQA Mar 15, 2024 Question Answering Reading Comprehension
— Unverified 0Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning Mar 15, 2024 Hallucination Instruction Following
— Unverified 0Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models Mar 15, 2024 Few-Shot Image Classification image-classification
— Unverified 0Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering Mar 14, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0UniCode: Learning a Unified Codebook for Multimodal Large Language Models Mar 14, 2024 Quantization Visual Question Answering (VQA)
— Unverified 0OmniCount: Multi-label Object Counting with Semantic-Geometric Priors Mar 8, 2024 Object Object Counting
— Unverified 0SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM Mar 7, 2024 Question Answering Retrieval
— Unverified 0CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments Mar 5, 2024 Language Modelling Large Language Model
— Unverified 0Enhancing Generalization in Medical Visual Question Answering Tasks via Gradient-Guided Model Perturbation Mar 5, 2024 Data Augmentation Medical Visual Question Answering
— Unverified 0ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks Feb 27, 2024 Domain Generalization Image Captioning
— Unverified 0LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery Feb 26, 2024 Continual Learning Exemplar-Free
Code Code Available 0CommVQA: Situating Visual Question Answering in Communicative Contexts Feb 22, 2024 Question Answering Visual Question Answering
Code Code Available 0A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models Feb 17, 2024 Diagnostic Visual Question Answering (VQA)
— Unverified 0VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models Feb 16, 2024 Adversarial Robustness Language Modelling
— Unverified 0II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering Feb 16, 2024 Question Answering Triplet
Code Code Available 0PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter Feb 16, 2024 Language Modeling Language Modelling
— Unverified 0Prompt-based Personalized Federated Learning for Medical Visual Question Answering Feb 15, 2024 Federated Learning Medical Visual Question Answering
— Unverified 0LAPDoc: Layout-Aware Prompting for Documents Feb 15, 2024 document understanding Key Information Extraction
— Unverified 0Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays Feb 14, 2024 Language Modeling Language Modelling
Code Code Available 0Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks Feb 13, 2024 Language Modeling Language Modelling
— Unverified 0CIC: A Framework for Culturally-Aware Image Captioning Feb 8, 2024 Descriptive Image Captioning
— Unverified 0Convincing Rationales for Visual Question Answering Reasoning Feb 6, 2024 Question Answering Visual Question Answering
Code Code Available 0Curriculum reinforcement learning for quantum architecture search under hardware errors Feb 5, 2024 3D Architecture Computational Efficiency
— Unverified 0Knowledge Generation for Zero-shot Knowledge-based VQA Feb 4, 2024 Question Answering Visual Question Answering
Code Code Available 0Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis Jan 31, 2024 Multi-Task Learning Question Answering
Code Code Available 0Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Jan 31, 2024 Question Answering Visual Question Answering (VQA)
— Unverified 0Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA Jan 29, 2024 Benchmarking Image Comprehension
— Unverified 0LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering Jan 29, 2024 Language Modeling Language Modelling
— Unverified 0Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning Jan 28, 2024 Data Augmentation Question Answering
— Unverified 0Free Form Medical Visual Question Answering in Radiology Jan 23, 2024 Diagnostic Form
— Unverified 0SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Jan 22, 2024 Question Answering Spatial Reasoning
— Unverified 0Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation Jan 18, 2024 Caption Generation Language Modeling
— Unverified 0COCO is "ALL'' You Need for Visual Instruction Fine-tuning Jan 17, 2024 All Image Captioning
— Unverified 0Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine Strategy Jan 16, 2024 Image Quality Assessment Video Quality Assessment
— Unverified 0Uncovering the Full Potential of Visual Grounding Methods in VQA Jan 15, 2024 Question Answering Visual Grounding
Code Code Available 0BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining Jan 12, 2024 Question Answering Visual Question Answering
— Unverified 0Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model Jan 12, 2024 Language Modeling Language Modelling
Code Code Available 0Hallucination Benchmark in Medical Visual Question Answering Jan 11, 2024 Hallucination Medical Visual Question Answering
Code Code Available 0GRAM: Global Reasoning for Multi-Page VQA Jan 7, 2024 Question Answering Visual Question Answering
— Unverified 0Subjective and Objective Analysis of Indian Social Media Video Quality Jan 5, 2024 Mixture-of-Experts Visual Question Answering (VQA)
Code Code Available 0ArtQuest: Countering Hidden Language Biases in ArtVQA Jan 4, 2024 Question Answering Visual Question Answering
Code Code Available 0Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems Jan 1, 2024 Question Answering Visual Question Answering
— Unverified 0DIEM: Decomposition-Integration Enhancing Multimodal Insights Jan 1, 2024 MM-Vet Question Answering
— Unverified 0Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action Jan 1, 2024 Image Generation Instruction Following
— Unverified 0Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA Jan 1, 2024 Chart Question Answering Data Augmentation
— Unverified 0Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles Jan 1, 2024 Question Answering Visual Question Answering
— Unverified 0Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification Dec 28, 2023 Attribute cross-modal alignment
— Unverified 0Gemini Pro Defeated by GPT-4V: Evidence from Education Dec 27, 2023 image-classification Image Classification
— Unverified 0Knowledge Guided Semi-Supervised Learning for Quality Assessment of User Generated Videos Dec 24, 2023 Representation Learning Transfer Learning
Code Code Available 0Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models Dec 23, 2023 Image Quality Assessment Video Quality Assessment
— Unverified 0