Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery May 19, 2023 Answer Generation object-detection
Code Code Available 1Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 1Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature May 18, 2023 Question Answering Visual Question Answering
— Unverified 0ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities May 18, 2023 1 Image, 2*2 Stitchi Action Classification
Code Code Available 3MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts May 18, 2023 Medical Visual Question Answering Question Answering
Code Code Available 1An Empirical Study on the Language Modal in Visual Question Answering May 17, 2023 Question Answering Visual Question Answering
— Unverified 0PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering May 17, 2023 Benchmarking Diagnostic
Code Code Available 1TG-VQA: Ternary Game of Video Question Answering May 17, 2023 Contrastive Learning Question Answering
— Unverified 0A Novel Stochastic LSTM Model Inspired by Quantum Machine Learning May 17, 2023 Quantum Machine Learning Visual Question Answering (VQA)
— Unverified 0Light-VQA: A Multi-Dimensional Quality Assessment Model for Low-Light Video Enhancement May 16, 2023 Video Enhancement Video Quality Assessment
Code Code Available 1SB-VQA: A Stack-Based Video Quality Assessment Framework for Video Enhancement May 15, 2023 Video Enhancement Video Quality Assessment
— Unverified 0OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models May 13, 2023 Key Information Extraction Nutrition
Code Code Available 2InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning May 11, 2023 1 Image, 2*2 Stitching Diversity
Code Code Available 2Combo of Thinking and Observing for Outside-Knowledge VQA May 10, 2023 Decoder Question Answering
Code Code Available 1OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese May 7, 2023 Information Retrieval Question Answering
Code Code Available 0Adaptive loose optimization for robust question answering May 6, 2023 Extractive Question-Answering Machine Reading Comprehension
Code Code Available 0Otter: A Multi-Modal Model with In-Context Instruction Tuning May 5, 2023 GPU In-Context Learning
Code Code Available 4Analysis of Visual Question Answering Algorithms with attention model May 4, 2023 Question Answering Visual Question Answering
— Unverified 0GAMIVAL: Video Quality Prediction on Mobile Cloud Gaming Content May 3, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Visual Reasoning: from State to Transformation May 2, 2023 Visual Question Answering (VQA) Visual Reasoning
Code Code Available 1An Empirical Study of Multimodal Model Merging Apr 28, 2023 model Retrieval
Code Code Available 1LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Apr 28, 2023 Instruction Following model
Code Code Available 5Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment Apr 28, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1An Empirical Comparison of Optimizers for Quantum Machine Learning with SPSA-based Gradients Apr 27, 2023 Quantum Machine Learning Visual Question Answering (VQA)
— Unverified 0mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Apr 27, 2023 Visual Question Answering (VQA) Zero-Shot Video Question Answer
Code Code Available 4A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering Apr 26, 2023 Decoder Knowledge Distillation
Code Code Available 1Making Video Quality Assessment Models Robust to Bit Depth Apr 25, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos Apr 25, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models Apr 20, 2023 Image Description Language Modelling
Code Code Available 7SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery Apr 19, 2023 Question Answering Scene Segmentation
Code Code Available 1Learning Situation Hyper-Graphs for Video Question Answering Apr 18, 2023 Decoder Question Answering
Code Code Available 1VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Apr 17, 2023 Audio captioning Audio-Video Question Answering (AVQA)
Code Code Available 2Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method Apr 14, 2023 Video Compression Video Quality Assessment
Code Code Available 1PDFVQA: A New Dataset for Real-World VQA on PDF Documents Apr 13, 2023 document understanding Key Information Extraction
— Unverified 0Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment Apr 13, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1CAVL: Learning Contrastive and Adaptive Representations of Vision and Language Apr 10, 2023 Image Retrieval Phrase Grounding
— Unverified 0Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions Apr 6, 2023 In-Context Learning Question Answering
— Unverified 0Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder Apr 4, 2023 Classification Decoder
— Unverified 0Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA Apr 4, 2023 Answer Generation Language Modelling
— Unverified 0SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering Apr 4, 2023 counterfactual Metric Learning
— Unverified 0Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space Apr 2, 2023 Question Answering Visual Question Answering
— Unverified 0MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks Mar 29, 2023 Cross-Modal Retrieval Decoder
Code Code Available 0LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention Mar 28, 2023 Instruction Following Language Modelling
Code Code Available 5Unmasked Teacher: Towards Training-Efficient Video Foundation Models Mar 28, 2023 Action Classification Action Recognition
Code Code Available 0Curriculum Learning for Compositional Visual Reasoning Mar 27, 2023 Question Answering Visual Question Answering
— Unverified 0MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos Mar 27, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning Mar 25, 2023 Contrastive Learning Question Answering
Code Code Available 1MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models Mar 23, 2023 Auxiliary Learning Multimodal Sentiment Analysis
Code Code Available 1Top-Down Visual Attention from Analysis by Synthesis Mar 23, 2023 Retrieval Semantic Segmentation
Code Code Available 1CoBIT: A Contrastive Bi-directional Image-Text Generation Model Mar 23, 2023 Decoder Image Generation
— Unverified 0