Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks Jul 30, 2024 Visual Question Answering (VQA)
Code Code Available 0Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering Jul 30, 2024 Code Generation Question Answering
— Unverified 0Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy Jul 30, 2024 4k Video Quality Assessment
— Unverified 0Take A Step Back: Rethinking the Two Stages in Visual Reasoning Jul 29, 2024 Logical Reasoning Question Answering
— Unverified 0Multi-label Cluster Discrimination for Visual Representation Learning Jul 24, 2024 Contrastive Learning Image-text Retrieval
Code Code Available 4Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos Jul 23, 2024 Image Generation Point Tracking
Code Code Available 2Improved Few-Shot Image Classification Through Multiple-Choice Questions Jul 23, 2024 Articles Few-Shot Image Classification
— Unverified 0Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models Jul 22, 2024 Question Answering Representation Learning
— Unverified 0QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View Jul 18, 2024 Action Anticipation Action Recognition
Code Code Available 0Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark Jul 18, 2024 GPU Image Retrieval
Code Code Available 1Multimodal Reranking for Knowledge-Intensive Visual Question Answering Jul 17, 2024 Answer Generation Question Answering
— Unverified 0ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Jul 17, 2024 Question Answering Visual Question Answering
— Unverified 0EchoSight: Advancing Visual-Language Models with Wiki Knowledge Jul 17, 2024 Articles Question Answering
— Unverified 0ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment Jul 16, 2024 Optical Flow Estimation Video Compression
Code Code Available 1TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering Jul 16, 2024 Medical Visual Question Answering Question Answering
— Unverified 0SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Jul 12, 2024 Articles Question Answering
Code Code Available 2Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images Jul 11, 2024 Question Answering Segmentation
— Unverified 0Extracting Training Data from Document-Based VQA Models Jul 11, 2024 Memorization Question Answering
— Unverified 0VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving Jul 9, 2024 Autonomous Driving Image to 3D
— Unverified 0Large Language Models Understand Layout Jul 8, 2024 Question Answering Visual Question Answering
Code Code Available 0WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering Jul 8, 2024 Diagnostic Generative Visual Question Answering
Code Code Available 2CLIPVQA:Video Quality Assessment via CLIP Jul 6, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding Jul 6, 2024 Optical Character Recognition (OCR) Visual Question Answering (VQA)
Code Code Available 1RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models Jul 6, 2024 Medical Diagnosis RAG
Code Code Available 2Black-box Model Ensembling for Textual and Visual Question Answering via Information Fusion Jul 4, 2024 Question Answering Visual Question Answering
Code Code Available 0MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis Jul 4, 2024 Diagnostic Language Modeling
Code Code Available 2Visual Robustness Benchmark for Visual Question Answering (VQA) Jul 3, 2024 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis Jul 3, 2024 Position Question Answering
— Unverified 0BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs Jul 3, 2024 Image Captioning Image Generation
— Unverified 0D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions Jul 2, 2024 Diagnostic Instruction Following
— Unverified 0A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding Jul 2, 2024 document understanding Key Information Extraction
Code Code Available 2Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness Jul 2, 2024 Image Captioning Question Answering
— Unverified 0https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 0μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Jul 1, 2024 Cell Detection Classification
Code Code Available 0Hierarchical Memory for Long Video QA Jun 30, 2024 GPU Question Answering
— Unverified 0Tarsier: Recipes for Training and Evaluating Large Video Description Models Jun 30, 2024 Video Captioning Video Description
Code Code Available 4From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis Jun 28, 2024 Visual Question Answering (VQA) Visual Reasoning
Code Code Available 1STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering Jun 28, 2024 Medical Diagnosis Medical Question Answering
Code Code Available 1SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs Jun 28, 2024 RAG Retrieval-augmented Generation
— Unverified 0Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA Jun 27, 2024 General Knowledge Question Answering
— Unverified 0Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation Jun 27, 2024 Continual Learning Question Answering
Code Code Available 0RAVEN: Multitask Retrieval Augmented Vision-Language Learning Jun 27, 2024 Image Captioning RAG
— Unverified 0HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Jun 27, 2024 Visual Question Answering (VQA)
Code Code Available 3On the Role of Visual Grounding in VQA Jun 26, 2024 Visual Grounding Visual Question Answering (VQA)
— Unverified 0MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs Jun 24, 2024 Question Answering Visual Question Answering
— Unverified 0Long Context Transfer from Language to Vision Jun 24, 2024 Language Modeling Language Modelling
Code Code Available 4Priorformer: A UGC-VQA Method with content and distortion priors Jun 24, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts Jun 24, 2024 Mathematical Reasoning Visual Question Answering (VQA)
— Unverified 0Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis Jun 21, 2024 Attribute Medical Visual Question Answering
— Unverified 0VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning Jun 20, 2024 Image Comprehension Question Answering
Code Code Available 0