Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding Nov 5, 2024 3D visual grounding Visual Grounding
— Unverified 0Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding Oct 31, 2024 Object Position
Code Code Available 0Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding Oct 31, 2024 parameter-efficient fine-tuning Visual Grounding
— Unverified 0Few-Shot Multimodal Explanation for Visual Question Answering Oct 28, 2024 Explainable artificial intelligence Explainable Artificial Intelligence (XAI)
Code Code Available 0Joint Top-Down and Bottom-Up Frameworks for 3D Visual Grounding Oct 21, 2024 3D visual grounding Object
— Unverified 0Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models Oct 21, 2024 Instruction Following object-detection
— Unverified 0Context-Infused Visual Grounding for Art Oct 16, 2024 object-detection Object Detection
Code Code Available 0MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs Oct 16, 2024 Visual Grounding
Code Code Available 0Learning to Ground VLMs without Forgetting Oct 14, 2024 Decoder Language Modelling
— Unverified 0Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics Oct 10, 2024 Visual Grounding
— Unverified 0GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance Oct 9, 2024 Visual Grounding
— Unverified 0Context-Aware Command Understanding for Tabletop Scenarios Oct 8, 2024 Decision Making Visual Grounding
— Unverified 0VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks Oct 7, 2024 Information Retrieval Language Modeling
— Unverified 0Adaptive Masking Enhances Visual Grounding Oct 4, 2024 Few-Shot Learning Visual Grounding
Code Code Available 0World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering Sep 30, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0Individuation in Neural Models with and without Visual Grounding Sep 27, 2024 Visual Grounding
— Unverified 0ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue Sep 26, 2024 Medical Visual Question Answering Question Answering
— Unverified 0HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models Sep 16, 2024 Attribute Decoder
Code Code Available 0Bayesian Self-Training for Semi-Supervised 3D Segmentation Sep 12, 2024 3D Instance Segmentation 3D Semantic Segmentation
— Unverified 0Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Sep 9, 2024 Language Modeling Language Modelling
Code Code Available 0Visual Prompting in Multimodal Large Language Models: A Survey Sep 5, 2024 In-Context Learning Prompt Learning
— Unverified 0NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar Aug 30, 2024 Autonomous Driving Visual Grounding
— Unverified 0ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding Aug 29, 2024 Data Augmentation Image Generation
Code Code Available 0M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation Aug 29, 2024 Instruction Following Medical Report Generation
— Unverified 0MMR: Evaluating Reading Ability of Large Multimodal Models Aug 26, 2024 Font Recognition MMR total
— Unverified 0Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models Aug 15, 2024 Pose Estimation Visual Grounding
— Unverified 0Task-oriented Sequential Grounding in 3D Scenes Aug 7, 2024 3D visual grounding Visual Grounding
— Unverified 0UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models Jul 25, 2024 Computational Efficiency Question Answering
— Unverified 0Unveiling and Mitigating Bias in Audio Visual Segmentation Jul 23, 2024 Attribute Visual Grounding
— Unverified 0PD-APE: A Parallel Decoding Framework with Adaptive Position Encoding for 3D Visual Grounding Jul 19, 2024 3D visual grounding Attribute
— Unverified 0Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models Jul 18, 2024 3D Semantic Segmentation Semantic Segmentation
— Unverified 0Learning Visual Grounding from Generative Vision and Language Model Jul 18, 2024 Attribute Language Modeling
— Unverified 0VIMI: Grounding Video Generation through Multi-modal Instruction Jul 8, 2024 Text-to-Video Generation Video Generation
— Unverified 0Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model Jul 7, 2024 Segmentation Sentence
Code Code Available 0Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition Jul 5, 2024 Visual Grounding Visual Storytelling
Code Code Available 0Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge Jul 5, 2024 Cross-Modal Retrieval Question Answering
— Unverified 0Smart Vision-Language Reasoners Jul 5, 2024 Math Mathematical Reasoning
Code Code Available 0ACTRESS: Active Retraining for Semi-supervised Visual Grounding Jul 3, 2024 Binary Classification Visual Grounding
— Unverified 0Visual Grounding with Attention-Driven Constraint Balancing Jul 3, 2024 Object object-detection
— Unverified 0The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA Jul 2, 2024 Grounded Video Question Answering Object Tracking
— Unverified 0ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities Jul 1, 2024 3D visual grounding Language Modeling
— Unverified 0From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models Jun 28, 2024 Diversity Retrieval
— Unverified 0FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts Jun 27, 2024 Decision Making Logical Reasoning
— Unverified 0Towards Open-World Grasping with Large Vision-Language Models Jun 26, 2024 Robotic Grasping Visual Grounding
— Unverified 0On the Role of Visual Grounding in VQA Jun 26, 2024 Visual Grounding Visual Question Answering (VQA)
— Unverified 0Visually Consistent Hierarchical Image Classification Jun 17, 2024 Classification image-classification
— Unverified 0Learning Language Structures through Grounding Jun 14, 2024 Automatic Speech Recognition Dependency Parsing
— Unverified 0Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding Jun 13, 2024 3D visual grounding Attribute
— Unverified 0HPE-CogVLM: Advancing Vision Language Models with a Head Pose Grounding Task Jun 4, 2024 Head Pose Estimation Language Modelling
— Unverified 0HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model Jun 1, 2024 Action Recognition Activity Recognition
— Unverified 0