TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding May 19, 2023 Sentence Visual Grounding
— Unverified 0Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding May 18, 2023 Contrastive Learning Object
— Unverified 0CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding May 15, 2023 Diversity Transfer Learning
Code Code Available 1Sample-Specific Debiasing for Better Image-Text Models Apr 25, 2023 Contrastive Learning Cross-Modal Retrieval
— Unverified 0Movie Box Office Prediction With Self-Supervised and Visually Grounded Pretraining Apr 20, 2023 Visual Grounding
— Unverified 0WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language Apr 12, 2023 3D visual grounding Autonomous Driving
Code Code Available 0ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance Mar 29, 2023 3D visual grounding Visual Grounding
Code Code Available 1ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding Mar 23, 2023 3D visual grounding Visual Grounding
Code Code Available 0Joint Visual Grounding and Tracking with Natural Language Specification Mar 21, 2023 Visual Grounding Visual Tracking
Code Code Available 1Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment Mar 14, 2023 Medical Image Analysis Phrase Grounding
— Unverified 0Parallel Vertex Diffusion for Unified Visual Grounding Mar 13, 2023 Visual Grounding
— Unverified 0Focusing On Targets For Improving Weakly Supervised Visual Grounding Feb 22, 2023 Dependency Parsing Object
— Unverified 0mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Feb 1, 2023 Action Classification Image Classification
Code Code Available 4Champion Solution for the WSDM2023 Toloka VQA Challenge Jan 22, 2023 Question Answering Visual Grounding
Code Code Available 3Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks Jan 12, 2023 Cross-Modal Retrieval Open-Ended Question Answering
Code Code Available 0ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding Jan 1, 2023 3D visual grounding Visual Grounding
— Unverified 0CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition Jan 1, 2023 Sign Language Recognition Visual Grounding
— Unverified 0Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding Jan 1, 2023 Descriptive Object
Code Code Available 1Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training Jan 1, 2023 3D dense captioning 3D visual grounding
Code Code Available 1Dynamic Inference With Grounding Based Vision and Language Models Jan 1, 2023 Language Modelling Referring Expression
— Unverified 0GAFNet: A Global Fourier Self Attention Based Novel Network for multi-modal downstream tasks Jan 1, 2023 Image Generation Image-text Retrieval
— Unverified 0Position-guided Text Prompt for Vision-Language Pre-training Dec 19, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 1Using Multiple Instance Learning to Build Multimodal Representations Dec 11, 2022 Contrastive Learning Cross-Modal Retrieval
— Unverified 0UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding Dec 1, 2022 3D dense captioning 3D visual grounding
— Unverified 0DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding Nov 28, 2022 object-detection Object Detection
Code Code Available 1MNER-QG: An End-to-End MRC framework for Multimodal Named Entity Recognition with Query Grounding Nov 27, 2022 named-entity-recognition Named Entity Recognition
— Unverified 0Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding Nov 25, 2022 3D visual grounding Knowledge Distillation
Code Code Available 1X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks Nov 22, 2022 All Cross-Modal Retrieval
Code Code Available 2A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 0YORO -- Lightweight End to End Visual Grounding Nov 15, 2022 Natural Language Queries Visual Grounding
Code Code Available 1Visually Grounded VQA by Lattice-based Retrieval Nov 15, 2022 Information Retrieval Question Answering
Code Code Available 0Are Current Decoding Strategies Capable of Facing the Challenges of Visual Dialogue? Oct 24, 2022 Informativeness Text Generation
— Unverified 0Instruction-Following Agents with Multimodal Transformer Oct 24, 2022 Instruction Following Visual Grounding
Code Code Available 1RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data Oct 23, 2022 Image Captioning Image-text Retrieval
— Unverified 0A Visual Tour Of Current Challenges In Multimodal Language Models Oct 22, 2022 Image Generation Text to Image Generation
— Unverified 0Learning Point-Language Hierarchical Alignment for 3D Visual Grounding Oct 22, 2022 3D visual grounding Sentence
Code Code Available 1Vision-Language Pre-training: Basics, Recent Advances, and Future Trends Oct 17, 2022 Few-Shot Learning Image Captioning
Code Code Available 3Like a bilingual baby: The advantage of visually grounding a bilingual language model Oct 11, 2022 Language Modeling Language Modelling
— Unverified 0YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding Oct 10, 2022 Visual Grounding
— Unverified 0MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning Oct 9, 2022 Image-text Retrieval multimodal interaction
— Unverified 0Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Oct 3, 2022 Referring Expression Robot Manipulation
Code Code Available 0Cost-Effective Language Driven Image Editing with LX-DRIM Oct 1, 2022 Visual Grounding
Code Code Available 0GRAVL-BERT: Graphical Visual-Linguistic Representations for Multimodal Coreference Resolution Oct 1, 2022 coreference-resolution Coreference Resolution
Code Code Available 1Differentiable Parsing and Visual Grounding of Natural Language Instructions for Object Placement Oct 1, 2022 Graph Neural Network Object
— Unverified 0EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding Sep 29, 2022 3D visual grounding Object
Code Code Available 1Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding Sep 28, 2022 Decoder Visual Grounding
— Unverified 0Introspective Learning : A Two-Stage Approach for Inference in Neural Networks Sep 17, 2022 Active Learning Decision Making
Code Code Available 0Visual Grounding of Inter-lingual Word-Embeddings Sep 8, 2022 Visual Grounding Word Embeddings
— Unverified 0Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1VLMAE: Vision-Language Masked Autoencoder Aug 19, 2022 Image-text Retrieval Language Modeling
— Unverified 0