Fine-Grained Semantically Aligned Vision-Language Pre-Training Aug 4, 2022 cross-modal alignment object-detection
Code Code Available 1Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations Jun 30, 2022 Language Modeling Language Modelling
Code Code Available 1MixGen: A New Multi-Modal Data Augmentation Jun 16, 2022 Data Augmentation Image-text Retrieval
Code Code Available 1TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer Jun 14, 2022 Visual Grounding
Code Code Available 1mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections May 24, 2022 Computational Efficiency cross-modal alignment
Code Code Available 1Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning Apr 30, 2022 Attribute Decoder
Code Code Available 13D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection Apr 13, 2022 3D visual grounding Visual Grounding
Code Code Available 1Multi-View Transformer for 3D Visual Grounding Apr 5, 2022 3D visual grounding Visual Grounding
Code Code Available 1SeqTR: A Simple yet Universal Network for Visual Grounding Mar 30, 2022 Decoder Referring Expression
Code Code Available 1Collaborative Transformers for Grounded Situation Recognition Mar 30, 2022 Grounded Situation Recognition Image Classification
Code Code Available 1TubeDETR: Spatio-Temporal Video Grounding with Transformers Mar 30, 2022 Decoder Language-Based Temporal Localization
Code Code Available 1Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding Mar 29, 2022 Multimodal Reasoning Visual Grounding
Code Code Available 1Word Discovery in Visually Grounded, Self-Supervised Speech Models Mar 28, 2022 Clustering Segmentation
Code Code Available 1Local-Global Context Aware Transformer for Language-Guided Video Segmentation Mar 18, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding Mar 16, 2022 Language Modelling Natural Language Queries
Code Code Available 1REX: Reasoning-aware and Grounded Explanation Mar 11, 2022 Decision Making Explanation Generation
Code Code Available 1Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling Feb 7, 2022 Language Modeling Language Modelling
Code Code Available 1Multi-Modal Dynamic Graph Transformer for Visual Grounding Jan 1, 2022 Visual Grounding
Code Code Available 1CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision Dec 14, 2021 Contrastive Learning Representation Learning
Code Code Available 1UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Nov 23, 2021 Image Captioning Image Description
Code Code Available 1Grounded Situation Recognition with Transformers Nov 19, 2021 Decoder Grounded Situation Recognition
Code Code Available 1Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Nov 16, 2021 Cross-Modal Retrieval Image Captioning
Code Code Available 1CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models Sep 24, 2021 Visual Grounding
Code Code Available 1Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation Sep 17, 2021 Dialogue Generation Visual Grounding
Code Code Available 1Panoptic Narrative Grounding Sep 10, 2021 Natural Language Visual Grounding Panoptic Segmentation
Code Code Available 1VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer Jul 6, 2021 Image Retrieval Knowledge Distillation
Code Code Available 1Referring Transformer: A One-step Approach to Multi-task Visual Grounding Jun 6, 2021 Decoder Referring Expression
Code Code Available 1SAT: 2D Semantics Assisted Training for 3D Visual Grounding May 24, 2021 3D visual grounding Object
Code Code Available 1Connecting What to Say With Where to Look by Modeling Human Attention Traces May 12, 2021 Caption Generation Image Captioning
Code Code Available 1MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 1TransVG: End-to-End Visual Grounding with Transformers Apr 17, 2021 Referring Expression Comprehension Visual Grounding
Code Code Available 1Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding Apr 9, 2021 Descriptive Object
Code Code Available 1Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation Apr 5, 2021 Object Visual Grounding
Code Code Available 1Relation-aware Instance Refinement for Weakly Supervised Visual Grounding Mar 24, 2021 Object Relation
Code Code Available 1Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images Mar 14, 2021 3D visual grounding Object
Code Code Available 1OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene Grounding Mar 13, 2021 Referring Expression Referring Expression Segmentation
Code Code Available 1InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring Mar 1, 2021 3D visual grounding Attribute
Code Code Available 1Panoptic Narrative Grounding Jan 1, 2021 Natural Language Visual Grounding Panoptic Segmentation
Code Code Available 1Text-Free Image-to-Speech Synthesis Using Learned Segmental Units Dec 31, 2020 Image Captioning Speech Synthesis
Code Code Available 1Text-to-Image Generation Grounded by Fine-Grained User Attention Nov 7, 2020 Image Generation Position
Code Code Available 1X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers Sep 23, 2020 Image Captioning Image Generation
Code Code Available 1Improving One-stage Visual Grounding by Recursive Sub-query Construction Aug 3, 2020 Sentence Sentence Embedding
Code Code Available 1Spatially Aware Multimodal Transformers for TextVQA Jul 23, 2020 Optical Character Recognition (OCR) Spatial Reasoning
Code Code Available 1Visual Relation Grounding in Videos Jul 17, 2020 Question Answering Relation
Code Code Available 1Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation Jul 3, 2020 Contrastive Learning Knowledge Distillation
Code Code Available 1Visual Grounding of Learned Physical Models Apr 28, 2020 Visual Grounding
Code Code Available 1Deep Multimodal Neural Architecture Search Apr 25, 2020 Decoder Image-text matching
Code Code Available 1Visual Grounding Methods for VQA are Working for the Wrong Reasons! Apr 12, 2020 Question Answering Visual Grounding
Code Code Available 1Visual Grounding in Video for Unsupervised Word Translation Mar 11, 2020 Translation Visual Grounding
Code Code Available 1Guessing State Tracking for Visual Dialogue Feb 24, 2020 Visual Grounding
Code Code Available 1