Fine-Grained Semantically Aligned Vision-Language Pre-Training Aug 4, 2022 cross-modal alignment object-detection
Code Code Available 1SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding Jul 27, 2022 Visual Grounding
Code Code Available 0Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases Jul 5, 2022 Object Representation Learning
— Unverified 0RoViST: Learning Robust Metrics for Visual Storytelling Jul 1, 2022 Sentence Text Generation
Code Code Available 0How direct is the link between words and images? Jun 30, 2022 Visual Grounding Word Embeddings
— Unverified 0Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations Jun 30, 2022 Language Modeling Language Modelling
Code Code Available 1Tell Me the Evidence? Dual Visual-Linguistic Interaction for Answer Grounding Jun 21, 2022 Decoder Question Answering
— Unverified 0Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution Jun 18, 2022 Visual Grounding
— Unverified 0Language with Vision: a Study on Grounded Word and Sentence Embeddings Jun 17, 2022 Sentence Sentence Embeddings
Code Code Available 0MixGen: A New Multi-Modal Data Augmentation Jun 16, 2022 Data Augmentation Image-text Retrieval
Code Code Available 1TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer Jun 14, 2022 Visual Grounding
Code Code Available 1Referring Image Matting Jun 10, 2022 Domain Generalization Image Matting
Code Code Available 2Guiding Visual Question Answering with Attention Priors May 25, 2022 Question Answering Visual Grounding
— Unverified 0mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections May 24, 2022 Computational Efficiency cross-modal alignment
Code Code Available 1Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution May 24, 2022 Domain Adaptation Visual Grounding
— Unverified 0Weakly-supervised segmentation of referring expressions May 10, 2022 Image Segmentation Referring Expression
— Unverified 0RoViST:Learning Robust Metrics for Visual Storytelling May 8, 2022 Sentence Text Generation
Code Code Available 0Flexible Visual Grounding May 1, 2022 Articles Visual Grounding
Code Code Available 0To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo May 1, 2022 Benchmarking Person-centric Visual Grounding
Code Code Available 0Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer May 1, 2022 Text Generation Visual Grounding
Code Code Available 0Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning Apr 30, 2022 Attribute Decoder
Code Code Available 13D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection Apr 13, 2022 3D visual grounding Visual Grounding
Code Code Available 1Multi-View Transformer for 3D Visual Grounding Apr 5, 2022 3D visual grounding Visual Grounding
Code Code Available 1FindIt: Generalized Localization with Natural Language Queries Mar 31, 2022 Natural Language Queries Object
— Unverified 0To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo Mar 30, 2022 Benchmarking Person-centric Visual Grounding
Code Code Available 0Collaborative Transformers for Grounded Situation Recognition Mar 30, 2022 Grounded Situation Recognition Image Classification
Code Code Available 1TubeDETR: Spatio-Temporal Video Grounding with Transformers Mar 30, 2022 Decoder Language-Based Temporal Localization
Code Code Available 1SeqTR: A Simple yet Universal Network for Visual Grounding Mar 30, 2022 Decoder Referring Expression
Code Code Available 1Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding Mar 29, 2022 Multimodal Reasoning Visual Grounding
Code Code Available 1Word Discovery in Visually Grounded, Self-Supervised Speech Models Mar 28, 2022 Clustering Segmentation
Code Code Available 1Local-Global Context Aware Transformer for Language-Guided Video Segmentation Mar 18, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding Mar 16, 2022 Language Modelling Natural Language Queries
Code Code Available 1REX: Reasoning-aware and Grounded Explanation Mar 11, 2022 Decision Making Explanation Generation
Code Code Available 1Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding Mar 10, 2022 Object Visual Grounding
— Unverified 0Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge Feb 21, 2022 Grounded language learning Image Retrieval
— Unverified 0Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling Feb 7, 2022 Language Modeling Language Modelling
Code Code Available 1OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework Feb 7, 2022 Image Captioning image-classification
Code Code Available 0Multi-Modal Dynamic Graph Transformer for Visual Grounding Jan 1, 2022 Visual Grounding
Code Code Available 13DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds Jan 1, 2022 3D dense captioning Attribute
— Unverified 0Deconfounded Visual Grounding Dec 31, 2021 Referring Expression Visual Grounding
Code Code Available 0RoViST: Learning Robust Metrics for Visual Storytelling Dec 17, 2021 Sentence Text Generation
— Unverified 0CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision Dec 14, 2021 Contrastive Learning Representation Learning
Code Code Available 1D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding Dec 2, 2021 3D dense captioning 3D visual grounding
— Unverified 0Less is More: Generating Grounded Navigation Instructions from Landmarks Nov 25, 2021 Decoder Instruction Following
— Unverified 0UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Nov 23, 2021 Image Captioning Image Description
Code Code Available 1Grounded Situation Recognition with Transformers Nov 19, 2021 Decoder Grounded Situation Recognition
Code Code Available 1Zero-Shot Visual Grounding of Referring Utterances in Dialogue Nov 16, 2021 Descriptive Visual Grounding
— Unverified 0Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Nov 16, 2021 Cross-Modal Retrieval Image Captioning
Code Code Available 1Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer Oct 16, 2021 Text Generation Visual Grounding
— Unverified 0Efficient Multi-Modal Embeddings from Structured Data Oct 6, 2021 Semantic Similarity Semantic Textual Similarity
— Unverified 0