UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces Dec 25, 2023 Image Segmentation Object
Code Code Available 2Mask Grounding for Referring Image Segmentation Dec 19, 2023 cross-modal alignment Image Segmentation
Code Code Available 1GSVA: Generalized Segmentation via Multimodal Large Language Models Dec 15, 2023 Decoder Generalized Referring Expression Segmentation
Code Code Available 1General Object Foundation Model for Images and Videos at Scale Dec 14, 2023 Instance Segmentation Long-tail Video Object Segmentation
Code Code Available 3EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment Dec 13, 2023 Decoder Depth Estimation
Code Code Available 1Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation Dec 13, 2023 Descriptive Object
Code Code Available 1Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects Dec 8, 2023 Image Captioning object-detection
— Unverified 0Universal Segmentation at Arbitrary Granularity with Language Instruction Dec 4, 2023 Referring Expression Segmentation Segmentation
Code Code Available 2InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation Nov 30, 2023 Image Captioning Referring Expression
Code Code Available 0NExT-Chat: An LMM for Chat, Detection and Segmentation Nov 8, 2023 Referring Expression Referring Expression Segmentation
Code Code Available 2GLaMM: Pixel Grounding Large Multimodal Model Nov 6, 2023 Conversational Question Answering Image Captioning
Code Code Available 2Towards Omni-supervised Referring Expression Segmentation Nov 1, 2023 Referring Expression Referring Expression Segmentation
Code Code Available 0CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation Sep 17, 2023 Decoder Referring Expression
— Unverified 0Tracking Anything with Decoupled Video Segmentation Sep 7, 2023 Open-Vocabulary Video Segmentation Open-World Video Segmentation
Code Code Available 33D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation Aug 31, 2023 Navigate Referring Expression
Code Code Available 1Referring Image Segmentation Using Text Supervision Aug 28, 2023 Image Segmentation Object Localization
Code Code Available 1Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Aug 24, 2023 Chart Question Answering FS-MEVQA
Code Code Available 5EAVL: Explicitly Align Vision and Language for Referring Image Segmentation Aug 18, 2023 Image Segmentation Referring Expression Segmentation
— Unverified 0Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation Aug 8, 2023 Contrastive Learning Object
Code Code Available 0Spectrum-guided Multi-granularity Referring Video Object Segmentation Jul 25, 2023 Object Referring Expression Segmentation
Code Code Available 1Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation Jul 21, 2023 Decoder Image Segmentation
Code Code Available 1OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation Jul 18, 2023 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1Hierarchical Open-vocabulary Universal Image Segmentation Jul 3, 2023 Image Comprehension Image Segmentation
Code Code Available 2Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic Jun 27, 2023 Image Captioning Referring Expression Segmentation
Code Code Available 2WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation Jun 19, 2023 cross-modal alignment Image Segmentation
— Unverified 0Extending CLIP's Image-Text Alignment to Referring Image Segmentation Jun 14, 2023 Image Segmentation Referring Expression Segmentation
— Unverified 0LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation Jun 14, 2023 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1GRES: Generalized Referring Expression Segmentation Jun 1, 2023 Generalized Referring Expression Segmentation Referring Expression
Code Code Available 2SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation May 26, 2023 cross-modal alignment Object
Code Code Available 1Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation May 25, 2023 Object Referring Expression Segmentation
Code Code Available 1Advancing Referring Expression Segmentation Beyond Single Image May 21, 2023 Co-Salient Object Detection Object
Code Code Available 1Meta Compositional Referring Expression Segmentation Apr 10, 2023 Meta-Learning Referring Expression
— Unverified 0Zero-shot Referring Image Segmentation with Global-Local Context Features Mar 31, 2023 Image Segmentation Referring Expression
Code Code Available 1Universal Instance Perception as Object Discovery and Retrieval Mar 12, 2023 Described Object Detection Generalized Referring Expression Comprehension
Code Code Available 3Unleashing Text-to-Image Diffusion Models for Visual Perception Mar 3, 2023 Denoising Depth Estimation
Code Code Available 2PolyFormer: Referring Image Segmentation as Sequential Polygon Generation Feb 14, 2023 Decoder Image Segmentation
Code Code Available 1Segment Every Reference Object in Spatial and Temporal Spaces Jan 1, 2023 Image Segmentation Object
— Unverified 0Learning To Segment Every Referring Object Point by Point Jan 1, 2023 Object Referring Expression
Code Code Available 0Generalized Decoding for Pixel, Image, and Language Dec 21, 2022 Decoder Image Segmentation
Code Code Available 3Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning Dec 17, 2022 Position Referring Expression
— Unverified 0A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation Nov 15, 2022 Reference Expression Generation Referring Expression
— Unverified 0VLT: Vision-Language Transformer and Query Generation for Referring Segmentation Oct 28, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 2Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos Sep 21, 2022 Action Detection Action Recognition
Code Code Available 0Multi-Attention Network for Compressed Video Referring Object Segmentation Jul 26, 2022 Object Referring Expression Segmentation
Code Code Available 1Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus Jul 4, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 4Weakly-supervised segmentation of referring expressions May 10, 2022 Image Segmentation Referring Expression
— Unverified 0Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation Apr 6, 2022 Optical Flow Estimation Referring Expression Segmentation
Code Code Available 1ReSTR: Convolution-free Referring Image Segmentation Using Transformers Mar 31, 2022 Image Segmentation Referring Expression Segmentation
— Unverified 0SeqTR: A Simple yet Universal Network for Visual Grounding Mar 30, 2022 Decoder Referring Expression
Code Code Available 1