| Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection | Dec 4, 2023 | Image to textobject-detection | —Unverified | 0 |
| Continual Referring Expression Comprehension via Dual Modular Memorization | Nov 25, 2023 | MemorizationReferring Expression | CodeCode Available | 0 |
| Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models | Nov 24, 2023 | AllReferring Expression | —Unverified | 0 |
| Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models | Nov 21, 2023 | Image SegmentationLanguage Modelling | CodeCode Available | 0 |
| CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | Nov 6, 2023 | CoLAQuestion Answering | —Unverified | 0 |
| Video Referring Expression Comprehension via Transformer with Content-conditioned Query | Oct 25, 2023 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| Collecting Visually-Grounded Dialogue with A Game Of Sorts | Sep 10, 2023 | Coreference ResolutionImage Retrieval | CodeCode Available | 0 |
| HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks | Aug 24, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Whether you can locate or not? Interactive Referring Expression Generation | Aug 19, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks | Jul 14, 2023 | ObjectReferring Expression | —Unverified | 0 |
| Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input | Jun 25, 2023 | DiversityImage-text Retrieval | —Unverified | 0 |
| Language Adaptive Weight Generation for Multi-task Visual Grounding | Jun 6, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| Referring Expression Comprehension Using Language Adaptive Inference | Jun 6, 2023 | object-detectionObject Detection | CodeCode Available | 0 |
| Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving | May 25, 2023 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension | Feb 17, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension | Jan 1, 2023 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Dynamic Inference With Grounding Based Vision and Language Models | Jan 1, 2023 | Language ModellingReferring Expression | —Unverified | 0 |
| RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension | Jan 1, 2023 | Imitation LearningPseudo Label | —Unverified | 0 |
| Scene-Text Oriented Reffering Expression Comprehension | Nov 4, 2022 | Object LocalizationReferring Expression | CodeCode Available | 0 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos | Sep 21, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 |
| One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning | Jul 31, 2022 | AllReferring Expression | —Unverified | 0 |
| Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks | Jun 17, 2022 | Depth EstimationImage Generation | —Unverified | 0 |
| RefCrowd: Grounding the Target in Crowd with Referring Expressions | Jun 16, 2022 | AttributeReferring Expression | —Unverified | 0 |
| Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension | Apr 21, 2022 | DiversityInformativeness | —Unverified | 0 |