| Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input | Jun 25, 2023 | DiversityImage-text Retrieval | —Unverified | 0 |
| Language Adaptive Weight Generation for Multi-task Visual Grounding | Jun 6, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| Referring Expression Comprehension Using Language Adaptive Inference | Jun 6, 2023 | object-detectionObject Detection | CodeCode Available | 0 |
| Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving | May 25, 2023 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension | Feb 17, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension | Jan 1, 2023 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Dynamic Inference With Grounding Based Vision and Language Models | Jan 1, 2023 | Language ModellingReferring Expression | —Unverified | 0 |
| RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension | Jan 1, 2023 | Imitation LearningPseudo Label | —Unverified | 0 |
| Scene-Text Oriented Reffering Expression Comprehension | Nov 4, 2022 | Object LocalizationReferring Expression | CodeCode Available | 0 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 |