| Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | Mar 9, 2023 | DecoderObject Detection | CodeCode Available | 5 |
| CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension | Feb 17, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| Learning To Segment Every Referring Object Point by Point | Jan 1, 2023 | ObjectReferring Expression | CodeCode Available | 0 |
| RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension | Jan 1, 2023 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension | Jan 1, 2023 | Imitation LearningPseudo Label | —Unverified | 0 |
| Dynamic Inference With Grounding Based Vision and Language Models | Jan 1, 2023 | Language ModellingReferring Expression | —Unverified | 0 |
| Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning | Dec 17, 2022 | PositionReferring Expression | —Unverified | 0 |
| Layout-aware Dreamer for Embodied Referring Expression Grounding | Nov 30, 2022 | Common Sense ReasoningNavigate | CodeCode Available | 1 |
| A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation | Nov 15, 2022 | Reference Expression GenerationReferring Expression | —Unverified | 0 |
| Scene-Text Oriented Reffering Expression Comprehension | Nov 4, 2022 | Object LocalizationReferring Expression | CodeCode Available | 0 |
| TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation | Oct 19, 2022 | Instance SegmentationReferring Expression | CodeCode Available | 1 |
| SQA3D: Situated Question Answering in 3D Scenes | Oct 14, 2022 | Question AnsweringReferring Expression | CodeCode Available | 1 |
| Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset | Oct 10, 2022 | FormReferring Expression | —Unverified | 0 |
| VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment | Oct 9, 2022 | object-detectionObject Detection | CodeCode Available | 1 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach | Oct 3, 2022 | Referring ExpressionRobot Manipulation | CodeCode Available | 0 |
| Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos | Sep 21, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 |
| Learning to Evaluate Performance of Multi-modal Semantic Localization | Sep 14, 2022 | Cross-Modal RetrievalReferring Expression | CodeCode Available | 1 |
| One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning | Jul 31, 2022 | AllReferring Expression | —Unverified | 0 |
| Correspondence Matters for Video Referring Expression Comprehension | Jul 21, 2022 | Contrastive LearningReferring Expression | CodeCode Available | 1 |
| Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding | Jul 18, 2022 | AttributeReferring Expression | CodeCode Available | 0 |
| Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations | Jun 30, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks | Jun 17, 2022 | Depth EstimationImage Generation | —Unverified | 0 |
| RefCrowd: Grounding the Target in Crowd with Referring Expressions | Jun 16, 2022 | AttributeReferring Expression | —Unverified | 0 |
| Constructing Distributions of Variation in Referring Expression Type from Corpora for Model Evaluation | Jun 1, 2022 | Referring Expression | —Unverified | 0 |
| PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models | May 23, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Referring Expressions with Rational Speech Act Framework: A Probabilistic Approach | May 16, 2022 | Deep LearningReferring Expression | —Unverified | 0 |
| Weakly-supervised segmentation of referring expressions | May 10, 2022 | Image SegmentationReferring Expression | —Unverified | 0 |
| HOLM: Hallucinating Objects with Language Models for Referring Expression Recognition in Partially-Observed Scenes | May 1, 2022 | Referring Expression | —Unverified | 0 |
| GRIT: General Robust Image Task Benchmark | Apr 28, 2022 | Instance SegmentationKeypoint Detection | CodeCode Available | 1 |
| Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension | Apr 21, 2022 | DiversityInformativeness | —Unverified | 0 |
| A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension | Apr 17, 2022 | Data AugmentationReferring Expression | CodeCode Available | 1 |
| The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts | Apr 12, 2022 | Referring Expression | CodeCode Available | 1 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Apr 12, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| FindIt: Generalized Localization with Natural Language Queries | Mar 31, 2022 | Natural Language QueriesObject | —Unverified | 0 |
| SeqTR: A Simple yet Universal Network for Visual Grounding | Mar 30, 2022 | DecoderReferring Expression | CodeCode Available | 1 |
| Single-Stream Multi-Level Alignment for Vision-Language Pretraining | Mar 27, 2022 | Image-text RetrievalQuestion Answering | CodeCode Available | 0 |
| Non-neural Models Matter: A Re-evaluation of Neural Referring Expression Generation Systems | Mar 15, 2022 | BIG-bench Machine LearningReferring Expression | —Unverified | 0 |
| Differentiated Relevances Embedding for Group-based Referring Expression Comprehension | Mar 12, 2022 | AttributeObject | —Unverified | 0 |
| OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework | Feb 7, 2022 | Image Captioningimage-classification | CodeCode Available | 0 |
| Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching | Jan 18, 2022 | Image-text matchingReferring Expression | —Unverified | 0 |
| Lite-MDETR: A Lightweight Multi-Modal Detector | Jan 1, 2022 | object-detectionObject Detection | —Unverified | 0 |
| Deconfounded Visual Grounding | Dec 31, 2021 | Referring ExpressionVisual Grounding | CodeCode Available | 0 |
| Image Segmentation Using Text and Image Prompts | Dec 18, 2021 | DecoderImage Segmentation | CodeCode Available | 1 |
| LAVT: Language-Aware Vision Transformer for Referring Image Segmentation | Dec 4, 2021 | DecoderGeneralized Referring Expression Segmentation | CodeCode Available | 1 |
| Using Referring Expression Generation to Model Literary Style | Dec 1, 2021 | modelReferring Expression | —Unverified | 0 |
| Robust Visual Reasoning via Language Guided Neural Module Networks | Dec 1, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Nov 16, 2021 | image-classificationImage Classification | —Unverified | 0 |
| The Pipeline Model for Resolution of Anaphoric Reference and Resolution of Entity Reference | Nov 1, 2021 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| Evaluating and Improving Interactions with Hazy Oracles | Oct 19, 2021 | Object TrackingReferring Expression | —Unverified | 0 |