SOTAVerified

Referring Expression Comprehension

Papers

Showing 101150 of 167 papers

TitleStatusHype
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks0
RefCrowd: Grounding the Target in Crowd with Referring Expressions0
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneCode1
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language ModelsCode1
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension0
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression ComprehensionCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
FindIt: Generalized Localization with Natural Language Queries0
SeqTR: A Simple yet Universal Network for Visual GroundingCode1
Differentiated Relevances Embedding for Group-based Referring Expression Comprehension0
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning FrameworkCode0
Webly Supervised Concept Expansion for General Purpose Vision Models0
Lite-MDETR: A Lightweight Multi-Modal Detector0
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point CloudsCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension0
Evaluating and Improving Interactions with Hazy Oracles0
Towards Language-guided Visual Recognition via Dynamic ConvolutionsCode0
Referring Transformer: A One-step Approach to Multi-task Visual GroundingCode1
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic RepresentationCode0
Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention0
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
Playing Lottery Tickets with Vision and Language0
Understanding Synonymous Referring Expressions via Contrastive FeaturesCode0
TransVG: End-to-End Visual Grounding with TransformersCode1
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos0
Unifying Vision-and-Language Tasks via Text GenerationCode1
MDETR - Modulated Detection for End-to-End Multi-Modal UnderstandingCode2
TRAR: Routing the Attention Spans in Transformer for Visual Question AnsweringCode1
Language-Mediated, Object-Centric Representation Learning0
PPGN: Phrase-Guided Proposal Generation Network For Referring Expression Comprehension0
Modular Graph Attention Network for Complex Visual Relational Reasoning0
ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments0
Language-Conditioned Feature Pyramids for Visual Selection TasksCode0
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary0
Cosine meets Softmax: A tough-to-beat baseline for visual groundingCode0
AttnGrounder: Talking to Cars with AttentionCode0
Referring Expression Comprehension: A Survey of Methods and Datasets0
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph0
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
Give Me Something to Eat: Referring Expression Comprehension with Commonsense KnowledgeCode0
Leveraging Non-Specialists for Accurate and Time Efficient AMR Annotation0
Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding0
Multi-task Collaborative Network for Joint Referring Expression Comprehension and SegmentationCode1
MUTATT: Visual-Textual Mutual Guidance for Referring Expression Comprehension0
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension0
A Real-time Global Inference Network for One-stage Referring Expression ComprehensionCode0
UNITER: Learning UNiversal Image-TExt Representations0
UNITER: UNiversal Image-TExt Representation LearningCode1
Talk2Car: Taking Control of Your Self-Driving CarCode1
Show:102550
← PrevPage 3 of 4Next →

No leaderboard results yet.