SOTAVerified

Referring Expression Comprehension

Papers

Showing 5175 of 167 papers

TitleStatusHype
Multi-branch Collaborative Learning Network for 3D Visual GroundingCode1
Multi-task Collaborative Network for Joint Referring Expression Comprehension and SegmentationCode1
Multi-task Visual Grounding with Coarse-to-Fine Consistency ConstraintsCode1
Referring Transformer: A One-step Approach to Multi-task Visual GroundingCode1
Learning to Evaluate Performance of Multi-modal Semantic LocalizationCode1
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
InstructDET: Diversifying Referring Object Detection with Generalized InstructionsCode1
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionCode1
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature AlignmentCode1
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and GroundingCode1
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionCode1
An Open and Comprehensive Pipeline for Unified Object Grounding and DetectionCode1
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point CloudsCode1
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
Talk2Car: Taking Control of Your Self-Driving CarCode1
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun DistillationCode1
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
Language-Conditioned Graph Networks for Relational ReasoningCode0
Language-Conditioned Feature Pyramids for Visual Selection TasksCode0
Language Adaptive Weight Generation for Multi-task Visual GroundingCode0
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
Scene-Text Oriented Reffering Expression ComprehensionCode0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.