SOTAVerified

Referring Expression Comprehension

Papers

Showing 5175 of 167 papers

TitleStatusHype
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneCode1
Multi-task Collaborative Network for Joint Referring Expression Comprehension and SegmentationCode1
Multi-task Visual Grounding with Coarse-to-Fine Consistency ConstraintsCode1
TransVG: End-to-End Visual Grounding with TransformersCode1
Described Object Detection: Liberating Object Detection with Flexible ExpressionsCode1
Learning to Evaluate Performance of Multi-modal Semantic LocalizationCode1
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
InstructDET: Diversifying Referring Object Detection with Generalized InstructionsCode1
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionCode1
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and GroundingCode1
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionCode1
An Open and Comprehensive Pipeline for Unified Object Grounding and DetectionCode1
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point CloudsCode1
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun DistillationCode1
TRAR: Routing the Attention Spans in Transformer for Visual Question AnsweringCode1
UNITER: UNiversal Image-TExt Representation LearningCode1
Language-Conditioned Graph Networks for Relational ReasoningCode0
Language-Conditioned Feature Pyramids for Visual Selection TasksCode0
Language Adaptive Weight Generation for Multi-task Visual GroundingCode0
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasksCode0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.