SOTAVerified

Referring Expression Comprehension

Papers

Showing 51100 of 167 papers

TitleStatusHype
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language ModelsCode1
PolyFormer: Referring Image Segmentation as Sequential Polygon GenerationCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DCode1
Referring Transformer: A One-step Approach to Multi-task Visual GroundingCode1
Talk2Car: Taking Control of Your Self-Driving CarCode1
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression ComprehensionCode1
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun DistillationCode1
TransVG: End-to-End Visual Grounding with TransformersCode1
TRAR: Routing the Attention Spans in Transformer for Visual Question AnsweringCode1
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
Unifying Vision-and-Language Tasks via Text GenerationCode1
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoECode1
UNITER: UNiversal Image-TExt Representation LearningCode1
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language TasksCode1
VL-BERT: Pre-training of Generic Visual-Linguistic RepresentationsCode1
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature AlignmentCode1
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression ComprehensionCode1
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
Towards Language-guided Visual Recognition via Dynamic ConvolutionsCode0
Continual Referring Expression Comprehension via Dual Modular MemorizationCode0
Revisiting Counterfactual Problems in Referring Expression ComprehensionCode0
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning FrameworkCode0
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring ExpressionsCode0
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
Scene-Text Oriented Reffering Expression ComprehensionCode0
Give Me Something to Eat: Referring Expression Comprehension with Commonsense KnowledgeCode0
MAttNet: Modular Attention Network for Referring Expression ComprehensionCode0
Cosine meets Softmax: A tough-to-beat baseline for visual groundingCode0
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal ModelsCode0
Griffon: Spelling out All Object Locations at Any Granularity with Large Language ModelsCode0
Adversarial Robustness for Visual Grounding of Multimodal Large Language ModelsCode0
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression ComprehensionCode0
A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection TrainingCode0
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasksCode0
A Joint Speaker-Listener-Reinforcer Model for Referring ExpressionsCode0
Whether you can locate or not? Interactive Referring Expression GenerationCode0
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language ModelsCode0
A Real-time Global Inference Network for One-stage Referring Expression ComprehensionCode0
Language Adaptive Weight Generation for Multi-task Visual GroundingCode0
Language-Conditioned Feature Pyramids for Visual Selection TasksCode0
Language-Conditioned Graph Networks for Relational ReasoningCode0
Understanding Synonymous Referring Expressions via Contrastive FeaturesCode0
Referring Expression Comprehension Using Language Adaptive InferenceCode0
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and SegmentationCode0
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic RepresentationCode0
Exploring Modulated Detection Transformer as a Tool for Action Recognition in VideosCode0
AttnGrounder: Talking to Cars with AttentionCode0
Natural Language Object RetrievalCode0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.