SOTAVerified

Image to text

Papers

Showing 101125 of 246 papers

TitleStatusHype
Zero-shot Nuclei Detection via Visual-Language Pre-trained ModelsCode0
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language ModelsCode0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face DescriptionsCode0
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags0
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation0
Retrieval-Augmented Multimodal Language Modeling0
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning0
Revisiting DETR Pre-training for Object Detection0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization0
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization0
Robustifying Vision-Language Models via Dynamic Token Reweighting0
See then Tell: Enhancing Key Information Extraction with Vision Grounding0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs0
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation0
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing0
SLAN: Self-Locator Aided Network for Cross-Modal Understanding0
SLAN: Self-Locator Aided Network for Vision-Language Understanding0
SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification0
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution0
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment0
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image0
Synthesizing Novel Pairs of Image and Text0
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models0
Show:102550
← PrevPage 5 of 10Next →

No leaderboard results yet.