SOTAVerified

Image to text

Papers

Showing 201225 of 246 papers

TitleStatusHype
Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations0
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval0
Characterizing and Understanding the Behavior of Quantized Models for Reliable DeploymentCode0
Two-stream Hierarchical Similarity Reasoning for Image-text Matching0
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering0
EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval0
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering0
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language GenerationCode1
Distilled Dual-Encoder Model for Vision-Language UnderstandingCode1
Self-Supervised Image-to-Text and Text-to-Image SynthesisCode0
Exploration into Translation-Equivariant Image QuantizationCode0
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic ArithmeticCode1
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages0
L-Verse: Bidirectional Generation Between Image and TextCode1
Unifying Multimodal Transformer for Bi-directional Image and Text GenerationCode1
Contrastive Learning of Visual-Semantic Embeddings0
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval0
Concadia: Towards Image-Based Text Generation with a PurposeCode1
Knowledge driven Description Synthesis for Floor Plan Interpretation0
Progressive Transformer-Based Generation of Radiology ReportsCode1
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report GenerationCode1
Hierarchical Gumbel Attention Network for Text-based Person Search0
Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation0
Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese0
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications0
Show:102550
← PrevPage 9 of 10Next →

No leaderboard results yet.