SOTAVerified

Image to text

Papers

Showing 161170 of 246 papers

TitleStatusHype
Is Cross-modal Information Retrieval Possible without Training?0
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models0
CoBIT: A Contrastive Bi-directional Image-Text Generation Model0
MAGVLT: Masked Generative Vision-and-Language TransformerCode1
Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling0
One Transformer Fits All Distributions in Multi-Modal Diffusion at ScaleCode3
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language GenerationCode1
An End-to-End Neural Network for Image-to-Audio Transformation0
Towards Unifying Medical Vision-and-Language Pre-training via Soft PromptsCode1
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval0
Show:102550
← PrevPage 17 of 25Next →

No leaderboard results yet.