SOTAVerified

Image to text

Papers

Showing 51100 of 246 papers

TitleStatusHype
PRIOR: Prototype Representation Joint Learning from Medical Images and ReportsCode1
Multimodal Procedural Planning via Dual Text-Image PromptingCode1
Vision-Language Dataset DistillationCode1
Transferable Decoding with Visual Entities for Zero-Shot Image CaptioningCode1
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional ChangesCode1
Progressive Transformer-Based Generation of Radiology ReportsCode1
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer TextCode1
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal CyclesCode1
Bootstrapping Vision-Language Learning with Decoupled Language Pre-trainingCode1
L-Verse: Bidirectional Generation Between Image and TextCode1
Brain Captioning: Decoding human brain activity into images and textCode1
Distilled Dual-Encoder Model for Vision-Language UnderstandingCode1
Language-Oriented Semantic Latent Representation for Image TransmissionCode1
Can MLLMs Perform Text-to-Image In-Context Learning?Code1
Language Quantized AutoEncoders: Towards Unsupervised Text-Image AlignmentCode1
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report GenerationCode1
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?Code1
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language ModelsCode1
Linearly Mapping from Image to Text SpaceCode1
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning0
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding0
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models0
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models0
DiffusionSTR: Diffusion Model for Scene Text Recognition0
Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese0
Deductron -- A Recurrent Neural Network0
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation0
An Online Learning Approach to Prompt-based Selection of Generative Models0
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training0
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic0
Cross-modal Contrastive Attention Model for Medical Report Generation0
BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval0
Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation0
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval0
BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification0
An End-to-End Neural Network for Image-to-Audio Transformation0
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval0
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training0
Contrastive Learning of Visual-Semantic Embeddings0
GPC: Generative and General Pathology Image Classifier0
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation0
ABC: Achieving Better Control of Multimodal Embeddings using VLMs0
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation0
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration0
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics0
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing0
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution0
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks0
Image Semantic Relation Generation0
From Image to Text in Sentiment Analysis via Regression and Deep Learning0
Show:102550
← PrevPage 2 of 5Next →

No leaderboard results yet.