SOTAVerified

Image to text

Papers

Showing 151200 of 246 papers

TitleStatusHype
Everything is a Video: Unifying Modalities through Next-Frame Prediction0
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation0
Faithful Chart Summarization with ChaTS-Pi0
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval0
From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings0
From Image to Text in Sentiment Analysis via Regression and Deep Learning0
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing0
GPC: Generative and General Pathology Image Classifier0
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks0
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training0
Hierarchical Gumbel Attention Network for Text-based Person Search0
HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels0
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation0
Illegible Text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks0
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models0
Image Captioners Sometimes Tell More Than Images They See0
Image Semantic Relation Generation0
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module0
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags0
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation0
Retrieval-Augmented Multimodal Language Modeling0
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning0
Revisiting DETR Pre-training for Object Detection0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization0
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization0
Robustifying Vision-Language Models via Dynamic Token Reweighting0
See then Tell: Enhancing Key Information Extraction with Vision Grounding0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs0
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation0
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing0
SLAN: Self-Locator Aided Network for Cross-Modal Understanding0
SLAN: Self-Locator Aided Network for Vision-Language Understanding0
SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification0
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution0
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment0
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image0
Synthesizing Novel Pairs of Image and Text0
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models0
TMCIR: Token Merge Benefits Composed Image Retrieval0
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP0
Towards a Visual-Language Foundation Model for Computational Pathology0
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering0
TrojVLM: Backdoor Attack Against Vision Language Models0
Turbo Learning for Captionbot and Drawingbot0
Two-stream Hierarchical Similarity Reasoning for Image-text Matching0
Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations0
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning0
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation0
Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling0
Show:102550
← PrevPage 4 of 5Next →

No leaderboard results yet.