SOTAVerified

Image to text

Papers

Showing 101150 of 246 papers

TitleStatusHype
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language ModelsCode0
Adaptively Clustering Neighbor Elements for Image-Text GenerationCode0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
Survey on Abstractive Text Summarization: Dataset, Models, and MetricsCode0
Multi-LLM Collaborative Caption Generation in Scientific DocumentsCode0
UniMoCo: Unified Modality Completion for Robust Multi-Modal EmbeddingsCode0
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image RetrievalCode0
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization0
Robustifying Vision-Language Models via Dynamic Token Reweighting0
See then Tell: Enhancing Key Information Extraction with Vision Grounding0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs0
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation0
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing0
SLAN: Self-Locator Aided Network for Cross-Modal Understanding0
SLAN: Self-Locator Aided Network for Vision-Language Understanding0
SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification0
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution0
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment0
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image0
Synthesizing Novel Pairs of Image and Text0
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models0
TMCIR: Token Merge Benefits Composed Image Retrieval0
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP0
Towards a Visual-Language Foundation Model for Computational Pathology0
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering0
TrojVLM: Backdoor Attack Against Vision Language Models0
Turbo Learning for Captionbot and Drawingbot0
Two-stream Hierarchical Similarity Reasoning for Image-text Matching0
Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations0
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning0
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation0
Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling0
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages0
Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation0
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models0
X-Fusion: Introducing New Modality to Frozen Large Language Models0
15M Multimodal Facial Image-Text Dataset0
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning0
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution0
ABC: Achieving Better Control of Multimodal Embeddings using VLMs0
Accept the Modality Gap: An Exploration in the Hyperbolic Space0
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training0
AICoderEval: Improving AI Domain Code Generation of Large Language Models0
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method0
An End-to-End Neural Network for Image-to-Audio Transformation0
An Online Learning Approach to Prompt-based Selection of Generative Models0
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models0
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering0
Show:102550
← PrevPage 3 of 5Next →

No leaderboard results yet.