SOTAVerified

Image to text

Papers

Showing 101125 of 246 papers

TitleStatusHype
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution0
ABC: Achieving Better Control of Multimodal Embeddings using VLMs0
Accept the Modality Gap: An Exploration in the Hyperbolic Space0
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training0
AICoderEval: Improving AI Domain Code Generation of Large Language Models0
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method0
An End-to-End Neural Network for Image-to-Audio Transformation0
An Online Learning Approach to Prompt-based Selection of Generative Models0
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models0
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering0
Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition0
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models0
Backdooring Vision-Language Models with Out-Of-Distribution Data0
Better Text Understanding Through Image-To-Text Transfer0
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics0
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation0
BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification0
BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval0
BRIT: Bidirectional Retrieval over Unified Image-Text Graph0
Canonical Correlation Analysis for Misaligned Satellite Image Change Detection0
CapText: Large Language Model-based Caption Generation From Image Context and Description0
Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models0
ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering0
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval0
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?0
Show:102550
← PrevPage 5 of 10Next →

No leaderboard results yet.