SOTAVerified

Image to text

Papers

Showing 125 of 246 papers

TitleStatusHype
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration0
ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering0
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP0
BRIT: Bidirectional Retrieval over Unified Image-Text Graph0
Robustifying Vision-Language Models via Dynamic Token Reweighting0
UniMoCo: Unified Modality Completion for Robust Multi-Modal EmbeddingsCode0
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution0
X-Fusion: Introducing New Modality to Frozen Large Language Models0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs0
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation0
TMCIR: Token Merge Benefits Composed Image Retrieval0
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMsCode1
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer TextCode1
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module0
Natural Language Generation0
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR dataCode0
MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection0
FlowTok: Flowing Seamlessly Across Text and Image TokensCode5
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal CyclesCode1
ABC: Achieving Better Control of Multimodal Embeddings using VLMs0
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language modelsCode0
Natural Language Generation from Visual Sequences: Challenges and Future Directions0
Show:102550
← PrevPage 1 of 10Next →

No leaderboard results yet.