SOTAVerified

Image to text

Papers

Showing 126150 of 246 papers

TitleStatusHype
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval0
Benchmarking Vision-Language Contrastive Methods for Medical Representation LearningCode0
AICoderEval: Improving AI Domain Code Generation of Large Language Models0
Faithful Chart Summarization with ChaTS-Pi0
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning0
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report GenerationCode0
DOCCI: Descriptions of Connected and Contrasting Images0
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation0
Leveraging AI to Generate Audio for User-generated Content in Video Games0
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical AlterationsCode0
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection0
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation0
BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval0
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation0
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?0
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant0
Enhancing Vision-Language Pre-training with Rich Supervisions0
Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition0
Probing Multimodal Large Language Models for Global and Local Semantic RepresentationsCode0
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models0
Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models0
Dynamic Traceback Learning for Medical Report Generation0
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment0
Accept the Modality Gap: An Exploration in the Hyperbolic Space0
Show:102550
← PrevPage 6 of 10Next →

No leaderboard results yet.