Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 246 papers

Title	Date	Tasks	Status
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing	Nov 5, 2024	Change DetectionContrastive Learning	—Unverified
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization	Oct 30, 2024	Image to textImage-to-Text Retrieval	—Unverified
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)	Oct 25, 2024	AttributeImage to text	CodeCode Available
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics	Oct 24, 2024	Image to textImage-Variation	—Unverified
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image	Oct 20, 2024	Image to text	—Unverified
An Online Learning Approach to Prompt-based Selection of Generative Models	Oct 17, 2024	Image to text	—Unverified
Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models	Oct 7, 2024	Image to text	—Unverified
Backdooring Vision-Language Models with Out-Of-Distribution Data	Oct 2, 2024	Image CaptioningImage to text	—Unverified
See then Tell: Enhancing Key Information Extraction with Vision Grounding	Sep 29, 2024	Image to textKey Information Extraction	—Unverified
TrojVLM: Backdoor Attack Against Vision Language Models	Sep 28, 2024	Backdoor AttackImage Captioning	—Unverified
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization	Sep 26, 2024	Image to textImage-to-Text Retrieval	—Unverified
Evaluating authenticity and quality of image captions via sentiment and semantic analyses	Sep 14, 2024	Image CaptioningImage to text	—Unverified
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models	Aug 16, 2024	Image to text	—Unverified
Instruction Tuning-free Visual Token Complement for Multimodal LLMs	Aug 9, 2024	Image GenerationImage to text	—Unverified
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models	Jul 30, 2024	Image to textImage-to-Text Retrieval	CodeCode Available
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Jul 25, 2024	Image to textLanguage Modeling	—Unverified
GPC: Generative and General Pathology Image Classifier	Jul 12, 2024	Classificationimage-classification	—Unverified
15M Multimodal Facial Image-Text Dataset	Jul 11, 2024	Image to text	—Unverified
Towards a text-based quantitative and explainable histopathology image analysis	Jul 10, 2024	image-classificationImage Classification	CodeCode Available
Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation	Jul 8, 2024	Image to textLifelong learning	—Unverified
HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels	Jul 8, 2024	Contrastive LearningImage Retrieval	—Unverified
Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything	Jul 1, 2024	Image to textLanguage Modeling	—Unverified
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning	Jun 20, 2024	DiagnosticImage to text	CodeCode Available
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags	Jun 16, 2024	Image to textInstruction Following	—Unverified
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval	Jun 14, 2024	Image RetrievalImage to text	CodeCode Available
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval	Jun 11, 2024	Image RetrievalImage to text	—Unverified
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning	Jun 11, 2024	BenchmarkingContrastive Learning	CodeCode Available
AICoderEval: Improving AI Domain Code Generation of Large Language Models	Jun 7, 2024	Code GenerationImage to text	—Unverified
Faithful Chart Summarization with ChaTS-Pi	May 29, 2024	Image to textSentence	—Unverified
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning	May 26, 2024	Image to textImage-to-Text Retrieval	—Unverified
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation	May 23, 2024	Image to textSentence	CodeCode Available
DOCCI: Descriptions of Connected and Contrasting Images	Apr 30, 2024	Image GenerationImage to text	—Unverified
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Apr 30, 2024	Caption GenerationHallucination	—Unverified
Leveraging AI to Generate Audio for User-generated Content in Video Games	Apr 25, 2024	Audio GenerationGame Design	—Unverified
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations	Apr 25, 2024	Image to textSensitivity	CodeCode Available
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection	Apr 15, 2024	Anomaly DetectionAnomaly Localization	—Unverified
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation	Apr 1, 2024	Image SegmentationImage to text	—Unverified
BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval	Mar 24, 2024	DiagnosticImage Retrieval	—Unverified
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation	Mar 14, 2024	Image to textOptical Character Recognition (OCR)	—Unverified
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?	Mar 7, 2024	Image to textImage-to-Text Retrieval	—Unverified
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant	Mar 7, 2024	Clinical KnowledgeImage to text	—Unverified
Enhancing Vision-Language Pre-training with Rich Supervisions	Mar 5, 2024	Image to textTable Detection	—Unverified
Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition	Mar 4, 2024	Image to text	—Unverified
Probing Multimodal Large Language Models for Global and Local Semantic Representations	Feb 27, 2024	Image to textobject-detection	CodeCode Available
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models	Feb 21, 2024	BenchmarkingImage to text	—Unverified
Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models	Feb 13, 2024	Image CaptioningImage to text	—Unverified
Dynamic Traceback Learning for Medical Report Generation	Jan 24, 2024	Image to textMedical Report Generation	—Unverified
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs	Jan 5, 2024	Image ComprehensionImage to text	—Unverified
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment	Jan 4, 2024	Image Captioningimage-classification	—Unverified
Accept the Modality Gap: An Exploration in the Hyperbolic Space	Jan 1, 2024	Image to textImage-to-Text Retrieval	—Unverified

Show:10 25 50

← PrevPage 3 of 5Next →

No leaderboard results yet.