Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 246 papers

Title	Date	Tasks	Status	Hype
TrojVLM: Backdoor Attack Against Vision Language Models	Sep 28, 2024	Backdoor AttackImage Captioning	—Unverified	0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization	Sep 26, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Evaluating authenticity and quality of image captions via sentiment and semantic analyses	Sep 14, 2024	Image CaptioningImage to text	—Unverified	0
See or Guess: Counterfactually Regularized Image Captioning	Aug 29, 2024	Causal Inferencecounterfactual	CodeCode Available	1
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation	Aug 21, 2024	Image GenerationImage Retrieval	CodeCode Available	1
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models	Aug 16, 2024	Image to text	—Unverified	0
Instruction Tuning-free Visual Token Complement for Multimodal LLMs	Aug 9, 2024	Image GenerationImage to text	—Unverified	0
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation	Aug 9, 2024	Image to textObject	CodeCode Available	2
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models	Jul 30, 2024	Image to textImage-to-Text Retrieval	CodeCode Available	0
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities	Jul 29, 2024	Contrastive LearningDeepFake Detection	CodeCode Available	2
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Jul 25, 2024	Image to textLanguage Modeling	—Unverified	0
GPC: Generative and General Pathology Image Classifier	Jul 12, 2024	Classificationimage-classification	—Unverified	0
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval	Jul 11, 2024	Image RetrievalImage to text	CodeCode Available	2
15M Multimodal Facial Image-Text Dataset	Jul 11, 2024	Image to text	—Unverified	0
Towards a text-based quantitative and explainable histopathology image analysis	Jul 10, 2024	image-classificationImage Classification	CodeCode Available	0
HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels	Jul 8, 2024	Contrastive LearningImage Retrieval	—Unverified	0
Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation	Jul 8, 2024	Image to textLifelong learning	—Unverified	0
Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything	Jul 1, 2024	Image to textLanguage Modeling	—Unverified	0
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning	Jun 20, 2024	DiagnosticImage to text	CodeCode Available	0
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags	Jun 16, 2024	Image to textInstruction Following	—Unverified	0
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval	Jun 14, 2024	Image RetrievalImage to text	CodeCode Available	0
CMC-Bench: Towards a New Paradigm of Visual Signal Compression	Jun 13, 2024	Image CompressionImage to text	CodeCode Available	1
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval	Jun 11, 2024	Image RetrievalImage to text	—Unverified	0
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning	Jun 11, 2024	BenchmarkingContrastive Learning	CodeCode Available	0
AICoderEval: Improving AI Domain Code Generation of Large Language Models	Jun 7, 2024	Code GenerationImage to text	—Unverified	0
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design	May 29, 2024	Dataset GenerationImage to text	CodeCode Available	1
Faithful Chart Summarization with ChaTS-Pi	May 29, 2024	Image to textSentence	—Unverified	0
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning	May 26, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation	May 23, 2024	Image to textSentence	CodeCode Available	0
Libra: Building Decoupled Vision System on Large Language Models	May 16, 2024	Image to textLanguage Modeling	CodeCode Available	2
Language-Oriented Semantic Latent Representation for Image Transmission	May 16, 2024	Image to textSemantic Communication	CodeCode Available	1
DOCCI: Descriptions of Connected and Contrasting Images	Apr 30, 2024	Image GenerationImage to text	—Unverified	0
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Apr 30, 2024	Caption GenerationHallucination	—Unverified	0
Leveraging AI to Generate Audio for User-generated Content in Video Games	Apr 25, 2024	Audio GenerationGame Design	—Unverified	0
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations	Apr 25, 2024	Image to textSensitivity	CodeCode Available	0
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?	Apr 16, 2024	Image CaptioningImage Generation	CodeCode Available	1
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection	Apr 15, 2024	Anomaly DetectionAnomaly Localization	—Unverified	0
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching	Apr 4, 2024	AttributeImage Captioning	CodeCode Available	2
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation	Apr 1, 2024	Image SegmentationImage to text	—Unverified	0
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models	Apr 1, 2024	Graph GenerationImage to text	CodeCode Available	2
Evaluating Text-to-Visual Generation with Image-to-Text Generation	Apr 1, 2024	Image to textQuestion Answering	CodeCode Available	3
BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval	Mar 24, 2024	DiagnosticImage Retrieval	—Unverified	0
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation	Mar 14, 2024	Image to textOptical Character Recognition (OCR)	—Unverified	0
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes	Mar 7, 2024	Image to textObject	CodeCode Available	1
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant	Mar 7, 2024	Clinical KnowledgeImage to text	—Unverified	0
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?	Mar 7, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Enhancing Vision-Language Pre-training with Rich Supervisions	Mar 5, 2024	Image to textTable Detection	—Unverified	0
Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition	Mar 4, 2024	Image to text	—Unverified	0
Probing Multimodal Large Language Models for Global and Local Semantic Representations	Feb 27, 2024	Image to textobject-detection	CodeCode Available	0
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models	Feb 21, 2024	BenchmarkingImage to text	—Unverified	0

Show:10 25 50

← PrevPage 2 of 5Next →

No leaderboard results yet.