Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 246 papers

Title	Date	Tasks	Status	Hype
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design	May 29, 2024	Dataset GenerationImage to text	CodeCode Available	1
Faithful Chart Summarization with ChaTS-Pi	May 29, 2024	Image to textSentence	—Unverified	0
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning	May 26, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation	May 23, 2024	Image to textSentence	CodeCode Available	0
Libra: Building Decoupled Vision System on Large Language Models	May 16, 2024	Image to textLanguage Modeling	CodeCode Available	2
Language-Oriented Semantic Latent Representation for Image Transmission	May 16, 2024	Image to textSemantic Communication	CodeCode Available	1
DOCCI: Descriptions of Connected and Contrasting Images	Apr 30, 2024	Image GenerationImage to text	—Unverified	0
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Apr 30, 2024	Caption GenerationHallucination	—Unverified	0
Leveraging AI to Generate Audio for User-generated Content in Video Games	Apr 25, 2024	Audio GenerationGame Design	—Unverified	0
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations	Apr 25, 2024	Image to textSensitivity	CodeCode Available	0
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?	Apr 16, 2024	Image CaptioningImage Generation	CodeCode Available	1
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection	Apr 15, 2024	Anomaly DetectionAnomaly Localization	—Unverified	0
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching	Apr 4, 2024	AttributeImage Captioning	CodeCode Available	2
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation	Apr 1, 2024	Image SegmentationImage to text	—Unverified	0
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models	Apr 1, 2024	Graph GenerationImage to text	CodeCode Available	2
Evaluating Text-to-Visual Generation with Image-to-Text Generation	Apr 1, 2024	Image to textQuestion Answering	CodeCode Available	3
BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval	Mar 24, 2024	DiagnosticImage Retrieval	—Unverified	0
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation	Mar 14, 2024	Image to textOptical Character Recognition (OCR)	—Unverified	0
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes	Mar 7, 2024	Image to textObject	CodeCode Available	1
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant	Mar 7, 2024	Clinical KnowledgeImage to text	—Unverified	0
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?	Mar 7, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Enhancing Vision-Language Pre-training with Rich Supervisions	Mar 5, 2024	Image to textTable Detection	—Unverified	0
Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition	Mar 4, 2024	Image to text	—Unverified	0
Probing Multimodal Large Language Models for Global and Local Semantic Representations	Feb 27, 2024	Image to textobject-detection	CodeCode Available	0
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models	Feb 21, 2024	BenchmarkingImage to text	—Unverified	0

Show:10 25 50

← PrevPage 4 of 10Next →

No leaderboard results yet.