Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 246 papers

Title	Date	Tasks	Status	Hype
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering	Sep 29, 2023	Image to textPassage Retrieval	CodeCode Available	2
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search	Sep 28, 2023	cross-modal alignmentCross-Modal Retrieval	CodeCode Available	0
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution	Sep 25, 2023	Image to text	—Unverified	0
Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels	Sep 18, 2023	Generative Adversarial NetworkHandwriting Recognition	—Unverified	0
CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval	Sep 18, 2023	Image to textPerson Retrieval	CodeCode Available	0
BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification	Sep 9, 2023	Image to textLanguage Modeling	—Unverified	0
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation	Sep 8, 2023	Image GenerationImage to text	—Unverified	0
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning	Sep 5, 2023	DecoderImage Generation	CodeCode Available	2
Multimodal Foundation Models For Echocardiogram Interpretation	Aug 29, 2023	Cross-Modal RetrievalDiagnostic	CodeCode Available	1
Beyond One-to-One: Rethinking the Referring Image Segmentation	Aug 26, 2023	DecoderImage Segmentation	CodeCode Available	1
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages	Aug 23, 2023	Image GenerationImage to text	CodeCode Available	6
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training	Aug 22, 2023	image-classificationImage Classification	—Unverified	0
Vision-Language Dataset Distillation	Aug 15, 2023	Dataset Distillationimage-classification	CodeCode Available	1
Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval	Aug 8, 2023	Cross-Modal RetrievalImage Retrieval	CodeCode Available	1
Multimodal Neurons in Pretrained Text-Only Transformers	Aug 3, 2023	Image CaptioningImage to text	—Unverified	0
Revisiting DETR Pre-training for Object Detection	Aug 2, 2023	Image to textObject	—Unverified	0
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning	Jul 31, 2023	Caption GenerationHallucination	CodeCode Available	1
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports	Jul 24, 2023	Contrastive LearningImage to text	CodeCode Available	1
Towards a Visual-Language Foundation Model for Computational Pathology	Jul 24, 2023	Contrastive Learningimage-classification	—Unverified	0
Planting a SEED of Vision in Large Language Model	Jul 16, 2023	Image GenerationImage to text	CodeCode Available	2
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting	Jul 14, 2023	Cross-Modal RetrievalImage to text	—Unverified	0
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training	Jul 13, 2023	Image to text	CodeCode Available	1
Emu: Generative Pretraining in Multimodality	Jul 11, 2023	Image CaptioningImage Generation	CodeCode Available	3
MultiQG-TI: Towards Question Generation from Multi-modal Sources	Jul 7, 2023	Image to textOptical Character Recognition	CodeCode Available	0
Zero-shot Nuclei Detection via Visual-Language Pre-trained Models	Jun 30, 2023	Image to textobject-detection	CodeCode Available	0

Show:10 25 50

← PrevPage 6 of 10Next →

No leaderboard results yet.