Image Caption Generation for Low-Resource Assamese Language Nov 1, 2022 Caption Generation Decoder
— Unverified 0Text-Only Training for Image Captioning using Noise-Injected CLIP Nov 1, 2022 Decoder Image Captioning
Code Code Available 2DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention Oct 28, 2022 Image Captioning Language Modeling
— Unverified 0FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning Oct 26, 2022 Cross-Modal Retrieval Decoder
— Unverified 0Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks Oct 26, 2022 Image Captioning Language Modeling
— Unverified 0RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data Oct 23, 2022 Image Captioning Image-text Retrieval
— Unverified 0PoseScript: Linking 3D Human Poses and Natural Language Oct 21, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 2Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation Oct 20, 2022 Decoder Image Captioning
Code Code Available 1Image-Text Retrieval with Binary and Continuous Label Supervision Oct 20, 2022 Image Captioning Image-text Retrieval
— Unverified 0Prophet Attention: Predicting Attention with Future Attention for Image Captioning Oct 19, 2022 Image Captioning
— Unverified 0Aligning MAGMA by Few-Shot Learning and Finetuning Oct 18, 2022 Few-Shot Learning Image Captioning
— Unverified 0Probing Cross-modal Semantics Alignment Capability from the Textual Perspective Oct 18, 2022 Image Captioning Sentence
— Unverified 0Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training Oct 17, 2022 Image Captioning Network Interpretation
Code Code Available 0Vision-Language Pre-training: Basics, Recent Advances, and Future Trends Oct 17, 2022 Few-Shot Learning Image Captioning
Code Code Available 3MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting Oct 13, 2022 Image Captioning Question Answering
Code Code Available 1Visual Language Maps for Robot Navigation Oct 11, 2022 3D Reconstruction Image Captioning
Code Code Available 2MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer Oct 10, 2022 Decoder Image Captioning
Code Code Available 0Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis Oct 10, 2022 All Image Captioning
Code Code Available 1Generating image captions with external encyclopedic knowledge Oct 10, 2022 Caption Generation Image Captioning
— Unverified 0CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning Oct 10, 2022 Decoder Denoising
Code Code Available 1Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP Oct 9, 2022 Image Captioning Open Vocabulary Semantic Segmentation
Code Code Available 2Towards Multi-Modal Sarcasm Detection via Hierarchical Congruity Modeling with Knowledge Enhancement Oct 7, 2022 Image Captioning Sarcasm Detection
Code Code Available 1Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding Oct 7, 2022 Chart Question Answering Diversity
Code Code Available 2Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning Oct 4, 2022 Image Captioning Sentence
Code Code Available 0Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity Oct 3, 2022 Audio captioning Image Captioning
— Unverified 0On the Effects of Video Grounding on Language Models Oct 1, 2022 Image Captioning Question Answering
— Unverified 0DeltaNet: Conditional Medical Report Generation for COVID-19 Diagnosis Oct 1, 2022 COVID-19 Diagnosis Decoder
— Unverified 0JPG - Jointly Learn to Align: Automated Disease Prediction and Radiology Report Generation Oct 1, 2022 cross-modal alignment Disease Prediction
— Unverified 0Multi-view and Cross-view Brain Decoding Oct 1, 2022 Brain Decoding Image Captioning
— Unverified 0SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation Sep 30, 2022 Decoder Image Captioning
Code Code Available 1Linearly Mapping from Image to Text Space Sep 30, 2022 Image Captioning Image to text
Code Code Available 1Medical Image Captioning via Generative Pretrained Transformers Sep 28, 2022 Caption Generation Descriptive
— Unverified 0Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text Sep 28, 2022 Image Captioning Image Retrieval
Code Code Available 1DRAMA: Joint Risk Localization and Captioning in Driving Sep 22, 2022 Image Captioning
— Unverified 0Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia Sep 21, 2022 Articles Image Captioning
— Unverified 0Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering Sep 21, 2022 Image Captioning Optical Character Recognition (OCR)
— Unverified 0Learning Distinct and Representative Styles for Image Captioning Sep 17, 2022 Diversity Image Captioning
Code Code Available 1Belief Revision based Caption Re-ranker with Visual Semantic Information Sep 16, 2022 Caption Generation Image Captioning
Code Code Available 1LAVIS: A Library for Language-Vision Intelligence Sep 15, 2022 Benchmarking Image Captioning
— Unverified 0OmniVL:One Foundation Model for Image-Language and Video-Language Tasks Sep 15, 2022 Action Classification Action Recognition
— Unverified 0M^4I: Multi-modal Models Membership Inference Sep 15, 2022 Image Captioning Inference Attack
Code Code Available 1PaLI: A Jointly-Scaled Multilingual Language-Image Model Sep 14, 2022 Decoder Few-Shot Image Classification
— Unverified 0PreSTU: Pre-Training for Scene-Text Understanding Sep 12, 2022 Decoder Image Captioning
— Unverified 0Every picture tells a story: Image-grounded controllable stylistic story generation Sep 4, 2022 Image Captioning Image to text
— Unverified 0vieCap4H-VLSP 2021: Vietnamese Image Captioning for Healthcare Domain using Swin Transformer and Attention-based LSTM Sep 3, 2022 Decoder Image Captioning
— Unverified 0Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Aug 22, 2022 All Cross-Modal Retrieval
Code Code Available 0A Medical Semantic-Assisted Transformer for Radiographic Report Generation Aug 22, 2022 Image Captioning Medical Report Generation
— Unverified 0Target-oriented Sentiment Classification with Sequential Cross-modal Semantic Graph Aug 19, 2022 Decoder Image Captioning
Code Code Available 0VAuLT: Augmenting the Vision-and-Language Transformer for Sentiment Classification on Social Media Aug 18, 2022 Descriptive Diversity
Code Code Available 1GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement Aug 18, 2022 Grounded Situation Recognition Image Captioning
Code Code Available 0