Enhancing image captioning with depth information using a Transformer-based framework Jul 24, 2023 Image Captioning Image Paragraph Captioning
— Unverified 0OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? Jul 21, 2023 Diversity Image Captioning
— Unverified 0Improving Multimodal Datasets with Image Captioning Jul 19, 2023 Image Captioning
— Unverified 0Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning Jul 19, 2023 Image Captioning
— Unverified 0Image Captions are Natural Prompts for Text-to-Image Models Jul 17, 2023 Image Captioning Image Generation
Code Code Available 1AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes Jul 14, 2023 Attribute Caption Generation
— Unverified 0mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs Jul 13, 2023 Image Captioning
Code Code Available 0Reading Radiology Imaging Like The Radiologist Jul 12, 2023 Image Captioning Retrieval
— Unverified 0Emu: Generative Pretraining in Multimodality Jul 11, 2023 Image Captioning Image Generation
Code Code Available 3Linear Alignment of Vision-language Models for Image Captioning Jul 10, 2023 Image Captioning Language Modelling
Code Code Available 1SVIT: Scaling up Visual Instruction Tuning Jul 9, 2023 Diversity Image Captioning
Code Code Available 3Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints Jul 7, 2023 Image Captioning Image Retrieval
Code Code Available 1Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels Jul 5, 2023 Image Captioning Prompt Learning
— Unverified 0JourneyDB: A Benchmark for Generative Image Understanding Jul 3, 2023 Image Captioning Image Comprehension
Code Code Available 2More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data Jul 1, 2023 Image Captioning image-classification
— Unverified 0LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding Jun 29, 2023 16k Image Captioning
Code Code Available 2Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023 Jun 28, 2023 Action Anticipation Image Captioning
Code Code Available 1Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity Jun 28, 2023 Benchmarking Image Captioning
— Unverified 0Semi-supervised Multimodal Representation Learning through a Global Workspace Jun 27, 2023 Image Captioning Image Generation
Code Code Available 0Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic Jun 27, 2023 Image Captioning Referring Expression Segmentation
Code Code Available 2What Makes ImageNet Look Unlike LAION Jun 27, 2023 counterfactual Image Captioning
Code Code Available 1Self-Supervised Image Captioning with CLIP Jun 26, 2023 Image Captioning Informativeness
— Unverified 0Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1Improving Reference-based Distinctive Image Captioning with Contrastive Rewards Jun 25, 2023 Benchmarking Contrastive Learning
— Unverified 0Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion Jun 20, 2023 Image Captioning Language Modelling
— Unverified 0Generation of Radiology Findings in Chest X-Ray by Leveraging Collaborative Knowledge Jun 18, 2023 Image Captioning Language Modelling
— Unverified 0Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering Jun 16, 2023 Image Captioning Question Answering
Code Code Available 1LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models Jun 15, 2023 Hallucination Image Captioning
Code Code Available 2Image Captioners Are Scalable Vision Learners Too Jun 13, 2023 Decoder Image Captioning
— Unverified 0Top-Down Framework for Weakly-supervised Grounded Image Captioning Jun 13, 2023 Image Captioning Multi-Label Classification
Code Code Available 0Scalable 3D Captioning with Pretrained Models Jun 12, 2023 Descriptive Image Captioning
Code Code Available 2A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation Jun 12, 2023 Image Captioning Machine Translation
— Unverified 0Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Jun 7, 2023 Diversity Image Captioning
Code Code Available 1Putting Humans in the Image Captioning Loop Jun 6, 2023 Image Captioning
— Unverified 0Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory Jun 6, 2023 Continual Learning Data Augmentation
— Unverified 0SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning Jun 6, 2023 Caption Generation Image Captioning
Code Code Available 0Composition and Deformance: Measuring Imageability with a Text-to-Image Model Jun 5, 2023 Image Captioning Image Generation
Code Code Available 0Cheap-fake Detection with LLM using Prompt Engineering Jun 5, 2023 Image Captioning Image Generation
— Unverified 0"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning Jun 1, 2023 Image Captioning Keyword Extraction
— Unverified 0Understanding and Mitigating Copying in Diffusion Models May 31, 2023 Image Captioning Memorization
Code Code Available 1LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting May 31, 2023 Decoder Image Captioning
Code Code Available 0VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset May 29, 2023 Audio captioning Audio-Visual Captioning
Code Code Available 2Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models May 29, 2023 Image Captioning Image Classification
Code Code Available 1Image Captioning with Multi-Context Synthetic Data May 29, 2023 Image Captioning Language Modelling
— Unverified 0Contextual Object Detection with Multimodal Large Language Models May 29, 2023 Cloze Test Decoder
Code Code Available 2FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 1FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing May 27, 2023 Graph Similarity Human Judgment Correlation
Code Code Available 1CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 1Green Runner: A tool for efficient model selection from model repositories May 26, 2023 Deep Learning Image Captioning
— Unverified 0BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks May 26, 2023 Image Captioning Medical Visual Question Answering
Code Code Available 2