Towards Models that Can See and Read Jan 18, 2023 Decoder Image Captioning
— Unverified 0Embodied Agents for Efficient Exploration and Smart Scene Description Jan 17, 2023 Efficient Exploration Image Captioning
— Unverified 0See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning Jan 12, 2023 Few-Shot Learning Image Captioning
Code Code Available 1An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU) Jan 6, 2023 Decoder Image Captioning
— Unverified 0Adaptively Clustering Neighbor Elements for Image-Text Generation Jan 5, 2023 Clustering Decoder
Code Code Available 0An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation Jan 3, 2023 Image Captioning Machine Translation
— Unverified 0PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3 Jan 1, 2023 Image Captioning Question Answering
— Unverified 0Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks Jan 1, 2023 Cross-Modal Retrieval Image Captioning
— Unverified 0Crossing the Gap: Domain Generalization for Image Captioning Jan 1, 2023 Domain Generalization Image Captioning
— Unverified 0On the Interpretability of Attention Networks Dec 30, 2022 Image Captioning
Code Code Available 0Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning Dec 27, 2022 Image Captioning Image Retrieval
Code Code Available 1On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective Dec 24, 2022 Decision Making Image Captioning
Code Code Available 1Do DALL-E and Flamingo Understand Each Other? Dec 23, 2022 Image Captioning Image Generation
— Unverified 0Transferring General Multimodal Pretrained Models to Text Recognition Dec 19, 2022 Image Captioning Optical Character Recognition (OCR)
— Unverified 0Position-guided Text Prompt for Vision-Language Pre-training Dec 19, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 1Efficient Image Captioning for Edge Devices Dec 18, 2022 CPU Image Captioning
— Unverified 0Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift Dec 15, 2022 Benchmarking Image Captioning
Code Code Available 1Cross-Modal Similarity-Based Curriculum Learning for Image Captioning Dec 14, 2022 Image Captioning Language Modeling
— Unverified 0NLIP: Noise-robust Language-Image Pre-training Dec 14, 2022 Image Captioning Image-text Retrieval
— Unverified 0Cap2Aug: Caption guided Image to Image data Augmentation Dec 11, 2022 Classification Cross-Domain Few-Shot
— Unverified 0REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory Dec 10, 2022 Image Captioning Language Modeling
Code Code Available 0ParsVQA-Caps: A Benchmark for Visual Question Answering and Image Captioning in Persian Dec 7, 2022 Image Captioning Question Answering
— Unverified 0Dataset vs Reality: Understanding Model Performance from the Perspective of Information Need Dec 6, 2022 Image Captioning Information Retrieval
— Unverified 0Semantic-Conditional Diffusion Networks for Image Captioning Dec 6, 2022 Cross-Modal Retrieval Decoder
Code Code Available 2Adaptive Testing of Computer Vision Models Dec 6, 2022 Image Captioning object-detection
Code Code Available 0Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning Dec 6, 2022 Image Captioning reinforcement-learning
Code Code Available 0Controllable Image Captioning via Prompting Dec 4, 2022 controllable image captioning Image Captioning
— Unverified 0Weakly Supervised Annotations for Multi-modal Greeting Cards Dataset Dec 1, 2022 Image Captioning Image Generation
— Unverified 0Focus! Relevant and Sufficient Context Selection for News Image Captioning Dec 1, 2022 Image Captioning Relation Extraction
— Unverified 0Uncertainty-Aware Image Captioning Nov 30, 2022 Caption Generation Image Captioning
— Unverified 0CLID: Controlled-Length Image Descriptions with Limited Data Nov 27, 2022 controllable image captioning Image Captioning
Code Code Available 0Predictive linguistic cues for fake news: a societal artificial intelligence problem Nov 26, 2022 Attribute Image Captioning
— Unverified 0Aesthetically Relevant Image Captioning Nov 25, 2022 Image Captioning Sentence
Code Code Available 1Can Machines Imitate Humans? Integrative Turing Tests for Vision and Language Demonstrate a Narrowing Gap Nov 23, 2022 Image Captioning object-detection
— Unverified 0Retrieval-Augmented Multimodal Language Modeling Nov 22, 2022 Caption Generation Image Captioning
— Unverified 0X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks Nov 22, 2022 All Cross-Modal Retrieval
Code Code Available 2Exploring Discrete Diffusion Models for Image Captioning Nov 21, 2022 Image Captioning Image Generation
Code Code Available 1A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 0An Enhanced Object Detection Model for Scene Graph Generation Nov 18, 2022 Graph Generation Image Captioning
— Unverified 0I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision Nov 17, 2022 Image Captioning Question Answering
Code Code Available 1Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired Nov 17, 2022 Image Captioning
— Unverified 0Progressive Tree-Structured Prototype Network for End-to-End Image Captioning Nov 17, 2022 Image Captioning
Code Code Available 1PromptCap: Prompt-Guided Task-Aware Image Captioning Nov 15, 2022 Image Captioning Language Modelling
Code Code Available 1Versatile Diffusion: Text, Images and Variations All in One Diffusion Model Nov 15, 2022 All Disentanglement
Code Code Available 6Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment Nov 14, 2022 Computational Efficiency Image Captioning
— Unverified 0Large-Scale Bidirectional Training for Zero-Shot Image Captioning Nov 13, 2022 Image Captioning Keyword Extraction
Code Code Available 1DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis Nov 12, 2022 COVID-19 Diagnosis Decoder
Code Code Available 1Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics Nov 12, 2022 Audio captioning Image Captioning
— Unverified 0VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning Nov 10, 2022 Image Captioning Vietnamese Image Captioning
— Unverified 0Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions Nov 9, 2022 Image Captioning Language Modeling
— Unverified 0