GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text Aug 14, 2023 Drug Discovery Image Captioning
Code Code Available 15 GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph Sep 6, 2021 Graph Generation Graph Learning
Code Code Available 15 Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search May 18, 2018 GPU Image Captioning
Code Code Available 15 GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Feb 13, 2025 Cross-Modal Retrieval Image Captioning
Code Code Available 15 Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Jan 5, 2025 Image Captioning Image to text
Code Code Available 15 CLIPScore: A Reference-free Evaluation Metric for Image Captioning Apr 18, 2021 Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref)
Code Code Available 15 CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning Oct 10, 2022 Decoder Denoising
Code Code Available 15 FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context Mar 4, 2022 Decoder Image Captioning
Code Code Available 15 From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 15 CNN+CNN: Convolutional Decoders for Image Captioning May 23, 2018 Image Captioning Sentence
Code Code Available 15 Compact Bidirectional Transformer for Image Captioning Jan 6, 2022 Decoder Image Captioning
Code Code Available 15 FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 15 ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models Feb 17, 2024 Earth Observation Image Captioning
Code Code Available 15 CgT-GAN: CLIP-guided Text GAN for Image Captioning Aug 23, 2023 Image Captioning
Code Code Available 15 FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation Nov 23, 2024 Anatomy Image Captioning
Code Code Available 15 FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks Mar 4, 2023 Cross-Modal Retrieval Image Captioning
Code Code Available 15 A Survey on Efficient Vision-Language Models Apr 13, 2025 Image Captioning Question Answering
Code Code Available 15 FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing May 27, 2023 Graph Similarity Human Judgment Correlation
Code Code Available 15 Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards Aug 6, 2020 Attribute Image Captioning
Code Code Available 15 PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model Jan 21, 2025 Hallucination Image Captioning
Code Code Available 15 Can Audio Captions Be Evaluated with Image Caption Metrics? Oct 10, 2021 AudioCaps Audio captioning
Code Code Available 15 Exploring Discrete Diffusion Models for Image Captioning Nov 21, 2022 Image Captioning Image Generation
Code Code Available 15 Exchanging-based Multimodal Fusion with Transformer Sep 5, 2023 Image Captioning Image Generation
Code Code Available 15 Chart-to-Text: A Large-Scale Benchmark for Chart Summarization Mar 12, 2022 Data-to-Text Generation Image Captioning
Code Code Available 15 Adapting Grad-CAM for Embedding Networks Jan 17, 2020 Image Captioning image-classification
Code Code Available 15 Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning Aug 13, 2022 Image Captioning
Code Code Available 15 Exploring Diverse In-Context Configurations for Image Captioning May 24, 2023 Image Captioning In-Context Learning
Code Code Available 15 CIDEr: Consensus-based Image Description Evaluation Nov 20, 2014 Action Recognition Attribute
Code Code Available 15 CaMEL: Mean Teacher Learning for Image Captioning Feb 21, 2022 Image Captioning Knowledge Distillation
Code Code Available 15 CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation Aug 29, 2023 Image Captioning Machine Translation
Code Code Available 15 Evaluating Multimodal Representations on Visual Semantic Textual Similarity Apr 4, 2020 Benchmarking Image Captioning
Code Code Available 15 Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints Jul 7, 2023 Image Captioning Image Retrieval
Code Code Available 15 Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models May 15, 2023 3D Object Detection Image Captioning
Code Code Available 15 COBRA: Contrastive Bi-Modal Representation Algorithm May 7, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 15 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 15 Gated Hierarchical Attention for Image Captioning Oct 30, 2018 Decoder Image Captioning
Code Code Available 15 BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues Jul 29, 2024 Image Captioning
Code Code Available 15 CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages Oct 20, 2023 Diversity GPU
Code Code Available 15 Evolving Deep Neural Networks Mar 1, 2017 Deep Learning Image Captioning
Code Code Available 15 Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Dec 11, 2023 Image Captioning Question Answering
Code Code Available 15 Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning May 31, 2022 Common Sense Reasoning Graph Generation
Code Code Available 15 FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model Jun 10, 2024 Image Captioning
Code Code Available 15 End-to-End Supermask Pruning: Learning to Prune Image Captioning Models Oct 7, 2021 Decoder Image Captioning
Code Code Available 15 Comprehensive Image Captioning via Scene Graph Decomposition Jul 23, 2020 Diversity Image Captioning
Code Code Available 15 A large annotated corpus for learning natural language inference Aug 21, 2015 Image Captioning Natural Language Inference
Code Code Available 15 Confidence-aware Non-repetitive Multimodal Transformers for TextCaps Dec 7, 2020 Image Captioning Optical Character Recognition
Code Code Available 15 Connecting What to Say With Where to Look by Modeling Human Attention Traces May 12, 2021 Caption Generation Image Captioning
Code Code Available 15 GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features Jul 20, 2022 Image Captioning
Code Code Available 15 Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning Aug 8, 2022 Image Captioning Image Generation
Code Code Available 15 End-to-End Transformer Based Model for Image Captioning Mar 29, 2022 Decoder Image Captioning
Code Code Available 15