Zero-shot audio captioning with audio-language model guidance and audio context keywords Nov 14, 2023 Audio captioning Descriptive
Code Code Available 1InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 1Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models Nov 11, 2023 Image Captioning MMR total
Code Code Available 3Holistic Evaluation of GPT-4V for Biomedical Imaging Nov 10, 2023 Anatomy Diagnostic
— Unverified 0How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model Nov 10, 2023 Image Captioning Language Modeling
— Unverified 0Zero-shot Translation of Attention Patterns in VQA Models to Natural Language Nov 8, 2023 Image Captioning Language Modeling
Code Code Available 0DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding Nov 7, 2023 3D Reconstruction Benchmarking
Code Code Available 0JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models Nov 7, 2023 Image Captioning
Code Code Available 0GLaMM: Pixel Grounding Large Multimodal Model Nov 6, 2023 Conversational Question Answering Image Captioning
Code Code Available 2NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment Nov 5, 2023 Caption Generation Common Sense Reasoning
Code Code Available 1Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning Nov 2, 2023 Diagnostic Image Captioning
Code Code Available 1Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning Nov 2, 2023 Caption Generation Efficient Exploration
— Unverified 0Enhanced Knowledge Injection for Radiology Report Generation Nov 1, 2023 Image Captioning Retrieval
— Unverified 0What a Whole Slide Image Can Tell? Subtype-guided Masked Transformer for Pathological Image Captioning Oct 31, 2023 Image Captioning Sentence
— Unverified 0Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts Oct 31, 2023 Image Captioning Language Modeling
Code Code Available 1Improving Medical Visual Representations via Radiology Report Generation Oct 30, 2023 Contrastive Learning Decoder
— Unverified 0Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection Oct 29, 2023 Anomaly Detection Image Captioning
Code Code Available 1Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender Oct 29, 2023 Image Captioning
Code Code Available 0CropCap: Embedding Visual Cross-Partition Dependency for Image Captioning Oct 27, 2023 Image Captioning
— Unverified 0Impressions: Understanding Visual Semiotics and Aesthetic Impact Oct 27, 2023 Image Captioning Image Description
— Unverified 0Apollo: Zero-shot MultiModal Reasoning with Multiple Experts Oct 25, 2023 Image Captioning Multimodal Reasoning
Code Code Available 0A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation Oct 25, 2023 Image Captioning Image Generation
— Unverified 0Semantic and Expressive Variation in Image Captions Across Languages Oct 22, 2023 Diversity Graph Embedding
— Unverified 0Visual Grounding Helps Learn Word Meanings in Low-Data Regimes Oct 20, 2023 Image Captioning Language Acquisition
Code Code Available 1CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages Oct 20, 2023 Diversity GPU
Code Code Available 1RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering Oct 19, 2023 Image Captioning Question Answering
Code Code Available 0Lost in Translation: When GPT-4V(ision) Can't See Eye to Eye with Text. A Vision-Language-Consistency Analysis of VLLMs and Beyond Oct 19, 2023 Image Captioning Language Modeling
— Unverified 0CLAIR: Evaluating Image Captions with Large Language Models Oct 19, 2023 Diversity Image Captioning
— Unverified 0ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding Oct 19, 2023 Image Captioning Language Modeling
Code Code Available 0Evaluating the Fairness of Discriminative Foundation Models in Computer Vision Oct 18, 2023 Fairness Image Captioning
Code Code Available 0Towards Automatic Satellite Images Captions Generation Using Large Language Models Oct 17, 2023 Image Captioning Management
— Unverified 0Bounding and Filling: A Fast and Flexible Framework for Image Captioning Oct 15, 2023 Image Captioning Image Description
Code Code Available 0From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models Oct 13, 2023 Hallucination Image Captioning
Code Code Available 2Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning Oct 12, 2023 Image Captioning Image-text Retrieval
— Unverified 0LangNav: Language as a Perceptual Representation for Navigation Oct 11, 2023 Image Captioning Language Modeling
— Unverified 0A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation Oct 11, 2023 Caption Generation Decoder
— Unverified 0Improving mitosis detection on histopathology images using large vision-language models Oct 11, 2023 Domain Generalization Image Captioning
— Unverified 0The Solution for the CVPR2023 NICE Image Captioning Challenge Oct 10, 2023 Contrastive Learning Image Captioning
— Unverified 0ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models Oct 9, 2023 Image Captioning Visual Commonsense Reasoning
— Unverified 0Lightweight In-Context Tuning for Multimodal Unified Models Oct 8, 2023 Image Captioning In-Context Learning
— Unverified 0Module-wise Adaptive Distillation for Multimodality Foundation Models Oct 6, 2023 Image Captioning Thompson Sampling
— Unverified 0IcoCap: Improving Video Captioning by Compounding Images Oct 5, 2023 Image Captioning Video Captioning
— Unverified 0On the Performance of Multimodal Language Models Oct 4, 2023 Benchmarking Binary Classification
— Unverified 0MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts Oct 3, 2023 Chatbot Image Captioning
Code Code Available 2Language Models as Knowledge Bases for Visual Word Sense Disambiguation Oct 3, 2023 Image Captioning Multiple-choice
Code Code Available 0Sieve: Multimodal Dataset Pruning Using Image Captioning Models Oct 3, 2023 Diversity Image Captioning
Code Code Available 1Self-Supervised Open-Ended Classification with Small Visual Language Models Sep 30, 2023 Few-Shot Learning Image Captioning
— Unverified 0YOLOR-Based Multi-Task Learning Sep 29, 2023 Image Captioning Instance Segmentation
Code Code Available 5ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens Sep 28, 2023 Cross-Modal Retrieval GPU
Code Code Available 0Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness Sep 27, 2023 Data Augmentation Image Captioning
— Unverified 0