Evaluation of Multilingual Image Captioning: How far can we get with CLIP models? Feb 10, 2025 Image Captioning Semantic correspondence
Code Code Available 0Generative Distribution Prediction: A Unified Approach to Multimodal Learning Feb 10, 2025 Domain Adaptation Image Captioning
— Unverified 0Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents Feb 6, 2025 Image Captioning Optical Character Recognition
— Unverified 0Efficient Few-Shot Continual Learning in Vision-Language Models Feb 6, 2025 Continual Learning Image Captioning
— Unverified 0TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data Feb 5, 2025 Image Captioning object-detection
Code Code Available 0COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation Feb 4, 2025 Image Captioning Panoptic Segmentation
— Unverified 0Exploring Spatial Language Grounding Through Referring Expressions Feb 4, 2025 Image Captioning Negation
— Unverified 0MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding Jan 30, 2025 Benchmarking Decision Making
— Unverified 0Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes Jan 23, 2025 Emotion Classification Image Captioning
Code Code Available 0An Ensemble Model with Attention Based Mechanism for Image Captioning Jan 22, 2025 Ensemble Learning Image Captioning
— Unverified 0Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis Jan 16, 2025 Decoder Image Captioning
Code Code Available 0Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness Jan 16, 2025 Adversarial Defense Adversarial Robustness
— Unverified 0VCRScore: Image captioning metric based on V\&L Transformers, CLIP, and precision-recall Jan 15, 2025 Image Captioning
— Unverified 0GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing Jan 12, 2025 Image Captioning Language Modeling
— Unverified 0Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time Jan 8, 2025 Image Captioning Style Transfer
— Unverified 0Evaluating Image Caption via Cycle-consistent Text-to-Image Generation Jan 7, 2025 Contrastive Learning Diversity
— Unverified 0Decoding fMRI Data into Captions using Prefix Language Modeling Jan 5, 2025 Brain Decoding Image Captioning
Code Code Available 0MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning Jan 3, 2025 Diagnostic General Knowledge
— Unverified 0Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models Jan 1, 2025 Image Captioning Memorization
— Unverified 0Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception Jan 1, 2025 Image Captioning Image Generation
— Unverified 0AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation Jan 1, 2025 Image Captioning Question Answering
— Unverified 0Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution Jan 1, 2025 Depth Estimation Image Captioning
— Unverified 0Semantic and Expressive Variations in Image Captions Across Languages Jan 1, 2025 Descriptive Image Captioning
— Unverified 0Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning Dec 31, 2024 Caption Generation Decoder
— Unverified 0Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering Dec 30, 2024 Image Captioning Object Recognition
— Unverified 0ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers Dec 27, 2024 Image Captioning Question Answering
— Unverified 0ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning Dec 26, 2024 Image Captioning Retrieval
Code Code Available 0GCS-M3VLT: Guided Context Self-Attention based Multi-modal Medical Vision Language Transformer for Retinal Image Captioning Dec 23, 2024 Image Captioning Language Modeling
— Unverified 0Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy Dec 23, 2024 Image Captioning Question Answering
— Unverified 0SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization Dec 21, 2024 Image Captioning Multimodal Reasoning
Code Code Available 0Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution Dec 20, 2024 Answer Generation Image Captioning
Code Code Available 0Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage Dec 20, 2024 Attribute Benchmarking
— Unverified 0Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation Dec 20, 2024 Image Captioning
Code Code Available 0A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation Dec 20, 2024 Image Captioning
Code Code Available 0Dataset Augmentation by Mixing Visual Concepts Dec 19, 2024 Image Captioning
— Unverified 0Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models Dec 19, 2024 Autonomous Driving Image Captioning
Code Code Available 0Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Dec 19, 2024 Depth Estimation Image Captioning
— Unverified 0Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception Dec 18, 2024 Descriptive Human-Object Interaction Detection
Code Code Available 0Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval Dec 18, 2024 Cross-Modal Retrieval Image Captioning
— Unverified 0JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts Dec 18, 2024 Action Detection Descriptive
Code Code Available 0UnMA-CapSumT: Unified and Multi-Head Attention-driven Caption Summarization Transformer Dec 16, 2024 Image Captioning
— Unverified 0PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension Dec 16, 2024 Benchmarking Image Captioning
— Unverified 0Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track Dec 15, 2024 Image Captioning Medical Question Answering
— Unverified 0From Simple to Professional: A Combinatorial Controllable Image Captioning Agent Dec 15, 2024 Caption Generation controllable image captioning
Code Code Available 0Optimizing Vision-Language Interactions Through Decoder-Only Models Dec 14, 2024 Decoder Image Captioning
— Unverified 0Automated Image Captioning with CNNs and Transformers Dec 13, 2024 Descriptive Hyperparameter Optimization
Code Code Available 0Vision-Language Models Represent Darker-Skinned Black Individuals as More Homogeneous than Lighter-Skinned Black Individuals Dec 12, 2024 Image Captioning Image Generation
— Unverified 0How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey Dec 11, 2024 Image Captioning Question Answering
— Unverified 0Seeing Syntax: Uncovering Syntactic Learning Limitations in Vision-Language Models Dec 11, 2024 Image Captioning Image Generation
— Unverified 03D Spatial Understanding in MLLMs: Disambiguation and Evaluation Dec 9, 2024 3D dense captioning 3D visual grounding
— Unverified 0