A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation Dec 20, 2024 Image Captioning
Code Code Available 0Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution Dec 20, 2024 Answer Generation Image Captioning
Code Code Available 0Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage Dec 20, 2024 Attribute Benchmarking
— Unverified 0Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation Dec 20, 2024 Image Captioning
Code Code Available 0Dataset Augmentation by Mixing Visual Concepts Dec 19, 2024 Image Captioning
— Unverified 0Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models Dec 19, 2024 Autonomous Driving Image Captioning
Code Code Available 0Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Dec 19, 2024 Depth Estimation Image Captioning
— Unverified 0Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception Dec 18, 2024 Descriptive Human-Object Interaction Detection
Code Code Available 0G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o Dec 18, 2024 Image Captioning Video Captioning
Code Code Available 1JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts Dec 18, 2024 Action Detection Descriptive
Code Code Available 0Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models Dec 18, 2024 document understanding Image Captioning
Code Code Available 1Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval Dec 18, 2024 Cross-Modal Retrieval Image Captioning
— Unverified 0MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants Dec 17, 2024 Image Captioning Question Answering
Code Code Available 1PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension Dec 16, 2024 Benchmarking Image Captioning
— Unverified 0UnMA-CapSumT: Unified and Multi-Head Attention-driven Caption Summarization Transformer Dec 16, 2024 Image Captioning
— Unverified 0Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track Dec 15, 2024 Image Captioning Medical Question Answering
— Unverified 0From Simple to Professional: A Combinatorial Controllable Image Captioning Agent Dec 15, 2024 Caption Generation controllable image captioning
Code Code Available 0Optimizing Vision-Language Interactions Through Decoder-Only Models Dec 14, 2024 Decoder Image Captioning
— Unverified 0Automated Image Captioning with CNNs and Transformers Dec 13, 2024 Descriptive Hyperparameter Optimization
Code Code Available 0Vision-Language Models Represent Darker-Skinned Black Individuals as More Homogeneous than Lighter-Skinned Black Individuals Dec 12, 2024 Image Captioning Image Generation
— Unverified 0Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning Dec 11, 2024 Attribute Benchmarking
Code Code Available 1Seeing Syntax: Uncovering Syntactic Learning Limitations in Vision-Language Models Dec 11, 2024 Image Captioning Image Generation
— Unverified 0How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey Dec 11, 2024 Image Captioning Question Answering
— Unverified 03D Spatial Understanding in MLLMs: Disambiguation and Evaluation Dec 9, 2024 3D dense captioning 3D visual grounding
— Unverified 0Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Dec 8, 2024 Image Captioning
Code Code Available 0HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing Dec 7, 2024 Answer Generation Graph Generation
— Unverified 0Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning Dec 5, 2024 Comment Generation Decoder
Code Code Available 0Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Dec 5, 2024 Contrastive Learning Hallucination
Code Code Available 3Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis Dec 4, 2024 Image Captioning Image Description
— Unverified 0Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey Dec 3, 2024 Change Detection Descriptive
Code Code Available 3Progress-Aware Video Frame Captioning Dec 3, 2024 Image Captioning Video Captioning
— Unverified 0CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs Dec 3, 2024 Image Captioning Quantization
— Unverified 0DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding Dec 2, 2024 Caption Generation Domain Generalization
— Unverified 0Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring Dec 1, 2024 Automated Theorem Proving Geometry Problem Solving
— Unverified 0Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers Nov 28, 2024 Image Captioning image-classification
— Unverified 0OPCap:Object-aware Prompting Captioning Nov 27, 2024 Attribute Decoder
— Unverified 0Active Data Curation Effectively Distills Large-Scale Multimodal Models Nov 27, 2024 Decoder Image Captioning
— Unverified 0Efficient Multi-modal Large Language Models via Visual Token Grouping Nov 26, 2024 Image Captioning Question Answering
— Unverified 0LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation Nov 25, 2024 Image Captioning RAG
Code Code Available 1Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models Nov 25, 2024 Attribute Computational Efficiency
— Unverified 0Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks Nov 24, 2024 Image Captioning Natural Language Understanding
— Unverified 0FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Nov 23, 2024 Attribute Cross-Modal Retrieval
— Unverified 0FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation Nov 23, 2024 Anatomy Image Captioning
Code Code Available 1Uterine Ultrasound Image Captioning Using Deep Learning Techniques Nov 21, 2024 Deep Learning Descriptive
— Unverified 0LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression Nov 20, 2024 Image Captioning Image Compression
Code Code Available 1Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment Nov 19, 2024 Image Captioning Image Quality Assessment
— Unverified 0AI Flow at the Network Edge Nov 19, 2024 Image Captioning
— Unverified 0The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning Nov 18, 2024 Image Captioning
Code Code Available 0Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning Nov 17, 2024 Image Captioning Language Modeling
Code Code Available 0MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild Nov 17, 2024 Active Learning Image Captioning
— Unverified 0