Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation Jul 10, 2024 Image Captioning Image Segmentation
Code Code Available 1AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning Jul 10, 2024 Audio-Visual Captioning Image Captioning
Code Code Available 1Leveraging image captions for selective whole slide image annotation Jul 8, 2024 Diversity Image Captioning
Code Code Available 0Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes Jul 4, 2024 Image Captioning image-classification
— Unverified 0BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs Jul 3, 2024 Image Captioning Image Generation
— Unverified 0Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness Jul 2, 2024 Image Captioning Question Answering
— Unverified 0Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention Jun 28, 2024 Caption Generation Decoder
— Unverified 0Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review Jun 28, 2024 Active Learning Image Captioning
— Unverified 0Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Jun 28, 2024 Image Captioning
— Unverified 0MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment Jun 28, 2024 Answer Generation Image Captioning
Code Code Available 1Towards Temporal Change Explanations from Bi-Temporal Satellite Images Jun 27, 2024 Image Captioning
— Unverified 0RAVEN: Multitask Retrieval Augmented Vision-Language Learning Jun 27, 2024 Image Captioning RAG
— Unverified 0MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data Jun 26, 2024 Decoder GPU
— Unverified 0Enhancing Scientific Figure Captioning Through Cross-modal Learning Jun 24, 2024 Diversity Image Captioning
— Unverified 0Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? Jun 20, 2024 Caption Generation Hallucination
— Unverified 0Reinforcing Pre-trained Models Using Counterfactual Images Jun 19, 2024 Classification counterfactual
— Unverified 0VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding Jun 18, 2024 Image Captioning Question Answering
Code Code Available 2Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning? Jun 18, 2024 Attribute Hallucination
— Unverified 0MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models Jun 17, 2024 Benchmarking Fact Checking
Code Code Available 1LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning Jun 17, 2024 Image Captioning Question Answering
— Unverified 0From Pixels to Prose: A Large Dataset of Dense Image Captions Jun 14, 2024 Image Captioning
— Unverified 0OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst Jun 14, 2024 Image Captioning Language Modeling
— Unverified 0ImageNet3D: Towards General-Purpose Object-Level 3D Understanding Jun 13, 2024 Image Captioning Linear Probing Object-Level 3D Awareness
Code Code Available 1Towards Vision-Language Geo-Foundation Model: A Survey Jun 13, 2024 Earth Observation Image Captioning
Code Code Available 2Yo'LLaVA: Your Personalized Language and Vision Assistant Jun 13, 2024 Image Captioning Question Answering
Code Code Available 2Translating speech with just images Jun 11, 2024 Image Captioning Translation
Code Code Available 0FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model Jun 10, 2024 Image Captioning
Code Code Available 1From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks Jun 4, 2024 Image Captioning Language Modelling
Code Code Available 2Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Jun 4, 2024 Image Captioning Retrieval
Code Code Available 0Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language Models Jun 3, 2024 Image Captioning Language Modelling
Code Code Available 2Image Captioning via Dynamic Path Customization Jun 1, 2024 Diversity Image Captioning
Code Code Available 0DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration Jun 1, 2024 Caption Generation Image Captioning
— Unverified 0Image captioning in different languages May 31, 2024 Image Captioning Position
— Unverified 0RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection May 30, 2024 Image Captioning Image Inpainting
Code Code Available 1OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation May 30, 2024 3D Instance Segmentation 3D Open-Vocabulary Instance Segmentation
— Unverified 0MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification May 29, 2024 Hallucination Image Captioning
— Unverified 0Multi-Modal Generative Embedding Model May 29, 2024 Caption Generation Cross-Modal Retrieval
— Unverified 0Benchmarking and Improving Detail Image Caption May 29, 2024 Benchmarking Image Captioning
Code Code Available 2Text-only Synthesis for Image Captioning May 28, 2024 Image Captioning Language Modelling
— Unverified 0RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness May 27, 2024 Hallucination Image Captioning
Code Code Available 11How Culturally Aware are Vision-Language Models? May 24, 2024 Image Captioning
— Unverified 0LG-VQ: Language-Guided Codebook Learning May 23, 2024 Image Captioning Image Generation
— Unverified 0A Survey on Vision-Language-Action Models for Embodied AI May 23, 2024 Image Captioning Instruction Following
Code Code Available 4CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models May 22, 2024 Benchmarking Hallucination
— Unverified 0Class-Conditional self-reward mechanism for improved Text-to-Image models May 22, 2024 Image Captioning object-detection
Code Code Available 0Towards Retrieval-Augmented Architectures for Image Captioning May 21, 2024 Image Captioning Language Modeling
— Unverified 0UniRAG: Universal Retrieval Augmentation for Large Vision Language Models May 16, 2024 Image Captioning Image Generation
Code Code Available 1Chameleon: Mixed-Modal Early-Fusion Foundation Models May 16, 2024 Image Captioning Image Generation
Code Code Available 7Contextual Emotion Recognition using Large Vision Language Models May 14, 2024 Decision Making Emotion Recognition
— Unverified 0Boostlet.js: Image processing plugins for the web via JavaScript injection May 13, 2024 Data Visualization Image Captioning
Code Code Available 1