EVLM: An Efficient Vision-Language Model for Visual Understanding Jul 19, 2024 Image Captioning Language Modeling
— Unverified 0LookupViT: Compressing visual information to a limited number of tokens Jul 17, 2024 Image Captioning image-classification
— Unverified 0Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights Jul 16, 2024 Image Captioning Multimodal Reasoning
Code Code Available 0CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation Jul 16, 2024 controllable image captioning Data Augmentation
Code Code Available 0Leveraging image captions for selective whole slide image annotation Jul 8, 2024 Diversity Image Captioning
Code Code Available 0Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes Jul 4, 2024 Image Captioning image-classification
— Unverified 0BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs Jul 3, 2024 Image Captioning Image Generation
— Unverified 0Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness Jul 2, 2024 Image Captioning Question Answering
— Unverified 0Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Jun 28, 2024 Image Captioning
— Unverified 0Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review Jun 28, 2024 Active Learning Image Captioning
— Unverified 0Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention Jun 28, 2024 Caption Generation Decoder
— Unverified 0RAVEN: Multitask Retrieval Augmented Vision-Language Learning Jun 27, 2024 Image Captioning RAG
— Unverified 0Towards Temporal Change Explanations from Bi-Temporal Satellite Images Jun 27, 2024 Image Captioning
— Unverified 0MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data Jun 26, 2024 Decoder GPU
— Unverified 0Enhancing Scientific Figure Captioning Through Cross-modal Learning Jun 24, 2024 Diversity Image Captioning
— Unverified 0Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? Jun 20, 2024 Caption Generation Hallucination
— Unverified 0Reinforcing Pre-trained Models Using Counterfactual Images Jun 19, 2024 Classification counterfactual
— Unverified 0Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning? Jun 18, 2024 Attribute Hallucination
— Unverified 0LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning Jun 17, 2024 Image Captioning Question Answering
— Unverified 0OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst Jun 14, 2024 Image Captioning Language Modeling
— Unverified 0From Pixels to Prose: A Large Dataset of Dense Image Captions Jun 14, 2024 Image Captioning
— Unverified 0Translating speech with just images Jun 11, 2024 Image Captioning Translation
Code Code Available 0Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Jun 4, 2024 Image Captioning Retrieval
Code Code Available 0Image Captioning via Dynamic Path Customization Jun 1, 2024 Diversity Image Captioning
Code Code Available 0DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration Jun 1, 2024 Caption Generation Image Captioning
— Unverified 0Image captioning in different languages May 31, 2024 Image Captioning Position
— Unverified 0OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation May 30, 2024 3D Instance Segmentation 3D Open-Vocabulary Instance Segmentation
— Unverified 0Multi-Modal Generative Embedding Model May 29, 2024 Caption Generation Cross-Modal Retrieval
— Unverified 0MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification May 29, 2024 Hallucination Image Captioning
— Unverified 0Text-only Synthesis for Image Captioning May 28, 2024 Image Captioning Language Modelling
— Unverified 0How Culturally Aware are Vision-Language Models? May 24, 2024 Image Captioning
— Unverified 0LG-VQ: Language-Guided Codebook Learning May 23, 2024 Image Captioning Image Generation
— Unverified 0CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models May 22, 2024 Benchmarking Hallucination
— Unverified 0Class-Conditional self-reward mechanism for improved Text-to-Image models May 22, 2024 Image Captioning object-detection
Code Code Available 0Towards Retrieval-Augmented Architectures for Image Captioning May 21, 2024 Image Captioning Language Modeling
— Unverified 0Contextual Emotion Recognition using Large Vision Language Models May 14, 2024 Decision Making Emotion Recognition
— Unverified 0Using Machine Translation to Augment Multilingual Classification May 9, 2024 Classification Image Captioning
— Unverified 0LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model May 3, 2024 Image Captioning Instruction Following
Code Code Available 0Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores May 2, 2024 Image Captioning Re-Ranking
Code Code Available 0A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection) May 2, 2024 Acoustic Scene Classification Event Detection
— Unverified 0Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis May 1, 2024 Image Captioning Question Answering
— Unverified 0What Makes for Good Image Captions? May 1, 2024 Hallucination Image Captioning
— Unverified 0Semi-supervised Text-based Person Search Apr 28, 2024 Image Captioning Person Search
— Unverified 0Compressed Image Captioning using CNN-based Encoder-Decoder Framework Apr 28, 2024 Decoder Image Captioning
— Unverified 0Learning text-to-video retrieval from image captioning Apr 26, 2024 Image Captioning Image Retrieval
— Unverified 0Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers Apr 21, 2024 Diagnostic Image Captioning
Code Code Available 0MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering Apr 19, 2024 Chatbot Domain Adaptation
— Unverified 0The Solution for the CVPR2024 NICE Image Captioning Challenge Apr 19, 2024 Image Captioning Retrieval
— Unverified 0ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis Apr 15, 2024 Descriptive Image Captioning
Code Code Available 0Bridging Vision and Language Spaces with Assignment Prediction Apr 15, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 0