Jewelry Recognition via Encoder-Decoder Models Jan 15, 2024 Decoder Image Captioning
— Unverified 0What Else Would I Like? A User Simulator using Alternatives for Improved Evaluation of Fashion Conversational Recommendation Systems Jan 11, 2024 Conversational Recommendation Image Captioning
— Unverified 0Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding Jan 9, 2024 Image Captioning image-classification
— Unverified 0MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning Jan 5, 2024 Decoder Image Captioning
— Unverified 0Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding Jan 5, 2024 Image Captioning Machine Translation
Code Code Available 0Object-oriented backdoor attack against image captioning Jan 5, 2024 Backdoor Attack Image Captioning
— Unverified 0SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment Jan 4, 2024 Image Captioning image-classification
— Unverified 0Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training Jan 4, 2024 Descriptive Image Captioning
Code Code Available 1Social Media Ready Caption Generation for Brands Jan 3, 2024 Caption Generation Image Captioning
— Unverified 0GPT-4V(ision) is a Generalist Web Agent, if Grounded Jan 3, 2024 Image Captioning Question Answering
Code Code Available 4Learning Vision from Models Rivals Learning Vision from Data Dec 28, 2023 Contrastive Learning Image Captioning
Code Code Available 2TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Dec 28, 2023 Computational Efficiency Image Captioning
Code Code Available 3Cycle-Consistency Learning for Captioning and Grounding Dec 23, 2023 Image Captioning Visual Grounding
— Unverified 0LLM4VG: Large Language Models Evaluation for Video Grounding Dec 21, 2023 Image Captioning Video Grounding
— Unverified 0VCoder: Versatile Vision Encoders for Multimodal Large Language Models Dec 21, 2023 Image Captioning Image Generation
Code Code Available 2p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models Dec 17, 2023 Image Captioning Question Answering
Code Code Available 0Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models Dec 15, 2023 Image Captioning In-Context Learning
Code Code Available 1Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning Dec 15, 2023 Factual Inconsistency Detection in Chart Captioning Image Captioning
Code Code Available 1VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation Dec 14, 2023 Image Captioning Image Generation
Code Code Available 1Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis Dec 14, 2023 Image Captioning Scene Understanding
— Unverified 0Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning Dec 14, 2023 cross-modal alignment Decoder
— Unverified 0A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Dec 14, 2023 Image Captioning
Code Code Available 1Synocene, Beyond the Anthropocene: De-Anthropocentralising Human-Nature-AI Interaction Dec 13, 2023 Chatbot Image Captioning
— Unverified 0Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data Dec 11, 2023 Image Captioning Image-text Retrieval
— Unverified 0Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Dec 11, 2023 Image Captioning Question Answering
Code Code Available 1Unifying Text, Tables, and Images for Multimodal Question Answering Dec 10, 2023 Image Captioning Question Answering
Code Code Available 0Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects Dec 8, 2023 Image Captioning object-detection
— Unverified 0User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning Dec 8, 2023 Image Captioning Language Modeling
— Unverified 0PixLore: A Dataset-driven Approach to Rich Image Captioning Dec 8, 2023 GPU Image Captioning
Code Code Available 0Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos Dec 7, 2023 Diagnostic Image Captioning
Code Code Available 1On the Robustness of Large Multimodal Models Against Image Adversarial Attacks Dec 6, 2023 Image Captioning image-classification
— Unverified 0Mitigating Open-Vocabulary Caption Hallucinations Dec 6, 2023 Diversity Hallucination
Code Code Available 1Towards More Unified In-context Visual Understanding Dec 5, 2023 Decoder Image Captioning
— Unverified 0CLAMP: Contrastive LAnguage Model Prompt-tuning Dec 4, 2023 Contrastive Learning Image Captioning
— Unverified 0Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT Dec 3, 2023 Caption Generation Decoder
Code Code Available 0Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning Dec 2, 2023 Causal Language Modeling Contrastive Learning
Code Code Available 1RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback Dec 1, 2023 Hallucination Image Captioning
Code Code Available 6Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts Dec 1, 2023 Chart Question Answering Document AI
— Unverified 0Video Summarization: Towards Entity-Aware Captions Dec 1, 2023 Image Captioning Video Captioning
Code Code Available 0Enhancing Image Captioning with Neural Models Dec 1, 2023 Caption Generation Image Captioning
— Unverified 0InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation Nov 30, 2023 Image Captioning Referring Expression
Code Code Available 0Contrastive Vision-Language Alignment Makes Efficient Instruction Learner Nov 29, 2023 Contrastive Learning Image Captioning
Code Code Available 1A natural language processing-based approach: mapping human perception by understanding deep semantic features in street view images Nov 29, 2023 Image Captioning Language Modelling
— Unverified 0Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models Nov 28, 2023 Image Captioning Image-text matching
Code Code Available 1LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models Nov 28, 2023 Image Captioning Question Answering
Code Code Available 2MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training Nov 28, 2023 Image Captioning Transfer Learning
— Unverified 0EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension Nov 27, 2023 Image Captioning Object
— Unverified 0DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism Nov 25, 2023 Caption Generation Denoising
— Unverified 0Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder Nov 15, 2023 Decoder Image Captioning
— Unverified 0Improving Image Captioning via Predicting Structured Concepts Nov 14, 2023 Image Captioning
— Unverified 0