Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition Mar 19, 2024 Dense Captioning Image Captioning
— Unverified 0As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks? Mar 19, 2024 Adversarial Attack Image Captioning
— Unverified 0Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 1TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling Mar 18, 2024 Image Captioning Visual Storytelling
— Unverified 0Few-Shot VQA with Frozen LLMs: A Tale of Two Approaches Mar 17, 2024 Image Captioning Question Answering
— Unverified 0Does the Performance of Text-to-Image Retrieval Models Generalize Beyond Captions-as-a-Query? Mar 15, 2024 Descriptive Image Captioning
Code Code Available 0Can We Talk Models Into Seeing the World Differently? Mar 14, 2024 Image Captioning Image Classification
Code Code Available 1Leveraging LLMs for On-the-Fly Instruction Guided Image Editing Mar 12, 2024 Image Captioning
Code Code Available 0Beyond Text: Frozen Large Language Models in Visual Signal Comprehension Mar 12, 2024 Deblurring Decoder
Code Code Available 2Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings Mar 12, 2024 Image Captioning Image Generation
— Unverified 0A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes Mar 12, 2024 3D dense captioning Dense Captioning
— Unverified 0Transformer based Multitask Learning for Image Captioning and Object Detection Mar 10, 2024 Autonomous Navigation Image Captioning
— Unverified 0PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Mar 7, 2024 4k Image Captioning
Code Code Available 5MeaCap: Memory-Augmented Zero-shot Image Captioning Mar 6, 2024 Caption Generation Image Captioning
Code Code Available 2The Case for Evaluating Multimodal Translation Models on Text Datasets Mar 5, 2024 Descriptive Image Captioning
— Unverified 0Differentially Private Representation Learning via Image Captioning Mar 4, 2024 Image Captioning Representation Learning
Code Code Available 1VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT Mar 4, 2024 Image Captioning Zero-shot Moment Retrieval
Code Code Available 2What Is Missing in Multilingual Visual Reasoning and How to Fix It Mar 3, 2024 Image Captioning Visual Reasoning
Code Code Available 0Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset Mar 1, 2024 Image Captioning Image Generation
Code Code Available 0EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning Feb 29, 2024 Image Captioning Sentence
— Unverified 0Polos: Multimodal Metric Learning from Human Feedback for Image Captioning Feb 28, 2024 Contrastive Learning Image Captioning
Code Code Available 1Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction Feb 28, 2024 Image Captioning Language Modeling
— Unverified 0ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks Feb 27, 2024 Domain Generalization Image Captioning
— Unverified 0Fine-tuning CLIP Text Encoders with Two-step Paraphrasing Feb 23, 2024 Image Captioning Image Retrieval
— Unverified 0Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning Feb 21, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 1Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions Feb 20, 2024 Image Captioning Question Answering
— Unverified 0Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models Feb 19, 2024 Image Captioning Question Answering
— Unverified 0AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization Feb 19, 2024 Adversarial Attack Image Captioning
Code Code Available 0IRR: Image Review Ranking Framework for Evaluating Vision-Language Models Feb 19, 2024 Diversity Image Captioning
— Unverified 0Cobra Effect in Reference-Free Image Captioning Metrics Feb 18, 2024 Image Captioning
Code Code Available 0ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models Feb 17, 2024 Earth Observation Image Captioning
Code Code Available 1Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models Feb 13, 2024 Code Generation HumanEval
— Unverified 0Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models Feb 13, 2024 Image Captioning Image to text
— Unverified 0Multimodal Learned Sparse Retrieval for Image Suggestion Feb 12, 2024 Image Captioning Retrieval
— Unverified 0Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers Feb 9, 2024 Image Captioning Semantic Segmentation
— Unverified 0Large Language Models for Captioning and Retrieving Remote Sensing Images Feb 9, 2024 Cross-Modal Retrieval Decoder
— Unverified 0Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images Feb 8, 2024 Image Captioning Question Answering
Code Code Available 0CIC: A Framework for Culturally-Aware Image Captioning Feb 8, 2024 Descriptive Image Captioning
— Unverified 0Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing Feb 8, 2024 Image Captioning TAG
— Unverified 0GPTs Are Multilingual Annotators for Sequence Generation Tasks Feb 8, 2024 Image Captioning
Code Code Available 0Text or Image? What is More Important in Cross-Domain Generalization Capabilities of Hate Meme Detection Models? Feb 7, 2024 Domain Generalization Image Captioning
— Unverified 0Image captioning for Brazilian Portuguese using GRIT model Feb 7, 2024 Image Captioning model
— Unverified 0Text-Guided Image Clustering Feb 5, 2024 Clustering Image Captioning
Code Code Available 1PICS: Pipeline for Image Captioning and Search Feb 1, 2024 Asset Management Image Captioning
— Unverified 0SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling Feb 1, 2024 Diversity Image Captioning
— Unverified 0Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data Jan 31, 2024 Benchmarking Change Detection
Code Code Available 0SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval Jan 24, 2024 Benchmarking Image Captioning
Code Code Available 1Veagle: Advancements in Multimodal Representation Learning Jan 18, 2024 Image Captioning Language Modelling
Code Code Available 1COCO is "ALL'' You Need for Visual Instruction Fine-tuning Jan 17, 2024 All Image Captioning
— Unverified 0KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain Jan 16, 2024 Image Captioning Vietnamese Image Captioning
— Unverified 0