CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts May 9, 2024 Image Captioning Instruction Following
Code Code Available 2Using Machine Translation to Augment Multilingual Classification May 9, 2024 Classification Image Captioning
— Unverified 0LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model May 3, 2024 Image Captioning Instruction Following
Code Code Available 0A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection) May 2, 2024 Acoustic Scene Classification Event Detection
— Unverified 0Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores May 2, 2024 Image Captioning Re-Ranking
Code Code Available 0Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis May 1, 2024 Image Captioning Question Answering
— Unverified 0What Makes for Good Image Captions? May 1, 2024 Hallucination Image Captioning
— Unverified 0Compressed Image Captioning using CNN-based Encoder-Decoder Framework Apr 28, 2024 Decoder Image Captioning
— Unverified 0Semi-supervised Text-based Person Search Apr 28, 2024 Image Captioning Person Search
— Unverified 0Learning text-to-video retrieval from image captioning Apr 26, 2024 Image Captioning Image Retrieval
— Unverified 0OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search Apr 25, 2024 Entity Embeddings Image Captioning
Code Code Available 2Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers Apr 21, 2024 Diagnostic Image Captioning
Code Code Available 0The Solution for the CVPR2024 NICE Image Captioning Challenge Apr 19, 2024 Image Captioning Retrieval
— Unverified 0MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering Apr 19, 2024 Chatbot Domain Adaptation
— Unverified 0LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? Apr 16, 2024 Image Captioning Image Generation
Code Code Available 1ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis Apr 15, 2024 Descriptive Image Captioning
Code Code Available 0Bridging Vision and Language Spaces with Assignment Prediction Apr 15, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 0On Speculative Decoding for Multimodal Large Language Models Apr 13, 2024 Image Captioning Language Modeling
— Unverified 0FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning Apr 12, 2024 Federated Learning Image Captioning
Code Code Available 0Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Apr 12, 2024 Image Captioning Question Answering
Code Code Available 1View Selection for 3D Captioning via Diffusion Ranking Apr 11, 2024 3D Object Captioning Hallucination
Code Code Available 3Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation Apr 6, 2024 Image Captioning Instance Segmentation
— Unverified 0CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Apr 4, 2024 Attribute Image Captioning
Code Code Available 2Would Deep Generative Models Amplify Bias in Future Models? Apr 4, 2024 Image Captioning Image Generation
— Unverified 0Jump Self-attention: Capturing High-order Statistics in Transformers Apr 3, 2024 Image Captioning Natural Language Understanding
— Unverified 0Harnessing the Power of Large Vision Language Models for Synthetic Image Detection Apr 3, 2024 Image Captioning Synthetic Image Detection
Code Code Available 1Disentangled Pre-training for Human-Object Interaction Detection Apr 2, 2024 Action Recognition Decoder
Code Code Available 1Bi-LORA: A Vision-Language Approach for Synthetic Image Detection Apr 2, 2024 Binary Classification Image Captioning
Code Code Available 1VLRM: Vision-Language Models act as Reward Models for Image Captioning Apr 2, 2024 Image Captioning reinforcement-learning
— Unverified 0Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning Apr 1, 2024 Image Captioning Instruction Following
Code Code Available 0LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction Apr 1, 2024 Image Captioning Instruction Following
— Unverified 0VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis Mar 29, 2024 Hallucination Image Captioning
Code Code Available 2A Review of Multi-Modal Large Language and Vision Models Mar 28, 2024 Image Captioning Prompt Engineering
— Unverified 0LocCa: Visual Pretraining with Location-aware Captioners Mar 28, 2024 Decoder Image Captioning
Code Code Available 0Text Data-Centric Image Captioning with Interactive Prompts Mar 28, 2024 Image Captioning
— Unverified 0Semantic Map-based Generation of Navigation Instructions Mar 28, 2024 Image Captioning
Code Code Available 0A Survey on Large Language Models from Concept to Implementation Mar 27, 2024 Chatbot Image Captioning
— Unverified 0Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction Mar 27, 2024 Image Captioning Language Modeling
Code Code Available 2Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary Study Mar 26, 2024 Decoder Image Captioning
— Unverified 0Semi-Supervised Image Captioning Considering Wasserstein Graph Matching Mar 26, 2024 Data Augmentation Graph Matching
— Unverified 0Visual Hallucination: Definition, Quantification, and Prescriptive Remediations Mar 26, 2024 Hallucination Image Captioning
— Unverified 0The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge Mar 26, 2024 Caption Generation Image Captioning
— Unverified 0Image Captioning in news report scenario Mar 24, 2024 Image Captioning Recommendation Systems
— Unverified 0Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual content Mar 23, 2024 Descriptive Image Captioning
Code Code Available 0A Multimodal Approach for Cross-Domain Image Retrieval Mar 22, 2024 Image Captioning Image Retrieval
— Unverified 0MyVLM: Personalizing VLMs for User-Specific Queries Mar 21, 2024 Image Captioning Language Modelling
— Unverified 0Inserting Faces inside Captions: Image Captioning with Attention Guided Merging Mar 20, 2024 Image Captioning Retrieval
— Unverified 0Improved Baselines for Data-efficient Perceptual Augmentation of LLMs Mar 20, 2024 Audio captioning Image Captioning
— Unverified 0VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning Mar 19, 2024 Benchmarking Image Captioning
Code Code Available 2Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition Mar 19, 2024 Dense Captioning Image Captioning
— Unverified 0