SOTAVerified

Caption Generation

Papers

Showing 5175 of 310 papers

TitleStatusHype
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic DataCode1
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer0
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving0
Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal FusionCode0
See It All: Contextualized Late Aggregation for 3D Dense Captioning0
Bi-directional Contextual Attention for 3D Dense Captioning0
Dual-path Collaborative Generation Network for Emotional Video CaptioningCode0
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language ModelsCode2
XMeCap: Meme Caption Generation with Sub-Image Adaptability0
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing ImagesCode0
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change CaptioningCode1
Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention0
HCQA @ Ego4D EgoSchema Challenge 2024Code1
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?0
Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target TokensCode0
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon CaptioningCode0
DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration0
Multi-Modal Generative Embedding Model0
Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation0
MICap: A Unified Model for Identity-aware Movie Descriptions0
SoccerNet-Echoes: A Soccer Game Audio Commentary DatasetCode1
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation0
BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in MemesCode1
The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge0
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival0
Show:102550
← PrevPage 3 of 13Next →

No leaderboard results yet.