SOTAVerified

Caption Generation

Papers

Showing 91100 of 310 papers

TitleStatusHype
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding0
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains0
Everything is a Video: Unifying Modalities through Next-Frame Prediction0
Grounded Video Caption Generation0
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs0
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning0
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer0
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving0
Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal FusionCode0
See It All: Contextualized Late Aggregation for 3D Dense Captioning0
Show:102550
← PrevPage 10 of 31Next →

No leaderboard results yet.