SOTAVerified

Caption Generation

Papers

Showing 5160 of 310 papers

TitleStatusHype
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic DataCode1
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer0
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving0
Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal FusionCode0
See It All: Contextualized Late Aggregation for 3D Dense Captioning0
Bi-directional Contextual Attention for 3D Dense Captioning0
Dual-path Collaborative Generation Network for Emotional Video CaptioningCode0
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language ModelsCode2
XMeCap: Meme Caption Generation with Sub-Image Adaptability0
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing ImagesCode0
Show:102550
← PrevPage 6 of 31Next →

No leaderboard results yet.