SOTAVerified

Caption Generation

Papers

Showing 4150 of 310 papers

TitleStatusHype
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language ModelsCode2
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains0
Everything is a Video: Unifying Modalities through Next-Frame Prediction0
Grounded Video Caption Generation0
PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceCode2
Croc: Pretraining Large Multimodal Models with Cross-Modal ComprehensionCode1
MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based AnnotationsCode1
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMsCode0
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning0
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingCode2
Show:102550
← PrevPage 5 of 31Next →

No leaderboard results yet.