SOTAVerified

Caption Generation

Papers

Showing 1120 of 310 papers

TitleStatusHype
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and CaptioningCode2
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsCode2
Fine-grained Image Captioning with CLIP RewardCode2
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality EvaluationCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
VideoMultiAgents: A Multi-Agent Framework for Video Question AnsweringCode1
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic CognitionCode1
Large-scale Pre-training for Grounded Video Caption GenerationCode1
Croc: Pretraining Large Multimodal Models with Cross-Modal ComprehensionCode1
MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based AnnotationsCode1
Show:102550
← PrevPage 2 of 31Next →

No leaderboard results yet.