SOTAVerified

Caption Generation

Papers

Showing 1120 of 310 papers

TitleStatusHype
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives0
VideoMultiAgents: A Multi-Agent Framework for Video Question AnsweringCode1
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention0
Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering0
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images0
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic CognitionCode1
Show:102550
← PrevPage 2 of 31Next →

No leaderboard results yet.