SOTAVerified

Caption Generation

Papers

Showing 110 of 310 papers

TitleStatusHype
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language ModelsCode4
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real WorldCode2
SonicVerse: Multi-Task Learning for Music Feature-Informed CaptioningCode2
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual FusionCode2
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language ModelsCode2
PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceCode2
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingCode2
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language ModelsCode2
MeaCap: Memory-Augmented Zero-shot Image CaptioningCode2
Segment and Caption AnythingCode2
Show:102550
← PrevPage 1 of 31Next →

No leaderboard results yet.