SOTAVerified

Caption Generation

Papers

Showing 110 of 310 papers

TitleStatusHype
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language ModelsCode4
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingCode2
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language ModelsCode2
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual FusionCode2
MeaCap: Memory-Augmented Zero-shot Image CaptioningCode2
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and CaptioningCode2
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsCode2
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real WorldCode2
Fine-grained Image Captioning with CLIP RewardCode2
PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceCode2
Show:102550
← PrevPage 1 of 31Next →

No leaderboard results yet.