SOTAVerified

Caption Generation

Papers

Showing 125 of 310 papers

TitleStatusHype
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language ModelsCode4
MeaCap: Memory-Augmented Zero-shot Image CaptioningCode2
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and CaptioningCode2
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsCode2
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language ModelsCode2
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingCode2
Segment and Caption AnythingCode2
PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceCode2
SonicVerse: Multi-Task Learning for Music Feature-Informed CaptioningCode2
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual FusionCode2
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real WorldCode2
Fine-grained Image Captioning with CLIP RewardCode2
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language ModelsCode2
Human-like Controllable Image Captioning with Verb-specific Semantic RolesCode1
HCQA @ Ego4D EgoSchema Challenge 2024Code1
Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer NetworkCode1
GL-RG: Global-Local Representation Granularity for Video CaptioningCode1
BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in MemesCode1
Grad-CAM++: Improved Visual Explanations for Deep Convolutional NetworksCode1
Improving Image Captioning with Better Use of CaptionsCode1
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive PruningCode1
End-to-End Dense Video Captioning with Parallel DecodingCode1
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance SegmentationCode1
Belief Revision based Caption Re-ranker with Visual Semantic InformationCode1
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change CaptioningCode1
Show:102550
← PrevPage 1 of 13Next →

No leaderboard results yet.