SOTAVerified

Caption Generation

Papers

Showing 101125 of 310 papers

TitleStatusHype
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query ResponseCode1
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningCode1
ViCo: Engaging Video Comment Generation with Human Preference Rewards0
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and CaptioningCode2
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsCode2
Transferable Decoding with Visual Entities for Zero-Shot Image CaptioningCode1
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human FeedbackCode0
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes0
Multi-Similarity Contrastive Learning0
Knowledge Distillation for Efficient Audio-Visual Video Captioning0
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure CaptioningCode0
CapText: Large Language Model-based Caption Generation From Image Context and Description0
RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment0
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning0
DiffCap: Exploring Continuous Diffusion on Image Captioning0
Efficient Audio Captioning Transformer with Patchout and Text Guidance0
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models0
Multi-modal reward for visual relationships-based image captioning0
GNNFormer: A Graph-based Framework for Cytopathology Report Generation0
Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text SummarizationCode0
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning0
Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image CaptioningCode0
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance SegmentationCode1
Uncertainty-Aware Image Captioning0
Retrieval-Augmented Multimodal Language Modeling0
Show:102550
← PrevPage 5 of 13Next →

No leaderboard results yet.