SOTAVerified

Caption Generation

Papers

Showing 101150 of 310 papers

TitleStatusHype
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query ResponseCode1
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningCode1
ViCo: Engaging Video Comment Generation with Human Preference Rewards0
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and CaptioningCode2
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsCode2
Transferable Decoding with Visual Entities for Zero-Shot Image CaptioningCode1
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human FeedbackCode0
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes0
Multi-Similarity Contrastive Learning0
Knowledge Distillation for Efficient Audio-Visual Video Captioning0
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure CaptioningCode0
CapText: Large Language Model-based Caption Generation From Image Context and Description0
RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment0
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning0
DiffCap: Exploring Continuous Diffusion on Image Captioning0
Efficient Audio Captioning Transformer with Patchout and Text Guidance0
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models0
Multi-modal reward for visual relationships-based image captioning0
GNNFormer: A Graph-based Framework for Cytopathology Report Generation0
Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text SummarizationCode0
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning0
Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image CaptioningCode0
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance SegmentationCode1
Uncertainty-Aware Image Captioning0
Retrieval-Augmented Multimodal Language Modeling0
Visual Commonsense-aware Representation Network for Video CaptioningCode1
Event and Entity Extraction from Generated Video CaptionsCode0
Image Caption Generation for Low-Resource Assamese Language0
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive PruningCode1
Generating image captions with external encyclopedic knowledge0
REST: REtrieve & Self-Train for generative action recognition0
Medical Image Captioning via Generative Pretrained Transformers0
Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned0
Belief Revision based Caption Re-ranker with Visual Semantic InformationCode1
Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using PatchesCode1
Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces0
Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding0
Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset0
Fine-grained Image Captioning with CLIP RewardCode2
GL-RG: Global-Local Representation Granularity for Video CaptioningCode1
Automated Audio Captioning: An Overview of Recent Progress and New Challenges0
Spatiality-guided Transformer for 3D Dense Captioning on Point CloudsCode1
Guiding Attention using Partial-Order Relationships for Image Captioning0
NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation ModelsCode0
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge0
A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism0
Deep Learning Approaches on Image Captioning: A Review0
Local Information Assisted Attention-free Decoder for Audio CaptioningCode0
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning0
Injecting Semantic Concepts into End-to-End Image CaptioningCode1
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.