SOTAVerified

Caption Generation

Papers

Showing 201250 of 310 papers

TitleStatusHype
Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation0
Structural and Functional Decomposition for Personality Image Captioning in a Communication Game0
StyleNet: Generating Attractive Visual Captions With Styles0
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models0
Temporal Knowledge-Aware Image Captioning0
Temporal Object Captioning for Street Scene Videos from LiDAR Tracks0
THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS0
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation0
The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge0
The Use of Object Labels and Spatial Prepositions as Keywords in a Web-Retrieval-Based Image Caption Generation System0
Time Series Language Model for Descriptive Caption Generation0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
Topic Scene Graph Generation by Attention Distillation From Caption0
TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning0
Uncertainty-Aware Image Captioning0
Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing0
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning0
Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards0
UNISON: Unpaired Cross-lingual Image Captioning0
ViCo: Engaging Video Comment Generation with Human Preference Rewards0
Video Caption Dataset for Describing Human Actions in Japanese0
Video Captioning in Compressed Video0
Video Captioning with Guidance of Multimodal Latent Topics0
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives0
Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning0
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation0
WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset0
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models0
Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching0
What is not where: the challenge of integrating spatial representations into deep learning architectures0
Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned0
XMeCap: Meme Caption Generation with Sub-Image Adaptability0
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation0
LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models0
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training0
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival0
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning0
MAMS: Model-Agnostic Module Selection Framework for Video Captioning0
MAT: A Multimodal Attentive Translator for Image Captioning0
Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing0
Medical Image Captioning via Generative Pretrained Transformers0
MICap: A Unified Model for Identity-aware Movie Descriptions0
Mind's Eye: A Recurrent Visual Representation for Image Caption Generation0
Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset0
Multi-modal Dependency Tree for Video Captioning0
Multi-Modal Generative Embedding Model0
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications0
Multi-modal reward for visual relationships-based image captioning0
Multi-Similarity Contrastive Learning0
Multi-task Sequence to Sequence Learning0
Show:102550
← PrevPage 5 of 7Next →

No leaderboard results yet.