SOTAVerified

Caption Generation

Papers

Showing 101125 of 310 papers

TitleStatusHype
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback0
FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images0
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech0
Fast Image Caption Generation with Position Alignment0
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation0
Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation0
Hierarchical LSTMs with Adaptive Attention for Visual Captioning0
FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning0
Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning0
Fine-Grained Video Captioning through Scene Graph Consolidation0
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting0
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning0
Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering0
Fusion Models for Improved Visual Captioning0
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance0
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning0
Generating captions without looking beyond objects0
Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks0
Generating image captions with external encyclopedic knowledge0
Generating Video Description using Sequence-to-sequence Model with Temporal Attention0
Empirical Analysis of Image Caption Generation using Deep Learning0
Geometry-Entangled Visual Semantic Transformer for Image Captioning0
Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models0
E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information0
Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding0
Show:102550
← PrevPage 5 of 13Next →

No leaderboard results yet.