SOTAVerified

Caption Generation

Papers

Showing 251300 of 310 papers

TitleStatusHype
Everything is a Video: Unifying Modalities through Next-Frame Prediction0
Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces0
Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention0
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer0
FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images0
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech0
Fast Image Caption Generation with Position Alignment0
Feature Fusion Effects of Tensor Product Representation on (De)Compositional Network for Caption Generation for Images0
Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation0
FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning0
Fine-Grained Video Captioning through Scene Graph Consolidation0
Fusion Models for Improved Visual Captioning0
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance0
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning0
Generating captions without looking beyond objects0
Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks0
Generating image captions with external encyclopedic knowledge0
Generating Video Description using Sequence-to-sequence Model with Temporal Attention0
Geo-Aware Image Caption Generation0
Geometry-Entangled Visual Semantic Transformer for Image Captioning0
GNNFormer: A Graph-based Framework for Cytopathology Report Generation0
GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning0
Goal-driven text descriptions for images0
Grounded Video Caption Generation0
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention0
Guiding Attention using Partial-Order Relationships for Image Captioning0
Guiding the Long-Short Term Memory Model for Image Caption Generation0
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning0
Hierarchical LSTMs with Adaptive Attention for Visual Captioning0
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning0
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation0
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification0
Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering0
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers0
Image Caption Generation for Low-Resource Assamese Language0
Image Caption Generation Framework for Assamese News using Attention Mechanism0
Image Captioning using Facial Expression and Attention0
Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding0
Image Captioning with Unseen Objects0
Image Position Prediction in Multimodal Documents0
Image Representations and New Domains in Neural Image Captioning0
Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit0
Improving Image Captioning with Better Use of Caption0
Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models0
Knowledge Distillation for Efficient Audio-Visual Video Captioning0
Knowledge driven Description Synthesis for Floor Plan Interpretation0
Language Production Dynamics with Recurrent Neural Networks0
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images0
Learning a Recurrent Visual Representation for Image Caption Generation0
Learning from Massive Human Videos for Universal Humanoid Pose Control0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.