SOTAVerified

Caption Generation

Papers

Showing 101150 of 310 papers

TitleStatusHype
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation0
FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images0
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech0
Fast Image Caption Generation with Position Alignment0
Learning a Recurrent Visual Representation for Image Caption Generation0
Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation0
Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning0
FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning0
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting0
Fine-Grained Video Captioning through Scene Graph Consolidation0
Learning from Massive Human Videos for Universal Humanoid Pose Control0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
LLMs in Political Science: Heralding a New Era of Visual Analysis0
Fusion Models for Improved Visual Captioning0
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance0
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning0
Generating captions without looking beyond objects0
Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks0
Empirical Analysis of Image Caption Generation using Deep Learning0
Generating Video Description using Sequence-to-sequence Model with Temporal Attention0
E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information0
Geometry-Entangled Visual Semantic Transformer for Image Captioning0
Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models0
Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding0
GNNFormer: A Graph-based Framework for Cytopathology Report Generation0
GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning0
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images0
Automated Audio Captioning: An Overview of Recent Progress and New Challenges0
Knowledge driven Description Synthesis for Floor Plan Interpretation0
Efficient Audio Captioning Transformer with Patchout and Text Guidance0
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits0
Common Subspace for Model and Similarity: Phrase Learning for Caption Generation From Images0
Language Production Dynamics with Recurrent Neural Networks0
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation0
Clue: Cross-modal Coherence Modeling for Caption Generation0
DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration0
Domain Adaptation for Neural Networks by Parameter Augmentation0
Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 20230
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?0
Image Captioning using Facial Expression and Attention0
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation0
Image Caption Generation Framework for Assamese News using Attention Mechanism0
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning0
Image Caption Generation for Low-Resource Assamese Language0
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers0
Chittron: An Automatic Bangla Image Captioning System0
Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit0
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space0
Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding0
Improving Image Captioning with Better Use of Caption0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.