SOTAVerified

Caption Generation

Papers

Showing 51100 of 310 papers

TitleStatusHype
Large-scale Pre-training for Grounded Video Caption GenerationCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
Improving Image Captioning with Better Use of CaptionsCode1
SoccerNet-Echoes: A Soccer Game Audio Commentary DatasetCode1
Grad-CAM++: Improved Visual Explanations for Deep Convolutional NetworksCode1
HCQA @ Ego4D EgoSchema Challenge 2024Code1
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change CaptioningCode1
Injecting Semantic Concepts into End-to-End Image CaptioningCode1
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query ResponseCode1
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic DataCode1
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models0
Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models0
End-to-End Video Captioning0
Deep Learning Approaches on Image Captioning: A Review0
VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools0
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation0
Bi-directional Contextual Attention for 3D Dense Captioning0
Deep Bayesian Natural Language Processing0
An encoder-decoder based framework for hindi image caption generation0
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving0
GNNFormer: A Graph-based Framework for Cytopathology Report Generation0
Cross-modal Coherence Modeling for Caption Generation0
Cross-Lingual Image Caption Generation0
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving0
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains0
A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism0
Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention0
Analysis of Convolutional Decoder for Image Caption Generation0
Controlled Caption Generation for Images Through Adversarial Attacks0
Evaluation of Automatic Video Captioning Using Direct Assessment0
3G structure for image caption generation0
Geo-Aware Image Caption Generation0
Geometry-Entangled Visual Semantic Transformer for Image Captioning0
GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning0
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning0
Entity-aware Image Caption Generation0
Enhancing Image Captioning with Neural Models0
Generating captions without looking beyond objects0
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback0
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation0
Error Causal inference for Multi-Fusion models0
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance0
Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning0
Everything is a Video: Unifying Modalities through Next-Frame Prediction0
Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces0
Cortico-cerebellar networks as decoupled neural interfaces0
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting0
Fusion Models for Improved Visual Captioning0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.