SOTAVerified

Caption Generation

Papers

Showing 101150 of 310 papers

TitleStatusHype
Regularizing RNNs for Caption Generation by Reconstructing The Past with The PresentCode0
Twin Networks: Matching the Future for Sequence GenerationCode0
R^3Net:Relation-embedded Representation Reconstruction Network for Change CaptioningCode0
Discriminability objective for training descriptive captionsCode0
Pre-gen metrics: Predicting caption quality metrics without generating captionsCode0
Rˆ3Net:Relation-embedded Representation Reconstruction Network for Change CaptioningCode0
Multi-source weak supervision for saliency detectionCode0
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon CaptioningCode0
Multimodal Preference Data Synthetic Alignment with Reward ModelCode0
Recurrent Neural Network RegularizationCode0
Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal FusionCode0
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image CaptioningCode0
NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation ModelsCode0
Local Information Assisted Attention-free Decoder for Audio CaptioningCode0
Guiding Long-Short Term Memory for Image Caption GenerationCode0
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption GenerationCode0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
Journalistic Guidelines Aware News Image CaptioningCode0
Memeify: A Large-Scale Meme Generation SystemCode0
Event and Entity Extraction from Generated Video CaptionsCode0
GNNFormer: A Graph-based Framework for Cytopathology Report Generation0
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models0
Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models0
Geometry-Entangled Visual Semantic Transformer for Image Captioning0
Geo-Aware Image Caption Generation0
GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning0
Generating Video Description using Sequence-to-sequence Model with Temporal Attention0
Generating image captions with external encyclopedic knowledge0
Deep Learning Approaches on Image Captioning: A Review0
VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools0
End-to-End Video Captioning0
Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks0
Generating captions without looking beyond objects0
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning0
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance0
Deep Bayesian Natural Language Processing0
Bi-directional Contextual Attention for 3D Dense Captioning0
Fusion Models for Improved Visual Captioning0
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving0
An encoder-decoder based framework for hindi image caption generation0
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation0
Fine-Grained Video Captioning through Scene Graph Consolidation0
Cross-modal Coherence Modeling for Caption Generation0
FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning0
Cross-Lingual Image Caption Generation0
Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation0
Feature Fusion Effects of Tensor Product Representation on (De)Compositional Network for Caption Generation for Images0
Fast Image Caption Generation with Position Alignment0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.