SOTAVerified

Caption Generation

Papers

Showing 101150 of 310 papers

TitleStatusHype
Regularizing RNNs for Caption Generation by Reconstructing The Past with The PresentCode0
R^3Net:Relation-embedded Representation Reconstruction Network for Change CaptioningCode0
Discriminability objective for training descriptive captionsCode0
Pre-gen metrics: Predicting caption quality metrics without generating captionsCode0
Rˆ3Net:Relation-embedded Representation Reconstruction Network for Change CaptioningCode0
Multi-source weak supervision for saliency detectionCode0
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon CaptioningCode0
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human FeedbackCode0
Multimodal Preference Data Synthetic Alignment with Reward ModelCode0
Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal FusionCode0
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image CaptioningCode0
NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation ModelsCode0
Local Information Assisted Attention-free Decoder for Audio CaptioningCode0
Guiding Long-Short Term Memory for Image Caption GenerationCode0
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption GenerationCode0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
Journalistic Guidelines Aware News Image CaptioningCode0
Memeify: A Large-Scale Meme Generation SystemCode0
Event and Entity Extraction from Generated Video CaptionsCode0
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models0
Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models0
Geometry-Entangled Visual Semantic Transformer for Image Captioning0
Geo-Aware Image Caption Generation0
Generating Video Description using Sequence-to-sequence Model with Temporal Attention0
GNNFormer: A Graph-based Framework for Cytopathology Report Generation0
GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning0
Generating image captions with external encyclopedic knowledge0
Deep Learning Approaches on Image Captioning: A Review0
VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools0
End-to-End Video Captioning0
Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks0
Generating captions without looking beyond objects0
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning0
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance0
Deep Bayesian Natural Language Processing0
Bi-directional Contextual Attention for 3D Dense Captioning0
Fusion Models for Improved Visual Captioning0
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving0
An encoder-decoder based framework for hindi image caption generation0
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation0
Fine-Grained Video Captioning through Scene Graph Consolidation0
Cross-modal Coherence Modeling for Caption Generation0
FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning0
Cross-Lingual Image Caption Generation0
Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation0
Feature Fusion Effects of Tensor Product Representation on (De)Compositional Network for Caption Generation for Images0
Fast Image Caption Generation with Position Alignment0
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.