SOTAVerified

Caption Generation

Papers

Showing 151200 of 310 papers

TitleStatusHype
Describing Multimedia Content using Attention-based Encoder--Decoder Networks0
Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance0
Caption Generation on Scenes with Seen and Unseen Object Categories0
DiffCap: Exploring Continuous Diffusion on Image Captioning0
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding0
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space0
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?0
Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 20230
Domain Adaptation for Neural Networks by Parameter Augmentation0
DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration0
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits0
Efficient Audio Captioning Transformer with Patchout and Text Guidance0
E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information0
Empirical Analysis of Image Caption Generation using Deep Learning0
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting0
Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning0
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback0
Enhancing Image Captioning with Neural Models0
Entity-aware Image Caption Generation0
Error Causal inference for Multi-Fusion models0
Evaluation of Automatic Video Captioning Using Direct Assessment0
Everything is a Video: Unifying Modalities through Next-Frame Prediction0
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection0
Neural Caption Generation for News Images0
NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID0
NLPHut’s Participation at WAT20210
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge0
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning0
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts0
PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning0
Predicting the Mumble of Wireless Channel with Sequence-to-Sequence Models0
Relationship-based Neural Baby Talk0
REST: REtrieve & Self-Train for generative action recognition0
Rethinking the Form of Latent States in Image Captioning0
Retrieval-Augmented Multimodal Language Modeling0
Review Networks for Caption Generation0
RUC+CMU: System Report for Dense Captioning Events in Videos0
Scene-based Factored Attention for Image Captioning0
Scene Graph Generation for Better Image Captioning?0
Scene Understanding for Autonomous Manipulation with Deep Learning0
See It All: Contextualized Late Aggregation for 3D Dense Captioning0
Seq2Mol: Automatic design of de novo molecules conditioned by the target protein sequences through deep neural networks0
Sequence to Sequence - Video to Text0
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning0
Simultaneous Segmentation and Recognition: Towards more accurate Ego Gesture Recognition0
Skip-Gram − Zipf + Uniform = Vector Additivity0
Social Media Ready Caption Generation for Brands0
Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection0
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning0
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.