| Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches | Jun 30, 2022 | Caption GenerationVideo Captioning | CodeCode Available | 1 | 5 |
| Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension | Oct 18, 2024 | Caption Generation | CodeCode Available | 1 | 5 |
| Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Jan 2, 2023 | Caption GenerationInstance Segmentation | CodeCode Available | 1 | 5 |
| VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation | May 29, 2025 | Caption GenerationLanguage Modeling | CodeCode Available | 1 | 5 |
| HCQA @ Ego4D EgoSchema Challenge 2024 | Jun 22, 2024 | Caption Generation | CodeCode Available | 1 | 5 |
| Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data | Oct 2, 2024 | Audio ClassificationCaption Generation | CodeCode Available | 1 | 5 |
| TAP: Text-Aware Pre-training for Text-VQA and Text-Caption | Dec 8, 2020 | Caption GenerationLanguage Modeling | CodeCode Available | 1 | 5 |
| Transferable Decoding with Visual Entities for Zero-Shot Image Captioning | Jul 31, 2023 | Caption GenerationHallucination | CodeCode Available | 1 | 5 |
| RECAP: Retrieval-Augmented Audio Captioning | Sep 18, 2023 | AudioCapsAudio captioning | CodeCode Available | 1 | 5 |
| Deep Reinforcement Learning For Sequence to Sequence Models | May 24, 2018 | Abstractive Text SummarizationCaption Generation | CodeCode Available | 1 | 5 |
| Bivariate Beta-LSTM | May 25, 2019 | Caption GenerationDensity Estimation | CodeCode Available | 0 | 5 |
| SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs | Oct 12, 2024 | AudioCapsAudio captioning | CodeCode Available | 0 | 5 |
| DeepDiary: Automatic Caption Generation for Lifelogging Image Streams | Aug 12, 2016 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Event and Entity Extraction from Generated Video Captions | Nov 5, 2022 | Caption GenerationDense Video Captioning | CodeCode Available | 0 | 5 |
| SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning | Jun 6, 2023 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Scalable Bayesian Optimization Using Deep Neural Networks | Feb 19, 2015 | Bayesian OptimizationCaption Generation | CodeCode Available | 0 | 5 |
| Sequence to Sequence -- Video to Text | May 3, 2015 | Caption GenerationLanguage Modeling | CodeCode Available | 0 | 5 |
| Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization | Feb 23, 2023 | Abstractive Text SummarizationCaption Generation | CodeCode Available | 0 | 5 |
| Referring Expression Object Segmentation with Caption-Aware Consistency | Oct 10, 2019 | Caption GenerationObject | CodeCode Available | 0 | 5 |
| Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present | Mar 30, 2018 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| An Empirical Study of Language CNN for Image Captioning | Dec 21, 2016 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Recurrent Neural Network Regularization | Sep 8, 2014 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Cortico-cerebellar networks as decoupling neural interfaces | Oct 21, 2021 | Caption Generation | CodeCode Available | 0 | 5 |
| Rˆ3Net:Relation-embedded Representation Reconstruction Network for Change Captioning | Nov 1, 2021 | Caption GenerationRelation | CodeCode Available | 0 | 5 |
| Expertized Caption Auto-Enhancement for Video-Text Retrieval | Feb 5, 2025 | Caption GenerationRetrieval | CodeCode Available | 0 | 5 |
| Dual-path Collaborative Generation Network for Emotional Video Captioning | Aug 6, 2024 | Caption GenerationVideo Captioning | CodeCode Available | 0 | 5 |
| Pre-gen metrics: Predicting caption quality metrics without generating captions | Oct 12, 2018 | Caption Generation | CodeCode Available | 0 | 5 |
| R^3Net:Relation-embedded Representation Reconstruction Network for Change Captioning | Oct 20, 2021 | Caption GenerationRelation | CodeCode Available | 0 | 5 |
| Multi-source weak supervision for saliency detection | Apr 1, 2019 | Caption GenerationSaliency Detection | CodeCode Available | 0 | 5 |
| Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network | Oct 24, 2021 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 | 5 |
| Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens | Jun 19, 2024 | Caption Generationimage-classification | CodeCode Available | 0 | 5 |
| Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT | Dec 3, 2023 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| An Actor-Critic Algorithm for Sequence Prediction | Jul 24, 2016 | Caption GenerationMachine Translation | CodeCode Available | 0 | 5 |
| Multi-LLM Collaborative Caption Generation in Scientific Documents | Jan 5, 2025 | Caption GenerationImage to text | CodeCode Available | 0 | 5 |
| AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS | Nov 15, 2021 | AudioCapsAudio captioning | CodeCode Available | 0 | 5 |
| Compositional Generalization in Image Captioning | Sep 10, 2019 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Memeify: A Large-Scale Meme Generation System | Oct 27, 2019 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| Efficient Urdu Caption Generation using Attention based LSTM | Aug 2, 2020 | Caption Generation | CodeCode Available | 0 | 5 |
| Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images | Jul 19, 2024 | Caption GenerationContinual Learning | CodeCode Available | 0 | 5 |
| Comparative evaluation of CNN architectures for Image Caption Generation | Feb 23, 2021 | Caption GenerationObject Recognition | CodeCode Available | 0 | 5 |
| Local Information Assisted Attention-free Decoder for Audio Captioning | Jan 10, 2022 | Audio captioningCaption Generation | CodeCode Available | 0 | 5 |
| Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network | Aug 27, 2019 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| Evaluating and interpreting caption prediction for histopathology images | Jul 8, 2020 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal Fusion | Aug 15, 2024 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| DSD: Dense-Sparse-Dense Training for Deep Neural Networks | Jul 15, 2016 | 8kCaption Generation | CodeCode Available | 0 | 5 |
| CNN Fixations: An unraveling approach to visualize the discriminative image regions | Aug 22, 2017 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Journalistic Guidelines Aware News Image Captioning | Sep 7, 2021 | Caption GenerationDescriptive | CodeCode Available | 0 | 5 |
| Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning | Jun 15, 2024 | Caption Generation | CodeCode Available | 0 | 5 |
| CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter | Nov 30, 2021 | Caption GenerationRepresentation Learning | CodeCode Available | 0 | 5 |