| Evaluation of Automatic Video Captioning Using Direct Assessment | Oct 29, 2017 | Caption GenerationMachine Translation | —Unverified | 0 |
| Everything is a Video: Unifying Modalities through Next-Frame Prediction | Nov 15, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID | May 26, 2025 | AttributeCaption Generation | —Unverified | 0 |
| NLPHut’s Participation at WAT2021 | Aug 1, 2021 | Caption GenerationImage Captioning | —Unverified | 0 |
| NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge | Mar 28, 2022 | Caption GenerationObject | —Unverified | 0 |
| O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning | Aug 5, 2021 | AttributeCaption Generation | —Unverified | 0 |
| OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts | Jul 22, 2017 | Caption GenerationDescriptive | —Unverified | 0 |
| PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning | Mar 13, 2024 | Caption GenerationDiagnostic | —Unverified | 0 |
| Predicting the Mumble of Wireless Channel with Sequence-to-Sequence Models | Jan 14, 2019 | Caption GenerationLanguage Modeling | —Unverified | 0 |
| Relationship-based Neural Baby Talk | Mar 8, 2021 | Caption GenerationGraph Attention | —Unverified | 0 |