| Dual-Forecaster: A Multimodal Time Series Model Integrating Descriptive and Predictive Texts | May 2, 2025 | DescriptiveTime Series | —Unverified | 0 |
| A Comprehensive Analysis of Real-World Image Captioning and Scene Identification | Aug 5, 2023 | DescriptiveImage Captioning | —Unverified | 0 |
| A Thorough Review on Recent Deep Learning Methodologies for Image Captioning | Jul 28, 2021 | Caption GenerationDescriptive | —Unverified | 0 |
| Dual-Level Decoupled Transformer for Video Captioning | May 6, 2022 | DescriptiveSentence | —Unverified | 0 |
| A Descriptive Study of Metaphors and Frames in the Multilingual Shared Annotation Task | Jul 1, 2022 | Descriptive | —Unverified | 0 |
| DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension | Jan 1, 2025 | DescriptiveReferring Expression | —Unverified | 0 |
| Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model | Apr 30, 2024 | DescriptiveGesture Generation | —Unverified | 0 |
| Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields | Oct 13, 2017 | DescriptiveDynamic Texture Recognition | —Unverified | 0 |
| DynaMiTe: A Dynamic Local Motion Model with Temporal Constraints for Robust Real-Time Feature Matching | Jul 31, 2020 | Camera Pose EstimationDescriptive | —Unverified | 0 |
| End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding | Mar 15, 2022 | DescriptiveRepresentation Learning | —Unverified | 0 |