| Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Mar 11, 2024 | 2D Human Pose EstimationAction Recognition | —Unverified | 0 |
| OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog | Feb 20, 2024 | ObjectObject Tracking | —Unverified | 0 |
| Semi-supervised Active Learning for Video Action Detection | Dec 12, 2023 | Action DetectionActive Learning | CodeCode Available | 0 |
| Deep-Learning-Assisted Analysis of Cataract Surgery Videos | Dec 10, 2023 | Decision MakingDeep Learning | —Unverified | 0 |
| TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding | Dec 4, 2023 | Dense CaptioningHighlight Detection | CodeCode Available | 2 |
| Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives | Sep 21, 2023 | Action LocalizationAction Recognition | —Unverified | 0 |
| Cross-Video Contextual Knowledge Exploration and Exploitation for Ambiguity Reduction in Weakly Supervised Temporal Action Localization | Aug 24, 2023 | Action LocalizationContrastive Learning | —Unverified | 0 |
| UnLoc: A Unified Framework for Video Localization Tasks | Aug 21, 2023 | Action SegmentationMoment Retrieval | CodeCode Available | 0 |
| VideoGLUE: Video General Understanding Evaluation of Foundation Models | Jul 6, 2023 | Action RecognitionTemporal Localization | CodeCode Available | 0 |
| Dense Video Object Captioning from Disjoint Supervision | Jun 20, 2023 | ObjectSentence | CodeCode Available | 0 |