| OmniVec: Learning robust representations with cross modal sharing | Nov 7, 2023 | 3D Point Cloud ClassificationAction Classification | —Unverified | 0 | 0 |
| OmniVL:One Foundation Model for Image-Language and Video-Language Tasks | Sep 15, 2022 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |
| Enhancing Video Transformers for Action Understanding with VLM-aided Training | Mar 24, 2024 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |
| End-to-end Generative Pretraining for Multimodal Video Captioning | Jan 20, 2022 | Action ClassificationDecoder | —Unverified | 0 | 0 |
| End-to-End Fine-Grained Action Segmentation and Recognition Using Conditional Random Field Models and Discriminative Sparse Coding | Jan 29, 2018 | Action ClassificationAction Segmentation | —Unverified | 0 | 0 |
| Open Vocabulary Multi-Label Video Classification | Jul 12, 2024 | Action ClassificationClassification | —Unverified | 0 | 0 |
| Egocentric Audio-Visual Noise Suppression | Nov 7, 2022 | Action ClassificationEvent Detection | —Unverified | 0 | 0 |
| Optimizing Average Precision using Weakly Supervised Data | Jun 1, 2014 | Action ClassificationBinary Classification | —Unverified | 0 | 0 |
| OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition | Mar 30, 2025 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |
| Efficient Two-Stream Motion and Appearance 3D CNNs for Video Classification | Aug 31, 2016 | 3D ArchitectureAction Classification | —Unverified | 0 | 0 |