| InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | Mar 22, 2024 | Action ClassificationAction Recognition | CodeCode Available | 7 | 5 |
| VideoMamba: State Space Model for Efficient Video Understanding | Mar 11, 2024 | Action ClassificationMamba | CodeCode Available | 5 | 5 |
| mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video | Feb 1, 2023 | Action ClassificationImage Classification | CodeCode Available | 4 | 5 |
| InternVideo: General Video Foundation Models via Generative and Discriminative Learning | Dec 6, 2022 | Action ClassificationAction Recognition | CodeCode Available | 4 | 5 |
| Towards Universal Soccer Video Understanding | Dec 2, 2024 | Action ClassificationSports Understanding | CodeCode Available | 3 | 5 |
| VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training | Mar 23, 2022 | 4kAction Classification | CodeCode Available | 3 | 5 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 | 5 |
| ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities | May 18, 2023 | 1 Image, 2*2 StitchiAction Classification | CodeCode Available | 3 | 5 |
| MARLIN: Masked Autoencoder for facial video Representation LearnINg | Nov 12, 2022 | Action ClassificationAttribute | CodeCode Available | 2 | 5 |
| Learning Video Representations from Large Language Models | Dec 8, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 | 5 |