| VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training | Mar 23, 2022 | 4kAction Classification | CodeCode Available | 3 | 5 |
| VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking | Mar 29, 2023 | Action ClassificationAction Recognition | CodeCode Available | 2 | 5 |
| Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning | Oct 29, 2020 | Contrastive LearningData Augmentation | CodeCode Available | 1 | 5 |
| Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting | Jun 18, 2021 | Action RecognitionAction Recognition In Videos | CodeCode Available | 1 | 5 |
| Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework | Aug 6, 2020 | Action Recognition In VideosContrastive Learning | CodeCode Available | 1 | 5 |
| Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition | Dec 7, 2021 | Action RecognitionContrastive Learning | CodeCode Available | 1 | 5 |
| Contrastive Multiview Coding | Jun 13, 2019 | Contrastive LearningSelf-Supervised Action Recognition | CodeCode Available | 1 | 5 |
| EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens | Nov 19, 2022 | Action RecognitionObject State Change Classification | CodeCode Available | 1 | 5 |
| Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning | Jan 2, 2020 | Action RecognitionRepresentation Learning | CodeCode Available | 1 | 5 |
| Temporally Coherent Embeddings for Self-Supervised Video Representation Learning | Mar 21, 2020 | Action RecognitionMetric Learning | CodeCode Available | 1 | 5 |
| TCLR: Temporal Contrastive Learning for Video Representation | Jan 20, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning | Nov 25, 2022 | Action ClassificationClassification | CodeCode Available | 1 | 5 |
| SpeedNet: Learning the Speediness in Videos | Apr 13, 2020 | Action RecognitionBinary Classification | CodeCode Available | 1 | 5 |
| Learning the Predictability of the Future | Jun 19, 2021 | Representation LearningSelf-Supervised Action Recognition | CodeCode Available | 1 | 5 |
| Masked Motion Encoding for Self-Supervised Video Representation Learning | Oct 12, 2022 | MMEOptical Flow Estimation | CodeCode Available | 1 | 5 |
| Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning | Dec 8, 2022 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| Part Aware Contrastive Learning for Self-Supervised Action Recognition | May 1, 2023 | Action RecognitionContrastive Learning | CodeCode Available | 1 | 5 |
| RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning | Oct 27, 2020 | Action RecognitionRepresentation Learning | CodeCode Available | 1 | 5 |
| Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences | Feb 17, 2023 | Action RecognitionContrastive Learning | CodeCode Available | 1 | 5 |
| Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity | Nov 9, 2021 | Audio ClassificationRetrieval | CodeCode Available | 1 | 5 |
| Self-supervised Co-training for Video Representation Learning | Oct 19, 2020 | Action RecognitionContrastive Learning | CodeCode Available | 1 | 5 |
| Audio-Visual Instance Discrimination with Cross-Modal Agreement | Apr 27, 2020 | Action RecognitionAudio Classification | CodeCode Available | 1 | 5 |
| Spatiotemporal Contrastive Video Representation Learning | Aug 9, 2020 | Action RecognitionContrastive Learning | CodeCode Available | 1 | 5 |
| Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning | Jun 1, 2020 | Action RecognitionDecoder | CodeCode Available | 1 | 5 |
| Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics | Apr 7, 2019 | Action RecognitionGeneral Classification | CodeCode Available | 1 | 5 |
| SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos | Jun 25, 2022 | Action ClassificationClustering | CodeCode Available | 1 | 5 |
| Broaden Your Views for Self-Supervised Video Learning | Mar 30, 2021 | Audio ClassificationOptical Flow Estimation | CodeCode Available | 1 | 5 |
| Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning | Dec 21, 2022 | Contrastive LearningLinear evaluation | CodeCode Available | 1 | 5 |
| Unsupervised Representation Learning by Sorting Sequences | Aug 3, 2017 | Action Recognitionimage-classification | CodeCode Available | 0 | 5 |
| Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition | Jul 15, 2023 | Action RecognitionContrastive Learning | CodeCode Available | 0 | 5 |
| Self-Supervised Learning by Cross-Modal Audio-Video Clustering | Nov 28, 2019 | Action RecognitionAudio Classification | CodeCode Available | 0 | 5 |
| Self-Supervised MultiModal Versatile Networks | Jun 29, 2020 | Action Recognition In VideosAudio Classification | CodeCode Available | 0 | 5 |
| Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video | Mar 5, 2020 | Action RecognitionRepresentation Learning | CodeCode Available | 0 | 5 |
| A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning | Apr 29, 2021 | Representation LearningSelf-Supervised Action Recognition | CodeCode Available | 0 | 5 |
| Video Representation Learning by Dense Predictive Coding | Sep 10, 2019 | Action RecognitionRepresentation Learning | CodeCode Available | 0 | 5 |
| Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles | Nov 24, 2018 | Action RecognitionColorization | —Unverified | 0 | 0 |
| Shuffle and Learn: Unsupervised Learning using Temporal Order Verification | Mar 28, 2016 | Action RecognitionPose Estimation | —Unverified | 0 | 0 |
| Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction | Jun 1, 2019 | Action RecognitionRetrieval | —Unverified | 0 | 0 |
| Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking | Oct 28, 2019 | Action RecognitionFuture prediction | —Unverified | 0 | 0 |
| Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction | Nov 28, 2018 | Action RecognitionPrediction | —Unverified | 0 | 0 |
| Self-Supervised Learning via multi-Transformation Classification for Action Recognition | Feb 20, 2021 | Action RecognitionClassification | —Unverified | 0 | 0 |
| Learning and Using the Arrow of Time | Jun 1, 2018 | Action RecognitionSelf-Supervised Action Recognition | —Unverified | 0 | 0 |
| Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training | Apr 27, 2022 | Action RecognitionContrastive Learning | —Unverified | 0 | 0 |
| Generating Videos with Scene Dynamics | Sep 8, 2016 | Action ClassificationFuture prediction | —Unverified | 0 | 0 |
| Self-supervised Contrastive Learning for Audio-Visual Action Recognition | Apr 28, 2022 | Action RecognitionContrastive Learning | —Unverified | 0 | 0 |
| Feature Hallucination for Self-supervised Action Recognition | Jun 25, 2025 | Action RecognitionHallucination | —Unverified | 0 | 0 |
| Evolving Losses for Unsupervised Video Representation Learning | Feb 26, 2020 | Action RecognitionFew-Shot Learning | —Unverified | 0 | 0 |
| Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization | Jun 30, 2018 | Action RecognitionAudio Classification | —Unverified | 0 | 0 |
| A Large-Scale Analysis on Self-Supervised Video Representation Learning | Jun 9, 2023 | BenchmarkingRepresentation Learning | —Unverified | 0 | 0 |
| Self-Supervised Video Representation Learning With Odd-One-Out Networks | Nov 21, 2016 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |