| Whether and When does Endoscopy Domain Pretraining Make Sense? | Mar 30, 2023 | Action Triplet DetectionSurgical phase recognition | CodeCode Available | 1 |
| Streaming Video Model | Mar 30, 2023 | Action RecognitionDecoder | CodeCode Available | 1 |
| TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition | Mar 28, 2023 | Action RecognitionOptical Flow Estimation | CodeCode Available | 1 |
| Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos | Mar 22, 2023 | Representation LearningSentence | CodeCode Available | 1 |
| Dual-path Adaptation from Image to Video Transformers | Mar 17, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization | Mar 16, 2023 | Action LocalizationTemporal Action Localization | CodeCode Available | 1 |
| Localizing Moments in Long Video Via Multimodal Guidance | Feb 26, 2023 | Natural Language Moment RetrievalNatural Language Visual Grounding | CodeCode Available | 1 |
| Test of Time: Instilling Video-Language Models with a Sense of Time | Jan 5, 2023 | Video-Text RetrievalVideo Understanding | CodeCode Available | 1 |
| Boosting Single Image Super-Resolution via Partial Channel Shifting | Jan 1, 2023 | DiversityImage Super-Resolution | CodeCode Available | 1 |
| Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning | Jan 1, 2023 | Contrastive LearningRepresentation Learning | CodeCode Available | 1 |
| Towards Smooth Video Composition | Dec 14, 2022 | Image Generationsingle-image-generation | CodeCode Available | 1 |
| MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing | Nov 28, 2022 | Activity RecognitionFew Shot Action Recognition | CodeCode Available | 1 |
| Contrastive Masked Autoencoders for Self-Supervised Video Hashing | Nov 21, 2022 | DecoderRetrieval | CodeCode Available | 1 |
| EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens | Nov 19, 2022 | Action RecognitionObject State Change Classification | CodeCode Available | 1 |
| InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges | Nov 17, 2022 | Future Hand PredictionMoment Queries | CodeCode Available | 1 |
| VTC: Improving Video-Text Retrieval with User Comments | Oct 19, 2022 | Representation LearningRetrieval | CodeCode Available | 1 |
| EgoTaskQA: Understanding Human Tasks in Egocentric Videos | Oct 8, 2022 | Action Localizationcounterfactual | CodeCode Available | 1 |
| SoccerNet 2022 Challenges Results | Oct 5, 2022 | Action SpottingCamera Calibration | CodeCode Available | 1 |
| Learning Transferable Spatiotemporal Representations from Natural Script Knowledge | Sep 30, 2022 | DescriptiveRepresentation Learning | CodeCode Available | 1 |
| Streaming Video Temporal Action Segmentation In Real Time | Sep 28, 2022 | Action SegmentationLanguage Modelling | CodeCode Available | 1 |
| Panoramic Vision Transformer for Saliency Detection in 360° Videos | Sep 19, 2022 | Saliency DetectionSaliency Prediction | CodeCode Available | 1 |
| EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography | Sep 9, 2022 | Video Understanding | CodeCode Available | 1 |
| DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations | Aug 17, 2022 | Camera CalibrationInstance Segmentation | CodeCode Available | 1 |
| Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding | Jul 30, 2022 | point cloud video understandingVideo Understanding | CodeCode Available | 1 |
| Static and Dynamic Concepts for Self-supervised Video Representation Learning | Jul 26, 2022 | DiversityRepresentation Learning | CodeCode Available | 1 |