| LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment | Oct 3, 2023 | Audio ClassificationContrastive Learning | CodeCode Available | 4 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 |
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | Dec 31, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | Jul 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Leveraging Temporal Contextualization for Video Action Recognition | Apr 15, 2024 | Action RecognitionTemporal Action Localization | CodeCode Available | 2 |
| EVA-CLIP: Improved Training Techniques for CLIP at Scale | Mar 27, 2023 | Image ClassificationRepresentation Learning | CodeCode Available | 1 |
| Elaborative Rehearsal for Zero-shot Action Recognition | Aug 5, 2021 | Action RecognitionFew-Shot Learning | CodeCode Available | 1 |
| A CLIP-Hitchhiker's Guide to Long Video Retrieval | May 17, 2022 | RetrievalVideo Retrieval | CodeCode Available | 1 |
| Bridging Video-text Retrieval with Multiple Choice Questions | Jan 13, 2022 | Action RecognitionLinear evaluation | CodeCode Available | 1 |
| ActionCLIP: A New Paradigm for Video Action Recognition | Sep 17, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |