| LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment | Oct 3, 2023 | Audio ClassificationContrastive Learning | CodeCode Available | 4 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 |
| Leveraging Temporal Contextualization for Video Action Recognition | Apr 15, 2024 | Action RecognitionTemporal Action Localization | CodeCode Available | 2 |
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | Dec 31, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | Jul 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP | Dec 13, 2024 | Action RecognitionText Augmentation | CodeCode Available | 1 |
| TDSM: Triplet Diffusion for Skeleton-Text Matching in Zero-Shot Action Recognition | Nov 16, 2024 | Action RecognitionSkeleton Based Action Recognition | CodeCode Available | 1 |
| Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition | Jun 19, 2024 | Action RecognitionSkeleton Based Action Recognition | CodeCode Available | 1 |
| EZ-CLIP: Efficient Zeroshot Video Action Recognition | Dec 13, 2023 | Action RecognitionGPU | CodeCode Available | 1 |
| OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition | Nov 30, 2023 | DescriptiveLanguage Modelling | CodeCode Available | 1 |