| LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment | Oct 3, 2023 | Audio ClassificationContrastive Learning | CodeCode Available | 4 | 5 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 | 5 |
| Leveraging Temporal Contextualization for Video Action Recognition | Apr 15, 2024 | Action RecognitionTemporal Action Localization | CodeCode Available | 2 | 5 |
| Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | Jul 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 | 5 |
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | Dec 31, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 | 5 |
| Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications | Mar 3, 2020 | BenchmarkingGeneral Classification | CodeCode Available | 1 | 5 |
| Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification | Mar 29, 2022 | Representation LearningVideo Classification | CodeCode Available | 1 | 5 |
| Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition | Jun 19, 2024 | Action RecognitionSkeleton Based Action Recognition | CodeCode Available | 1 | 5 |
| OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition | Nov 30, 2023 | DescriptiveLanguage Modelling | CodeCode Available | 1 | 5 |
| Actor-agnostic Multi-label Action Recognition with Multi-modal Query | Jul 20, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting | Apr 6, 2023 | Action RecognitionPrompt Learning | CodeCode Available | 1 | 5 |
| MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval | Apr 26, 2022 | Action RecognitionRetrieval | CodeCode Available | 1 | 5 |
| Bridging Video-text Retrieval with Multiple Choice Questions | Jan 13, 2022 | Action RecognitionLinear evaluation | CodeCode Available | 1 | 5 |
| Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP | Dec 13, 2024 | Action RecognitionText Augmentation | CodeCode Available | 1 | 5 |
| MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge | Mar 15, 2023 | Action RecognitionFew-Shot action recognition | CodeCode Available | 1 | 5 |
| Learning Spatiotemporal Features via Video and Text Pair Discrimination | Jan 16, 2020 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| A CLIP-Hitchhiker's Guide to Long Video Retrieval | May 17, 2022 | RetrievalVideo Retrieval | CodeCode Available | 1 | 5 |
| Elaborative Rehearsal for Zero-shot Action Recognition | Aug 5, 2021 | Action RecognitionFew-Shot Learning | CodeCode Available | 1 | 5 |
| EZ-CLIP: Efficient Zeroshot Video Action Recognition | Dec 13, 2023 | Action RecognitionGPU | CodeCode Available | 1 | 5 |
| EVA-CLIP: Improved Training Techniques for CLIP at Scale | Mar 27, 2023 | Image ClassificationRepresentation Learning | CodeCode Available | 1 | 5 |
| ActionCLIP: A New Paradigm for Video Action Recognition | Sep 17, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| Tell me what you see: A zero-shot action recognition method based on natural language descriptions | Dec 18, 2021 | Action RecognitionDescriptive | CodeCode Available | 1 | 5 |
| TDSM: Triplet Diffusion for Skeleton-Text Matching in Zero-Shot Action Recognition | Nov 16, 2024 | Action RecognitionSkeleton Based Action Recognition | CodeCode Available | 1 | 5 |
| Cross-Modal and Hierarchical Modeling of Video and Text | Oct 16, 2018 | Action RecognitionRetrieval | CodeCode Available | 0 | 5 |
| An embarrassingly simple approach to zero-shot learning | Jul 6, 2015 | Domain AdaptationZero-Shot Action Recognition | CodeCode Available | 0 | 5 |
| A New Split for Evaluating True Zero-Shot Action Recognition | Jul 27, 2021 | Action RecognitionFew-Shot action recognition | CodeCode Available | 0 | 5 |
| Evaluation of Output Embeddings for Fine-Grained Image Classification | Sep 30, 2014 | ClassificationFew-Shot Image Classification | CodeCode Available | 0 | 5 |
| FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks | Mar 24, 2022 | Action RecognitionRetrieval | CodeCode Available | 0 | 5 |
| Global Semantic Descriptors for Zero-Shot Action Recognition | Sep 24, 2022 | Action ClassificationAction Recognition | CodeCode Available | 0 | 5 |
| I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs | Jul 17, 2019 | Action RecognitionAttribute | CodeCode Available | 0 | 5 |
| InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation | Jul 13, 2023 | Action RecognitionContrastive Learning | CodeCode Available | 0 | 5 |
| Label-Embedding for Image Classification | Mar 30, 2015 | AttributeClassification | CodeCode Available | 0 | 5 |
| Learning a Deep Embedding Model for Zero-Shot Learning | Nov 15, 2016 | Image CaptioningSentence | CodeCode Available | 0 | 5 |
| LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition | Nov 27, 2024 | Action RecognitionGraph Attention | CodeCode Available | 0 | 5 |
| Orthogonal Temporal Interpolation for Zero-Shot Video Recognition | Aug 14, 2023 | Video RecognitionZero-Shot Action Recognition | CodeCode Available | 0 | 5 |
| Out-of-Distribution Detection for Generalized Zero-Shot Action Recognition | Apr 18, 2019 | Action RecognitionAction Recognition In Videos | CodeCode Available | 0 | 5 |
| Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions | Mar 28, 2022 | Action RecognitionZero-Shot Action Recognition | CodeCode Available | 0 | 5 |
| Telling Stories for Common Sense Zero-Shot Action Recognition | Sep 29, 2023 | Action RecognitionArticles | CodeCode Available | 0 | 5 |
| Zero-Shot Action Recognition from Diverse Object-Scene Compositions | Oct 26, 2021 | Action RecognitionObject | CodeCode Available | 0 | 5 |
| End-to-End Semantic Video Transformer for Zero-Shot Action Recognition | Mar 10, 2022 | Action RecognitionTemporal Action Localization | CodeCode Available | 0 | 5 |
| Learning Using Privileged Information for Zero-Shot Action Recognition | Jun 17, 2022 | Action RecognitionHallucination | —Unverified | 0 | 0 |
| CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition | Jan 18, 2021 | Action RecognitionClustering | —Unverified | 0 | 0 |
| A Cross-Dataset Study for Text-based 3D Human Motion Retrieval | May 27, 2024 | Action RecognitionRetrieval | —Unverified | 0 | 0 |
| Can masking background and object reduce static bias for zero-shot action recognition? | Jan 22, 2025 | Action RecognitionZero-Shot Action Recognition | —Unverified | 0 | 0 |
| An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition | Jun 2, 2024 | Action RecognitionEnsemble Learning | —Unverified | 0 | 0 |
| Alternative Semantic Representations for Zero-Shot Human Action Recognition | Jun 28, 2017 | Action RecognitionTemporal Action Localization | —Unverified | 0 | 0 |
| Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models | Jul 15, 2022 | Optical Flow EstimationVideo Classification | —Unverified | 0 | 0 |
| Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation | Nov 26, 2016 | Action RecognitionData Augmentation | —Unverified | 0 | 0 |
| Natural Language Descriptions for Human Activities in Video Streams | Sep 1, 2017 | Action RecognitionLanguage Modeling | —Unverified | 0 | 0 |
| Objects2action: Classifying and localizing actions without any video example | Oct 23, 2015 | AttributeObject | —Unverified | 0 | 0 |