| Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting | Apr 6, 2023 | Action RecognitionPrompt Learning | CodeCode Available | 1 |
| Synthetic Sample Selection for Generalized Zero-Shot Learning | Apr 6, 2023 | feature selectionGeneralized Zero-Shot Learning | —Unverified | 0 |
| VicTR: Video-conditioned Text Representations for Activity Recognition | Apr 5, 2023 | Action ClassificationActivity Recognition | —Unverified | 0 |
| EVA-CLIP: Improved Training Techniques for CLIP at Scale | Mar 27, 2023 | Image ClassificationRepresentation Learning | CodeCode Available | 1 |
| MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge | Mar 15, 2023 | Action RecognitionFew-Shot action recognition | CodeCode Available | 1 |
| Improving Zero-Shot Action Recognition using Human Instruction with Text Description | Jan 21, 2023 | Action RecognitionSentence | —Unverified | 0 |
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | Dec 31, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | Dec 9, 2022 | Question AnsweringRetrieval | —Unverified | 0 |
| REST: REtrieve & Self-Train for generative action recognition | Sep 29, 2022 | Action RecognitionCaption Generation | —Unverified | 0 |
| Global Semantic Descriptors for Zero-Shot Action Recognition | Sep 24, 2022 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 |
| Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models | Jul 15, 2022 | Optical Flow EstimationVideo Classification | —Unverified | 0 |
| Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | Jul 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Learning Using Privileged Information for Zero-Shot Action Recognition | Jun 17, 2022 | Action RecognitionHallucination | —Unverified | 0 |
| A CLIP-Hitchhiker's Guide to Long Video Retrieval | May 17, 2022 | RetrievalVideo Retrieval | CodeCode Available | 1 |
| Cross-modal Representation Learning for Zero-shot Action Recognition | May 3, 2022 | Action RecognitionRepresentation Learning | —Unverified | 0 |
| MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval | Apr 26, 2022 | Action RecognitionRetrieval | CodeCode Available | 1 |
| Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification | Mar 29, 2022 | Representation LearningVideo Classification | CodeCode Available | 1 |
| Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions | Mar 28, 2022 | Action RecognitionZero-Shot Action Recognition | CodeCode Available | 0 |
| FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks | Mar 24, 2022 | Action RecognitionRetrieval | CodeCode Available | 0 |
| End-to-End Semantic Video Transformer for Zero-Shot Action Recognition | Mar 10, 2022 | Action RecognitionTemporal Action Localization | CodeCode Available | 0 |
| Universal Prototype Transport for Zero-Shot Action Recognition and Localization | Mar 8, 2022 | Action RecognitionObject | —Unverified | 0 |
| Bridging Video-text Retrieval with Multiple Choice Questions | Jan 13, 2022 | Action RecognitionLinear evaluation | CodeCode Available | 1 |
| Tell me what you see: A zero-shot action recognition method based on natural language descriptions | Dec 18, 2021 | Action RecognitionDescriptive | CodeCode Available | 1 |
| Reformulating Zero-shot Action Recognition for Multi-label Actions | Dec 1, 2021 | Action ClassificationAction Detection | —Unverified | 0 |