SOTAVerified

Zero-Shot Action Recognition

Papers

Showing 125 of 83 papers

TitleStatusHype
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentCode4
Expanding Language-Image Pretrained Models for General Video RecognitionCode3
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsCode2
Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionCode2
Leveraging Temporal Contextualization for Video Action RecognitionCode2
Learning Spatiotemporal Features via Video and Text Pair DiscriminationCode1
Vita-CLIP: Video and text adaptive CLIP via Multimodal PromptingCode1
EVA-CLIP: Improved Training Techniques for CLIP at ScaleCode1
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action RecognitionCode1
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalCode1
Tell me what you see: A zero-shot action recognition method based on natural language descriptionsCode1
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic ApplicationsCode1
EZ-CLIP: Efficient Zeroshot Video Action RecognitionCode1
Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIPCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Alignment-Uniformity aware Representation Learning for Zero-shot Video ClassificationCode1
ActionCLIP: A New Paradigm for Video Action RecognitionCode1
TDSM: Triplet Diffusion for Skeleton-Text Matching in Zero-Shot Action RecognitionCode1
A CLIP-Hitchhiker's Guide to Long Video RetrievalCode1
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language KnowledgeCode1
Actor-agnostic Multi-label Action Recognition with Multi-modal QueryCode1
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video RecognitionCode1
Elaborative Rehearsal for Zero-shot Action RecognitionCode1
A New Split for Evaluating True Zero-Shot Action RecognitionCode0
An embarrassingly simple approach to zero-shot learningCode0
Show:102550
← PrevPage 1 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OTI(ViT-L/14)Top-1 Accuracy92.8Unverified
2IMP-MoE-LTop-1 Accuracy91.5Unverified
3MOV (ViT-L/14)Top-1 Accuracy87.1Unverified
4VideoCoCaTop-1 Accuracy86.6Unverified
5BIKETop-1 Accuracy86.6Unverified
6Text4VisTop-1 Accuracy85.8Unverified
7TC-CLIPTop-1 Accuracy85.4Unverified
8EVA-CLIP-E/14+Top-1 Accuracy83.1Unverified
9MOV (ViT-B/16)Top-1 Accuracy82.6Unverified
10OSTTop-1 Accuracy79.7Unverified
#ModelMetricClaimedVerifiedStatus
1MOV (ViT-L/14)Top-1 Accuracy64.7Unverified
2OTI(ViT-L/14)Top-1 Accuracy64Unverified
3BIKETop-1 Accuracy61.4Unverified
4MOV (ViT-B/16)Top-1 Accuracy60.8Unverified
5IMP-MoE-LTop-1 Accuracy59.1Unverified
6VideoCoCaTop-1 Accuracy58.7Unverified
7Text4VisTop-1 Accuracy58.4Unverified
8TC-CLIPTop-1 Accuracy56Unverified
9OSTTop-1 Accuracy55.9Unverified
10MAXITop-1 Accuracy52.3Unverified
#ModelMetricClaimedVerifiedStatus
1TC-CLIPTop-1 Accuracy78.1Unverified
2IMP-MoE-LTop-1 Accuracy76.8Unverified
3OSTTop-1 Accuracy75.1Unverified
4MAXITop-1 Accuracy71.6Unverified
5OTI(ViT-L/14)Top-1 Accuracy70.6Unverified
6VideoCoCaTop-1 Accuracy70.1Unverified
7Text4VisTop-1 Accuracy68.9Unverified
8BIKETop-1 Accuracy68.5Unverified
9X-CLIPTop-1 Accuracy65.2Unverified
10LanguageBindTop-1 Accuracy64.1Unverified
#ModelMetricClaimedVerifiedStatus
1SPOTTop-1 Accuracy68.7Unverified
2CLASTERTop-1 Accuracy68.4Unverified
3ER-ZSARTop-1 Accuracy60.2Unverified
4ZSECOCTop-1 Accuracy59.8Unverified
5TS-GCNTop-1 Accuracy56.5Unverified
6SJE(Atrribute)Top-1 Accuracy47.5Unverified
7MTETop-1 Accuracy44.3Unverified
8ESZSLTop-1 Accuracy39.6Unverified
9SJE(Word Embedding)Top-1 Accuracy28.6Unverified
#ModelMetricClaimedVerifiedStatus
1BIKETop-1 Accuracy86.2Unverified
2Text4VisTop-1 Accuracy84.6Unverified
3LoCATe-GATTop-1 Accuracy73.8Unverified
4ResTTop-1 Accuracy32.5Unverified
5E2ETop-1 Accuracy26.6Unverified
#ModelMetricClaimedVerifiedStatus
1MSQNetmAP35.59Unverified
2VideoCoCamAP25.8Unverified
3MAXImAP23.8Unverified
4CLIP-Hitchhiker (ViT-B/16, 32 frames)mAP21.1Unverified
#ModelMetricClaimedVerifiedStatus
1MSQNetAccuracy75.33Unverified