Action Recognition In Videos
Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.
Papers
Showing 71–80 of 124 papers
All datasetsJester (Gesture Recognition)PKU-MMDUCF101Something-Something V2Kinetics 400AVA v2.2FS-Something-Something V2-FullFS-Something-Something V2-SmallSports-1MTHUMOS14ActivityNetAVA v2.1
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | CPNet Res34, 5 CP | Val | 96.7 | — | Unverified |
| 2 | STM (Resnet-50, 16 frames) | Val | 96.7 | — | Unverified |
| 3 | MFNet | Val | 96.68 | — | Unverified |
| 4 | DIN | Val | 95.31 | — | Unverified |
| 5 | MultiScale TRN | Val | 95.31 | — | Unverified |
| 6 | convSTAR | Val | 92.7 | — | Unverified |
| 7 | 3D-SqueezeNet | Val | 90.77 | — | Unverified |
| 8 | 3D-ShuffleNetV2 0.25x | Val | 86.91 | — | Unverified |
| 9 | 3D-MobileNetV2 0.2x | Val | 86.43 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DSCNet (RGB + Pose) | X-Sub | 97.4 | — | Unverified |
| 2 | MMNet | X-Sub | 97.4 | — | Unverified |
| 3 | EPAM-Net | X-Sub | 96.2 | — | Unverified |
| 4 | DVANet (RGB only) | X-Sub | 95.8 | — | Unverified |
| 5 | TSMF | X-Sub | 95.8 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | STM (ImageNet+Kinetics pretrain) | 3-fold Accuracy | 96.2 | — | Unverified |
| 2 | 3D-SqueezeNet | 3-fold Accuracy | 74.94 | — | Unverified |
| 3 | 3D-ShuffleNetV2 0.25x | 3-fold Accuracy | 56.52 | — | Unverified |
| 4 | 3D-MobileNetV2 0.2x | 3-fold Accuracy | 55.56 | — | Unverified |
| 5 | Baseline UCF101 | 3-fold Accuracy | 43.9 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | STM (16 frames, ImageNet pretraining) | Top-1 Accuracy | 64.2 | — | Unverified |
| 2 | CPNet Res34, 5 CP | Top-1 Accuracy | 57.65 | — | Unverified |
| 3 | 2-Stream TRN | Top-1 Accuracy | 55.52 | — | Unverified |
| 4 | DIN | Top-1 Accuracy | 34.11 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Florence | Top-1 Accuracy | 86.5 | — | Unverified |
| 2 | ActionCLIP (ViT-B/16) | Top-1 Accuracy | 83.8 | — | Unverified |
| 3 | Frozen Backbone, SwinV2-G-ext22K (Video-Swin) | Top-1 Accuracy | 81.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | YOWO+LFB* | mAP (Val) | 20.2 | — | Unverified |
| 2 | VideoMAE V2 | mAP (Val) | 18.24 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | G-Blend | Video hit@1 | 74.8 | — | Unverified |
| 2 | LSTM +Pretrained on YT-8M | Video hit@1 | 65.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Single-stream R-C3D (two-way buffer) | mAP@0.1 | 54.5 | — | Unverified |
| 2 | Single-stream R-C3D (one-way buffer) | mAP@0.1 | 51.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LSTM + Pretrained on YT-8M | mAP | 75.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | YOWO+LFB* | mAP (Val) | 19.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | STM (ImageNet+Kinetics pretrain) | Average accuracy of 3 splits | 72.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Florence | Top-1 Accuracy | 87.8 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | G-Blend | Clip Hit@1 | 49.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | 2D-3D-Softargmax (RGB only) | Accuracy (CS) | 85.5 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | STM (16 frames, ImageNet pretraining) | Top 1 Accuracy | 50.7 | — | Unverified |