Action Detection
Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.
Papers
Showing 1–10 of 817 papers
All datasetsUCF101-24J-HMDBCharadesMulti-THUMOSUCF SportsTHUMOS' 14MultiSportsTSUTTStroke-21 ME21TTStroke-21 ME22MultiTHUMOS
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | STAR/L | Frame-mAP 0.5 | 90.3 | — | Unverified |
| 2 | SiA | Frame-mAP 0.5 | 88.5 | — | Unverified |
| 3 | YOWO + LFB | Frame-mAP 0.5 | 87.3 | — | Unverified |
| 4 | HIT | Frame-mAP 0.5 | 84.8 | — | Unverified |
| 5 | HISAN (ResNet-101 + FPN) | Video-mAP 0.2 | 82.3 | — | Unverified |
| 6 | YOWO | Frame-mAP 0.5 | 80.4 | — | Unverified |
| 7 | Two-in-one Two Stream | Video-mAP 0.2 | 78.48 | — | Unverified |
| 8 | MOC | Frame-mAP 0.5 | 77.8 | — | Unverified |
| 9 | Faster-RCNN + two-stream I3D conv | Frame-mAP 0.5 | 76.3 | — | Unverified |
| 10 | Two-in-one | Video-mAP 0.2 | 75.48 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SiA | Frame-mAP 0.5 | 88.5 | — | Unverified |
| 2 | HISAN (ResNet-101 + FPN) | Video-mAP 0.2 | 87.59 | — | Unverified |
| 3 | HIT | Frame-mAP 0.5 | 83.8 | — | Unverified |
| 4 | HISAN (VGG-16) | Frame-mAP 0.5 | 76.72 | — | Unverified |
| 5 | DTS | Video-mAP 0.2 | 76.1 | — | Unverified |
| 6 | YOWO + LFB | Frame-mAP 0.5 | 75.7 | — | Unverified |
| 7 | Two-in-one Two Stream | Video-mAP 0.5 | 74.74 | — | Unverified |
| 8 | YOWO | Frame-mAP 0.5 | 74.4 | — | Unverified |
| 9 | MOC | Frame-mAP 0.5 | 74 | — | Unverified |
| 10 | Faster-RCNN + two-stream I3D conv | Frame-mAP 0.5 | 73.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | TTM | mAP | 28.79 | — | Unverified |
| 2 | CTRN | mAP | 27.8 | — | Unverified |
| 3 | Coarse-Fine Networks (w/ self-supervised detection pretraining) | mAP | 26.95 | — | Unverified |
| 4 | UniMD+Sync. (RGB+Flow) | mAP | 26.53 | — | Unverified |
| 5 | PDAN (RGB+Flow) | mAP | 26.5 | — | Unverified |
| 6 | PAT | mAP | 26.5 | — | Unverified |
| 7 | MS-TCT (RGB only) | mAP | 25.4 | — | Unverified |
| 8 | 3D ResNet-50 + super-events pretrained on AViD | mAP | 25.2 | — | Unverified |
| 9 | Coarse-Fine Networks | mAP | 25.1 | — | Unverified |
| 10 | MLAD (RGB + Flow) | mAP | 23.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MLAD | mAP | 51.5 | — | Unverified |
| 2 | CTRN | mAP | 51.2 | — | Unverified |
| 3 | PDAN | mAP | 47.6 | — | Unverified |
| 4 | TGM | mAP | 46.4 | — | Unverified |
| 5 | MS-TCT (RGB only) | mAP | 43.1 | — | Unverified |
| 6 | I3D + our super-event | mAP | 36.4 | — | Unverified |
| 7 | Two-stream + LSTM | mAP | 28.1 | — | Unverified |
| 8 | Two-stream | mAP | 27.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Two-in-one Two Stream | Video-mAP 0.5 | 96.52 | — | Unverified |
| 2 | DTS | Video-mAP 0.2 | 94.3 | — | Unverified |
| 3 | Two-in-one | Video-mAP 0.5 | 92.74 | — | Unverified |
| 4 | T-CNN | Frame-mAP 0.5 | 86.7 | — | Unverified |
| 5 | MR-TS R-CNN | Frame-mAP 0.5 | 84.52 | — | Unverified |
| 6 | TS R-CNN | Frame-mAP 0.5 | 82.3 | — | Unverified |
| 7 | Action Tubes | Frame-mAP 0.5 | 68.1 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MAT (Ours) Trans | mAP | 71.6 | — | Unverified |
| 2 | TadML-two stream | mAP | 59.7 | — | Unverified |
| 3 | MAT (ours) | mAP | 58.2 | — | Unverified |
| 4 | TadML-rgb | mAP | 53.46 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MS-TCT | Frame-mAP | 33.7 | — | Unverified |
| 2 | PDAN | Frame-mAP | 32.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | STCNN | IoU | 0.14 | — | Unverified |
| 2 | Two Stream Network | IoU | 0.07 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | STCNN-V2 (Vote decision) | IoU | 0.52 | — | Unverified |
| 2 | RGB and PRGB | IoU | 0.35 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | PAT | mAP | 44.6 | — | Unverified |