Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 817 papers

Title	Date	Tasks	Status	Hype
ROAD: The ROad event Awareness Dataset for Autonomous Driving	Feb 23, 2021	Action DetectionActivity Detection	CodeCode Available	1
Semi-Supervised Temporal Action Detection with Proposal-Free Masking	Jul 14, 2022	Action DetectionGeneral Classification	CodeCode Available	1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation	Oct 24, 2022	Action DetectionActivity Detection	CodeCode Available	1
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos	Apr 12, 2018	Action ClassificationAction Detection	CodeCode Available	1
A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vessels	Jul 12, 2022	Action DetectionActivity Detection	CodeCode Available	1
Spotting Temporally Precise, Fine-Grained Events in Video	Jul 20, 2022	Action DetectionAction Spotting	CodeCode Available	1
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation	Jun 8, 2018	Action DetectionTemporal Action Localization	CodeCode Available	1
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer	May 9, 2025	Action DetectionDecoder	CodeCode Available	1
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation	Oct 21, 2021	Action DetectionTemporal Action Proposal Generation	CodeCode Available	1
SVIP: Sequence VerIfication for Procedures in Videos	Dec 13, 2021	Action DetectionAction Recognition	CodeCode Available	1
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos	May 2, 2020	Action DetectionForm	CodeCode Available	1
Continuous control with deep reinforcement learning	Sep 9, 2015	Action Detectioncontinuous-control	CodeCode Available	1
Asynchronous Interaction Aggregation for Action Detection	Apr 16, 2020	Action DetectionVideo Action Detection	CodeCode Available	1
Continual Transformers: Redundancy-Free Attention for Online Inference	Jan 17, 2022	Action DetectionAudio Classification	CodeCode Available	1
Coupling Intent and Action for Pedestrian Crossing Behavior Prediction	May 10, 2021	Action DetectionAutonomous Vehicles	CodeCode Available	1
Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models	Jan 23, 2025	Action DetectionPseudo Label	CodeCode Available	1
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers	Sep 3, 2023	Action DetectionAction Spotting	CodeCode Available	1
TubeR: Tubelet Transformer for Video Action Detection	Apr 2, 2021	Action ClassificationAction Detection	CodeCode Available	1
AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence	Nov 2, 2021	Action DetectionActivity Detection	CodeCode Available	1
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions	May 23, 2017	Actin DetectionAction Detection	CodeCode Available	1
VoxLingua107: a Dataset for Spoken Language Recognition	Nov 25, 2020	Action DetectionActivity Detection	CodeCode Available	1
WASE: Learning When to Attend for Speaker Extraction in Cocktail Party Environments	Jun 13, 2021	Action DetectionActivity Detection	CodeCode Available	1
A Hybrid CNN-BiLSTM Voice Activity Detector	Mar 5, 2021	Action DetectionActivity Detection	CodeCode Available	1
Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks	Aug 1, 2020	3D Action Recognition3D Classification	CodeCode Available	1
Context-Aware RCNN: A Baseline for Action Detection in Videos	Jul 20, 2020	Action DetectionAction Recognition	CodeCode Available	1
YOWO-Plus: An Incremental Improvement	Oct 20, 2022	Action DetectionGPU	CodeCode Available	1
DCAN: Improving Temporal Action Detection via Dual Context Aggregation	Dec 7, 2021	Action DetectionTemporal Action Localization	CodeCode Available	1
ETAD: Training Action Detection End to End on a Laptop	May 14, 2022	Action DetectionGPU	CodeCode Available	1
AViD Dataset: Anonymized Videos from Diverse Countries	Jul 10, 2020	Action ClassificationAction Detection	CodeCode Available	1
AV Taris: Online Audio-Visual Speech Recognition	Dec 14, 2020	Action DetectionActivity Detection	CodeCode Available	1
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos	Jul 17, 2024	Action DetectionAction Localization	CodeCode Available	1
Context-Enhanced Memory-Refined Transformer for Online Action Detection	Mar 24, 2025	Action DetectionDecoder	CodeCode Available	1
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection	May 5, 2022	Action Detectionobject-detection	CodeCode Available	1
A Multigrid Method for Efficiently Training Video Models	Dec 2, 2019	Action DetectionAction Recognition	CodeCode Available	1
Long Short-Term Transformer for Online Action Detection	Jul 7, 2021	Action DetectionDecoder	CodeCode Available	1
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions	Mar 29, 2024	Action DetectionBenchmarking	CodeCode Available	1
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications	Oct 12, 2021	Action DetectionActivity Detection	CodeCode Available	1
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions	Apr 21, 2022	Action DetectionVideo Understanding	CodeCode Available	1
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection	Jul 3, 2024	Action DetectionDynamic neural networks	CodeCode Available	1
E2E-LOAD: End-to-End Long-form Online Action Detection	Jun 13, 2023	Action DetectionForm	CodeCode Available	1
Proposal Relation Network for Temporal Action Detection	Jun 20, 2021	Action ClassificationAction Detection	CodeCode Available	1
End-to-End Semi-Supervised Learning for Video Action Detection	Mar 8, 2022	Action DetectionClassification Consistency	CodeCode Available	1
Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection	Sep 28, 2022	Action DetectionDomain Adaptation	CodeCode Available	1
Exploiting Temporal Side Information in Massive IoT Connectivity	Jan 5, 2022	Action DetectionActivity Detection	CodeCode Available	1
Multi-Granularity Hand Action Detection	Jun 19, 2023	Action DetectionAction Localization	CodeCode Available	1
BMN: Boundary-Matching Network for Temporal Action Proposal Generation	Jul 23, 2019	Action DetectionAction Recognition	CodeCode Available	1
A Hybrid Graph Network for Complex Activity Detection in Video	Oct 26, 2023	Action DetectionActivity Detection	—Unverified	0
Automatic Speech Recognition for Hindi	Jun 26, 2024	Action DetectionActivity Detection	—Unverified	0
Class Semantics-based Attention for Action Detection	Sep 6, 2021	Action DetectionAction Localization	—Unverified	0
Automated speech tools for helping communities process restricted-access corpora for language revival efforts	Apr 15, 2022	Action DetectionActivity Detection	—Unverified	0

Show:10 25 50

← PrevPage 3 of 17Next →

All datasets UCF101-24 J-HMDB Charades Multi-THUMOS UCF Sports THUMOS' 14 MultiSports TSU TTStroke-21 ME21 TTStroke-21 ME22 MultiTHUMOS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	STAR/L	Frame-mAP 0.5	90.3	—	Unverified
2	SiA	Frame-mAP 0.5	88.5	—	Unverified
3	YOWO + LFB	Frame-mAP 0.5	87.3	—	Unverified
4	HIT	Frame-mAP 0.5	84.8	—	Unverified
5	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	82.3	—	Unverified
6	YOWO	Frame-mAP 0.5	80.4	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.2	78.48	—	Unverified
8	MOC	Frame-mAP 0.5	77.8	—	Unverified
9	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	76.3	—	Unverified
10	Two-in-one	Video-mAP 0.2	75.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SiA	Frame-mAP 0.5	88.5	—	Unverified
2	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	87.59	—	Unverified
3	HIT	Frame-mAP 0.5	83.8	—	Unverified
4	HISAN (VGG-16)	Frame-mAP 0.5	76.72	—	Unverified
5	DTS	Video-mAP 0.2	76.1	—	Unverified
6	YOWO + LFB	Frame-mAP 0.5	75.7	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.5	74.74	—	Unverified
8	YOWO	Frame-mAP 0.5	74.4	—	Unverified
9	MOC	Frame-mAP 0.5	74	—	Unverified
10	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	73.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TTM	mAP	28.79	—	Unverified
2	CTRN	mAP	27.8	—	Unverified
3	Coarse-Fine Networks (w/ self-supervised detection pretraining)	mAP	26.95	—	Unverified
4	UniMD+Sync. (RGB+Flow)	mAP	26.53	—	Unverified
5	PDAN (RGB+Flow)	mAP	26.5	—	Unverified
6	PAT	mAP	26.5	—	Unverified
7	MS-TCT (RGB only)	mAP	25.4	—	Unverified
8	3D ResNet-50 + super-events pretrained on AViD	mAP	25.2	—	Unverified
9	Coarse-Fine Networks	mAP	25.1	—	Unverified
10	MLAD (RGB + Flow)	mAP	23.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MLAD	mAP	51.5	—	Unverified
2	CTRN	mAP	51.2	—	Unverified
3	PDAN	mAP	47.6	—	Unverified
4	TGM	mAP	46.4	—	Unverified
5	MS-TCT (RGB only)	mAP	43.1	—	Unverified
6	I3D + our super-event	mAP	36.4	—	Unverified
7	Two-stream + LSTM	mAP	28.1	—	Unverified
8	Two-stream	mAP	27.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Two-in-one Two Stream	Video-mAP 0.5	96.52	—	Unverified
2	DTS	Video-mAP 0.2	94.3	—	Unverified
3	Two-in-one	Video-mAP 0.5	92.74	—	Unverified
4	T-CNN	Frame-mAP 0.5	86.7	—	Unverified
5	MR-TS R-CNN	Frame-mAP 0.5	84.52	—	Unverified
6	TS R-CNN	Frame-mAP 0.5	82.3	—	Unverified
7	Action Tubes	Frame-mAP 0.5	68.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MAT (Ours) Trans	mAP	71.6	—	Unverified
2	TadML-two stream	mAP	59.7	—	Unverified
3	MAT (ours)	mAP	58.2	—	Unverified
4	TadML-rgb	mAP	53.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HIT	Frame-mAP 0.5	33.3	—	Unverified
2	SiA	Frame-mAP 0.5	28.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MS-TCT	Frame-mAP	33.7	—	Unverified
2	PDAN	Frame-mAP	32.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN	IoU	0.14	—	Unverified
2	Two Stream Network	IoU	0.07	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN-V2 (Vote decision)	IoU	0.52	—	Unverified
2	RGB and PRGB	IoU	0.35	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PAT	mAP	44.6	—	Unverified