Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 817 papers

Title	Date	Tasks	Status	Score
Learning Latent Super-Events to Detect Multiple Activities in Videos	Dec 5, 2017	Action DetectionActivity Detection	CodeCode Available	5
Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos	Sep 21, 2022	Action DetectionAction Recognition	CodeCode Available	5
Learning Spatio-Temporal Representation with Local and Global Diffusion	Jun 13, 2019	Action ClassificationAction Detection	CodeCode Available	5
Emotion Action Detection and Emotion Inference: the Task and Dataset	Mar 16, 2019	Action DetectionEmotion Classification	CodeCode Available	5
Long-term Conversation Analysis: Exploring Utility and Privacy	Jun 28, 2023	Action DetectionActivity Detection	CodeCode Available	5
Incremental Tube Construction for Human Action Detection	Apr 5, 2017	Action Detection	CodeCode Available	5
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations	Aug 14, 2023	Action DetectionActivity Detection	CodeCode Available	5
Identifying Visible Actions in Lifestyle Vlogs	Jun 10, 2019	Action Detection	CodeCode Available	5
Estimation of Reliable Proposal Quality for Temporal Action Detection	Apr 25, 2022	Action Detection	CodeCode Available	5
Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation	Jun 25, 2024	Action DetectionBenchmarking	CodeCode Available	5
A Comprehensive Study on Temporal Modeling for Online Action Detection	Jan 21, 2020	Action DetectionOnline Action Detection	CodeCode Available	5
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms	Feb 6, 2023	Action ClassificationAction Detection	CodeCode Available	5
Handwashing Action Detection System for an Autonomous Social Robot	Oct 27, 2022	Action DetectionAction Recognition	CodeCode Available	5
Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation	Feb 22, 2020	3D Human Pose EstimationAction Detection	CodeCode Available	5
End-to-end Learning of Action Detection from Frame Glimpses in Videos	Nov 22, 2015	Action DetectionTemporal Action Localization	CodeCode Available	5
Am I Done? Predicting Action Progress in Videos	May 4, 2017	Action DetectionTemporal Localization	CodeCode Available	5
Graph Distillation for Action Detection with Privileged Modalities	Nov 30, 2017	Action ClassificationAction Detection	CodeCode Available	5
Fine-Grained Classroom Activity Detection from Audio with Neural Networks	Jul 29, 2021	Action DetectionActivity Detection	CodeCode Available	5
Gan-Based Joint Activity Detection and Channel Estimation For Grant-free Random Access	Apr 4, 2022	Action DetectionActivity Detection	CodeCode Available	5
Fine-grained Activity Recognition in Baseball Videos	Apr 9, 2018	Action DetectionActivity Detection	CodeCode Available	5
Human Activity Recognition: A Spatio-temporal Image Encoding of 3D Skeleton Data for Online Action Detection	Feb 8, 2020	Action DetectionActivity Recognition	CodeCode Available	5
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos	Jul 25, 2024	Action DetectionAction Recognition	CodeCode Available	5
Personal VAD: Speaker-Conditioned Voice Activity Detection	Aug 12, 2019	Action DetectionActivity Detection	CodeCode Available	5
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022	Jan 31, 2023	Action DetectionBenchmarking	CodeCode Available	5
EMO\&LY (EMOtion and AnomaLY) : A new corpus for anomaly detection in an audiovisual stream with emotional context.	May 1, 2018	Action DetectionAnomaly Detection	—Unverified	0
EML Online Speech Activity Detection for the Fearless Steps Challenge Phase-III	Jun 21, 2021	Action DetectionActivity Detection	—Unverified	0
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts	Oct 7, 2024	Action DetectionMistake Detection	—Unverified	0
Automatic Speech Recognition for Hindi	Jun 26, 2024	Action DetectionActivity Detection	—Unverified	0
A Hybrid Graph Network for Complex Activity Detection in Video	Oct 26, 2023	Action DetectionActivity Detection	—Unverified	0
Ego-Only: Egocentric Action Detection without Exocentric Transferring	Jan 3, 2023	Action DetectionAction Localization	—Unverified	0
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization	Jan 6, 2022	Action DetectionActive Speaker Detection	—Unverified	0
Efficient Activity Detection in Untrimmed Video with Max-Subgraph Search	Jul 11, 2016	Action DetectionActivity Detection	—Unverified	0
Automated speech tools for helping communities process restricted-access corpora for language revival efforts	Apr 15, 2022	Action DetectionActivity Detection	—Unverified	0
ActionSpotter: Deep Reinforcement Learning Framework for Temporal Action Spotting in Videos	Apr 15, 2020	Action DetectionAction Spotting	—Unverified	0
Efficient Action Detection in Untrimmed Videos via Multi-Task Learning	Dec 22, 2016	Action DetectionAction Localization	—Unverified	0
Effective Abnormal Activity Detection on Multivariate Time Series Healthcare Data	Sep 11, 2023	Action DetectionActivity Detection	—Unverified	0
A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments	Oct 6, 2020	Action DetectionActivity Detection	—Unverified	0
Early Detection of In-Memory Malicious Activity based on Run-time Environmental Features	Mar 30, 2021	Action DetectionActivity Detection	—Unverified	0
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation	Mar 30, 2021	Action DetectionTemporal Action Proposal Generation	—Unverified	0
A Grammatical Compositional Model for Video Action Detection	Oct 4, 2023	Action DetectionHuman Dynamics	—Unverified	0
Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection	Aug 7, 2018	Action DetectionActivity Detection	—Unverified	0
Dual DETRs for Multi-Label Temporal Action Detection	Mar 31, 2024	Action Detectionobject-detection	—Unverified	0
Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion	Jun 2, 2025	Action DetectionActivity Detection	—Unverified	0
Aggressive actions and anger detection from multiple modalities using Kinect	Jul 5, 2016	Action Detection	—Unverified	0
A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities	Sep 15, 2024	Action DetectionActivity Detection	—Unverified	0
DT4ECG: A Dual-Task Learning Framework for ECG-Based Human Identity Recognition and Human Activity Detection	Feb 16, 2025	Action DetectionActivity Detection	—Unverified	0
Attention Filtering for Multi-person Spatiotemporal Action Detection on Deep Two-Stream CNN Architectures	Jul 21, 2019	Action DetectionGeneral Classification	—Unverified	0
Double-Sided Information Aided Temporal-Correlated Massive Access	May 16, 2022	Action DetectionActivity Detection	—Unverified	0
DOAD: Decoupled One Stage Action Detection Network	Apr 1, 2023	Action DetectionAction Recognition	—Unverified	0
ATTACH Dataset: Annotated Two-Handed Assembly Actions for Human Action Understanding	Apr 17, 2023	Action DetectionAction Recognition	—Unverified	0

Show:10 25 50

← PrevPage 6 of 17Next →

All datasets UCF101-24 J-HMDB Charades Multi-THUMOS UCF Sports THUMOS' 14 MultiSports TSU TTStroke-21 ME21 TTStroke-21 ME22 MultiTHUMOS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	STAR/L	Frame-mAP 0.5	90.3	—	Unverified
2	SiA	Frame-mAP 0.5	88.5	—	Unverified
3	YOWO + LFB	Frame-mAP 0.5	87.3	—	Unverified
4	HIT	Frame-mAP 0.5	84.8	—	Unverified
5	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	82.3	—	Unverified
6	YOWO	Frame-mAP 0.5	80.4	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.2	78.48	—	Unverified
8	MOC	Frame-mAP 0.5	77.8	—	Unverified
9	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	76.3	—	Unverified
10	Two-in-one	Video-mAP 0.2	75.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SiA	Frame-mAP 0.5	88.5	—	Unverified
2	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	87.59	—	Unverified
3	HIT	Frame-mAP 0.5	83.8	—	Unverified
4	HISAN (VGG-16)	Frame-mAP 0.5	76.72	—	Unverified
5	DTS	Video-mAP 0.2	76.1	—	Unverified
6	YOWO + LFB	Frame-mAP 0.5	75.7	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.5	74.74	—	Unverified
8	YOWO	Frame-mAP 0.5	74.4	—	Unverified
9	MOC	Frame-mAP 0.5	74	—	Unverified
10	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	73.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TTM	mAP	28.79	—	Unverified
2	CTRN	mAP	27.8	—	Unverified
3	Coarse-Fine Networks (w/ self-supervised detection pretraining)	mAP	26.95	—	Unverified
4	UniMD+Sync. (RGB+Flow)	mAP	26.53	—	Unverified
5	PDAN (RGB+Flow)	mAP	26.5	—	Unverified
6	PAT	mAP	26.5	—	Unverified
7	MS-TCT (RGB only)	mAP	25.4	—	Unverified
8	3D ResNet-50 + super-events pretrained on AViD	mAP	25.2	—	Unverified
9	Coarse-Fine Networks	mAP	25.1	—	Unverified
10	MLAD (RGB + Flow)	mAP	23.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MLAD	mAP	51.5	—	Unverified
2	CTRN	mAP	51.2	—	Unverified
3	PDAN	mAP	47.6	—	Unverified
4	TGM	mAP	46.4	—	Unverified
5	MS-TCT (RGB only)	mAP	43.1	—	Unverified
6	I3D + our super-event	mAP	36.4	—	Unverified
7	Two-stream + LSTM	mAP	28.1	—	Unverified
8	Two-stream	mAP	27.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Two-in-one Two Stream	Video-mAP 0.5	96.52	—	Unverified
2	DTS	Video-mAP 0.2	94.3	—	Unverified
3	Two-in-one	Video-mAP 0.5	92.74	—	Unverified
4	T-CNN	Frame-mAP 0.5	86.7	—	Unverified
5	MR-TS R-CNN	Frame-mAP 0.5	84.52	—	Unverified
6	TS R-CNN	Frame-mAP 0.5	82.3	—	Unverified
7	Action Tubes	Frame-mAP 0.5	68.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MAT (Ours) Trans	mAP	71.6	—	Unverified
2	TadML-two stream	mAP	59.7	—	Unverified
3	MAT (ours)	mAP	58.2	—	Unverified
4	TadML-rgb	mAP	53.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HIT	Frame-mAP 0.5	33.3	—	Unverified
2	SiA	Frame-mAP 0.5	28.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MS-TCT	Frame-mAP	33.7	—	Unverified
2	PDAN	Frame-mAP	32.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN	IoU	0.14	—	Unverified
2	Two Stream Network	IoU	0.07	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN-V2 (Vote decision)	IoU	0.52	—	Unverified
2	RGB and PRGB	IoU	0.35	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PAT	mAP	44.6	—	Unverified