Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–500 of 817 papers

Title	Date	Tasks	Status
Temporal Action Detection by Joint Identification-Verification	Oct 19, 2018	Action Detection	—Unverified
Temporal Action Detection Model Compression by Progressive Block Drop	Mar 21, 2025	Action DetectionAutonomous Driving	—Unverified
Temporal Action Detection with Multi-level Supervision	Nov 24, 2020	Action DetectionSemi-Supervised Action Detection	—Unverified
Temporal Action Localization by Structured Maximal Sums	Apr 15, 2017	Action DetectionAction Localization	—Unverified
Temporal Divide-and-Conquer Anomaly Actions Localization in Semi-Supervised Videos with Hierarchical Transformer	Aug 24, 2024	Action DetectionAnomaly Detection	—Unverified
Spatio-Temporal Event Segmentation and Localization for Wildlife Extended Videos	May 5, 2020	Action DetectionActivity Detection	—Unverified
Temporal-Needle: A view and appearance invariant video descriptor	Dec 14, 2016	Action DetectionClustering	—Unverified
Temporal Structure Mining for Weakly Supervised Action Detection	Oct 1, 2019	Action DetectionWeakly Supervised Action Localization	—Unverified
Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection	Apr 2, 2020	Action DetectionActivity Detection	—Unverified
Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations	Oct 15, 2015	Action DetectionActivity Detection	—Unverified
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos	Jul 26, 2022	Action DetectionAction Localization	—Unverified
The AFRL IWSLT 2020 Systems: Work-From-Home Edition	Jul 1, 2020	Action DetectionActivity Detection	—Unverified
The Cohort and Speechify Libraries for Rapid Construction of Speech Enabled Applications for Android	Sep 1, 2015	Action DetectionSpeech Recognition	—Unverified
The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge	Feb 4, 2022	Action DetectionActivity Detection	—Unverified
The DKU-DukeECE Diarization System for the VoxCeleb Speaker Recognition Challenge 2022	Oct 4, 2022	Action DetectionActivity Detection	—Unverified
The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge	Sep 5, 2021	Action DetectionActivity Detection	—Unverified
The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023	Aug 15, 2023	Action DetectionActivity Detection	—Unverified
The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge	Oct 22, 2020	Action DetectionActivity Detection	—Unverified
The Impact of Silence on Speech Anti-Spoofing	Sep 21, 2023	Action DetectionActivity Detection	—Unverified
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge	Jun 14, 2020	Action DetectionActivity Detection	—Unverified
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022	Sep 23, 2022	Action DetectionActivity Detection	—Unverified
The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description	Jan 17, 2023	Action DetectionActivity Detection	—Unverified
The RATS Collection: Supporting HLT Research with Degraded Audio Data	May 1, 2014	Action DetectionActivity Detection	—Unverified
The SAFE-T Corpus: A New Resource for Simulated Public Safety Communications	May 1, 2020	Action DetectionActivity Detection	—Unverified
The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges and methods	Apr 7, 2021	Action Detection	—Unverified
The "Sound of Silence" in EEG -- Cognitive voice activity detection	Oct 12, 2020	Action DetectionActivity Detection	—Unverified
The Speed Submission to DIHARD II: Contributions & Lessons Learned	Nov 6, 2019	Action DetectionActivity Detection	—Unverified
The Stackelberg Equilibrium for One-sided Zero-sum Partially Observable Stochastic Games	Sep 17, 2021	Action Detection	—Unverified
The Use of Video Captioning for Fostering Physical Activity	Apr 7, 2021	Action Detectionobject-detection	—Unverified
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge	Feb 10, 2022	Action DetectionActivity Detection	—Unverified
The VVAD-LRS3 Dataset for Visual Voice Activity Detection	Sep 28, 2021	Action DetectionActivity Detection	—Unverified
"This is Houston. Say again, please". The Behavox system for the Apollo-11 Fearless Steps Challenge (phase II)	Aug 4, 2020	Action DetectionActivity Detection	—Unverified
Three Birds with One Stone: Multi-Task Temporal Action Detection via Recycling Temporal Annotations	Jun 19, 2021	Action DetectionAction Localization	—Unverified
Time and Frequency Network for Human Action Detection in Videos	Mar 8, 2021	Action Detection	—Unverified
Token Turing Machines	Nov 16, 2022	Action DetectionActivity Detection	—Unverified
TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal	Oct 1, 2017	Action Detectionregression	—Unverified
Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition	Aug 1, 2020	3D Action RecognitionAction Classification	—Unverified
Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations	Jul 1, 2020	Action DetectionActivity Detection	—Unverified
Towards More Practical Group Activity Detection: A New Benchmark and Model	Dec 5, 2023	Action DetectionActivity Detection	—Unverified
Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM	May 29, 2025	Action DetectionActivity Detection	—Unverified
Trajectory-User Linking Is Easier Than You Think	Dec 14, 2022	Action DetectionActivity Detection	—Unverified
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR	Oct 7, 2021	Action DetectionActivity Detection	—Unverified
Transferable Adversarial Attacks against ASR	Nov 14, 2024	Action DetectionActivity Detection	—Unverified
Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection	Jul 10, 2016	Action DetectionAction Recognition	—Unverified
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval	Sep 21, 2020	Action DetectionActivity Detection	—Unverified
Tri-axial Self-Attention for Concurrent Activity Recognition	Dec 6, 2018	Action DetectionActivity Detection	—Unverified
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge	Oct 26, 2022	Action DetectionActivity Detection	—Unverified
Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection	Nov 8, 2022	Action DetectionActivity Detection	—Unverified
Two Stream Network for Stroke Detection in Table Tennis	Dec 16, 2021	Action DetectionVocal Bursts Valence Prediction	—Unverified
Two-Stream Region Convolutional 3D Network for Temporal Activity Detection	Jun 5, 2019	Action DetectionAction Recognition	—Unverified

Show:10 25 50

← PrevPage 10 of 17Next →

All datasets UCF101-24 J-HMDB Charades Multi-THUMOS UCF Sports THUMOS' 14 MultiSports TSU TTStroke-21 ME21 TTStroke-21 ME22 MultiTHUMOS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	STAR/L	Frame-mAP 0.5	90.3	—	Unverified
2	SiA	Frame-mAP 0.5	88.5	—	Unverified
3	YOWO + LFB	Frame-mAP 0.5	87.3	—	Unverified
4	HIT	Frame-mAP 0.5	84.8	—	Unverified
5	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	82.3	—	Unverified
6	YOWO	Frame-mAP 0.5	80.4	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.2	78.48	—	Unverified
8	MOC	Frame-mAP 0.5	77.8	—	Unverified
9	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	76.3	—	Unverified
10	Two-in-one	Video-mAP 0.2	75.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SiA	Frame-mAP 0.5	88.5	—	Unverified
2	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	87.59	—	Unverified
3	HIT	Frame-mAP 0.5	83.8	—	Unverified
4	HISAN (VGG-16)	Frame-mAP 0.5	76.72	—	Unverified
5	DTS	Video-mAP 0.2	76.1	—	Unverified
6	YOWO + LFB	Frame-mAP 0.5	75.7	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.5	74.74	—	Unverified
8	YOWO	Frame-mAP 0.5	74.4	—	Unverified
9	MOC	Frame-mAP 0.5	74	—	Unverified
10	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	73.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TTM	mAP	28.79	—	Unverified
2	CTRN	mAP	27.8	—	Unverified
3	Coarse-Fine Networks (w/ self-supervised detection pretraining)	mAP	26.95	—	Unverified
4	UniMD+Sync. (RGB+Flow)	mAP	26.53	—	Unverified
5	PDAN (RGB+Flow)	mAP	26.5	—	Unverified
6	PAT	mAP	26.5	—	Unverified
7	MS-TCT (RGB only)	mAP	25.4	—	Unverified
8	3D ResNet-50 + super-events pretrained on AViD	mAP	25.2	—	Unverified
9	Coarse-Fine Networks	mAP	25.1	—	Unverified
10	MLAD (RGB + Flow)	mAP	23.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MLAD	mAP	51.5	—	Unverified
2	CTRN	mAP	51.2	—	Unverified
3	PDAN	mAP	47.6	—	Unverified
4	TGM	mAP	46.4	—	Unverified
5	MS-TCT (RGB only)	mAP	43.1	—	Unverified
6	I3D + our super-event	mAP	36.4	—	Unverified
7	Two-stream + LSTM	mAP	28.1	—	Unverified
8	Two-stream	mAP	27.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Two-in-one Two Stream	Video-mAP 0.5	96.52	—	Unverified
2	DTS	Video-mAP 0.2	94.3	—	Unverified
3	Two-in-one	Video-mAP 0.5	92.74	—	Unverified
4	T-CNN	Frame-mAP 0.5	86.7	—	Unverified
5	MR-TS R-CNN	Frame-mAP 0.5	84.52	—	Unverified
6	TS R-CNN	Frame-mAP 0.5	82.3	—	Unverified
7	Action Tubes	Frame-mAP 0.5	68.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MAT (Ours) Trans	mAP	71.6	—	Unverified
2	TadML-two stream	mAP	59.7	—	Unverified
3	MAT (ours)	mAP	58.2	—	Unverified
4	TadML-rgb	mAP	53.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HIT	Frame-mAP 0.5	33.3	—	Unverified
2	SiA	Frame-mAP 0.5	28.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MS-TCT	Frame-mAP	33.7	—	Unverified
2	PDAN	Frame-mAP	32.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN	IoU	0.14	—	Unverified
2	Two Stream Network	IoU	0.07	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN-V2 (Vote decision)	IoU	0.52	—	Unverified
2	RGB and PRGB	IoU	0.35	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PAT	mAP	44.6	—	Unverified