Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–500 of 817 papers

Title	Date	Tasks	Status
Cross-modal Supervision for Learning Active Speaker Detection in Video	Mar 29, 2016	Action DetectionActive Speaker Detection	—Unverified
CTRN: Class-Temporal Relational Network for Action Detection	Oct 26, 2021	Action Detection	—Unverified
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model	Jan 29, 2024	Action DetectionAction Localization	—Unverified
Data-aided Active User Detection with a User Activity Extraction Network for Grant-free SCMA Systems	May 22, 2022	Action DetectionActivity Detection	—Unverified
Dataset for Real-World Human Action Detection Using FMCW mmWave Radar	Dec 23, 2024	Action DetectionPrivacy Preserving	—Unverified
Dealing with training and test segmentation mismatch: FBK@IWSLT2021	Jun 23, 2021	Action DetectionActivity Detection	—Unverified
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection	Mar 30, 2023	Action DetectionAction Localization	—Unverified
Deconstruct Complexity (DeComplex): A Novel Perspective on Tackling Dense Action Detection	Jan 30, 2025	Action DetectionContrastive Learning	—Unverified
Deep Learning-Assisted Parallel Interference Cancellation for Grant-Free NOMA in Machine-Type Communication	Mar 12, 2024	Action DetectionActivity Detection	—Unverified
Deep Learning-based Action Detection in Untrimmed Videos: A Survey	Sep 30, 2021	Action DetectionAction Recognition	—Unverified
Deep learning-based approaches for human motion decoding in smart walkers for rehabilitation	Jan 13, 2023	Action DetectionAction Recognition	—Unverified
Spatial-Temporal Alignment Network for Action Recognition and Detection	Dec 4, 2020	Action DetectionAction Recognition	—Unverified
Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation	Jul 31, 2017	Action DetectionRegion Proposal	—Unverified
Spatio-Temporal Action Detection with Multi-Object Interaction	Apr 1, 2020	Action DetectionHuman Detection	—Unverified
Spatio-Temporal Action Localization in a Weakly Supervised Setting	May 6, 2019	Action DetectionAction Localization	—Unverified
Spatio-temporal Action Recognition: A Survey	Jan 27, 2019	Action DetectionAction Localization	—Unverified
Spatio-Temporal Context for Action Detection	Jun 29, 2021	Action DetectionVideo Understanding	—Unverified
Spatio-Temporal Context Prompting for Zero-Shot Action Detection	Aug 28, 2024	Action DetectionZero-Shot Action Detection	—Unverified
Spatiotemporal Deformable Scene Graphs for Complex Activity Detection	Apr 16, 2021	Action DetectionActivity Detection	—Unverified
Spatiotemporal Deformable Part Models for Action Detection	Jun 1, 2013	Action Detectionobject-detection	—Unverified
Spatiotemporal Event Graphs for Dynamic Scene Understanding	Dec 11, 2023	Action DetectionActivity Detection	—Unverified
Speaker and Posture Classification using Instantaneous Intraspeech Breathing Features	May 25, 2020	Action DetectionActivity Detection	—Unverified
Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios	Mar 18, 2022	Action DetectionActivity Detection	—Unverified
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization	May 15, 2024	Action DetectionActivity Detection	—Unverified
Speaker Independent Continuous Speech to Text Converter for Mobile Application	Jul 19, 2013	Action DetectionActivity Detection	—Unverified
Speech enhancement aided end-to-end multi-task learning for voice activity detection	Oct 23, 2020	Action DetectionActivity Detection	—Unverified
Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection	Oct 22, 2019	Action DetectionActivity Detection	—Unverified
SPIRE-SIES: A Spontaneous Indian English Speech Corpus	Dec 1, 2023	Action DetectionActivity Detection	—Unverified
SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal Action Detection	Jun 29, 2021	Action Detection	—Unverified
SRG: Snippet Relatedness-based Temporal Action Proposal Generator	Nov 26, 2019	Action DetectionTemporal Action Proposal Generation	—Unverified
Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments	Jul 28, 2020	Action DetectionActivity Detection	—Unverified
Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector	Jul 9, 2018	Action DetectionTemporal Localization	—Unverified
STMixer: A One-Stage Sparse Action Detector	Apr 15, 2024	Action Detection	—Unverified
Supporting More Active Users for Massive Access via Data-assisted Activity Detection	Feb 17, 2021	Action DetectionActivity Detection	—Unverified
sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks	Mar 9, 2024	Action DetectionActivity Detection	—Unverified
SVVAD: Personal Voice Activity Detection for Speaker Verification	May 31, 2023	Action DetectionActivity Detection	—Unverified
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks	Dec 14, 2022	Action DetectionActivity Detection	—Unverified
TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection	May 31, 2019	Action Detection	—Unverified
TAEN: Temporal Aware Embedding Network for Few-Shot Action Recognition	Apr 21, 2020	3D Face ReconstructionAction Detection	—Unverified
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription	Jul 8, 2022	Action DetectionActivity Detection	—Unverified
Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario	May 14, 2020	Action DetectionActivity Detection	—Unverified
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction	Oct 28, 2022	Action DetectionActivity Detection	—Unverified
Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker	Aug 7, 2021	Action DetectionActivity Detection	—Unverified
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization	Aug 27, 2022	Action DetectionActivity Detection	—Unverified
TCG CREST System Description for the Second DISPLACE Challenge	Sep 16, 2024	Action DetectionActivity Detection	—Unverified
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks	Sep 27, 2024	Action DetectionAction Segmentation	—Unverified
Temporal Action Detection by Joint Identification-Verification	Oct 19, 2018	Action Detection	—Unverified
Temporal Action Detection Model Compression by Progressive Block Drop	Mar 21, 2025	Action DetectionAutonomous Driving	—Unverified
Temporal Action Detection with Multi-level Supervision	Nov 24, 2020	Action DetectionSemi-Supervised Action Detection	—Unverified
Temporal Action Localization by Structured Maximal Sums	Apr 15, 2017	Action DetectionAction Localization	—Unverified

Show:10 25 50

← PrevPage 10 of 17Next →

All datasets UCF101-24 J-HMDB Charades Multi-THUMOS UCF Sports THUMOS' 14 MultiSports TSU TTStroke-21 ME21 TTStroke-21 ME22 MultiTHUMOS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	STAR/L	Frame-mAP 0.5	90.3	—	Unverified
2	SiA	Frame-mAP 0.5	88.5	—	Unverified
3	YOWO + LFB	Frame-mAP 0.5	87.3	—	Unverified
4	HIT	Frame-mAP 0.5	84.8	—	Unverified
5	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	82.3	—	Unverified
6	YOWO	Frame-mAP 0.5	80.4	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.2	78.48	—	Unverified
8	MOC	Frame-mAP 0.5	77.8	—	Unverified
9	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	76.3	—	Unverified
10	Two-in-one	Video-mAP 0.2	75.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SiA	Frame-mAP 0.5	88.5	—	Unverified
2	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	87.59	—	Unverified
3	HIT	Frame-mAP 0.5	83.8	—	Unverified
4	HISAN (VGG-16)	Frame-mAP 0.5	76.72	—	Unverified
5	DTS	Video-mAP 0.2	76.1	—	Unverified
6	YOWO + LFB	Frame-mAP 0.5	75.7	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.5	74.74	—	Unverified
8	YOWO	Frame-mAP 0.5	74.4	—	Unverified
9	MOC	Frame-mAP 0.5	74	—	Unverified
10	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	73.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TTM	mAP	28.79	—	Unverified
2	CTRN	mAP	27.8	—	Unverified
3	Coarse-Fine Networks (w/ self-supervised detection pretraining)	mAP	26.95	—	Unverified
4	UniMD+Sync. (RGB+Flow)	mAP	26.53	—	Unverified
5	PDAN (RGB+Flow)	mAP	26.5	—	Unverified
6	PAT	mAP	26.5	—	Unverified
7	MS-TCT (RGB only)	mAP	25.4	—	Unverified
8	3D ResNet-50 + super-events pretrained on AViD	mAP	25.2	—	Unverified
9	Coarse-Fine Networks	mAP	25.1	—	Unverified
10	MLAD (RGB + Flow)	mAP	23.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MLAD	mAP	51.5	—	Unverified
2	CTRN	mAP	51.2	—	Unverified
3	PDAN	mAP	47.6	—	Unverified
4	TGM	mAP	46.4	—	Unverified
5	MS-TCT (RGB only)	mAP	43.1	—	Unverified
6	I3D + our super-event	mAP	36.4	—	Unverified
7	Two-stream + LSTM	mAP	28.1	—	Unverified
8	Two-stream	mAP	27.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Two-in-one Two Stream	Video-mAP 0.5	96.52	—	Unverified
2	DTS	Video-mAP 0.2	94.3	—	Unverified
3	Two-in-one	Video-mAP 0.5	92.74	—	Unverified
4	T-CNN	Frame-mAP 0.5	86.7	—	Unverified
5	MR-TS R-CNN	Frame-mAP 0.5	84.52	—	Unverified
6	TS R-CNN	Frame-mAP 0.5	82.3	—	Unverified
7	Action Tubes	Frame-mAP 0.5	68.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MAT (Ours) Trans	mAP	71.6	—	Unverified
2	TadML-two stream	mAP	59.7	—	Unverified
3	MAT (ours)	mAP	58.2	—	Unverified
4	TadML-rgb	mAP	53.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HIT	Frame-mAP 0.5	33.3	—	Unverified
2	SiA	Frame-mAP 0.5	28.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MS-TCT	Frame-mAP	33.7	—	Unverified
2	PDAN	Frame-mAP	32.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN	IoU	0.14	—	Unverified
2	Two Stream Network	IoU	0.07	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN-V2 (Vote decision)	IoU	0.52	—	Unverified
2	RGB and PRGB	IoU	0.35	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PAT	mAP	44.6	—	Unverified