Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 817 papers

Title	Date	Tasks	Status	Hype	Score
PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points	Oct 20, 2022	Action DetectionTemporal Action Localization	CodeCode Available	1	5
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression	Apr 3, 2024	Action Detectionobject-detection	CodeCode Available	1	5
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation	Oct 24, 2022	Action DetectionActivity Detection	CodeCode Available	1	5
From Recognition to Prediction: Analysis of Human Action and Trajectory Prediction in Video	Nov 20, 2020	Action DetectionAutonomous Driving	CodeCode Available	1	5
A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vessels	Jul 12, 2022	Action DetectionActivity Detection	CodeCode Available	1	5
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development	Jul 17, 2023	Action DetectionActivity Detection	CodeCode Available	1	5
HAKE: Human Activity Knowledge Engine	Apr 13, 2019	Action DetectionHuman-Object Interaction Detection	CodeCode Available	1	5
TubeR: Tubelet Transformer for Video Action Detection	Apr 2, 2021	Action ClassificationAction Detection	CodeCode Available	1	5
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation	Oct 21, 2021	Action DetectionTemporal Action Proposal Generation	CodeCode Available	1	5
Learning spectro-temporal representations of complex sounds with parameterized neural networks	Mar 12, 2021	Action DetectionActivity Detection	CodeCode Available	1	5
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos	May 2, 2020	Action DetectionForm	CodeCode Available	1	5
Long Short-Term Transformer for Online Action Detection	Jul 7, 2021	Action DetectionDecoder	CodeCode Available	1	5
Asynchronous Interaction Aggregation for Action Detection	Apr 16, 2020	Action DetectionVideo Action Detection	CodeCode Available	1	5
MMAD: Multi-label Micro-Action Detection in Videos	Jul 7, 2024	Action AnalysisAction Detection	CodeCode Available	1	5
Actions as Moving Points	Jan 14, 2020	Action DetectionAction Recognition	CodeCode Available	1	5
Memory-and-Anticipation Transformer for Online Action Understanding	Aug 15, 2023	Action DetectionAction Understanding	CodeCode Available	1	5
MiniROAD: Minimal RNN Framework for Online Action Detection	Jan 1, 2023	Action DetectionOnline Action Detection	CodeCode Available	1	5
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector	Jun 7, 2022	Action ClassificationAction Detection	CodeCode Available	1	5
NAS-VAD: Neural Architecture Search for Voice Activity Detection	Jan 22, 2022	Action DetectionActivity Detection	CodeCode Available	1	5
AV Taris: Online Audio-Visual Speech Recognition	Dec 14, 2020	Action DetectionActivity Detection	CodeCode Available	1	5
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization	Nov 12, 2022	Action DetectionActivity Detection	CodeCode Available	1	5
Modeling Multi-Label Action Dependencies for Temporal Action Localization	Mar 4, 2021	Action DetectionAction Localization	CodeCode Available	1	5
A Hybrid CNN-BiLSTM Voice Activity Detector	Mar 5, 2021	Action DetectionActivity Detection	CodeCode Available	1	5
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection	Dec 7, 2021	Action DetectionTemporal Action Localization	CodeCode Available	1	5
AViD Dataset: Anonymized Videos from Diverse Countries	Jul 10, 2020	Action ClassificationAction Detection	CodeCode Available	1	5
Multi-Modal Few-Shot Temporal Action Detection	Nov 27, 2022	Action DetectionFew-Shot Object Detection	CodeCode Available	1	5
CBR-Net: Cascade Boundary Refinement Network for Action Detection: Submission to ActivityNet Challenge 2020 (Task 1)	Jun 13, 2020	Action DetectionAction Localization	CodeCode Available	1	5
AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence	Nov 2, 2021	Action DetectionActivity Detection	CodeCode Available	1	5
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions	May 23, 2017	Actin DetectionAction Detection	CodeCode Available	1	5
Context-Enhanced Memory-Refined Transformer for Online Action Detection	Mar 24, 2025	Action DetectionDecoder	CodeCode Available	1	5
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos	Jul 17, 2024	Action DetectionAction Localization	CodeCode Available	1	5
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation	Jun 8, 2018	Action DetectionTemporal Action Localization	CodeCode Available	1	5
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection	May 5, 2022	Action Detectionobject-detection	CodeCode Available	1	5
A Multigrid Method for Efficiently Training Video Models	Dec 2, 2019	Action DetectionAction Recognition	CodeCode Available	1	5
Context-Aware RCNN: A Baseline for Action Detection in Videos	Jul 20, 2020	Action DetectionAction Recognition	CodeCode Available	1	5
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions	Mar 29, 2024	Action DetectionBenchmarking	CodeCode Available	1	5
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications	Oct 12, 2021	Action DetectionActivity Detection	CodeCode Available	1	5
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions	Apr 21, 2022	Action DetectionVideo Understanding	CodeCode Available	1	5
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding	May 14, 2024	Action DetectionGPU	CodeCode Available	1	5
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition	Apr 10, 2022	Action DetectionAction Triplet Recognition	CodeCode Available	1	5
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers	Sep 3, 2023	Action DetectionAction Spotting	CodeCode Available	1	5
Post-Processing Temporal Action Detection	Nov 27, 2022	Action ClassificationAction Detection	CodeCode Available	1	5
Continual Transformers: Redundancy-Free Attention for Online Inference	Jan 17, 2022	Action DetectionAudio Classification	CodeCode Available	1	5
Semi-Supervised Temporal Action Detection with Proposal-Free Masking	Jul 14, 2022	Action DetectionGeneral Classification	CodeCode Available	1	5
Continuous control with deep reinforcement learning	Sep 9, 2015	Action Detectioncontinuous-control	CodeCode Available	1	5
BMN: Boundary-Matching Network for Temporal Action Proposal Generation	Jul 23, 2019	Action DetectionAction Recognition	CodeCode Available	1	5
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment	Dec 1, 2024	Action DetectionActivity Detection	CodeCode Available	0	5
Automatic detection and prediction of nAMD activity change in retinal OCT using Siamese networks and Wasserstein Distance for ordinality	Jan 24, 2025	Action DetectionActivity Detection	CodeCode Available	0	5
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding	Nov 1, 2019	Action DetectionAction Recognition	CodeCode Available	0	5
Modality Distillation with Multiple Stream Networks for Action Recognition	Jun 19, 2018	Action ClassificationAction Detection	CodeCode Available	0	5

Show:10 25 50

← PrevPage 3 of 17Next →

All datasets UCF101-24 J-HMDB Charades Multi-THUMOS UCF Sports THUMOS' 14 MultiSports TSU TTStroke-21 ME21 TTStroke-21 ME22 MultiTHUMOS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	STAR/L	Frame-mAP 0.5	90.3	—	Unverified
2	SiA	Frame-mAP 0.5	88.5	—	Unverified
3	YOWO + LFB	Frame-mAP 0.5	87.3	—	Unverified
4	HIT	Frame-mAP 0.5	84.8	—	Unverified
5	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	82.3	—	Unverified
6	YOWO	Frame-mAP 0.5	80.4	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.2	78.48	—	Unverified
8	MOC	Frame-mAP 0.5	77.8	—	Unverified
9	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	76.3	—	Unverified
10	Two-in-one	Video-mAP 0.2	75.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SiA	Frame-mAP 0.5	88.5	—	Unverified
2	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	87.59	—	Unverified
3	HIT	Frame-mAP 0.5	83.8	—	Unverified
4	HISAN (VGG-16)	Frame-mAP 0.5	76.72	—	Unverified
5	DTS	Video-mAP 0.2	76.1	—	Unverified
6	YOWO + LFB	Frame-mAP 0.5	75.7	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.5	74.74	—	Unverified
8	YOWO	Frame-mAP 0.5	74.4	—	Unverified
9	MOC	Frame-mAP 0.5	74	—	Unverified
10	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	73.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TTM	mAP	28.79	—	Unverified
2	CTRN	mAP	27.8	—	Unverified
3	Coarse-Fine Networks (w/ self-supervised detection pretraining)	mAP	26.95	—	Unverified
4	UniMD+Sync. (RGB+Flow)	mAP	26.53	—	Unverified
5	PDAN (RGB+Flow)	mAP	26.5	—	Unverified
6	PAT	mAP	26.5	—	Unverified
7	MS-TCT (RGB only)	mAP	25.4	—	Unverified
8	3D ResNet-50 + super-events pretrained on AViD	mAP	25.2	—	Unverified
9	Coarse-Fine Networks	mAP	25.1	—	Unverified
10	MLAD (RGB + Flow)	mAP	23.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MLAD	mAP	51.5	—	Unverified
2	CTRN	mAP	51.2	—	Unverified
3	PDAN	mAP	47.6	—	Unverified
4	TGM	mAP	46.4	—	Unverified
5	MS-TCT (RGB only)	mAP	43.1	—	Unverified
6	I3D + our super-event	mAP	36.4	—	Unverified
7	Two-stream + LSTM	mAP	28.1	—	Unverified
8	Two-stream	mAP	27.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Two-in-one Two Stream	Video-mAP 0.5	96.52	—	Unverified
2	DTS	Video-mAP 0.2	94.3	—	Unverified
3	Two-in-one	Video-mAP 0.5	92.74	—	Unverified
4	T-CNN	Frame-mAP 0.5	86.7	—	Unverified
5	MR-TS R-CNN	Frame-mAP 0.5	84.52	—	Unverified
6	TS R-CNN	Frame-mAP 0.5	82.3	—	Unverified
7	Action Tubes	Frame-mAP 0.5	68.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MAT (Ours) Trans	mAP	71.6	—	Unverified
2	TadML-two stream	mAP	59.7	—	Unverified
3	MAT (ours)	mAP	58.2	—	Unverified
4	TadML-rgb	mAP	53.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HIT	Frame-mAP 0.5	33.3	—	Unverified
2	SiA	Frame-mAP 0.5	28.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MS-TCT	Frame-mAP	33.7	—	Unverified
2	PDAN	Frame-mAP	32.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN	IoU	0.14	—	Unverified
2	Two Stream Network	IoU	0.07	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN-V2 (Vote decision)	IoU	0.52	—	Unverified
2	RGB and PRGB	IoU	0.35	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PAT	mAP	44.6	—	Unverified