Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 817 papers

Title	Date	Tasks	Status	Score
Scaling Open-Vocabulary Action Detection	Apr 4, 2025	Action DetectionMultiple Action Detection	CodeCode Available	5
rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method	Jun 9, 2019	Action DetectionActivity Detection	CodeCode Available	5
S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks	Jul 21, 2018	Action DetectionActivity Detection	CodeCode Available	5
Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints	Jun 2, 2017	Action DetectionAction Segmentation	CodeCode Available	5
SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action Localization	Dec 20, 2023	Action DetectionAction Localization	CodeCode Available	5
Rethinking Online Action Detection in Untrimmed Videos: A Novel Online Evaluation Protocol	Mar 26, 2020	Action DetectionOnline Action Detection	CodeCode Available	5
A Framework for Adapting Human-Robot Interaction to Diverse User Groups	Oct 15, 2024	Action DetectionActivity Detection	CodeCode Available	5
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification	Dec 13, 2017	Action ClassificationAction Detection	CodeCode Available	5
RespVAD: Voice Activity Detection via Video-Extracted Respiration Patterns	Aug 21, 2020	Action DetectionActivity Detection	CodeCode Available	5
Review of Action Recognition and Detection Methods	Oct 21, 2016	Action DetectionAction Recognition	CodeCode Available	5
A flexible model for training action localization with varying levels of supervision	Jun 29, 2018	Action DetectionAction Localization	CodeCode Available	5
Refining Action Boundaries for One-stage Detection	Oct 25, 2022	Action Detection	CodeCode Available	5
Skeleton-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action Detection	May 31, 2024	Action DetectionAction Recognition	CodeCode Available	5
ShuttleSet: A Human-Annotated Stroke-Level Singles Dataset for Badminton Tactical Analysis	Jun 8, 2023	Action DetectionSports Analytics	CodeCode Available	5
Pyramid Region-based Slot Attention Network for Temporal Action Proposal Generation	Jun 21, 2022	Action DetectionTemporal Action Proposal Generation	CodeCode Available	5
RALACs: Action Recognition in Autonomous Vehicles using Interaction Encoding and Optical Flow	Sep 28, 2022	Action ClassificationAction Detection	CodeCode Available	5
Protest Activity Detection and Perceived Violence Estimation from Social Media Images	Sep 18, 2017	Action DetectionActivity Detection	CodeCode Available	5
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection	Mar 22, 2017	Action DetectionAction Recognition In Videos	CodeCode Available	5
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay	Jul 4, 2022	Action DetectionActivity Detection	CodeCode Available	5
ACDnet: An action detection network for real-time edge computing based on flow-guided feature approximation and memory aggregation	Feb 26, 2021	Action DetectionEdge-computing	CodeCode Available	5
A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning	Jun 22, 2017	Action DetectionPosition	CodeCode Available	5
Pre-Equalization Aided Grant-Free Massive Access in Massive MIMO System	Feb 10, 2025	Action DetectionActivity Detection	CodeCode Available	5
Progression-Guided Temporal Action Detection in Videos	Aug 18, 2023	Action ClassificationAction Detection	CodeCode Available	5
Personal VAD: Speaker-Conditioned Voice Activity Detection	Aug 12, 2019	Action DetectionActivity Detection	CodeCode Available	5
Personalized Activity Recognition with Deep Triplet Embeddings	Jan 15, 2020	Action DetectionActivity Detection	CodeCode Available	5
PLSM: A Parallelized Liquid State Machine for Unintentional Action Detection	May 6, 2021	Action DetectionGPU	CodeCode Available	5
Multi-Stage Speaker Diarization for Noisy Classrooms	May 16, 2025	Action DetectionActivity Detection	CodeCode Available	5
Optimizing Large Language Models for ESG Activity Detection in Financial Texts	Feb 28, 2025	Action DetectionActivity Detection	CodeCode Available	5
A Pursuit of Temporal Accuracy in General Activity Detection	Mar 8, 2017	Action DetectionActivity Detection	CodeCode Available	5
On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes	Oct 25, 2024	Action DetectionData Augmentation	CodeCode Available	5
One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features	Apr 30, 2024	Action DetectionOpen-vocab Temporal Action Detection	CodeCode Available	5
Contextual Explainable Video Representation: Human Perception-based Understanding	Dec 12, 2022	Action DetectionAction Recognition	CodeCode Available	5
Online Human Action Detection using Joint Classification-Regression Recurrent Neural Networks	Apr 19, 2016	Action DetectionAction Recognition	CodeCode Available	5
Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN	Oct 10, 2017	Action DetectionAction Recognition	CodeCode Available	5
Simple yet efficient real-time pose-based action recognition	Apr 19, 2019	Action DetectionAction Recognition	CodeCode Available	5
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding	Nov 1, 2019	Action DetectionAction Recognition	CodeCode Available	5
Modality Distillation with Multiple Stream Networks for Action Recognition	Jun 19, 2018	Action ClassificationAction Detection	CodeCode Available	5
MaCLR: Motion-aware Contrastive Learning of Representations for Videos	Jun 17, 2021	Action DetectionAction Recognition	CodeCode Available	5
Actor-identified Spatiotemporal Action Detection --- Detecting Who Is Doing What in Videos	Aug 27, 2022	Action ClassificationAction Detection	CodeCode Available	5
SoccerDB: A Large-Scale Database for Comprehensive Video Understanding	Dec 10, 2019	Action ClassificationAction Detection	CodeCode Available	5
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos	Jul 25, 2024	Action DetectionAction Recognition	CodeCode Available	5
MINOTAUR: Multi-task Video Grounding From Multimodal Queries	Feb 16, 2023	Action DetectionSentence	CodeCode Available	5
Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection	Nov 21, 2018	Action DetectionFine-Grained Action Detection	CodeCode Available	5
Actor Conditioned Attention Maps for Video Action Detection	Dec 30, 2018	Action DetectionVideo Action Detection	CodeCode Available	5
Long-term Conversation Analysis: Exploring Utility and Privacy	Jun 28, 2023	Action DetectionActivity Detection	CodeCode Available	5
Learning Latent Super-Events to Detect Multiple Activities in Videos	Dec 5, 2017	Action DetectionActivity Detection	CodeCode Available	5
Coarse-Fine Networks for Temporal Activity Detection in Videos	Mar 1, 2021	Action DetectionActivity Detection	CodeCode Available	5
Online Spatiotemporal Action Detection and Prediction via Causal Representations	Aug 31, 2020	Action DetectionAction Recognition	CodeCode Available	5
Learning Spatio-Temporal Representation with Local and Global Diffusion	Jun 13, 2019	Action ClassificationAction Detection	CodeCode Available	5
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts	Dec 18, 2024	Action DetectionDescriptive	CodeCode Available	5

Show:10 25 50

← PrevPage 4 of 17Next →

All datasets UCF101-24 J-HMDB Charades Multi-THUMOS UCF Sports THUMOS' 14 MultiSports TSU TTStroke-21 ME21 TTStroke-21 ME22 MultiTHUMOS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	STAR/L	Frame-mAP 0.5	90.3	—	Unverified
2	SiA	Frame-mAP 0.5	88.5	—	Unverified
3	YOWO + LFB	Frame-mAP 0.5	87.3	—	Unverified
4	HIT	Frame-mAP 0.5	84.8	—	Unverified
5	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	82.3	—	Unverified
6	YOWO	Frame-mAP 0.5	80.4	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.2	78.48	—	Unverified
8	MOC	Frame-mAP 0.5	77.8	—	Unverified
9	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	76.3	—	Unverified
10	Two-in-one	Video-mAP 0.2	75.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SiA	Frame-mAP 0.5	88.5	—	Unverified
2	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	87.59	—	Unverified
3	HIT	Frame-mAP 0.5	83.8	—	Unverified
4	HISAN (VGG-16)	Frame-mAP 0.5	76.72	—	Unverified
5	DTS	Video-mAP 0.2	76.1	—	Unverified
6	YOWO + LFB	Frame-mAP 0.5	75.7	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.5	74.74	—	Unverified
8	YOWO	Frame-mAP 0.5	74.4	—	Unverified
9	MOC	Frame-mAP 0.5	74	—	Unverified
10	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	73.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TTM	mAP	28.79	—	Unverified
2	CTRN	mAP	27.8	—	Unverified
3	Coarse-Fine Networks (w/ self-supervised detection pretraining)	mAP	26.95	—	Unverified
4	UniMD+Sync. (RGB+Flow)	mAP	26.53	—	Unverified
5	PDAN (RGB+Flow)	mAP	26.5	—	Unverified
6	PAT	mAP	26.5	—	Unverified
7	MS-TCT (RGB only)	mAP	25.4	—	Unverified
8	3D ResNet-50 + super-events pretrained on AViD	mAP	25.2	—	Unverified
9	Coarse-Fine Networks	mAP	25.1	—	Unverified
10	MLAD (RGB + Flow)	mAP	23.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MLAD	mAP	51.5	—	Unverified
2	CTRN	mAP	51.2	—	Unverified
3	PDAN	mAP	47.6	—	Unverified
4	TGM	mAP	46.4	—	Unverified
5	MS-TCT (RGB only)	mAP	43.1	—	Unverified
6	I3D + our super-event	mAP	36.4	—	Unverified
7	Two-stream + LSTM	mAP	28.1	—	Unverified
8	Two-stream	mAP	27.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Two-in-one Two Stream	Video-mAP 0.5	96.52	—	Unverified
2	DTS	Video-mAP 0.2	94.3	—	Unverified
3	Two-in-one	Video-mAP 0.5	92.74	—	Unverified
4	T-CNN	Frame-mAP 0.5	86.7	—	Unverified
5	MR-TS R-CNN	Frame-mAP 0.5	84.52	—	Unverified
6	TS R-CNN	Frame-mAP 0.5	82.3	—	Unverified
7	Action Tubes	Frame-mAP 0.5	68.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MAT (Ours) Trans	mAP	71.6	—	Unverified
2	TadML-two stream	mAP	59.7	—	Unverified
3	MAT (ours)	mAP	58.2	—	Unverified
4	TadML-rgb	mAP	53.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HIT	Frame-mAP 0.5	33.3	—	Unverified
2	SiA	Frame-mAP 0.5	28.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MS-TCT	Frame-mAP	33.7	—	Unverified
2	PDAN	Frame-mAP	32.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN	IoU	0.14	—	Unverified
2	Two Stream Network	IoU	0.07	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN-V2 (Vote decision)	IoU	0.52	—	Unverified
2	RGB and PRGB	IoU	0.35	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PAT	mAP	44.6	—	Unverified