Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 817 papers

Title	Date	Tasks	Status	Hype
Moshi: a speech-text foundation model for real-time dialogue	Sep 17, 2024	Action DetectionActivity Detection	CodeCode Available	9
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection	Feb 27, 2025	Action DetectionBenchmarking	CodeCode Available	3
Harnessing Temporal Causality for Advanced Temporal Action Detection	Jul 25, 2024	Action DetectionAction Recognition	CodeCode Available	3
Efficient Video Action Detection with Token Dropout and Context Refinement	Apr 17, 2023	Action DetectionDecoder	CodeCode Available	3
pyannote.audio: neural building blocks for speaker diarization	Nov 4, 2019	Action DetectionActivity Detection	CodeCode Available	3
YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition	Aug 5, 2024	Action Detection	CodeCode Available	2
TIM: A Time Interval Machine for Audio-Visual Action Recognition	Apr 8, 2024	Action DetectionAction Recognition	CodeCode Available	2
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection	Apr 7, 2024	Action DetectionMoment Queries	CodeCode Available	2
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames	Nov 28, 2023	Action DetectionTemporal Action Localization	CodeCode Available	2
Temporal Action Localization with Enhanced Instant Discriminability	Sep 11, 2023	Action DetectionAction Localization	CodeCode Available	2
Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation	Jun 30, 2023	Action DetectionPose Prediction	CodeCode Available	2
TriDet: Temporal Action Detection with Relative Boundary Modeling	Mar 13, 2023	Action DetectionTemporal Action Localization	CodeCode Available	2
YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection	Feb 14, 2023	Action Detection	CodeCode Available	2
Structured Attention Composition for Temporal Action Localization	May 20, 2022	Action DetectionAction Localization	CodeCode Available	2
Colar: Effective and Efficient Online Action Detection by Consulting Exemplars	Mar 2, 2022	Action DetectionOnline Action Detection	CodeCode Available	2
audino: A Modern Annotation Tool for Audio and Speech	Jun 9, 2020	Action DetectionActivity Detection	CodeCode Available	2
Temporal Action Detection with Structured Segment Networks	Apr 20, 2017	Action DetectionAction Recognition	CodeCode Available	2
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm	Jun 3, 2025	Action DetectionActivity Detection	CodeCode Available	1
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer	May 9, 2025	Action DetectionDecoder	CodeCode Available	1
Context-Enhanced Memory-Refined Transformer for Online Action Detection	Mar 24, 2025	Action DetectionDecoder	CodeCode Available	1
VANPY: Voice Analysis Framework	Feb 17, 2025	Action DetectionActivity Detection	CodeCode Available	1
Preventing Rogue Agents Improves Multi-Agent Collaboration	Feb 9, 2025	Action Detection	CodeCode Available	1
Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models	Jan 23, 2025	Action DetectionPseudo Label	CodeCode Available	1
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection	Jan 10, 2025	Action DetectionGPU	CodeCode Available	1
WiFi CSI Based Temporal Activity Detection via Dual Pyramid Network	Dec 19, 2024	Action DetectionAction Recognition	CodeCode Available	1
USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation	Dec 12, 2024	Action DetectionAction Recognition	CodeCode Available	1
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection	Nov 17, 2024	Action DetectionOpen Vocabulary Action Detection	CodeCode Available	1
Towards Student Actions in Classroom Scenes: New Dataset and Baseline	Sep 2, 2024	Action DetectionBenchmarking	CodeCode Available	1
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos	Jul 17, 2024	Action DetectionAction Localization	CodeCode Available	1
MMAD: Multi-label Micro-Action Detection in Videos	Jul 7, 2024	Action AnalysisAction Detection	CodeCode Available	1
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection	Jul 3, 2024	Action DetectionDynamic neural networks	CodeCode Available	1
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender Segmentation	Jun 6, 2024	Action DetectionActivity Detection	CodeCode Available	1
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding	May 14, 2024	Action DetectionGPU	CodeCode Available	1
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression	Apr 3, 2024	Action Detectionobject-detection	CodeCode Available	1
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions	Mar 29, 2024	Action DetectionBenchmarking	CodeCode Available	1
Online speaker diarization of meetings guided by speech separation	Jan 30, 2024	Action DetectionActivity Detection	CodeCode Available	1
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering	Jan 3, 2024	Action DetectionHuman-Object Interaction Detection	CodeCode Available	1
Generative Model-based Feature Knowledge Distillation for Action Recognition	Dec 14, 2023	Action DetectionAction Recognition	CodeCode Available	1
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos	Dec 4, 2023	Action DetectionVideo Recognition	CodeCode Available	1
ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors	Oct 25, 2023	Action DetectionPose Estimation	CodeCode Available	1
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers	Sep 3, 2023	Action DetectionAction Spotting	CodeCode Available	1
Memory-and-Anticipation Transformer for Online Action Understanding	Aug 15, 2023	Action DetectionAction Understanding	CodeCode Available	1
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development	Jul 17, 2023	Action DetectionActivity Detection	CodeCode Available	1
Multi-Granularity Hand Action Detection	Jun 19, 2023	Action DetectionAction Localization	CodeCode Available	1
E2E-LOAD: End-to-End Long-form Online Action Detection	Jun 13, 2023	Action DetectionForm	CodeCode Available	1
WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition	Apr 11, 2023	Action DetectionAction Localization	CodeCode Available	1
Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection	Apr 10, 2023	Action DetectionLanguage Modeling	CodeCode Available	1
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion	Mar 27, 2023	Action DetectionDecoder	CodeCode Available	1
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings	Mar 7, 2023	Action DetectionActivity Detection	CodeCode Available	1
MiniROAD: Minimal RNN Framework for Online Action Detection	Jan 1, 2023	Action DetectionOnline Action Detection	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 17Next →

All datasets UCF101-24 J-HMDB Charades Multi-THUMOS UCF Sports THUMOS' 14 MultiSports TSU TTStroke-21 ME21 TTStroke-21 ME22 MultiTHUMOS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	STAR/L	Frame-mAP 0.5	90.3	—	Unverified
2	SiA	Frame-mAP 0.5	88.5	—	Unverified
3	YOWO + LFB	Frame-mAP 0.5	87.3	—	Unverified
4	HIT	Frame-mAP 0.5	84.8	—	Unverified
5	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	82.3	—	Unverified
6	YOWO	Frame-mAP 0.5	80.4	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.2	78.48	—	Unverified
8	MOC	Frame-mAP 0.5	77.8	—	Unverified
9	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	76.3	—	Unverified
10	Two-in-one	Video-mAP 0.2	75.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SiA	Frame-mAP 0.5	88.5	—	Unverified
2	HISAN (ResNet-101 + FPN)	Video-mAP 0.2	87.59	—	Unverified
3	HIT	Frame-mAP 0.5	83.8	—	Unverified
4	HISAN (VGG-16)	Frame-mAP 0.5	76.72	—	Unverified
5	DTS	Video-mAP 0.2	76.1	—	Unverified
6	YOWO + LFB	Frame-mAP 0.5	75.7	—	Unverified
7	Two-in-one Two Stream	Video-mAP 0.5	74.74	—	Unverified
8	YOWO	Frame-mAP 0.5	74.4	—	Unverified
9	MOC	Frame-mAP 0.5	74	—	Unverified
10	Faster-RCNN + two-stream I3D conv	Frame-mAP 0.5	73.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TTM	mAP	28.79	—	Unverified
2	CTRN	mAP	27.8	—	Unverified
3	Coarse-Fine Networks (w/ self-supervised detection pretraining)	mAP	26.95	—	Unverified
4	UniMD+Sync. (RGB+Flow)	mAP	26.53	—	Unverified
5	PDAN (RGB+Flow)	mAP	26.5	—	Unverified
6	PAT	mAP	26.5	—	Unverified
7	MS-TCT (RGB only)	mAP	25.4	—	Unverified
8	3D ResNet-50 + super-events pretrained on AViD	mAP	25.2	—	Unverified
9	Coarse-Fine Networks	mAP	25.1	—	Unverified
10	MLAD (RGB + Flow)	mAP	23.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MLAD	mAP	51.5	—	Unverified
2	CTRN	mAP	51.2	—	Unverified
3	PDAN	mAP	47.6	—	Unverified
4	TGM	mAP	46.4	—	Unverified
5	MS-TCT (RGB only)	mAP	43.1	—	Unverified
6	I3D + our super-event	mAP	36.4	—	Unverified
7	Two-stream + LSTM	mAP	28.1	—	Unverified
8	Two-stream	mAP	27.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Two-in-one Two Stream	Video-mAP 0.5	96.52	—	Unverified
2	DTS	Video-mAP 0.2	94.3	—	Unverified
3	Two-in-one	Video-mAP 0.5	92.74	—	Unverified
4	T-CNN	Frame-mAP 0.5	86.7	—	Unverified
5	MR-TS R-CNN	Frame-mAP 0.5	84.52	—	Unverified
6	TS R-CNN	Frame-mAP 0.5	82.3	—	Unverified
7	Action Tubes	Frame-mAP 0.5	68.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MAT (Ours) Trans	mAP	71.6	—	Unverified
2	TadML-two stream	mAP	59.7	—	Unverified
3	MAT (ours)	mAP	58.2	—	Unverified
4	TadML-rgb	mAP	53.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HIT	Frame-mAP 0.5	33.3	—	Unverified
2	SiA	Frame-mAP 0.5	28.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MS-TCT	Frame-mAP	33.7	—	Unverified
2	PDAN	Frame-mAP	32.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN	IoU	0.14	—	Unverified
2	Two Stream Network	IoU	0.07	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STCNN-V2 (Vote decision)	IoU	0.52	—	Unverified
2	RGB and PRGB	IoU	0.35	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PAT	mAP	44.6	—	Unverified