Temporal Action Localization

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 1477 papers

Title	Date	Tasks	Status	Score
Learning deep representations for video-based intake gesture detection	Sep 24, 2019	Action RecognitionTemporal Action Localization	CodeCode Available	5
Large-scale weakly-supervised pre-training for video action recognition	May 2, 2019	Action ClassificationAction Recognition	CodeCode Available	5
Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations	Nov 20, 2018	Temporal Action Localization	CodeCode Available	5
Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization	Jan 1, 2023	Action LocalizationPseudo Label	CodeCode Available	5
Let's Dance: Learning From Online Dance Videos	Jan 23, 2018	Action RecognitionOptical Flow Estimation	CodeCode Available	5
Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts	Jan 21, 2024	Action RecognitionScheduling	CodeCode Available	5
Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing	Jan 1, 2023	Action LocalizationMultiple Instance Learning	CodeCode Available	5
Action Recognition Based on Optimal Joint Selection and Discriminative Depth Descriptor	Nov 27, 2016	Action RecognitionDynamic Time Warping	CodeCode Available	5
Joint Discovery of Object States and Manipulation Actions	Feb 9, 2017	Action RecognitionClustering	CodeCode Available	5
KORSAL: Key-point Detection based Online Real-Time Spatio-Temporal Action Localization	Nov 5, 2021	Action LocalizationOptical Flow Estimation	CodeCode Available	5
Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping	Feb 6, 2024	Action Recognitionimage-classification	CodeCode Available	5
Advancing Compressed Video Action Recognition through Progressive Knowledge Distillation	Jul 2, 2024	Action RecognitionKnowledge Distillation	CodeCode Available	5
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision	Nov 29, 2018	Action RecognitionActive Learning	CodeCode Available	5
Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition	Jul 15, 2023	Action RecognitionContrastive Learning	CodeCode Available	5
Language Model Guided Interpretable Video Action Reasoning	Apr 2, 2024	Action RecognitionDecision Making	CodeCode Available	5
Skeleton-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action Detection	May 31, 2024	Action DetectionAction Recognition	CodeCode Available	5
Interpretable 3D Human Action Analysis with Temporal Convolutional Networks	Apr 14, 2017	3D Action RecognitionAction Analysis	CodeCode Available	5
AENet: Learning Deep Audio Features for Video Analysis	Jan 3, 2017	Action RecognitionData Augmentation	CodeCode Available	5
Beyond the Self: Using Grounded Affordances to Interpret and Describe Others' Actions	Feb 26, 2019	Action RecognitionTemporal Action Localization	CodeCode Available	5
Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?	May 15, 2025	Action RecognitionTemporal Action Localization	CodeCode Available	5
Learning Gating ConvNet for Two-Stream based Methods in Action Recognition	Sep 12, 2017	Action ClassificationAction Recognition	CodeCode Available	5
Im2Flow: Motion Hallucination from Static Images for Action Recognition	Dec 12, 2017	Action RecognitionActivity Recognition	CodeCode Available	5
Investigation of Different Skeleton Features for CNN-based 3D Action Recognition	May 2, 2017	3D Action RecognitionAction Analysis	CodeCode Available	5
Large-scale Robustness Analysis of Video Action Recognition Models	Jul 4, 2022	Action RecognitionTemporal Action Localization	CodeCode Available	5
Hierarchical Explanations for Video Action Recognition	Jan 1, 2023	Action ClassificationAction Recognition	CodeCode Available	5

Show:10 25 50

← PrevPage 10 of 60Next →

All datasets THUMOS14 ActivityNet-1.3 HACS FineAction MultiTHUMOS CrossTask EPIC-KITCHENS-100 MUSES ActivityNet-1.2 Ego4D MQ test Ego4D MQ val MEXaction2

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AdaTAD (VideoMAEv2-giant)	Avg mAP (0.3:0.7)	76.9	—	Unverified
2	RDFA-S6 (InternVideo2-6B)	Avg mAP (0.3:0.7)	74.2	—	Unverified
3	ActionMamba(InternVideo2-6B)	Avg mAP (0.3:0.7)	72.72	—	Unverified
4	GCM	mAP IOU@0.1	72.5	—	Unverified
5	AGT (Ours)	mAP IOU@0.1	72.1	—	Unverified
6	InternVideo2-6B	Avg mAP (0.3:0.7)	72	—	Unverified
7	ActionFormer (InternVideo features)	Avg mAP (0.3:0.7)	71.58	—	Unverified
8	TriDet (VideoMAE v2-g feature)	Avg mAP (0.3:0.7)	70.1	—	Unverified
9	InternVideo2-1B	Avg mAP (0.3:0.7)	69.8	—	Unverified
10	ActionFormer (VideoMAE V2-g features)	Avg mAP (0.3:0.7)	69.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UnLoc-L	mAP IOU@0.5	59.3	—	Unverified
2	RDFA-S6 (InternVideo2-6B)	mAP	42.9	—	Unverified
3	ActionMamba (InternVideo2-6B)	mAP	42.02	—	Unverified
4	PRN+BMN (ensemble)	mAP	42	—	Unverified
5	AdaTAD (VideoMAEv2-giant)	mAP	41.93	—	Unverified
6	InternVideo2-6B	mAP	41.2	—	Unverified
7	InternVideo2-1B	mAP	40.4	—	Unverified
8	UniMD+Sync.	mAP	39.83	—	Unverified
9	PRN (CSN)	mAP	39.4	—	Unverified
10	InternVideo	mAP	39	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	RDFA-S6 (InternVideo2-6B)	Average-mAP	45.8	—	Unverified
2	ActionMamba(InternVideo2-6B)	Average-mAP	44.56	—	Unverified
3	DyFADet(VideoMAEv2)	Average-mAP	44.3	—	Unverified
4	InternVideo2-6B	Average-mAP	43.3	—	Unverified
5	TriDet (VideoMAEv2)	Average-mAP	43.1	—	Unverified
6	InternVideo2-1B	Average-mAP	42.4	—	Unverified
7	InternVideo	Average-mAP	41.55	—	Unverified
8	TriDet (SlowFast)	Average-mAP	38.6	—	Unverified
9	TriDet (I3D RGB)	Average-mAP	36.8	—	Unverified
10	TadTr (I3D RGB)	Average-mAP	32.09	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	RDFA-S6 (InternVideo2-6B)	mAP	29.6	—	Unverified
2	ActionMamba(InternVideo2-6B)	mAP	29.04	—	Unverified
3	InternVideo2-6B	mAP	27.7	—	Unverified
4	DyFADet (VideoMAE v2-g)	mAP	23.8	—	Unverified
5	VideoMAE V2-g	mAP	18.24	—	Unverified
6	InternVideo	mAP	17.57	—	Unverified
7	BMN (i3d feaure)	mAP	9.25	—	Unverified
8	G-TAD (i3d feature)	mAP	9.06	—	Unverified
9	DBG (i3d feature)	mAP	6.75	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TriDet (VideoMAEv2)	Average mAP	37.5	—	Unverified
2	DualDETR (I3D-rgb)	Average mAP	32.64	—	Unverified
3	TriDet (I3D-rgb)	Average mAP	30.7	—	Unverified
4	TemporalMaxer	Average mAP	29.9	—	Unverified
5	PointTAD	Average mAP	23.5	—	Unverified
6	PDAN	Average mAP	17.3	—	Unverified
7	MS-TCT	Average mAP	16.2	—	Unverified
8	MLAD	Average mAP	14.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	VideoCLIP	Recall	47.3	—	Unverified
2	VLM	Recall	46.5	—	Unverified
3	TACo	Recall	42.5	—	Unverified
4	Text-Video Embedding	Recall	33.6	—	Unverified
5	Fully-supervised upper-bound	Recall	31.6	—	Unverified
6	Zhukov	Recall	22.4	—	Unverified
7	Alayrac	Recall	13.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	AdaTAD (verb, VideoMAE-L)	Avg mAP (0.1-0.5)	29.3	—	Unverified
2	TriDet (verb)	Avg mAP (0.1-0.5)	25.4	—	Unverified
3	TemporalMaxer (verb)	Avg mAP (0.1-0.5)	24.5	—	Unverified
4	ActionFormer (verb)	Avg mAP (0.1-0.5)	23.5	—	Unverified
5	G-TAD (verb)	Avg mAP (0.1-0.5)	9.4	—	Unverified
6	BMN (verb)	Avg mAP (0.1-0.5)	8.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TemporalMaxer	mAP	27.2	—	Unverified
2	MUSES	mAP	18.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DeepMetricLearner	mAP IOU@0.5	35.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ActionFormer (SlowFast+Omnivore+EgoVLP)	Average mAP	21.76	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ActionFormer (SlowFast+Omnivore+EgoVLP)	Average mAP	21.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	S-CNN	mAP	7.4	—	Unverified