Video Classification

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–450 of 455 papers

Title	Date	Tasks	Status
RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs	Feb 9, 2021	3D Semantic SegmentationSemantic Segmentation	CodeCode Available
Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks	Sep 27, 2018	Action RecognitionClassification	CodeCode Available
Read My Ears! Horse Ear Movement Detection for Equine Affective State Assessment	May 6, 2025	Optical Flow EstimationVideo Classification	CodeCode Available
The YouTube-8M Kaggle Competition: Challenges and Methods	Jun 28, 2017	General ClassificationVideo Classification	CodeCode Available
Adversarial Perturbations Against Real-Time Video Classification Systems	Jul 2, 2018	ClassificationGeneral Classification	CodeCode Available
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation	Jun 14, 2025	Test-time AdaptationVideo Classification	CodeCode Available
Evaluation of Explanation Methods of AI -- CNNs in Image Classification Tasks with Reference-based and No-reference Metrics	Dec 2, 2022	image-classificationImage Classification	CodeCode Available
Representation Flow for Action Recognition	Oct 2, 2018	Action ClassificationAction Recognition	CodeCode Available
ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition	Sep 3, 2024	Action Recognitionimage-classification	CodeCode Available
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification	Dec 13, 2017	Action ClassificationAction Detection	CodeCode Available
Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset	Jun 24, 2017	ClassificationGeneral Classification	CodeCode Available
Efficient Video Classification Using Fewer Frames	Feb 27, 2019	ClassificationClustering	CodeCode Available
Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video	Jun 5, 2015	Gesture RecognitionImage Captioning	CodeCode Available
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding	Jun 1, 2015	Action DetectionAction Recognition	CodeCode Available
Efficient Lung Ultrasound Severity Scoring Using Dedicated Feature Extractor	Jan 21, 2025	DiagnosticKnowledge Distillation	CodeCode Available
Approaches Toward Physical and General Video Anomaly Detection	Dec 14, 2021	Anomaly DetectionDensity Estimation	CodeCode Available
Movie Genre Classification by Language Augmentation and Shot Sampling	Mar 24, 2022	Action RecognitionBoundary Detection	CodeCode Available
TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification	Jun 21, 2021	Action ClassificationClassification	CodeCode Available
Beyond Short Snippets: Deep Networks for Video Classification	Mar 31, 2015	Action RecognitionClassification	CodeCode Available
Robust Real-Time Violence Detection in Video Using CNN And LSTM	Mar 27, 2019	Action Recognition In VideosVideo Classification	CodeCode Available
Towards a Robust Framework for Multimodal Hate Detection: A Study on Video vs. Image-based Content	Feb 11, 2025	Hate Speech DetectionVideo Classification	CodeCode Available
Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge	Jul 7, 2016	Action DetectionActivity Detection	CodeCode Available
Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions	Feb 4, 2019	Action ClassificationGeneral Classification	CodeCode Available
Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space	Sep 9, 2024	Video Classification	CodeCode Available
UTS submission to Google YouTube-8M Challenge 2017	Jul 13, 2017	ClassificationGeneral Classification	CodeCode Available

Show:10 25 50

← PrevPage 18 of 19Next →

All datasets Breakfast COIN MoB YouTube-8M Hockey Fight Detection Dataset Charades Home Action Genome Kinetics Multimodal PISA Something-Something V1 Something-Something V2 SRI-APPROVE Fine-Grained Video Classification

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	HERMES	Accuracy (%)	95.2	—	Unverified
2	MA-LMM	Accuracy (%)	93	—	Unverified
3	S5	Accuracy (%)	90.7	—	Unverified
4	TranS4mer	Accuracy (%)	90.27	—	Unverified
5	D-Sprv.	Accuracy (%)	89.9	—	Unverified
6	ViS4mer	Accuracy (%)	88.2	—	Unverified
7	GHRM	Accuracy (%)	75.5	—	Unverified
8	Timeception	Accuracy (%)	71.3	—	Unverified
9	VideoGraph	Accuracy (%)	69.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HERMES	Accuracy (%)	93.5	—	Unverified
2	MA-LMM	Accuracy (%)	93.2	—	Unverified
3	S5	Accuracy (%)	90.8	—	Unverified
4	D-Sprv.	Accuracy (%)	90	—	Unverified
5	TranS4mer	Accuracy (%)	89.3	—	Unverified
6	ViS4mer	Accuracy (%)	88.4	—	Unverified
7	TSN	Accuracy (%)	73.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	VTN	Accuracy	77.85	—	Unverified
2	I3D	Accuracy	72.11	—	Unverified
3	ConvLSTM	Accuracy	69.71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DCGN (self-attention graph pooling)	Hit@1	87.7	—	Unverified
2	Hierarchical LSTM with MoE	Hit@1	86.8	—	Unverified
3	Mixture-of-2-Experts	Hit@1	70.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Structured Keypoint Pooling	Accuracy	99.5	—	Unverified
2	CNN+LSTM	1:1 Accuracy	98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Multigrid	mAP	38.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Cooperative Ours (3rd-person)	Accuracy (%)	24.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Multigrid	Top-1	77.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Video	Accuracy (%)	73.95	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MSNet-R50En (ours)	Top-5 Accuracy	84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MSNet-R50En (ours)	Top-5 Accuracy	91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Multi-Label Prototypes Contrastive Learning	AUPR	88.4	—	Unverified