Video Classification

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 455 papers

Title	Date	Tasks	Status
Hallucinating Optical Flow Features for Video Classification	May 28, 2019	ClassificationGeneral Classification	CodeCode Available
Adversarial Framing for Image and Video Classification	Dec 11, 2018	ClassificationGeneral Classification	CodeCode Available
Temporal Feature Weaving for Neonatal Echocardiographic Viewpoint Video Classification	Jan 7, 2025	Classificationimage-classification	CodeCode Available
Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints	May 12, 2019	General Classificationimage-classification	CodeCode Available
A Multimodal Handover Failure Detection Dataset and Baselines	Feb 28, 2024	Action SegmentationObject	CodeCode Available
Group Normalization	Mar 22, 2018	Objectobject-detection	CodeCode Available
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning	Jul 20, 2022	Action RecognitionClustering	CodeCode Available
Tensor-Train Recurrent Neural Networks for Video Classification	Jul 6, 2017	ClassificationGeneral Classification	CodeCode Available
Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification	Jul 5, 2017	AttributeGeneral Classification	CodeCode Available
Gated Channel Transformation for Visual Recognition	Sep 25, 2019	General Classificationimage-classification	CodeCode Available
Fine-grained Activity Recognition in Baseball Videos	Apr 9, 2018	Action DetectionActivity Detection	CodeCode Available
Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)	Jun 3, 2024	Few Shot Action RecognitionFine-Grained Image Classification	CodeCode Available
Fast Non-Local Neural Networks with Spectral Residual Learning	Oct 15, 2019	Pose EstimationVideo Classification	CodeCode Available
Attention Bottlenecks for Multimodal Fusion	Jun 30, 2021	Action ClassificationAction Recognition	CodeCode Available
FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War	Jan 29, 2024	Fact CheckingLanguage Modeling	CodeCode Available
Text-to-feature diffusion for audio-visual few-shot learning	Sep 7, 2023	ClassificationFew-Shot Learning	CodeCode Available
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures	May 30, 2019	Action ClassificationAction Recognition	CodeCode Available
BRIDLE: Generalized Self-supervised Learning with Quantization	Feb 4, 2025	image-classificationImage Classification	CodeCode Available
VPAI_Lab at MedVidQA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification	May 1, 2022	Video Classification	CodeCode Available
The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge	Jun 16, 2017	General ClassificationVideo Classification	CodeCode Available
Pushing the boundaries of event subsampling in event-based video classification using CNNs	Sep 13, 2024	Event data classificationSensitivity	CodeCode Available
Extending Information Bottleneck Attribution to Video Sequences	Jan 28, 2025	DeepFake DetectionFace Swapping	CodeCode Available
Exploring Temporal Information for Improved Video Understanding	May 25, 2019	Action RecognitionOptical Flow Estimation	CodeCode Available
VideoDG: Generalizing Temporal Relations in Videos to Novel Domains	Dec 8, 2019	Action RecognitionData Augmentation	CodeCode Available
RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs	Oct 6, 2020	3D Semantic SegmentationSemantic Segmentation	CodeCode Available

Show:10 25 50

← PrevPage 17 of 19Next →

All datasets Breakfast COIN MoB YouTube-8M Hockey Fight Detection Dataset Charades Home Action Genome Kinetics Multimodal PISA Something-Something V1 Something-Something V2 SRI-APPROVE Fine-Grained Video Classification

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	HERMES	Accuracy (%)	95.2	—	Unverified
2	MA-LMM	Accuracy (%)	93	—	Unverified
3	S5	Accuracy (%)	90.7	—	Unverified
4	TranS4mer	Accuracy (%)	90.27	—	Unverified
5	D-Sprv.	Accuracy (%)	89.9	—	Unverified
6	ViS4mer	Accuracy (%)	88.2	—	Unverified
7	GHRM	Accuracy (%)	75.5	—	Unverified
8	Timeception	Accuracy (%)	71.3	—	Unverified
9	VideoGraph	Accuracy (%)	69.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HERMES	Accuracy (%)	93.5	—	Unverified
2	MA-LMM	Accuracy (%)	93.2	—	Unverified
3	S5	Accuracy (%)	90.8	—	Unverified
4	D-Sprv.	Accuracy (%)	90	—	Unverified
5	TranS4mer	Accuracy (%)	89.3	—	Unverified
6	ViS4mer	Accuracy (%)	88.4	—	Unverified
7	TSN	Accuracy (%)	73.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	VTN	Accuracy	77.85	—	Unverified
2	I3D	Accuracy	72.11	—	Unverified
3	ConvLSTM	Accuracy	69.71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DCGN (self-attention graph pooling)	Hit@1	87.7	—	Unverified
2	Hierarchical LSTM with MoE	Hit@1	86.8	—	Unverified
3	Mixture-of-2-Experts	Hit@1	70.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Structured Keypoint Pooling	Accuracy	99.5	—	Unverified
2	CNN+LSTM	1:1 Accuracy	98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Multigrid	mAP	38.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Cooperative Ours (3rd-person)	Accuracy (%)	24.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Multigrid	Top-1	77.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Video	Accuracy (%)	73.95	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MSNet-R50En (ours)	Top-5 Accuracy	84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MSNet-R50En (ours)	Top-5 Accuracy	91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Multi-Label Prototypes Contrastive Learning	AUPR	88.4	—	Unverified