Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–650 of 1149 papers

Title	Date	Tasks	Status
Can Temporal Information Help with Contrastive Self-Supervised Learning?	Nov 25, 2020	Data AugmentationRepresentation Learning	—Unverified
Can't Fool Me: Adversarially Robust Transformer for Video Understanding	Oct 26, 2021	image-classificationImage Classification	—Unverified
CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning	May 1, 2020	DiagnosticObject	—Unverified
Causal Reasoning Meets Visual Representation Learning: A Prospective Study	Apr 26, 2022	BenchmarkingOut-of-Distribution Generalization	—Unverified
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs	Jul 1, 2025	Text GenerationVideo Understanding	—Unverified
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Dec 16, 2024	HallucinationMultiple-choice	—Unverified
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis	May 14, 2024	4kGPU	—Unverified
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos	Apr 25, 2018	General ClassificationVideo Classification	—Unverified
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System	Apr 27, 2023	Video Understanding	—Unverified
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jul 14, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
CinePile: A Long Video Question Answering Dataset and Benchmark	May 14, 2024	FormHuman-Object Interaction Detection	—Unverified
Clapper: Compact Learning and Video Representation in VLMs	May 21, 2025	Video Understanding	—Unverified
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation	Mar 19, 2021	ObjectReferring Expression Segmentation	—Unverified
CLIP4Caption: CLIP for Video Caption	Oct 13, 2021	DecoderSentence	—Unverified
Co-attentional Transformers for Story-Based Video Understanding	Oct 27, 2020	Question AnsweringVideo Question Answering	—Unverified
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework	Dec 11, 2024	GPULanguage Modeling	—Unverified
CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding	Jul 21, 2021	Question AnsweringSentence	—Unverified
Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization	Mar 22, 2025	Saliency DetectionSentence	—Unverified
How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs	May 6, 2024	Autonomous VehiclesVideo Understanding	—Unverified
Comprehensive Video Understanding: Video summarization with content-based video recommender design	Oct 30, 2019	Action RecognitionData Augmentation	—Unverified
Compressed Vision for Efficient Video Understanding	Oct 6, 2022	Video CompressionVideo Understanding	—Unverified
Concept Graph Neural Networks for Surgical Video Understanding	Feb 27, 2022	Video Understanding	—Unverified
Constructing Hierarchical Q&A Datasets for Video Story Understanding	Apr 1, 2019	Video Understanding	—Unverified
ContextDet: Temporal Action Detection with Adaptive Context Aggregation	Oct 20, 2024	Action DetectionVideo Understanding	—Unverified
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries	Apr 3, 2020	Referring Expression SegmentationVideo Segmentation	—Unverified
Contrastive Language-Action Pre-training for Temporal Localization	Apr 26, 2022	Action LocalizationContrastive Learning	—Unverified
Contrastive Language Video Time Pre-training	Jun 4, 2024	Action RecognitionContrastive Learning	—Unverified
CoS: Chain-of-Shot Prompting for Long Video Understanding	Feb 10, 2025	Video Understanding	—Unverified
CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos	Mar 24, 2025	Anomaly DetectionAnomaly Detection In Surveillance Videos	—Unverified
Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization	Jan 1, 2021	Action LocalizationVideo Understanding	—Unverified
Cross-Class Relevance Learning for Temporal Concept Localization	Nov 19, 2019	Feature EngineeringVideo Understanding	—Unverified
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding	Jan 17, 2024	Contrastive Learningpoint cloud video understanding	—Unverified
CTM: Collaborative Temporal Modeling for Action Recognition	Feb 8, 2020	Action RecognitionVideo Understanding	—Unverified
Cultivating DNN Diversity for Large Scale Video Labelling	Jul 13, 2017	DiversityVideo Understanding	—Unverified
Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data	Jan 17, 2020	Graph LearningVideo Understanding	—Unverified
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model	Jan 29, 2024	Action DetectionAction Localization	—Unverified
Cycle-Contrast for Self-Supervised Video Representation Learning	Oct 28, 2020	Action RecognitionContrastive Learning	—Unverified
DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description	Mar 31, 2025	Video DescriptionVideo Understanding	—Unverified
Deep learning for action spotting in association football videos	Oct 2, 2024	Action SpottingBenchmarking	—Unverified
Deep Spatio-Temporal Random Fields for Efficient Video Segmentation	Jul 3, 2018	Instance SegmentationSemantic Segmentation	—Unverified
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding	May 23, 2025	FormQuestion Answering	—Unverified
DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding	May 19, 2018	Action Recognition In VideosGesture Recognition	—Unverified
Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection	Jul 29, 2020	object-detectionObject Detection	—Unverified
Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding	Jun 1, 2022	Knowledge GraphsVideo Understanding	—Unverified
Discerning Generic Event Boundaries in Long-Form Wild Videos	Jun 18, 2021	Boundary DetectionForm	—Unverified
Discrete neural representations for explainable anomaly detection	Dec 10, 2021	Anomaly DetectionObject	—Unverified
Disentangle and denoise: Tackling context misalignment for video moment retrieval	Aug 14, 2024	DenoisingDisentanglement	—Unverified
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding	Oct 31, 2021	Action RecognitionText Detection	—Unverified
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning	Aug 29, 2024	Multi-Task LearningPrompt Learning	—Unverified
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition	Jan 11, 2019	Action ClassificationAction Recognition	—Unverified

Show:10 25 50

← PrevPage 13 of 23Next →

No leaderboard results yet.