SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–625 of 1149 papers

Title	Date	Tasks	Status
Can Temporal Information Help with Contrastive Self-Supervised Learning?	Nov 25, 2020	Data AugmentationRepresentation Learning	—Unverified
Can't Fool Me: Adversarially Robust Transformer for Video Understanding	Oct 26, 2021	image-classificationImage Classification	—Unverified
CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning	May 1, 2020	DiagnosticObject	—Unverified
Causal Reasoning Meets Visual Representation Learning: A Prospective Study	Apr 26, 2022	BenchmarkingOut-of-Distribution Generalization	—Unverified
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs	Jul 1, 2025	Text GenerationVideo Understanding	—Unverified
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Dec 16, 2024	HallucinationMultiple-choice	—Unverified
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis	May 14, 2024	4kGPU	—Unverified
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos	Apr 25, 2018	General ClassificationVideo Classification	—Unverified
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System	Apr 27, 2023	Video Understanding	—Unverified
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jul 14, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
CinePile: A Long Video Question Answering Dataset and Benchmark	May 14, 2024	FormHuman-Object Interaction Detection	—Unverified
Clapper: Compact Learning and Video Representation in VLMs	May 21, 2025	Video Understanding	—Unverified
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation	Mar 19, 2021	ObjectReferring Expression Segmentation	—Unverified
CLIP4Caption: CLIP for Video Caption	Oct 13, 2021	DecoderSentence	—Unverified
Co-attentional Transformers for Story-Based Video Understanding	Oct 27, 2020	Question AnsweringVideo Question Answering	—Unverified
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework	Dec 11, 2024	GPULanguage Modeling	—Unverified
CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding	Jul 21, 2021	Question AnsweringSentence	—Unverified
Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization	Mar 22, 2025	Saliency DetectionSentence	—Unverified
How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs	May 6, 2024	Autonomous VehiclesVideo Understanding	—Unverified
Comprehensive Video Understanding: Video summarization with content-based video recommender design	Oct 30, 2019	Action RecognitionData Augmentation	—Unverified
Compressed Vision for Efficient Video Understanding	Oct 6, 2022	Video CompressionVideo Understanding	—Unverified
Concept Graph Neural Networks for Surgical Video Understanding	Feb 27, 2022	Video Understanding	—Unverified
Constructing Hierarchical Q&A Datasets for Video Story Understanding	Apr 1, 2019	Video Understanding	—Unverified
ContextDet: Temporal Action Detection with Adaptive Context Aggregation	Oct 20, 2024	Action DetectionVideo Understanding	—Unverified
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries	Apr 3, 2020	Referring Expression SegmentationVideo Segmentation	—Unverified

Show:10 25 50

← PrevPage 25 of 46Next →

No leaderboard results yet.