SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 10511100 of 1149 papers

TitleStatusHype
Temporally smooth online action detection using cycle-consistent future anticipationCode0
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge UnderstandingCode0
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTubeCode0
Temporal Action Proposal Generation With Action Frequency Adaptive NetworkCode0
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero ShotCode0
Telling Stories for Common Sense Zero-Shot Action RecognitionCode0
Technical Report for CVPR 2022 LOVEU AQTC ChallengeCode0
Tiny Video NetworksCode0
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental LearningCode0
Task-Aware KV Compression For Cost-Effective Long Video UnderstandingCode0
TAda! Temporally-Adaptive Convolutions for Video UnderstandingCode0
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation LearningCode0
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding ApproachCode0
Streaming Detection of Queried Event StartCode0
Hallucination Mitigation Prompts Long-term Video UnderstandingCode0
Gaussian Temporal Awareness Networks for Action LocalizationCode0
FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story VideosCode0
Video action detection by learning graph-based spatio-temporal interactionsCode0
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding TasksCode0
Spatio-Temporal Perturbations for Video AttributionCode0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
SoccerNet 2024 Challenges ResultsCode0
Few-Shot Referring Relationships in VideosCode0
Towards Multimodal Video Paragraph Captioning Models Robust to Missing ModalityCode0
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game UnderstandingCode0
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object UnderstandingCode0
Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action SegmentationCode0
Features Understanding in 3D CNNs for Actions Recognition in VideoCode0
Situational Scene Graph for Structured Human-centric Situation UnderstandingCode0
Exploring Temporal Information for Improved Video UnderstandingCode0
SeriesBench: A Benchmark for Narrative-Driven Drama Series UnderstandingCode0
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event UnderstandingCode0
Exploiting Long-Term Dependencies for Generating Dynamic Scene GraphsCode0
Screencast Tutorial Video UnderstandingCode0
Video Object Segmentation using Supervoxel-Based GerrymanderingCode0
ScaleLong: A Multi-Timescale Benchmark for Long Video UnderstandingCode0
Representation Flow for Action RecognitionCode0
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity RecognitionCode0
Relation-aware Hierarchical Attention Framework for Video Question AnsweringCode0
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action RecognitionCode0
Recurrent Space-time Graph Neural NetworksCode0
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic VideosCode0
ACVUBench: Audio-Centric Video Understanding BenchmarkCode0
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video ArchitecturesCode0
Win-Fail Action RecognitionCode0
VideoQA in the Era of LLMs: An Empirical StudyCode0
UAL-Bench: The First Comprehensive Unusual Activity Localization BenchmarkCode0
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence AlignmentCode0
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric OptimizationCode0
Enhancing Temporal Modeling of Video LLMs via Time GatingCode0
Show:102550
← PrevPage 22 of 23Next →

No leaderboard results yet.