SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 451500 of 1149 papers

TitleStatusHype
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental LearningCode0
Task-Aware KV Compression For Cost-Effective Long Video UnderstandingCode0
TAda! Temporally-Adaptive Convolutions for Video UnderstandingCode0
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge UnderstandingCode0
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation LearningCode0
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding ApproachCode0
Hallucination Mitigation Prompts Long-term Video UnderstandingCode0
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object UnderstandingCode0
Video action detection by learning graph-based spatio-temporal interactionsCode0
Streaming Detection of Queried Event StartCode0
SoccerNet 2024 Challenges ResultsCode0
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game UnderstandingCode0
Creative Flow+ DatasetCode0
Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action SegmentationCode0
Situational Scene Graph for Structured Human-centric Situation UnderstandingCode0
SeriesBench: A Benchmark for Narrative-Driven Drama Series UnderstandingCode0
Gaussian Temporal Awareness Networks for Action LocalizationCode0
DramaQA: Character-Centered Video Story Understanding with Hierarchical QACode0
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video ProcessingCode0
DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet ArchitectureCode0
ScaleLong: A Multi-Timescale Benchmark for Long Video UnderstandingCode0
Screencast Tutorial Video UnderstandingCode0
Representation Flow for Action RecognitionCode0
Contextual Explainable Video Representation: Human Perception-based UnderstandingCode0
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event UnderstandingCode0
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action RecognitionCode0
Recurrent Space-time Graph Neural NetworksCode0
FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story VideosCode0
Constrained-size Tensorflow Models for YouTube-8M Video Understanding ChallengeCode0
VideoDG: Generalizing Temporal Relations in Videos to Novel DomainsCode0
Relation-aware Hierarchical Attention Framework for Video Question AnsweringCode0
SoccerDB: A Large-Scale Database for Comprehensive Video UnderstandingCode0
Pooled Motion Features for First-Person VideosCode0
A Coding Framework and Benchmark towards Low-Bitrate Video UnderstandingCode0
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding TasksCode0
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence AlignmentCode0
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and BenchmarkCode0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video UnderstandingCode0
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow AnalysisCode0
Few-Shot Referring Relationships in VideosCode0
Features Understanding in 3D CNNs for Actions Recognition in VideoCode0
A Context-Aware Loss Function for Action Spotting in Soccer VideosCode0
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under OcclusionsCode0
Spatio-Temporal Perturbations for Video AttributionCode0
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video ArchitecturesCode0
NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video ClassificationCode0
Exploring Temporal Information for Improved Video UnderstandingCode0
Multimodal Dialogue State TrackingCode0
Exploiting Long-Term Dependencies for Generating Dynamic Scene GraphsCode0
Show:102550
← PrevPage 10 of 23Next →

No leaderboard results yet.