SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 801850 of 1149 papers

TitleStatusHype
Egocentric and Exocentric Methods: A Short Survey0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling0
Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition0
Exploring Anchor-based Detection for Ego4D Natural Language Query0
Exploring Missing Modality in Multimodal Egocentric Datasets0
Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 20220
Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding0
Extending Video Masked Autoencoders to 128 frames0
Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding0
Real-Time Segmentation Networks should be Latency Aware0
Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning0
FaVChat: Unlocking Fine-Grained Facail Video Understanding with Multimodal Large Language Models0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding0
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models0
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
Fine-Grain Annotation of Cricket Videos0
Fine-Grained Video Captioning through Scene Graph Consolidation0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge0
Flatten: Video Action Recognition is an Image Classification task0
Flexible Frame Selection for Efficient Video Reasoning0
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding0
FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering0
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions0
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles0
Frame-Voyager: Learning to Query Frames for Video Large Language Models0
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models0
From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction0
From Image to Video, what do we need in multimodal LLMs?0
From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations0
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment0
Fully Automated Hand Hygiene Monitoring\ Operating Room using 3D Convolutional Neural Network0
Future semantic segmentation of time-lapsed videos with large temporal displacement0
Gameplay Highlights Generation0
Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention0
Generating the Future With Adversarial Transformers0
Generating Videos with Scene Dynamics0
Generative Frame Sampler for Long Video Understanding0
Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning0
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning0
Global Motion Understanding in Large-Scale Video Object Segmentation0
Global Self-Attention Networks0
Global Self-Attention Networks for Image Recognition0
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding0
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation0
Gradient Frequency Modulation for Visually Explaining Video Understanding Models0
GraphVid: It Only Takes a Few Nodes to Understand a Video0
Grounded Objects and Interactions for Video Captioning0
Grounded Video Situation Recognition0
Grounding Action Descriptions in Videos0
Show:102550
← PrevPage 17 of 23Next →

No leaderboard results yet.