SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 401450 of 1149 papers

TitleStatusHype
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey0
An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform0
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction0
EAGLE: Egocentric AGgregated Language-video Engine0
DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding0
BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes0
An Attempt towards Interpretable Audio-Visual Video Captioning0
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding0
Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering0
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding0
Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition0
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training0
Beyond the Camera: Neural Networks in World Coordinates0
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks0
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs0
DualX-VSR: Dual Axial SpatialTemporal Transformer for Real-World Video Super-Resolution without Motion Compensation0
Beyond still images: Temporal features and input variance resilience0
DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM0
Abductive Ego-View Accident Video Understanding for Safe Driving Perception0
An Analysis of Data Transformation Effects on Segment Anything 20
Learning text-to-video retrieval from image captioning0
Dilated Temporal Relational Adversarial Network for Generic Video Summarization0
DrVideo: Document Retrieval Based Long Video Understanding0
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model0
A Multimodal Sentiment Dataset for Video Recommendation0
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives0
Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection0
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning0
Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation0
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation0
BERT for Large-scale Video Segment Classification with Test-time Augmentation0
AMEGO: Active Memory from long EGOcentric videos0
Domain Adaptation of VLM for Soccer Video Understanding0
Actor-Action Semantic Segmentation with Grouping Process Models0
BEARCUBS: A benchmark for computer-using web agents0
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering0
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling0
DOAD: Decoupled One Stage Action Detection Network0
BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation0
ALLVB: All-in-One Long Video Understanding Benchmark0
Learning reusable concepts across different egocentric video understanding tasks0
Learning Space-Time Semantic Correspondences0
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation0
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model0
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition0
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output0
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning0
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection0
Instrument-tissue Interaction Detection Framework for Surgical Video Understanding0
InstructionBench: An Instructional Video Understanding Benchmark0
Show:102550
← PrevPage 9 of 23Next →

No leaderboard results yet.