Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 1149 papers

Title	Date	Tasks	Status
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Jul 17, 2025	Video GroundingVideo Understanding	—Unverified
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jul 14, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments	Jul 14, 2025	Scene UnderstandingSpatial Reasoning	—Unverified
Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation	Jul 8, 2025	Depth EstimationDepth Prediction	—Unverified
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models	Jul 8, 2025	Future predictionLarge Language Model	—Unverified
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges	Jul 2, 2025	Video Understanding	—Unverified
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs	Jul 1, 2025	Text GenerationVideo Understanding	—Unverified
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment	Jun 28, 2025	Dynamic Time WarpingLarge Language Model	CodeCode Available
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs	Jun 27, 2025	MMEVideo MME	—Unverified
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes	Jun 26, 2025	AttributeQuestion Answering	—Unverified
Task-Aware KV Compression For Cost-Effective Long Video Understanding	Jun 26, 2025	Video Understanding	CodeCode Available
PEVLM: Parallel Encoding for Vision-Language Models	Jun 24, 2025	Autonomous DrivingVideo Understanding	—Unverified
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning	Jun 19, 2025	Multimodal Reasoningreinforcement-learning	—Unverified
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding	Jun 18, 2025	GPUStreaming video understanding	—Unverified
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization	Jun 17, 2025	Multi-Instance RetrievalRetrieval	CodeCode Available
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models	Jun 16, 2025	Video Understanding	—Unverified
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding	Jun 16, 2025	Optical Character Recognition (OCR)RAG	CodeCode Available
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios	Jun 11, 2025	Action RecognitionAction Segmentation	CodeCode Available
MLVTG: Mamba-Based Feature Alignment and LLM-Driven Purification for Multi-Modal Video Temporal Grounding	Jun 10, 2025	Language ModelingLanguage Modelling	—Unverified
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks	Jun 10, 2025	Multiple-choiceOpen-Ended Question Answering	—Unverified
SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding	Jun 9, 2025	RAGRetrieval	—Unverified
SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis	Jun 9, 2025	Action ClassificationBenchmarking	—Unverified
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding	Jun 9, 2025	Contrastive LearningVideo Editing	—Unverified
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision	Jun 6, 2025	Video Understanding	CodeCode Available
Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models	Jun 6, 2025	SegmentationVideo Understanding	—Unverified

Show:10 25 50

← PrevPage 17 of 46Next →

No leaderboard results yet.