Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–500 of 1149 papers

Title	Date	Tasks	Status	Hype
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model	Jul 9, 2024	Video Understanding	CodeCode Available	0
Rethinking Image-to-Video Adaptation: An Object-centric Perspective	Jul 9, 2024	Action RecognitionObject	—Unverified	0
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision	Jul 8, 2024	Action Quality AssessmentDescriptive	CodeCode Available	2
MMAD: Multi-label Micro-Action Detection in Videos	Jul 7, 2024	Action AnalysisAction Detection	CodeCode Available	1
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding	Jul 6, 2024	Video Understanding	—Unverified	0
KeyVideoLLM: Towards Large-scale Video Keyframe Selection	Jul 3, 2024	Data CompressionManagement	—Unverified	0
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output	Jul 3, 2024	ArticlesImage Comprehension	—Unverified	0
Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs	Jul 2, 2024	Video Understanding	—Unverified	0
https://arxiv.org/abs/2407.00634	Jul 2, 2024	Video CaptioningVideo Description	CodeCode Available	0
Tarsier: Recipes for Training and Evaluating Large Video Description Models	Jun 30, 2024	Video CaptioningVideo Description	CodeCode Available	4
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding	Jun 28, 2024	Multiple-choiceVideo Understanding	CodeCode Available	1
Snakes and Ladders: Two Steps Up for VideoMamba	Jun 27, 2024	Action RecognitionMamba	CodeCode Available	1
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding	Jun 27, 2024	DecoderSegmentation	CodeCode Available	5
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads	Jun 27, 2024	Diversityimage-classification	CodeCode Available	1
Zero-Shot Long-Form Video Understanding through Screenplay	Jun 25, 2024	FormQuestion Answering	—Unverified	0
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results	Jun 24, 2024	SegmentationSemantic Segmentation	CodeCode Available	4
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models	Jun 24, 2024	HallucinationVideo Understanding	—Unverified	0
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer	Jun 24, 2024	AI AgentLarge Language Model	CodeCode Available	2
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models	Jun 22, 2024	DiversityLanguage Modeling	CodeCode Available	0
Towards Event-oriented Long Video Understanding	Jun 20, 2024	Video Understanding	CodeCode Available	1
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding	Jun 20, 2024	FormVideo Understanding	—Unverified	0
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset	Jun 19, 2024	Language ModelingLanguage Modelling	—Unverified	0
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding	Jun 19, 2024	Question AnsweringSpatial Reasoning	CodeCode Available	1
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement	Jun 19, 2024	Video Understanding	—Unverified	0
DrVideo: Document Retrieval Based Long Video Understanding	Jun 18, 2024	document understandingEgoSchema	—Unverified	0
Slot State Space Models	Jun 18, 2024	MambaState Space Models	CodeCode Available	1
Hallucination Mitigation Prompts Long-term Video Understanding	Jun 17, 2024	Answer GenerationHallucination	CodeCode Available	0
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning	Jun 17, 2024	Anomaly DetectionLogical Reasoning	CodeCode Available	1
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment	Jun 16, 2024	Action UnderstandingBenchmarking	—Unverified	0
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model	Jun 15, 2024	Question AnsweringVideo Understanding	CodeCode Available	0
Localizing Events in Videos with Multimodal Queries	Jun 14, 2024	Natural Language QueriesVideo Understanding	—Unverified	0
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	Jun 14, 2024	Activity RecognitionMMR total	—Unverified	0
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living	Jun 13, 2024	BenchmarkingHuman-Object Interaction Detection	—Unverified	0
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs	Jun 13, 2024	BenchmarkingQuestion Answering	CodeCode Available	2
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	Jun 13, 2024	Dense Video CaptioningMVBench	CodeCode Available	3
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams	Jun 12, 2024	cross-modal alignmentLanguage Modelling	CodeCode Available	3
LVBench: An Extreme Long Video Understanding Benchmark	Jun 12, 2024	Decision MakingVideo Understanding	CodeCode Available	2
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos	Jun 12, 2024	counterfactualFuture prediction	CodeCode Available	1
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models	Jun 12, 2024	Video Understanding	—Unverified	0
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD	Jun 11, 2024	Video RecognitionVideo Understanding	—Unverified	0
Vript: A Video Is Worth Thousands of Words	Jun 10, 2024	Video CaptioningVideo Understanding	CodeCode Available	2
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation	Jun 8, 2024	BenchmarkingInstance Segmentation	—Unverified	0
Semantic Segmentation on VSPW Dataset through Masked Video Consistency	Jun 7, 2024	Semantic SegmentationVideo Understanding	—Unverified	0
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions	Jun 6, 2024	Video CaptioningVideo Generation	CodeCode Available	5
3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Jun 6, 2024	Panoptic SegmentationSegmentation	—Unverified	0
MLVU: Benchmarking Multi-task Long Video Understanding	Jun 6, 2024	BenchmarkingVideo Understanding	CodeCode Available	3
Contrastive Language Video Time Pre-training	Jun 4, 2024	Action RecognitionContrastive Learning	—Unverified	0
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos	Jun 3, 2024	Mistake DetectionOnline Mistake Detection	CodeCode Available	1
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model	Jun 1, 2024	Action RecognitionActivity Recognition	—Unverified	0
2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Jun 1, 2024	Autonomous DrivingPanoptic Segmentation	—Unverified	0

Show:10 25 50

← PrevPage 10 of 23Next →

No leaderboard results yet.