Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 1149 papers

Title	Date	Tasks	Status
Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention	Apr 10, 2024	Action AnticipationGraph Neural Network	—Unverified
Koala: Key frame-conditioned long video-LLM	Apr 5, 2024	Action RecognitionQuestion Answering	—Unverified
BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes	Apr 4, 2024	ObjectVideo Understanding	—Unverified
OW-VISCapTor: Abstractors for Open-World Video Instance Segmentation and Captioning	Apr 4, 2024	DescriptiveDiversity	—Unverified
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding	Apr 2, 2024	Highlight DetectionMoment Retrieval	—Unverified
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding	Mar 31, 2024	Highlight DetectionMoment Retrieval	—Unverified
Instrument-tissue Interaction Detection Framework for Surgical Video Understanding	Mar 30, 2024	Video Understanding	—Unverified
A Unified Framework for Human-centric Point Cloud Video Understanding	Mar 29, 2024	3D Pose EstimationAction Recognition	—Unverified
Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality	Mar 28, 2024	Data AugmentationDiversity	CodeCode Available
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding	Mar 24, 2024	Dense Video CaptioningTemporal Localization	—Unverified
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding	Mar 21, 2024	Pose EstimationVideo Understanding	CodeCode Available
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding	Mar 18, 2024	EgoSchemaVideo Understanding	—Unverified
Don't Judge by the Look: Towards Motion Coherent Video Representation	Mar 14, 2024	Data AugmentationObject Recognition	CodeCode Available
Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions	Mar 11, 2024	counterfactualVideo Editing	—Unverified
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives	Mar 5, 2024	Video Understanding	—Unverified
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies	Mar 3, 2024	Text GenerationVideo Understanding	—Unverified
Abductive Ego-View Accident Video Understanding for Safe Driving Perception	Mar 1, 2024	Objectobject-detection	—Unverified
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning	Feb 29, 2024	Question AnsweringVideo Understanding	—Unverified
LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs	Feb 21, 2024	Question AnsweringVideo Question Answering	—Unverified
Slot-VLM: SlowFast Slots for Video-Language Modeling	Feb 20, 2024	Language ModelingLanguage Modelling	—Unverified
VideoPrism: A Foundational Visual Encoder for Video Understanding	Feb 20, 2024	Question AnsweringVideo Question Answering	—Unverified
Dynamics Based Neural Encoding with Inter-Intra Region Connectivity	Feb 19, 2024	Video Understanding	—Unverified
Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly Videos	Feb 16, 2024	Decision MakingVideo Understanding	CodeCode Available
Memory Consolidation Enables Long-Context Video Understanding	Feb 8, 2024	EgoSchemaVideo Understanding	—Unverified
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming	Jan 30, 2024	Video GenerationVideo Understanding	—Unverified
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model	Jan 29, 2024	Action DetectionAction Localization	—Unverified
Exploring Missing Modality in Multimodal Egocentric Datasets	Jan 21, 2024	Action RecognitionVideo Understanding	—Unverified
Learning to Visually Connect Actions and their Effects	Jan 19, 2024	Object TrackingTask Planning	—Unverified
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding	Jan 17, 2024	Contrastive Learningpoint cloud video understanding	—Unverified
Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization	Jan 16, 2024	DecoderDenoising	—Unverified
Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning	Jan 8, 2024	object-detectionObject Detection	CodeCode Available
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Jan 1, 2024	Spatio-Temporal Video GroundingVideo Grounding	—Unverified
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning	Jan 1, 2024	object-detectionObject Detection	—Unverified
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action	Jan 1, 2024	Image GenerationInstruction Following	—Unverified
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning	Jan 1, 2024	Transfer LearningVideo Understanding	—Unverified
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Dec 31, 2023	Spatio-Temporal Video GroundingVideo Grounding	—Unverified
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision	Dec 20, 2023	Action ClassificationAttribute	—Unverified
Text-Conditioned Resampler For Long Form Video Understanding	Dec 19, 2023	EgoSchemaForm	—Unverified
Learning Object State Changes in Videos: An Open-World Perspective	Dec 19, 2023	Video Understanding	—Unverified
Artificial intelligence optical hardware empowers high-resolution hyperspectral video understanding at 1.2 Tb/s	Dec 17, 2023	Semantic SegmentationVideo Semantic Segmentation	—Unverified
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer	Dec 12, 2023	Action RecognitionAction Segmentation	CodeCode Available
Audio-Visual LLM for Video Understanding	Dec 11, 2023	AudioCapsLanguage Modeling	—Unverified
MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding	Dec 8, 2023	FormQuestion Answering	—Unverified
Retrieval-based Video Language Model for Efficient Long Video Question Answering	Dec 8, 2023	Language ModelingLanguage Modelling	—Unverified
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding	Dec 5, 2023	DiversityGraph Generation	—Unverified
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding	Dec 4, 2023	Language ModelingLanguage Modelling	—Unverified
Zero-Shot Video Question Answering with Procedural Programs	Dec 1, 2023	Code GenerationLanguage Modeling	—Unverified
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding	Nov 30, 2023	FormVideo Retrieval	—Unverified
Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation	Nov 30, 2023	Contrastive LearningDomain Adaptation	—Unverified
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation	Nov 25, 2023	Instruction FollowingLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 16 of 23Next →

No leaderboard results yet.