SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 376–400 of 1149 papers

Title	Date	Tasks	Status	Hype
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI	Oct 15, 2024	Question AnsweringVideo Question Answering	CodeCode Available	2
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models	Oct 14, 2024	2kBenchmarking	CodeCode Available	1
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs	Oct 14, 2024	Computational EfficiencyQuestion Answering	CodeCode Available	2
ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification	Oct 13, 2024	Contrastive LearningPerson Re-Identification	—Unverified	0
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering	Oct 12, 2024	Question AnsweringVideo Question Answering	—Unverified	0
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding	Oct 11, 2024	HallucinationMoment Retrieval	CodeCode Available	1
TVBench: Redesigning Video-Language Evaluation	Oct 10, 2024	Multiple-choiceOpen-Ended Question Answering	—Unverified	0
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization	Oct 9, 2024	Audio captioningLarge Language Model	—Unverified	0
MM-Ego: Towards Building Egocentric Multimodal LLMs	Oct 9, 2024	Video Understanding	—Unverified	0
Enhancing Temporal Modeling of Video LLMs via Time Gating	Oct 8, 2024	MVBenchQuestion Answering	CodeCode Available	0
TRACE: Temporal Grounding Video LLM via Causal Event Modeling	Oct 8, 2024	Text GenerationVideo Understanding	CodeCode Available	2
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference	Oct 6, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark	Oct 4, 2024	Image CaptioningVideo Understanding	—Unverified	0
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models	Oct 4, 2024	Dense Video CaptioningSentence	CodeCode Available	2
Frame-Voyager: Learning to Query Frames for Video Large Language Models	Oct 4, 2024	Question AnsweringVideo Question Answering	—Unverified	0
AirLetters: An Open Video Dataset of Characters Drawn in the Air	Oct 3, 2024	Video Understanding	—Unverified	0
DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM	Oct 3, 2024	Object TrackingVideo Understanding	—Unverified	0
Deep learning for action spotting in association football videos	Oct 2, 2024	Action SpottingBenchmarking	—Unverified	0
UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark	Oct 2, 2024	Unusual Activity LocalizationVideo Understanding	CodeCode Available	0
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding	Oct 1, 2024	Contrastive LearningHallucination	CodeCode Available	0
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs	Sep 30, 2024	BenchmarkingMultiple-choice	—Unverified	0
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Sep 30, 2024	Mixture-of-ExpertsOptical Character Recognition (OCR)	—Unverified	0
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs	Sep 30, 2024	EgoSchemaLanguage Modelling	CodeCode Available	1
Visual Context Window Extension: A New Perspective for Long Video Understanding	Sep 30, 2024	Video Understanding	—Unverified	0
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks	Sep 27, 2024	Action DetectionAction Segmentation	—Unverified	0

Show:10 25 50

← PrevPage 16 of 46Next →

No leaderboard results yet.