SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–675 of 1149 papers

Title	Date	Tasks	Status
ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification	Oct 13, 2024	Contrastive LearningPerson Re-Identification	—Unverified
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering	Oct 12, 2024	Question AnsweringVideo Question Answering	—Unverified
TVBench: Redesigning Video-Language Evaluation	Oct 10, 2024	Multiple-choiceOpen-Ended Question Answering	—Unverified
MM-Ego: Towards Building Egocentric Multimodal LLMs	Oct 9, 2024	Video Understanding	—Unverified
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization	Oct 9, 2024	Audio captioningLarge Language Model	—Unverified
Enhancing Temporal Modeling of Video LLMs via Time Gating	Oct 8, 2024	MVBenchQuestion Answering	CodeCode Available
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark	Oct 4, 2024	Image CaptioningVideo Understanding	—Unverified
Frame-Voyager: Learning to Query Frames for Video Large Language Models	Oct 4, 2024	Question AnsweringVideo Question Answering	—Unverified
DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM	Oct 3, 2024	Object TrackingVideo Understanding	—Unverified
AirLetters: An Open Video Dataset of Characters Drawn in the Air	Oct 3, 2024	Video Understanding	—Unverified
Deep learning for action spotting in association football videos	Oct 2, 2024	Action SpottingBenchmarking	—Unverified
UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark	Oct 2, 2024	Unusual Activity LocalizationVideo Understanding	CodeCode Available
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding	Oct 1, 2024	Contrastive LearningHallucination	CodeCode Available
Visual Context Window Extension: A New Perspective for Long Video Understanding	Sep 30, 2024	Video Understanding	—Unverified
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Sep 30, 2024	Mixture-of-ExpertsOptical Character Recognition (OCR)	—Unverified
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs	Sep 30, 2024	BenchmarkingMultiple-choice	—Unverified
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks	Sep 27, 2024	Action DetectionAction Segmentation	—Unverified
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified
LLM4Brain: Training a Large Language Model for Brain Video Understanding	Sep 26, 2024	Domain AdaptationLanguage Modeling	—Unverified
Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP	Sep 23, 2024	Image GenerationQuestion Answering	—Unverified
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge	Sep 20, 2024	Multiple-choiceQuestion Answering	—Unverified
Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder	Sep 20, 2024	Activity RecognitionDiagnostic	—Unverified
Interpretable Action Recognition on Hard to Classify Actions	Sep 19, 2024	Action RecognitionDepth Estimation	—Unverified
AMEGO: Active Memory from long EGOcentric videos	Sep 17, 2024	Video Understanding	—Unverified
HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions	Sep 16, 2024	Dimensionality ReductionVideo Understanding	—Unverified

Show:10 25 50

← PrevPage 27 of 46Next →

No leaderboard results yet.