SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–660 of 1149 papers

Title	Date	Tasks	Status	Hype
ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification	Oct 13, 2024	Contrastive LearningPerson Re-Identification	—Unverified	0
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering	Oct 12, 2024	Question AnsweringVideo Question Answering	—Unverified	0
TVBench: Redesigning Video-Language Evaluation	Oct 10, 2024	Multiple-choiceOpen-Ended Question Answering	—Unverified	0
MM-Ego: Towards Building Egocentric Multimodal LLMs	Oct 9, 2024	Video Understanding	—Unverified	0
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization	Oct 9, 2024	Audio captioningLarge Language Model	—Unverified	0
Enhancing Temporal Modeling of Video LLMs via Time Gating	Oct 8, 2024	MVBenchQuestion Answering	CodeCode Available	0
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark	Oct 4, 2024	Image CaptioningVideo Understanding	—Unverified	0
Frame-Voyager: Learning to Query Frames for Video Large Language Models	Oct 4, 2024	Question AnsweringVideo Question Answering	—Unverified	0
DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM	Oct 3, 2024	Object TrackingVideo Understanding	—Unverified	0
AirLetters: An Open Video Dataset of Characters Drawn in the Air	Oct 3, 2024	Video Understanding	—Unverified	0

Show:10 25 50

← PrevPage 66 of 115Next →

No leaderboard results yet.