SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 110 of 817 papers

TitleStatusHype
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans0
CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment0
Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications0
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation AlgorithmCode1
Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion0
Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors0
Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM0
Robust Activity Detection for Massive Random Access0
Improving endpoint detection in end-to-end streaming ASR for conversational speech0
Multi-Stage Speaker Diarization for Noisy ClassroomsCode0
Show:102550
← PrevPage 1 of 82Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified