SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 151200 of 817 papers

TitleStatusHype
Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors0
Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM0
Robust Activity Detection for Massive Random Access0
Improving endpoint detection in end-to-end streaming ASR for conversational speech0
Multi-Stage Speaker Diarization for Noisy ClassroomsCode0
Beyond Pixels: Leveraging the Language of Soccer to Improve Spatio-Temporal Action Detection in Broadcast Videos0
Sensing Framework Design and Performance Optimization with Action Detection for ISCC0
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection0
MicroNAS: An Automated Framework for Developing a Fall Detection System0
Scaling Open-Vocabulary Action DetectionCode0
FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection0
Temporal Action Detection Model Compression by Progressive Block Drop0
Fast MLE and MAPE-Based Device Activity Detection for Grant-Free Access via PSCA and PSCA-Net0
ACT360: An Efficient 360-Degree Action Detection and Summarization Framework for Mission-Critical Training and Debriefing0
Lightweight Learning for Grant-Free Activity Detection in Cell-Free Massive MIMO Networks0
Federated Learning for Secure and Efficient Device Activity Detection in mMTC Networks0
Robust Learning-Based Sparse Recovery for Device Activity Detection in Grant-Free Random Access Cell-Free Massive MIMO: Enhancing Resilience to Impairments0
CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors0
Optimizing Large Language Models for ESG Activity Detection in Financial TextsCode0
Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems0
Unveiling ECC Vulnerabilities: LSTM Networks for Operation Recognition in Side-Channel Attacks0
Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks0
LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems0
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems0
Unveiling the Power of Complex-Valued Transformers in Wireless Communications0
DT4ECG: A Dual-Task Learning Framework for ECG-Based Human Identity Recognition and Human Activity Detection0
Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge0
When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating RoomCode0
Pre-Equalization Aided Grant-Free Massive Access in Massive MIMO SystemCode0
An Automated Machine Learning Framework for Surgical Suturing Action Detection under Class Imbalance0
Deconstruct Complexity (DeComplex): A Novel Perspective on Tackling Dense Action Detection0
Automatic detection and prediction of nAMD activity change in retinal OCT using Siamese networks and Wasserstein Distance for ordinalityCode0
Text-driven Online Action DetectionCode0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining0
Fotheidil: an Automatic Transcription System for the Irish Language0
Action-Agnostic Point-Level Supervision for Temporal Action DetectionCode0
Dataset for Real-World Human Action Detection Using FMCW mmWave Radar0
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language ContextsCode0
Stable Mean Teacher for Semi-supervised Video Action DetectionCode0
Comparative Analysis of Deep Learning Approaches for Harmful Brain Activity Detection Using EEG0
Asynchronous Random Access in Massive MIMO Systems Facilitated by the Delay-Angle Domain0
Continual Low-Rank Scaled Dot-product Attention0
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and AssessmentCode0
Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation0
Transferable Adversarial Attacks against ASR0
A Flexible Framework for Grant-Free Random Access in Cell-Free Massive MIMO Systems0
On the Detection of Non-Cooperative RISs: Scan B-Testing via Deep Support Vector Data Description0
Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization0
Intelligent Video Recording Optimization using Activity Detection for Surveillance Systems0
Show:102550
← PrevPage 4 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10MLAD (RGB + Flow)mAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified