SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 101150 of 817 papers

TitleStatusHype
Harvesting Ambient RF for Presence Detection Through Deep LearningCode1
HGCN: Harmonic gated compensation network for speech enhancementCode1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationCode1
Generic Event Boundary Detection: A Benchmark for Event SegmentationCode1
A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vesselsCode1
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender SegmentationCode1
Multi-Modal Few-Shot Temporal Action DetectionCode1
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker EmbeddingsCode1
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals GenerationCode1
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and DevelopmentCode1
A Benchmark for Structured Procedural Knowledge Extraction from Cooking VideosCode1
Learning spectro-temporal representations of complex sounds with parameterized neural networksCode1
Asynchronous Interaction Aggregation for Action DetectionCode1
VANPY: Voice Analysis FrameworkCode1
Actions as Moving PointsCode1
Low-Latency Speech Separation Guided Diarization for Telephone ConversationsCode1
WASE: Learning When to Attend for Speaker Extraction in Cocktail Party EnvironmentsCode1
Long Short-Term Transformer for Online Action DetectionCode1
Memory-and-Anticipation Transformer for Online Action UnderstandingCode1
AViD Dataset: Anonymized Videos from Diverse CountriesCode1
CBR-Net: Cascade Boundary Refinement Network for Action Detection: Submission to ActivityNet Challenge 2020 (Task 1)Code1
MiniROAD: Minimal RNN Framework for Online Action DetectionCode1
A Hybrid CNN-BiLSTM Voice Activity DetectorCode1
MM-ALT: A Multimodal Automatic Lyric Transcription SystemCode1
AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-OccurrenceCode1
YOWO-Plus: An Incremental ImprovementCode1
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual ActionsCode1
Context-Enhanced Memory-Refined Transformer for Online Action DetectionCode1
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural DiarizationCode1
AV Taris: Online Audio-Visual Speech RecognitionCode1
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming VideosCode1
BSN: Boundary Sensitive Network for Temporal Action Proposal GenerationCode1
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action DetectionCode1
A Multigrid Method for Efficiently Training Video ModelsCode1
Context-Aware RCNN: A Baseline for Action Detection in VideosCode1
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal CorruptionsCode1
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsCode1
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally ActionsCode1
ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee BehaviorsCode1
CholecTriplet2021: A benchmark challenge for surgical action triplet recognitionCode1
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using TransformersCode1
Rescaling Egocentric VisionCode1
Continual Transformers: Redundancy-Free Attention for Online InferenceCode1
SVIP: Sequence VerIfication for Procedures in VideosCode1
Continuous control with deep reinforcement learningCode1
BMN: Boundary-Matching Network for Temporal Action Proposal GenerationCode1
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and AssessmentCode0
Modality Distillation with Multiple Stream Networks for Action RecognitionCode0
Automatic detection and prediction of nAMD activity change in retinal OCT using Siamese networks and Wasserstein Distance for ordinalityCode0
MaCLR: Motion-aware Contrastive Learning of Representations for VideosCode0
Show:102550
← PrevPage 3 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10I3D + biGRU + VS-ST-MPNNmAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified