SOTAVerified

Activity Detection

Detecting activities in extended videos.

Papers

Showing 150 of 380 papers

TitleStatusHype
Moshi: a speech-text foundation model for real-time dialogueCode9
pyannote.audio: neural building blocks for speaker diarizationCode3
audino: A Modern Annotation Tool for Audio and SpeechCode2
AV Taris: Online Audio-Visual Speech RecognitionCode1
Low-Latency Speech Separation Guided Diarization for Telephone ConversationsCode1
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsCode1
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and DevelopmentCode1
VANPY: Voice Analysis FrameworkCode1
X-Vector based voice activity detection for multi-genre broadcast speech-to-textCode1
MM-ALT: A Multimodal Automatic Lyric Transcription SystemCode1
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender SegmentationCode1
WASE: Learning When to Attend for Speaker Extraction in Cocktail Party EnvironmentsCode1
Classification of Abnormal Hand Movement for Aiding in Autism Detection: Machine Learning StudyCode1
End-to-end speaker segmentation for overlap-aware resegmentationCode1
SG-VAD: Stochastic Gates Based Speech Activity DetectionCode1
A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vesselsCode1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationCode1
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency FilteringCode1
Harvesting Ambient RF for Presence Detection Through Deep LearningCode1
WiFi CSI Based Temporal Activity Detection via Dual Pyramid NetworkCode1
Learning spectro-temporal representations of complex sounds with parameterized neural networksCode1
Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0Code1
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural DiarizationCode1
NAS-VAD: Neural Architecture Search for Voice Activity DetectionCode1
Online speaker diarization of meetings guided by speech separationCode1
HGCN: Harmonic gated compensation network for speech enhancementCode1
A Hybrid CNN-BiLSTM Voice Activity DetectorCode1
Exploiting Temporal Side Information in Massive IoT ConnectivityCode1
AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-OccurrenceCode1
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker EmbeddingsCode1
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation AlgorithmCode1
VoxLingua107: a Dataset for Spoken Language RecognitionCode1
ROAD: The ROad event Awareness Dataset for Autonomous DrivingCode1
An End-to-End Architecture for Keyword Spotting and Voice Activity DetectionCode1
Protest Activity Detection and Perceived Violence Estimation from Social Media ImagesCode0
Activity Detection for Massive Connectivity in Cell-free Networks with Unknown Large-scale Fading, Channel Statistics, Noise Variance, and Activity Probability: A Bayesian ApproachCode0
Personal VAD: Speaker-Conditioned Voice Activity DetectionCode0
Pre-Equalization Aided Grant-Free Massive Access in Massive MIMO SystemCode0
R-C3D: Region Convolutional 3D Network for Temporal Activity DetectionCode0
A Framework for Adapting Human-Robot Interaction to Diverse User GroupsCode0
Optimizing Large Language Models for ESG Activity Detection in Financial TextsCode0
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic DelayCode0
Personalized Activity Recognition with Deep Triplet EmbeddingsCode0
RespVAD: Voice Activity Detection via Video-Extracted Respiration PatternsCode0
A Pursuit of Temporal Accuracy in General Activity DetectionCode0
Learning Latent Super-Events to Detect Multiple Activities in VideosCode0
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for ConversationsCode0
Fine-Grained Classroom Activity Detection from Audio with Neural NetworksCode0
A Convolutional Neural Network Smartphone App for Real-Time Voice Activity DetectionCode0
Gan-Based Joint Activity Detection and Channel Estimation For Grant-free Random AccessCode0
Show:102550
← PrevPage 1 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN-BiLSTM_bestROC-AUC95.14Unverified
2CNN-BiLSTM_smallROC-AUC95.13Unverified
3SG-VAD (ours)ROC-AUC94.3Unverified
4ADA-VADROC-AUC79.1Unverified