SOTAVerified

Activity Detection

Detecting activities in extended videos.

Papers

Showing 150 of 380 papers

TitleStatusHype
Moshi: a speech-text foundation model for real-time dialogueCode9
pyannote.audio: neural building blocks for speaker diarizationCode3
audino: A Modern Annotation Tool for Audio and SpeechCode2
AV Taris: Online Audio-Visual Speech RecognitionCode1
WASE: Learning When to Attend for Speaker Extraction in Cocktail Party EnvironmentsCode1
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsCode1
X-Vector based voice activity detection for multi-genre broadcast speech-to-textCode1
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency FilteringCode1
Learning spectro-temporal representations of complex sounds with parameterized neural networksCode1
MM-ALT: A Multimodal Automatic Lyric Transcription SystemCode1
HGCN: Harmonic gated compensation network for speech enhancementCode1
Harvesting Ambient RF for Presence Detection Through Deep LearningCode1
Exploiting Temporal Side Information in Massive IoT ConnectivityCode1
End-to-end speaker segmentation for overlap-aware resegmentationCode1
ROAD: The ROad event Awareness Dataset for Autonomous DrivingCode1
AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-OccurrenceCode1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationCode1
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker EmbeddingsCode1
Classification of Abnormal Hand Movement for Aiding in Autism Detection: Machine Learning StudyCode1
WiFi CSI Based Temporal Activity Detection via Dual Pyramid NetworkCode1
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender SegmentationCode1
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural DiarizationCode1
Online speaker diarization of meetings guided by speech separationCode1
SG-VAD: Stochastic Gates Based Speech Activity DetectionCode1
Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0Code1
NAS-VAD: Neural Architecture Search for Voice Activity DetectionCode1
A Hybrid CNN-BiLSTM Voice Activity DetectorCode1
VoxLingua107: a Dataset for Spoken Language RecognitionCode1
VANPY: Voice Analysis FrameworkCode1
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation AlgorithmCode1
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and DevelopmentCode1
A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vesselsCode1
Low-Latency Speech Separation Guided Diarization for Telephone ConversationsCode1
An End-to-End Architecture for Keyword Spotting and Voice Activity DetectionCode1
Personal VAD: Speaker-Conditioned Voice Activity DetectionCode0
Activity Detection for Massive Connectivity in Cell-free Networks with Unknown Large-scale Fading, Channel Statistics, Noise Variance, and Activity Probability: A Bayesian ApproachCode0
Pre-Equalization Aided Grant-Free Massive Access in Massive MIMO SystemCode0
Optimizing Large Language Models for ESG Activity Detection in Financial TextsCode0
Personalized Activity Recognition with Deep Triplet EmbeddingsCode0
Protest Activity Detection and Perceived Violence Estimation from Social Media ImagesCode0
A Framework for Adapting Human-Robot Interaction to Diverse User GroupsCode0
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic DelayCode0
Long-term Conversation Analysis: Exploring Utility and PrivacyCode0
Argus: Efficient Activity Detection System for Extended Video AnalysisCode0
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for ConversationsCode0
A Pursuit of Temporal Accuracy in General Activity DetectionCode0
Learning Latent Super-Events to Detect Multiple Activities in VideosCode0
Fine-Grained Classroom Activity Detection from Audio with Neural NetworksCode0
A Convolutional Neural Network Smartphone App for Real-Time Voice Activity DetectionCode0
FunASR: A Fundamental End-to-End Speech Recognition ToolkitCode0
Show:102550
← PrevPage 1 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN-BiLSTM_bestROC-AUC95.14Unverified
2CNN-BiLSTM_smallROC-AUC95.13Unverified
3SG-VAD (ours)ROC-AUC94.3Unverified
4ADA-VADROC-AUC79.1Unverified