SOTAVerified

Activity Detection

Detecting activities in extended videos.

Papers

Showing 150 of 380 papers

TitleStatusHype
CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment0
Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications0
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation AlgorithmCode1
Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion0
Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors0
Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM0
Robust Activity Detection for Massive Random Access0
Improving endpoint detection in end-to-end streaming ASR for conversational speech0
Multi-Stage Speaker Diarization for Noisy ClassroomsCode0
MicroNAS: An Automated Framework for Developing a Fall Detection System0
Fast MLE and MAPE-Based Device Activity Detection for Grant-Free Access via PSCA and PSCA-Net0
Federated Learning for Secure and Efficient Device Activity Detection in mMTC Networks0
Lightweight Learning for Grant-Free Activity Detection in Cell-Free Massive MIMO Networks0
Robust Learning-Based Sparse Recovery for Device Activity Detection in Grant-Free Random Access Cell-Free Massive MIMO: Enhancing Resilience to Impairments0
CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors0
Optimizing Large Language Models for ESG Activity Detection in Financial TextsCode0
Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems0
Unveiling ECC Vulnerabilities: LSTM Networks for Operation Recognition in Side-Channel Attacks0
LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems0
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems0
VANPY: Voice Analysis FrameworkCode1
Unveiling the Power of Complex-Valued Transformers in Wireless Communications0
DT4ECG: A Dual-Task Learning Framework for ECG-Based Human Identity Recognition and Human Activity Detection0
Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge0
When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating RoomCode0
Pre-Equalization Aided Grant-Free Massive Access in Massive MIMO SystemCode0
Automatic detection and prediction of nAMD activity change in retinal OCT using Siamese networks and Wasserstein Distance for ordinalityCode0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining0
Fotheidil: an Automatic Transcription System for the Irish Language0
WiFi CSI Based Temporal Activity Detection via Dual Pyramid NetworkCode1
Comparative Analysis of Deep Learning Approaches for Harmful Brain Activity Detection Using EEG0
Asynchronous Random Access in Massive MIMO Systems Facilitated by the Delay-Angle Domain0
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and AssessmentCode0
Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation0
A Flexible Framework for Grant-Free Random Access in Cell-Free Massive MIMO Systems0
Transferable Adversarial Attacks against ASR0
On the Detection of Non-Cooperative RISs: Scan B-Testing via Deep Support Vector Data Description0
User Activity Detection with Delay-Calibration for Asynchronous Massive Random Access0
Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization0
Intelligent Video Recording Optimization using Activity Detection for Surveillance Systems0
CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection0
A Framework for Adapting Human-Robot Interaction to Diverse User GroupsCode0
Investigation of Speaker Representation for Target-Speaker Speech Processing0
Raising the Bar(ometer): Identifying a User's Stair and Lift Usage Through Wearable Sensor Data Analysis0
Moshi: a speech-text foundation model for real-time dialogueCode9
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses0
TCG CREST System Description for the Second DISPLACE Challenge0
A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities0
NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge0
Show:102550
← PrevPage 1 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN-BiLSTM_bestROC-AUC95.14Unverified
2CNN-BiLSTM_smallROC-AUC95.13Unverified
3SG-VAD (ours)ROC-AUC94.3Unverified
4ADA-VADROC-AUC79.1Unverified