SOTAVerified

Audio Tagging

Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.

Papers

Showing 110 of 81 papers

TitleStatusHype
Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 40
USAD: Universal Speech and Audio Representation via Distillation0
Enhancing Speech Emotion Recognition with Graph-Based Multimodal Fusion and Prosodic Features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Interspeech 20250
M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAPCode0
Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound ScenesCode1
Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet TaggingCode0
Solla: Towards a Speech-Oriented LLM That Hears Acoustic ContextCode0
Exploring Performance-Complexity Trade-Offs in Sound Event Detection ModelsCode1
Masked Latent Prediction and Classification for Self-Supervised Audio Representation LearningCode1
Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications0
Show:102550
← PrevPage 1 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CAV-MAE (Audio-Visual)mean average precision0.51Unverified
2mn40_as (Ensemble)mean average precision0.5Unverified
3PaSSTmean average precision0.5Unverified
4DyMN-L (Audio-Only, Single)mean average precision0.49Unverified
5Audio Spectrogram Transformermean average precision0.49Unverified
6mn40_as (Single)mean average precision0.48Unverified
7PSLAmean average precision0.47Unverified
8ST-SEDmean average precision0.47Unverified
9CAV-MAE (Audio-Only)mean average precision0.47Unverified
10ERANN-1-6mean average precision0.45Unverified