InternVideo2: Scaling Foundation Models for Multimodal Video Understanding Mar 22, 2024 Action Classification Action Recognition
Code Code Available 75 LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment Oct 3, 2023 Audio Classification Contrastive Learning
Code Code Available 45 EAT: Self-Supervised Pre-Training with Efficient Audio Transformer Jan 7, 2024 Audio Classification Self-Supervised Learning
Code Code Available 35 Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Nov 14, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 35 ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities May 18, 2023 1 Image, 2*2 Stitchi Action Classification
Code Code Available 35 CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification Mar 13, 2022 Audio Classification Knowledge Distillation
Code Code Available 35 Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics Apr 25, 2024 Audio Classification Transfer Learning
Code Code Available 35 Contrastive Audio-Visual Masked Autoencoder Oct 2, 2022 Audio Classification Audio Tagging
Code Code Available 25 SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model May 20, 2024 Audio Classification GPU
Code Code Available 25 AST: Audio Spectrogram Transformer Apr 5, 2021 Audio Classification Audio Tagging
Code Code Available 25 Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation Nov 9, 2022 Audio Classification Audio Tagging
Code Code Available 25 BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics Mar 15, 2024 Audio Classification Classification
Code Code Available 25 Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition Jan 4, 2024 Attribute Audio Classification
Code Code Available 25 Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio Feb 14, 2024 Audio Classification Decoder
Code Code Available 25 Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models Oct 24, 2023 Audio Classification Audio Tagging
Code Code Available 25 Audio Mamba: Bidirectional State Space Model for Audio Representation Learning Jun 5, 2024 Audio Classification Classification
Code Code Available 25 Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models Jan 16, 2023 Audio Classification Few-Shot Learning
Code Code Available 25 SSAST: Self-Supervised Audio Spectrogram Transformer Oct 19, 2021 Audio Classification Classification
Code Code Available 25 HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection Feb 2, 2022 Audio Classification Event Detection
Code Code Available 25 Benchmarking Representations for Speech, Music, and Acoustic Events May 2, 2024 Audio Classification Benchmarking
Code Code Available 25 Global birdsong embeddings enable superior transfer learning for bioacoustic classification Jul 12, 2023 Audio Classification Decision Making
Code Code Available 25 Federated Self-Training for Semi-Supervised Audio Recognition Jul 14, 2021 Audio Classification Federated Learning
Code Code Available 15 CRNNs for Urban Sound Tagging with spatiotemporal context Aug 24, 2020 Audio Classification Audio Tagging
Code Code Available 15 Few-shot Class-incremental Audio Classification Using Stochastic Classifier Jun 3, 2023 Audio Classification Classification
Code Code Available 15 Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks Jan 6, 2022 Audio Classification Classification
Code Code Available 15 Adaptive Differential Denoising for Respiratory Sounds Classification Jun 3, 2025 Audio Classification Classification
Code Code Available 15 Continual Transformers: Redundancy-Free Attention for Online Inference Jan 17, 2022 Action Detection Audio Classification
Code Code Available 15 Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer Dec 16, 2021 Audio Classification Audio Tagging
Code Code Available 15 animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics Jun 3, 2024 Audio Classification Benchmarking
Code Code Available 15 CNN Architectures for Large-Scale Audio Classification Sep 29, 2016 Audio Classification Event Detection
Code Code Available 15 AUCO ResNet: an end-to-end network for Covid-19 pre-screening from cough and breath Mar 15, 2022 8k Audio Classification
Code Code Available 15 CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition Oct 18, 2023 Audio Classification Contrastive Learning
Code Code Available 15 Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions Jun 23, 2024 Audio Classification Parkinson Detection from Speech
Code Code Available 15 Fluctuation-driven initialization for spiking neural network training Jun 21, 2022 Audio Classification
Code Code Available 15 ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions Jul 11, 2024 All Audio Classification
Code Code Available 15 EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use Jul 12, 2022 Audio Classification Classification
Code Code Available 15 End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network Apr 25, 2022 Audio Classification Classification
Code Code Available 15 Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities Nov 30, 2023 Audio Classification Few-Shot Audio Classification
Code Code Available 15 Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block Nov 5, 2022 Audio Classification Classification
Code Code Available 15 Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance Nov 11, 2023 Audio Classification Sound Classification
Code Code Available 15 Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices Mar 5, 2021 Audio Classification Environmental Sound Classification
Code Code Available 15 A surrogate gradient spiking baseline for speech command recognition Aug 22, 2022 Audio Classification Time Series
Code Code Available 15 Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models Apr 9, 2024 Audio Classification Generalized Zero-Shot Learning
Code Code Available 15 ATST: Audio Representation Learning with Teacher-Student Transformer Apr 26, 2022 Audio Classification Instrument Recognition
Code Code Available 15 BEATs: Audio Pre-Training with Acoustic Tokenizers Dec 18, 2022 Audio Classification Self-Supervised Learning
Code Code Available 15 Efficient Training of Audio Transformers with Patchout Oct 11, 2021 Acoustic Scene Classification Audio Classification
Code Code Available 15 Device-Robust Acoustic Scene Classification via Impulse Response Augmentation May 12, 2023 Acoustic Scene Classification Audio Classification
Code Code Available 15 BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification Jun 10, 2024 Audio Classification Sound Classification
Code Code Available 15 A Spatio-temporal Deep Learning Approach for Underwater Acoustic Signals Classification Nov 24, 2022 Audio Classification Classification
Code Code Available 15 Audio Tagging on an Embedded Hardware Platform Jun 15, 2023 Audio Classification Audio Tagging
Code Code Available 15