InternVideo2: Scaling Foundation Models for Multimodal Video Understanding Mar 22, 2024 Action Classification Action Recognition
Code Code Available 7LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment Oct 3, 2023 Audio Classification Contrastive Learning
Code Code Available 4Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics Apr 25, 2024 Audio Classification Transfer Learning
Code Code Available 3EAT: Self-Supervised Pre-Training with Efficient Audio Transformer Jan 7, 2024 Audio Classification Self-Supervised Learning
Code Code Available 3Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Nov 14, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 3ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities May 18, 2023 1 Image, 2*2 Stitchi Action Classification
Code Code Available 3CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification Mar 13, 2022 Audio Classification Knowledge Distillation
Code Code Available 3Audio Mamba: Bidirectional State Space Model for Audio Representation Learning Jun 5, 2024 Audio Classification Classification
Code Code Available 2SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model May 20, 2024 Audio Classification GPU
Code Code Available 2Benchmarking Representations for Speech, Music, and Acoustic Events May 2, 2024 Audio Classification Benchmarking
Code Code Available 2BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics Mar 15, 2024 Audio Classification Classification
Code Code Available 2Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio Feb 14, 2024 Audio Classification Decoder
Code Code Available 2Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition Jan 4, 2024 Attribute Audio Classification
Code Code Available 2Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models Oct 24, 2023 Audio Classification Audio Tagging
Code Code Available 2Global birdsong embeddings enable superior transfer learning for bioacoustic classification Jul 12, 2023 Audio Classification Decision Making
Code Code Available 2Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models Jan 16, 2023 Audio Classification Few-Shot Learning
Code Code Available 2Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation Nov 9, 2022 Audio Classification Audio Tagging
Code Code Available 2Contrastive Audio-Visual Masked Autoencoder Oct 2, 2022 Audio Classification Audio Tagging
Code Code Available 2HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection Feb 2, 2022 Audio Classification Event Detection
Code Code Available 2SSAST: Self-Supervised Audio Spectrogram Transformer Oct 19, 2021 Audio Classification Classification
Code Code Available 2AST: Audio Spectrogram Transformer Apr 5, 2021 Audio Classification Audio Tagging
Code Code Available 2Adaptive Differential Denoising for Respiratory Sounds Classification Jun 3, 2025 Audio Classification Classification
Code Code Available 1Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning Feb 17, 2025 Audio Classification Audio Tagging
Code Code Available 1CycleGuardian: A Framework for Automatic RespiratorySound classification Based on Improved Deep clustering and Contrastive Learning Feb 2, 2025 Audio Classification Clustering
Code Code Available 1TaxaBind: A Unified Embedding Space for Ecological Applications Nov 1, 2024 Audio Classification Cross-Modal Retrieval
Code Code Available 1Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data Oct 2, 2024 Audio Classification Caption Generation
Code Code Available 1From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation Sep 27, 2024 Audio Classification Audio Generation
Code Code Available 1ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Sep 13, 2024 Audio Classification Descriptive
Code Code Available 1ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions Jul 11, 2024 All Audio Classification
Code Code Available 1DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners Jul 4, 2024 Audio Classification Audio Tagging
Code Code Available 1Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions Jun 23, 2024 Audio Classification Parkinson Detection from Speech
Code Code Available 1BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification Jun 10, 2024 Audio Classification Sound Classification
Code Code Available 1animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics Jun 3, 2024 Audio Classification Benchmarking
Code Code Available 1Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning May 14, 2024 Audio Classification Representation Learning
Code Code Available 1Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models Apr 29, 2024 Audio Classification Gesture Recognition
Code Code Available 1MAX-AST: COMBINING CONVOLUTION, LOCAL AND GLOBAL SELF-ATTENTIONS FOR AUDIO EVENT CLASSIFICATION Apr 14, 2024 Audio Classification
Code Code Available 1Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models Apr 9, 2024 Audio Classification Generalized Zero-Shot Learning
Code Code Available 1DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification Mar 24, 2024 Audio Classification Information Retrieval
Code Code Available 1EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning Mar 14, 2024 Audio Classification audio-visual learning
Code Code Available 1On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification Feb 2, 2024 Audio Classification Few-Shot Audio Classification
Code Code Available 1Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification Dec 15, 2023 Audio Classification Contrastive Learning
Code Code Available 1Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers Dec 6, 2023 Audio Classification Few-Shot Learning
Code Code Available 1Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities Nov 30, 2023 Audio Classification Few-Shot Audio Classification
Code Code Available 1Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance Nov 11, 2023 Audio Classification Sound Classification
Code Code Available 1CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition Oct 18, 2023 Audio Classification Contrastive Learning
Code Code Available 1Audio classification with Dilated Convolution with Learnable Spacings Sep 25, 2023 Audio Classification Audio Tagging
Code Code Available 1RDLINet: A Novel Lightweight Inception Network for Respiratory Disease Classification Using Lung Sounds Jul 6, 2023 Audio Classification Classification
Code Code Available 1Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings Jun 30, 2023 Audio Classification speech-recognition
Code Code Available 1Audio Tagging on an Embedded Hardware Platform Jun 15, 2023 Audio Classification Audio Tagging
Code Code Available 1Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks Jun 7, 2023 Audio Classification Audio Tagging
Code Code Available 1