Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Feb 2, 2024 Acoustic Scene Classification Audio captioning
Code Code Available 55 Improving Text-To-Audio Models with Synthetic Captions Jun 18, 2024 AudioCaps Audio captioning
Code Code Available 55 LLMs can see and hear without any training Jan 30, 2025 Audio captioning Image Generation
Code Code Available 35 SALMONN: Towards Generic Hearing Abilities for Large Language Models Oct 20, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 35 Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Nov 14, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 35 Mellow: a small audio language model for reasoning Mar 11, 2025 Audio captioning Language Modeling
Code Code Available 25 VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Apr 17, 2023 Audio captioning Audio-Video Question Answering (AVQA)
Code Code Available 25 AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models Nov 28, 2024 Audio captioning Audio to Text Retrieval
Code Code Available 25 WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research Mar 30, 2023 Audio captioning Event Detection
Code Code Available 25 VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset May 29, 2023 Audio captioning Audio-Visual Captioning
Code Code Available 25 video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models Jun 18, 2025 Audio captioning Large Language Model
Code Code Available 25 Pengi: An Audio Language Model for Audio Tasks May 19, 2023 Audio captioning Audio Question Answering
Code Code Available 25 LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 25 ETTA: Elucidating the Design Space of Text-to-Audio Models Dec 26, 2024 AudioCaps Audio captioning
Code Code Available 25 EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance Sep 2, 2024 AudioCaps Audio captioning
Code Code Available 25 EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning Jan 31, 2024 AudioCaps Audio captioning
Code Code Available 25 FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Jun 1, 2025 Audio captioning Caption Generation
Code Code Available 25 Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models Jun 12, 2024 Audio captioning Hallucination
Code Code Available 25 Taming Data and Transformers for Audio Generation Jun 27, 2024 Audio captioning Audio Generation
Code Code Available 25 Multimodal Knowledge Alignment with Reinforcement Learning May 25, 2022 Audio captioning Language Modeling
Code Code Available 15 ADIFF: Explaining audio difference using natural language Feb 6, 2025 AudioCaps Audio captioning
Code Code Available 15 Audio Captioning Transformer Jul 21, 2021 AudioCaps Audio captioning
Code Code Available 15 Prefix tuning for automated audio captioning Mar 30, 2023 AudioCaps Audio captioning
Code Code Available 15 Audio Retrieval with WavText5K and CLAP Training Sep 28, 2022 AudioCaps Audio captioning
Code Code Available 15 Tell What You Hear From What You See -- Video to Audio Generation Through Text Nov 8, 2024 Audio captioning Audio Generation
Code Code Available 15 An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning Aug 5, 2021 Audio captioning Decoder
Code Code Available 15 Audio Retrieval with Natural Language Queries: A Benchmark Study Dec 17, 2021 AudioCaps Audio captioning
Code Code Available 15 RECAP: Retrieval-Augmented Audio Captioning Sep 18, 2023 AudioCaps Audio captioning
Code Code Available 15 Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention Oct 28, 2022 AudioCaps Audio captioning
Code Code Available 15 WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information Oct 21, 2020 Audio captioning Decoder
Code Code Available 15 Clotho: An Audio Captioning Dataset Oct 21, 2019 Audio captioning Diversity
Code Code Available 15 Zero-shot audio captioning with audio-language model guidance and audio context keywords Nov 14, 2023 Audio captioning Descriptive
Code Code Available 15 MusCaps: Generating Captions for Music Audio Apr 24, 2021 Audio captioning Classification
Code Code Available 15 CL4AC: A Contrastive Loss for Audio Captioning Jul 21, 2021 Audio captioning Decoder
Code Code Available 15 LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport Jan 16, 2025 AudioCaps Audio captioning
Code Code Available 15 THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING Jul 6, 2021 Audio captioning Audio Tagging
Code Code Available 15 Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates Nov 14, 2022 AudioCaps Audio captioning
Code Code Available 15 Can Audio Captions Be Evaluated with Image Caption Metrics? Oct 10, 2021 AudioCaps Audio captioning
Code Code Available 15 A Whisper transformer for audio captioning trained with synthetic captions and transfer learning May 15, 2023 Audio captioning Speech-to-Text
Code Code Available 15 Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding Jun 19, 2024 Audio captioning Decoder
Code Code Available 15 Training Audio Captioning Models without Audio Sep 14, 2023 Audio captioning Decoder
Code Code Available 15 SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs Oct 12, 2024 AudioCaps Audio captioning
Code Code Available 05 Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context Mar 19, 2025 Audio captioning Audio Question Answering
Code Code Available 05 AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS Nov 15, 2021 AudioCaps Audio captioning
Code Code Available 05 An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment Oct 8, 2024 Audio captioning Contrastive Learning
Code Code Available 05 Automated Audio Captioning and Language-Based Audio Retrieval Jul 8, 2022 Audio captioning Retrieval
Code Code Available 05 OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio Separation Sep 28, 2024 Audio captioning
Code Code Available 05 DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning Oct 12, 2024 Audio captioning Large Language Model
Code Code Available 05 AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning Nov 21, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 05 Language-based Audio Retrieval Task in DCASE 2022 Challenge Jun 13, 2022 Audio captioning Retrieval
Code Code Available 05