Improving Text-To-Audio Models with Synthetic Captions Jun 18, 2024 AudioCaps Audio captioning
Code Code Available 5Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Feb 2, 2024 Acoustic Scene Classification Audio captioning
Code Code Available 5LLMs can see and hear without any training Jan 30, 2025 Audio captioning Image Generation
Code Code Available 3Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Nov 14, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 3SALMONN: Towards Generic Hearing Abilities for Large Language Models Oct 20, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 3video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models Jun 18, 2025 Audio captioning Large Language Model
Code Code Available 2FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Jun 1, 2025 Audio captioning Caption Generation
Code Code Available 2Mellow: a small audio language model for reasoning Mar 11, 2025 Audio captioning Language Modeling
Code Code Available 2ETTA: Elucidating the Design Space of Text-to-Audio Models Dec 26, 2024 AudioCaps Audio captioning
Code Code Available 2AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models Nov 28, 2024 Audio captioning Audio to Text Retrieval
Code Code Available 2EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance Sep 2, 2024 AudioCaps Audio captioning
Code Code Available 2Taming Data and Transformers for Audio Generation Jun 27, 2024 Audio captioning Audio Generation
Code Code Available 2Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models Jun 12, 2024 Audio captioning Hallucination
Code Code Available 2EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning Jan 31, 2024 AudioCaps Audio captioning
Code Code Available 2LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 2VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset May 29, 2023 Audio captioning Audio-Visual Captioning
Code Code Available 2Pengi: An Audio Language Model for Audio Tasks May 19, 2023 Audio captioning Audio Question Answering
Code Code Available 2VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Apr 17, 2023 Audio captioning Audio-Video Question Answering (AVQA)
Code Code Available 2WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research Mar 30, 2023 Audio captioning Event Detection
Code Code Available 2ADIFF: Explaining audio difference using natural language Feb 6, 2025 AudioCaps Audio captioning
Code Code Available 1LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport Jan 16, 2025 AudioCaps Audio captioning
Code Code Available 1Tell What You Hear From What You See -- Video to Audio Generation Through Text Nov 8, 2024 Audio captioning Audio Generation
Code Code Available 1Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding Jun 19, 2024 Audio captioning Decoder
Code Code Available 1Zero-shot audio captioning with audio-language model guidance and audio context keywords Nov 14, 2023 Audio captioning Descriptive
Code Code Available 1RECAP: Retrieval-Augmented Audio Captioning Sep 18, 2023 AudioCaps Audio captioning
Code Code Available 1Training Audio Captioning Models without Audio Sep 14, 2023 Audio captioning Decoder
Code Code Available 1A Whisper transformer for audio captioning trained with synthetic captions and transfer learning May 15, 2023 Audio captioning Speech-to-Text
Code Code Available 1Prefix tuning for automated audio captioning Mar 30, 2023 AudioCaps Audio captioning
Code Code Available 1Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates Nov 14, 2022 AudioCaps Audio captioning
Code Code Available 1Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention Oct 28, 2022 AudioCaps Audio captioning
Code Code Available 1Audio Retrieval with WavText5K and CLAP Training Sep 28, 2022 AudioCaps Audio captioning
Code Code Available 1Multimodal Knowledge Alignment with Reinforcement Learning May 25, 2022 Audio captioning Language Modeling
Code Code Available 1Audio Retrieval with Natural Language Queries: A Benchmark Study Dec 17, 2021 AudioCaps Audio captioning
Code Code Available 1Can Audio Captions Be Evaluated with Image Caption Metrics? Oct 10, 2021 AudioCaps Audio captioning
Code Code Available 1An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning Aug 5, 2021 Audio captioning Decoder
Code Code Available 1Audio Captioning Transformer Jul 21, 2021 AudioCaps Audio captioning
Code Code Available 1CL4AC: A Contrastive Loss for Audio Captioning Jul 21, 2021 Audio captioning Decoder
Code Code Available 1THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING Jul 6, 2021 Audio captioning Audio Tagging
Code Code Available 1MusCaps: Generating Captions for Music Audio Apr 24, 2021 Audio captioning Classification
Code Code Available 1WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information Oct 21, 2020 Audio captioning Decoder
Code Code Available 1Clotho: An Audio Captioning Dataset Oct 21, 2019 Audio captioning Diversity
Code Code Available 1AC/DC: LLM-based Audio Comprehension via Dialogue Continuation Jun 12, 2025 AudioCaps Audio captioning
— Unverified 0CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer Jun 1, 2025 Audio captioning Language Modeling
— Unverified 0Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning May 28, 2025 AudioCaps Audio captioning
— Unverified 0TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining May 12, 2025 Audio captioning Audio Generation
— Unverified 0M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP Mar 28, 2025 Audio captioning Audio Classification
Code Code Available 0Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context Mar 19, 2025 Audio captioning Audio Question Answering
Code Code Available 0Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Mar 6, 2025 Audio captioning Language Modeling
— Unverified 0Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders Feb 21, 2025 Audio captioning Automatic Speech Recognition
— Unverified 0Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning Feb 8, 2025 AudioCaps Audio captioning
— Unverified 0