video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models Jun 18, 2025 Audio captioning Large Language Model
Code Code Available 2AC/DC: LLM-based Audio Comprehension via Dialogue Continuation Jun 12, 2025 AudioCaps Audio captioning
— Unverified 0CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer Jun 1, 2025 Audio captioning Language Modeling
— Unverified 0FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Jun 1, 2025 Audio captioning Caption Generation
Code Code Available 2Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning May 28, 2025 AudioCaps Audio captioning
— Unverified 0TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining May 12, 2025 Audio captioning Audio Generation
— Unverified 0M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP Mar 28, 2025 Audio captioning Audio Classification
Code Code Available 0Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context Mar 19, 2025 Audio captioning Audio Question Answering
Code Code Available 0Mellow: a small audio language model for reasoning Mar 11, 2025 Audio captioning Language Modeling
Code Code Available 2Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Mar 6, 2025 Audio captioning Language Modeling
— Unverified 0Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders Feb 21, 2025 Audio captioning Automatic Speech Recognition
— Unverified 0Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning Feb 8, 2025 AudioCaps Audio captioning
— Unverified 0ADIFF: Explaining audio difference using natural language Feb 6, 2025 AudioCaps Audio captioning
Code Code Available 1LLMs can see and hear without any training Jan 30, 2025 Audio captioning Image Generation
Code Code Available 3CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions Jan 28, 2025 Audio captioning Audio Generation
— Unverified 0LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport Jan 16, 2025 AudioCaps Audio captioning
Code Code Available 1Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model Jan 13, 2025 Audio captioning Instruction Following
— Unverified 0Classifier-Guided Captioning Across Modalities Jan 3, 2025 Audio captioning Video Captioning
— Unverified 0ETTA: Elucidating the Design Space of Text-to-Audio Models Dec 26, 2024 AudioCaps Audio captioning
Code Code Available 2AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models Nov 28, 2024 Audio captioning Audio to Text Retrieval
Code Code Available 2Tell What You Hear From What You See -- Video to Audio Generation Through Text Nov 8, 2024 Audio captioning Audio Generation
Code Code Available 1EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation Oct 15, 2024 Audio captioning Emotion Recognition
— Unverified 0Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning Oct 14, 2024 AudioCaps Audio captioning
— Unverified 0SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs Oct 12, 2024 AudioCaps Audio captioning
Code Code Available 0DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning Oct 12, 2024 Audio captioning Large Language Model
Code Code Available 0Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization Oct 9, 2024 Audio captioning Large Language Model
— Unverified 0An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment Oct 8, 2024 Audio captioning Contrastive Learning
Code Code Available 0OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio Separation Sep 28, 2024 Audio captioning
Code Code Available 0CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Sep 19, 2024 Audio captioning Language Modeling
Code Code Available 0Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models Sep 17, 2024 Audio captioning Instruction Following
— Unverified 0Towards Diverse and Efficient Audio Captioning via Diffusion Models Sep 14, 2024 Audio captioning Diversity
— Unverified 0Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models Sep 10, 2024 Audio captioning Audio Question Answering
— Unverified 0Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning Sep 2, 2024 Audio captioning Reranking
— Unverified 0EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance Sep 2, 2024 AudioCaps Audio captioning
Code Code Available 2Taming Data and Transformers for Audio Generation Jun 27, 2024 Audio captioning Audio Generation
Code Code Available 2Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding Jun 19, 2024 Audio captioning Decoder
Code Code Available 1Improving Text-To-Audio Models with Synthetic Captions Jun 18, 2024 AudioCaps Audio captioning
Code Code Available 5Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models Jun 12, 2024 Audio captioning Hallucination
Code Code Available 2Audio Dialogues: Dialogues dataset for audio and music understanding Apr 11, 2024 Audio captioning Audio Question Answering
— Unverified 0Improved Baselines for Data-efficient Perceptual Augmentation of LLMs Mar 20, 2024 Audio captioning Image Captioning
— Unverified 0Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Feb 2, 2024 Acoustic Scene Classification Audio captioning
Code Code Available 5EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning Jan 31, 2024 AudioCaps Audio captioning
Code Code Available 2Learning Audio Concepts from Counterfactual Natural Language Jan 10, 2024 Audio captioning Audio Classification
Code Code Available 0AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning Nov 21, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 0Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Nov 14, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 3Zero-shot audio captioning with audio-language model guidance and audio context keywords Nov 14, 2023 Audio captioning Descriptive
Code Code Available 1SALMONN: Towards Generic Hearing Abilities for Large Language Models Oct 20, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 3LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 2Weakly-supervised Automated Audio Captioning via text only training Sep 21, 2023 AudioCaps Audio captioning
Code Code Available 0Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning Sep 20, 2023 Audio captioning Caption Generation
— Unverified 0