CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training May 23, 2025 Automatic Speech Recognition Emotion Recognition
Code Code Available 11CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 11FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Jul 4, 2024 Emotion Recognition Event Detection
Code Code Available 11Moonshine: Speech Recognition for Live Transcription and Voice Commands Oct 21, 2024 Decoder Position
Code Code Available 9Moshi: a speech-text foundation model for real-time dialogue Sep 17, 2024 Action Detection Activity Detection
Code Code Available 9Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition Jul 17, 2023 Decoder Language Modeling
Code Code Available 8Robust Speech Recognition via Large-Scale Weak Supervision Dec 6, 2022 Robust Speech Recognition speech-recognition
Code Code Available 8Speechless: Speech Instruction Training Without Speech for Low Resource Languages May 23, 2025 speech-recognition Speech Recognition
Code Code Available 7Kimi-Audio Technical Report Apr 25, 2025 Audio Question Answering Question Answering
Code Code Available 7Qwen2.5-Omni Technical Report Mar 26, 2025 Automatic Speech Recognition (ASR) GSM8K
Code Code Available 7GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot Dec 3, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 7Scaling Speech-Text Pre-training with Synthetic Interleaved Data Nov 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 7Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant Oct 20, 2024 Question Answering speech-recognition
Code Code Available 7OxfordVGG Submission to the EGO4D AV Transcription Challenge Jul 18, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 6PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit May 20, 2022 All Automatic Speech Recognition (ASR)
Code Code Available 6FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration Jan 24, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 5StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit Mar 29, 2022 Decoder Language Modelling
Code Code Available 5GigaAM: Efficient Self-Supervised Learner for Speech Recognition Jun 1, 2025 Automatic Speech Recognition Language Modeling
Code Code Available 4Multi-head Temporal Latent Attention May 19, 2025 GPU speech-recognition
Code Code Available 4VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model May 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages Mar 26, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions Aug 29, 2024 Dynamic Time Warping speech-recognition
Code Code Available 4The Llama 3 Herd of Models Jul 31, 2024 answerability prediction Language Modeling
Code Code Available 4A Survey on Vision-Language-Action Models for Embodied AI May 23, 2024 Image Captioning Instruction Following
Code Code Available 4Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System May 17, 2024 Data Augmentation Speech Dereverberation
Code Code Available 4SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation Mar 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Nov 1, 2023 Hallucination Knowledge Distillation
Code Code Available 4TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch Oct 27, 2023 Self-Supervised Learning Speech Enhancement
Code Code Available 4Turning Whisper into Real-Time Transcription System Jul 27, 2023 speech-recognition Speech Recognition
Code Code Available 4Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play May 5, 2025 AI Agent Automatic Speech Recognition
Code Code Available 3mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition Feb 3, 2025 Audio-Visual Speech Recognition Decoder
Code Code Available 3OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia Jan 23, 2025 Emotion Recognition Event Detection
Code Code Available 3WavChat: A Survey of Spoken Dialogue Models Nov 15, 2024 speech-recognition Speech Recognition
Code Code Available 3VoiceBench: Benchmarking LLM-Based Voice Assistants Oct 22, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction Sep 24, 2024 Management speech-recognition
Code Code Available 3WhisperNER: Unified Open Named Entity and Speech Recognition Sep 12, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3MooER: LLM-based Speech Recognition and Translation Models from Moore Threads Aug 9, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Sentiment Reasoning for Healthcare Jul 24, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement Jun 17, 2024 speech-recognition Speech Recognition
Code Code Available 3Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation Jun 14, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3PhoWhisper: Automatic Speech Recognition for Vietnamese Mar 27, 2024 Automatic Speech Recognition speech-recognition
Code Code Available 3Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing Feb 23, 2024 Lipreading Lip Reading
Code Code Available 3DiarizationLM: Speaker Diarization Post-Processing with Large Language Models Jan 7, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Nov 14, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 3SALMONN: Towards Generic Hearing Abilities for Large Language Models Oct 20, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 3SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities May 18, 2023 Language Modeling Language Modelling
Code Code Available 3X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages May 7, 2023 Attribute Instruction Following
Code Code Available 3Delay-penalized transducer for low-latency streaming ASR Oct 31, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3