CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training May 23, 2025 Automatic Speech Recognition Emotion Recognition
Code Code Available 11CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 11FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Jul 4, 2024 Emotion Recognition Event Detection
Code Code Available 11Moonshine: Speech Recognition for Live Transcription and Voice Commands Oct 21, 2024 Decoder Position
Code Code Available 9Moshi: a speech-text foundation model for real-time dialogue Sep 17, 2024 Action Detection Activity Detection
Code Code Available 9Robust Speech Recognition via Large-Scale Weak Supervision Dec 6, 2022 Robust Speech Recognition speech-recognition
Code Code Available 8Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition Jul 17, 2023 Decoder Language Modeling
Code Code Available 8Speechless: Speech Instruction Training Without Speech for Low Resource Languages May 23, 2025 speech-recognition Speech Recognition
Code Code Available 7Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant Oct 20, 2024 Question Answering speech-recognition
Code Code Available 7Qwen2.5-Omni Technical Report Mar 26, 2025 Automatic Speech Recognition (ASR) GSM8K
Code Code Available 7Scaling Speech-Text Pre-training with Synthetic Interleaved Data Nov 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 7GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot Dec 3, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 7Kimi-Audio Technical Report Apr 25, 2025 Audio Question Answering Question Answering
Code Code Available 7PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit May 20, 2022 All Automatic Speech Recognition (ASR)
Code Code Available 6OxfordVGG Submission to the EGO4D AV Transcription Challenge Jul 18, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 6StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration Jan 24, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 5WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit Mar 29, 2022 Decoder Language Modelling
Code Code Available 5VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model May 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4The Llama 3 Herd of Models Jul 31, 2024 answerability prediction Language Modeling
Code Code Available 4Turning Whisper into Real-Time Transcription System Jul 27, 2023 speech-recognition Speech Recognition
Code Code Available 4GigaAM: Efficient Self-Supervised Learner for Speech Recognition Jun 1, 2025 Automatic Speech Recognition Language Modeling
Code Code Available 4CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions Aug 29, 2024 Dynamic Time Warping speech-recognition
Code Code Available 4Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages Mar 26, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation Mar 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System May 17, 2024 Data Augmentation Speech Dereverberation
Code Code Available 4Multi-head Temporal Latent Attention May 19, 2025 GPU speech-recognition
Code Code Available 4TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch Oct 27, 2023 Self-Supervised Learning Speech Enhancement
Code Code Available 4A Survey on Vision-Language-Action Models for Embodied AI May 23, 2024 Image Captioning Instruction Following
Code Code Available 4Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Nov 1, 2023 Hallucination Knowledge Distillation
Code Code Available 4VoiceBench: Benchmarking LLM-Based Voice Assistants Oct 22, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement Jun 17, 2024 speech-recognition Speech Recognition
Code Code Available 3A Parallelizable Lattice Rescoring Strategy with Neural Language Models Mar 8, 2021 ARC Automatic Speech Recognition
Code Code Available 3Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates Sep 27, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation May 12, 2018 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play May 5, 2025 AI Agent Automatic Speech Recognition
Code Code Available 3Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Delay-penalized transducer for low-latency streaming ASR Oct 31, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Semi-Supervised Speech Recognition via Local Prior Matching Feb 24, 2020 Knowledge Distillation Language Modeling
Code Code Available 3Datasets: A Community Library for Natural Language Processing Sep 7, 2021 Image Classification Object Recognition
Code Code Available 3Sentiment Reasoning for Healthcare Jul 24, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3SALMONN: Towards Generic Hearing Abilities for Large Language Models Oct 20, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 3PhoWhisper: Automatic Speech Recognition for Vietnamese Mar 27, 2024 Automatic Speech Recognition speech-recognition
Code Code Available 3DiarizationLM: Speaker Diarization Post-Processing with Large Language Models Jan 7, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia Jan 23, 2025 Emotion Recognition Event Detection
Code Code Available 3Conformer: Convolution-augmented Transformer for Speech Recognition May 16, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3MooER: LLM-based Speech Recognition and Translation Models from Moore Threads Aug 9, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition Feb 3, 2025 Audio-Visual Speech Recognition Decoder
Code Code Available 3Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Nov 14, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 3SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities May 18, 2023 Language Modeling Language Modelling
Code Code Available 3