GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot Dec 3, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 75 Scaling Speech-Text Pre-training with Synthetic Interleaved Data Nov 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 75 Qwen2.5-Omni Technical Report Mar 26, 2025 Automatic Speech Recognition (ASR) GSM8K
Code Code Available 75 PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit May 20, 2022 All Automatic Speech Recognition (ASR)
Code Code Available 65 StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 55 FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration Jan 24, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 55 SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation Mar 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 45 VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model May 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 45 Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages Mar 26, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 45 Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Nov 14, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 35 TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation May 12, 2018 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 Conformer: Convolution-augmented Transformer for Speech Recognition May 16, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 DiarizationLM: Speaker Diarization Post-Processing with Large Language Models Jan 7, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 MooER: LLM-based Speech Recognition and Translation Models from Moore Threads Aug 9, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 Sentiment Reasoning for Healthcare Jul 24, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates Sep 27, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation Jun 14, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 WhisperNER: Unified Open Named Entity and Speech Recognition Sep 12, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 VoiceBench: Benchmarking LLM-Based Voice Assistants Oct 22, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 Delay-penalized transducer for low-latency streaming ASR Oct 31, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 A Parallelizable Lattice Rescoring Strategy with Neural Language Models Mar 8, 2021 ARC Automatic Speech Recognition
Code Code Available 35 Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play May 5, 2025 AI Agent Automatic Speech Recognition
Code Code Available 35 Towards A Unified Conformer Structure: from ASR to ASV Task Nov 14, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency Dec 17, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation Feb 8, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning Jun 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 An Embarrassingly Simple Approach for LLM with Strong ASR Capacity Feb 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension Feb 12, 2024 2k Automatic Speech Recognition
Code Code Available 25 TEVR: Improving Speech Recognition by Token Entropy Variance Reduction Jun 25, 2022 Automatic Speech Recognition (ASR) Language Modeling
Code Code Available 25 Recent Advances in Speech Language Models: A Survey Oct 1, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Robust Self-Supervised Audio-Visual Speech Recognition Jan 5, 2022 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 25 Squeezeformer: An Efficient Transformer for Automatic Speech Recognition Jun 2, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 NusaCrowd: Open Source Initiative for Indonesian NLP Resources Dec 19, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Feb 27, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings Mar 4, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Jan 5, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions Sep 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units Jul 5, 2024 Acoustic Unit Discovery Automatic Speech Recognition
Code Code Available 25 Large Language Models are Efficient Learners of Noise-Robust Speech Recognition Jan 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Large Language Models are Strong Audio-Visual Speech Recognition Learners Sep 18, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 25 Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels Mar 25, 2023 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 25 LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models Oct 4, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography Oct 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Dialectal Coverage And Generalization in Arabic Speech Recognition Nov 7, 2024 Arabic Speech Recognition Automatic Speech Recognition
Code Code Available 25 DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition Dec 30, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Fast Transformers with Clustered Attention Jul 9, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 4-bit Conformer with Native Quantization Aware Training for Speech Recognition Mar 29, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 CMGAN: Conformer-based Metric GAN for Speech Enhancement Mar 28, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25