Qwen2.5-Omni Technical Report Mar 26, 2025 Automatic Speech Recognition (ASR) GSM8K
Code Code Available 7GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot Dec 3, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 7Scaling Speech-Text Pre-training with Synthetic Interleaved Data Nov 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 7PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit May 20, 2022 All Automatic Speech Recognition (ASR)
Code Code Available 6FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration Jan 24, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 5StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model May 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages Mar 26, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation Mar 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play May 5, 2025 AI Agent Automatic Speech Recognition
Code Code Available 3VoiceBench: Benchmarking LLM-Based Voice Assistants Oct 22, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3WhisperNER: Unified Open Named Entity and Speech Recognition Sep 12, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3MooER: LLM-based Speech Recognition and Translation Models from Moore Threads Aug 9, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Sentiment Reasoning for Healthcare Jul 24, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation Jun 14, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3DiarizationLM: Speaker Diarization Post-Processing with Large Language Models Jan 7, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Nov 14, 2023 Acoustic Scene Classification Audio captioning
Code Code Available 3Delay-penalized transducer for low-latency streaming ASR Oct 31, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates Sep 27, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3A Parallelizable Lattice Rescoring Strategy with Neural Language Models Mar 8, 2021 ARC Automatic Speech Recognition
Code Code Available 3Conformer: Convolution-augmented Transformer for Speech Recognition May 16, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation May 12, 2018 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Feb 27, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR Feb 27, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition Dec 30, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency Dec 17, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Dialectal Coverage And Generalization in Arabic Speech Recognition Nov 7, 2024 Arabic Speech Recognition Automatic Speech Recognition
Code Code Available 2emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography Oct 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Recent Advances in Speech Language Models: A Survey Oct 1, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Large Language Models are Strong Audio-Visual Speech Recognition Learners Sep 18, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 2Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions Sep 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech Aug 8, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units Jul 5, 2024 Acoustic Unit Discovery Automatic Speech Recognition
Code Code Available 2Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings Mar 4, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2An Embarrassingly Simple Approach for LLM with Strong ASR Capacity Feb 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension Feb 12, 2024 2k Automatic Speech Recognition
Code Code Available 2Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation Feb 8, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Large Language Models are Efficient Learners of Noise-Robust Speech Recognition Jan 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition Oct 10, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models Oct 4, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels Mar 25, 2023 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 2NusaCrowd: Open Source Initiative for Indonesian NLP Resources Dec 19, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric Dec 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Towards A Unified Conformer Structure: from ASR to ASV Task Nov 14, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement Sep 22, 2022 Audio Super-Resolution Automatic Speech Recognition
Code Code Available 2TEVR: Improving Speech Recognition by Token Entropy Variance Reduction Jun 25, 2022 Automatic Speech Recognition (ASR) Language Modeling
Code Code Available 2SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning Jun 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Squeezeformer: An Efficient Transformer for Automatic Speech Recognition Jun 2, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2