SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 651700 of 3012 papers

TitleStatusHype
An efficient text augmentation approach for contextualized Mandarin speech recognition0
Optimizing Byte-level Representation for End-to-end ASR0
Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation0
ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR0
Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn'tCode0
Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment0
The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments0
LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related TasksCode0
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition0
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion TechniquesCode0
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion0
Transformer-based Model for ASR N-Best Rescoring and Rewriting0
Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data0
Towards Unsupervised Speech Recognition Without Pronunciation ModelsCode0
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets0
Guiding Frame-Level CTC Alignments Using Self-knowledge DistillationCode0
PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding0
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter0
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection0
Reading Miscue Detection in Primary School through Automatic Speech Recognition0
mHuBERT-147: A Compact Multilingual HuBERT ModelCode0
ASTRA: Aligning Speech and Text Representations for Asr without Sampling0
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations0
Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis0
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR0
Flexible Multichannel Speech Enhancement for Noise-Robust Frontend0
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores0
Hypernetworks for Personalizing ASR to Atypical Speech0
To Distill or Not to Distill? On the Robustness of Robust Knowledge DistillationCode0
Enhancing CTC-based speech recognition with diverse modeling units0
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition0
Text Injection for Neural Contextual Biasing0
Keyword-Guided Adaptation of Automatic Speech Recognition0
Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping0
Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach0
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning0
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities0
Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation0
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition0
Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language UnderstandingCode0
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation0
You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish0
FairLENS: Assessing Fairness in Law Enforcement Speech Recognition0
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models0
Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings0
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer0
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech0
Open Implementation and Study of BEST-RQ for Speech Processing0
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition0
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition0
Show:102550
← PrevPage 14 of 61Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified