SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 251275 of 3012 papers

TitleStatusHype
End-to-End Speech Recognition and Disfluency RemovalCode1
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech RecognitionCode1
Sum-Product Networks for Robust Automatic Speaker IdentificationCode1
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker RecordingsCode1
Distilling the Knowledge of BERT for Sequence-to-Sequence ASRCode1
Word Error Rate Estimation Without ASR Output: e-WER2Code1
Pretraining Techniques for Sequence-to-Sequence Voice ConversionCode1
Automatic Speech Recognition Benchmark for Air-Traffic CommunicationsCode1
AVLnet: Learning Audio-Visual Language Representations from Instructional VideosCode1
Learning to Count Words in Fluent Speech enables Online Speech RecognitionCode1
On the Comparison of Popular End-to-End Models for Large Scale Speech RecognitionCode1
Adapting End-to-End Speech Recognition for Readable SubtitlesCode1
End-to-end Named Entity Recognition from English SpeechCode1
PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASRCode1
Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech RecognitionCode1
Improved Noisy Student Training for Automatic Speech RecognitionCode1
Enhancing Monotonic Multihead Attention for Streaming ASRCode1
CTC-synchronous Training for Monotonic Attention ModelCode1
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global ContextCode1
A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applicationsCode1
ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact CentersCode1
Transformer based Grapheme-to-Phoneme ConversionCode1
Multi-modal Dense Video CaptioningCode1
Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and PruningCode1
Natural Language Processing Advancements By Deep Learning: A SurveyCode1
Show:102550
← PrevPage 11 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified