SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 28262850 of 3012 papers

TitleStatusHype
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distributionCode0
Syllable Subword Tokens for Open Vocabulary Speech Recognition in MalayalamCode0
A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality ConversionCode0
Enhancing Quantised End-to-End ASR Models via PersonalisationCode0
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive DecodingCode0
Whose Emotion Matters? Speaking Activity Localisation without Prior KnowledgeCode0
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case StudyCode0
Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation NetworkCode0
Hybrid phonetic-neural model for correction in speech recognition systemsCode0
Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LMCode0
Self-supervised Speech Representations Still Struggle with African American Vernacular EnglishCode0
NeMo Inverse Text Normalization: From Development To ProductionCode0
End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive EnvelopesCode0
Conditional independence for pretext task selection in Self-supervised speech representation learningCode0
Neural Architecture Search For LF-MMI Trained Time Delay Neural NetworksCode0
HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanismCode0
ADIMA: Abuse Detection In Multilingual AudioCode0
Semantically Corrected Amharic Automatic Speech RecognitionCode0
Semantically Meaningful Metrics for Norwegian ASR SystemsCode0
Towards Temporally Explainable Dysarthric Speech Clarity AssessmentCode0
Hybrid ASR for Resource-Constrained Robots: HMM - Deep Learning FusionCode0
Light Gated Recurrent Units for Speech RecognitionCode0
Human Transcription Quality ImprovementCode0
A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognitionCode0
HuBERT-EE: Early Exiting HuBERT for Efficient Speech RecognitionCode0
Show:102550
← PrevPage 114 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified