SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 20762100 of 3012 papers

TitleStatusHype
End-to-End Speech Recognition and Disfluency RemovalCode1
EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition0
Multi-modal embeddings using multi-task learning for emotion recognition0
Unmanned Aerial Vehicle Control Through Domain-based Automatic Speech Recognition0
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech RecognitionCode1
Robust Spoken Language Understanding with RL-based Value Error Recovery0
Silent Speech Interfaces for Speech Restoration: A Review0
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer0
Convolutional Speech Recognition with Pitch and Voice Quality Features0
Multi-view Attention-based Speech Enhancement Model for Noise-robust Automatic Speech Recognition0
Data augmentation using prosody and false starts to recognize non-native children's speechCode0
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition0
Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts0
Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus0
Cross-Utterance Language Models with Acoustic Error Sampling0
Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical StudyCode0
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces0
Sum-Product Networks for Robust Automatic Speaker IdentificationCode1
Large-scale Transfer Learning for Low-resource Spoken Language Understanding0
MASRI-HEADSET: A Maltese Corpus for Speech Recognition0
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition0
Transfer Learning Approaches for Streaming End-to-End Speech Recognition System0
Online Automatic Speech Recognition with Listen, Attend and Spell Model0
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker RecordingsCode1
Transformer with Bidirectional Decoder for Speech Recognition0
Show:102550
← PrevPage 84 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified