SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 29012925 of 3012 papers

TitleStatusHype
LT-LM: a novel non-autoregressive language model for single-shot lattice rescoringCode0
Attentively Embracing Noise for Robust Latent Representation in BERTCode0
Analyzing the impact of speaker localization errors on speech separation for automatic speech recognitionCode0
Attention-based Multi-hypothesis Fusion for Speech SummarizationCode0
On-Device Neural Language Model Based Word PredictionCode0
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential DataCode0
Deep Spiking Neural Networks for Large Vocabulary Automatic Speech RecognitionCode0
Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the ArtCode0
Deep Learning for Audio Signal ProcessingCode0
Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations GenerationCode0
AdaCS: Adaptive Normalization for Enhanced Code-Switching ASRCode0
Spoken Language Intent Detection using Confusion2VecCode0
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer GeneratorCode0
Assessing the Use of Prosody in Constituency Parsing of Imperfect TranscriptsCode0
Sequential Randomized Smoothing for Adversarially Robust Speech RecognitionCode0
Training dynamic models using early exits for automatic speech recognition on resource-constrained devicesCode0
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech RecognitionCode0
Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics ProcessingCode0
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text NormalizationCode0
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the BibleCode0
Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition TaskCode0
Textless Dependency Parsing by Labeled Sequence PredictionCode0
Massively Multilingual Neural Grapheme-to-Phoneme ConversionCode0
On Out-of-Distribution Detection for Audio with Deep Nearest NeighborsCode0
Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of WolofCode0
Show:102550
← PrevPage 117 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified