SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 29012950 of 3012 papers

TitleStatusHype
LT-LM: a novel non-autoregressive language model for single-shot lattice rescoringCode0
Attentively Embracing Noise for Robust Latent Representation in BERTCode0
Analyzing the impact of speaker localization errors on speech separation for automatic speech recognitionCode0
Attention-based Multi-hypothesis Fusion for Speech SummarizationCode0
On-Device Neural Language Model Based Word PredictionCode0
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential DataCode0
Deep Spiking Neural Networks for Large Vocabulary Automatic Speech RecognitionCode0
Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the ArtCode0
Deep Learning for Audio Signal ProcessingCode0
Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations GenerationCode0
AdaCS: Adaptive Normalization for Enhanced Code-Switching ASRCode0
Spoken Language Intent Detection using Confusion2VecCode0
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer GeneratorCode0
Assessing the Use of Prosody in Constituency Parsing of Imperfect TranscriptsCode0
Sequential Randomized Smoothing for Adversarially Robust Speech RecognitionCode0
Training dynamic models using early exits for automatic speech recognition on resource-constrained devicesCode0
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech RecognitionCode0
Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics ProcessingCode0
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text NormalizationCode0
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the BibleCode0
Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition TaskCode0
Textless Dependency Parsing by Labeled Sequence PredictionCode0
Massively Multilingual Neural Grapheme-to-Phoneme ConversionCode0
On Out-of-Distribution Detection for Audio with Deep Nearest NeighborsCode0
Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of WolofCode0
RED-ACE: Robust Error Detection for ASR using Confidence EmbeddingsCode0
Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech RecognitionCode0
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition SystemsCode0
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasksCode0
Measuring the Accuracy of Automatic Speech Recognition SolutionsCode0
Written Term Detection Improves Spoken Term DetectionCode0
Reducing Language confusion for Code-switching Speech Recognition with Token-level Language DiarizationCode0
SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech RecognitionCode0
Two-stage Textual Knowledge Distillation for End-to-End Spoken Language UnderstandingCode0
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech RecognitionCode0
A Simplified Fully Quantized Transformer for End-to-end Speech RecognitionCode0
On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASRCode0
On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question AnsweringCode0
Thai Wav2Vec2.0 with CommonVoice V8Code0
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial DomainCode0
Unsupervised Online Continual Learning for Automatic Speech RecognitionCode0
FLEURS: Few-shot Learning Evaluation of Universal Representations of SpeechCode0
Rehearsal-Free Online Continual Learning for Automatic Speech RecognitionCode0
A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AICode0
Data augmentation using prosody and false starts to recognize non-native children's speechCode0
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition modelsCode0
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition SystemsCode0
mHuBERT-147: A Compact Multilingual HuBERT ModelCode0
Star Temporal Classification: Sequence Classification with Partially Labeled DataCode0
BehancePR: A Punctuation Restoration Dataset for Livestreaming Video TranscriptCode0
Show:102550
← PrevPage 59 of 61Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified