SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 29262950 of 3012 papers

TitleStatusHype
RED-ACE: Robust Error Detection for ASR using Confidence EmbeddingsCode0
Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech RecognitionCode0
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition SystemsCode0
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasksCode0
Measuring the Accuracy of Automatic Speech Recognition SolutionsCode0
Written Term Detection Improves Spoken Term DetectionCode0
Reducing Language confusion for Code-switching Speech Recognition with Token-level Language DiarizationCode0
SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech RecognitionCode0
Two-stage Textual Knowledge Distillation for End-to-End Spoken Language UnderstandingCode0
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech RecognitionCode0
A Simplified Fully Quantized Transformer for End-to-end Speech RecognitionCode0
On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASRCode0
On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question AnsweringCode0
Thai Wav2Vec2.0 with CommonVoice V8Code0
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial DomainCode0
Unsupervised Online Continual Learning for Automatic Speech RecognitionCode0
FLEURS: Few-shot Learning Evaluation of Universal Representations of SpeechCode0
Rehearsal-Free Online Continual Learning for Automatic Speech RecognitionCode0
A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AICode0
Data augmentation using prosody and false starts to recognize non-native children's speechCode0
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition modelsCode0
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition SystemsCode0
mHuBERT-147: A Compact Multilingual HuBERT ModelCode0
Star Temporal Classification: Sequence Classification with Partially Labeled DataCode0
BehancePR: A Punctuation Restoration Dataset for Livestreaming Video TranscriptCode0
Show:102550
← PrevPage 118 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified