SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 901925 of 3012 papers

TitleStatusHype
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation0
MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition0
UML: A Universal Monolingual Output Layer for Multilingual ASR0
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech RecognitionCode1
Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys0
An ASR-free Fluency Scoring Approach with Self-Supervised Learning0
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition0
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker OneCode1
Speaker and Language Change Detection using Wav2vec2 and Whisper0
Massively Multilingual Shallow Fusion with Large Language Models0
Stabilising and accelerating light gated recurrent units for automatic speech recognition0
Adaptive Axonal Delays in feedforward spiking neural networks for accurate spoken word recognition0
Speaker Change Detection for Transformer Transducer ASR0
Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax0
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition SystemsCode0
ASR Bundestag: A Large-Scale political debate dataset in German0
ASDF: A Differential Testing Framework for Automatic Speech Recognition SystemsCode0
PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction0
Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions0
MAC: A unified framework boosting low resource automatic speech recognition0
Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech RecognitionCode1
Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition0
Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives0
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case StudyCode0
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset0
Show:102550
← PrevPage 37 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified