SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 426450 of 3012 papers

TitleStatusHype
To Distill or Not to Distill? On the Robustness of Robust Knowledge DistillationCode0
Hypernetworks for Personalizing ASR to Atypical Speech0
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech RecognitionCode1
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores0
Text Injection for Neural Contextual Biasing0
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition0
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task LearningCode5
Enhancing CTC-based speech recognition with diverse modeling units0
Keyword-Guided Adaptation of Automatic Speech Recognition0
Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping0
Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach0
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning0
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities0
Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation0
A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and RecognitionCode1
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition0
Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language UnderstandingCode0
Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text RecognitionCode2
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation ModelsCode3
You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish0
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation0
FairLENS: Assessing Fairness in Law Enforcement Speech Recognition0
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models0
Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings0
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer0
Show:102550
← PrevPage 18 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified