SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 876900 of 3012 papers

TitleStatusHype
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative StudyCode0
Transcription free filler word detection with Neural semi-CRFsCode0
MIXPGD: Hybrid Adversarial Training for Speech Recognition Systems0
Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings0
wav2vec and its current potential to Automatic Speech Recognition in German for the usage in Digital History: A comparative assessment of available ASR-technologies for the use in cultural heritage contexts0
End-to-End Speech Recognition: A Survey0
Leveraging Large Text Corpora for End-to-End Speech Summarization0
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages0
Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition0
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space0
Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech RecognitionCode0
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video0
Diacritic Recognition Performance in Arabic ASR0
MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition0
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model0
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit0
Multimodal Speech Recognition for Language-Guided Embodied AgentsCode0
A low latency attention module for streaming self-supervised speech representation learningCode0
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator0
Speech Corpora Divergence Based Unsupervised Data Selection for ASR0
Efficient Ensemble for Multimodal Punctuation Restoration using Time-Delay Neural NetworkCode0
Factual Consistency Oriented Speech Recognition0
Ensemble knowledge distillation of self-supervised speech models0
Improving Massively Multilingual ASR With Auxiliary CTC Objectives0
Evaluating Automatic Speech Recognition in an Incremental Setting0
Show:102550
← PrevPage 36 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified