SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 101125 of 3012 papers

TitleStatusHype
Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization0
Whispering in Amharic: Fine-tuning Whisper for Low-resource Language0
Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication0
Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces0
ValSub: Subsampling Validation Data to Mitigate Forgetting during ASR Personalization0
Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment0
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR0
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling0
Building English ASR model with regional language support0
From Voice to Safety: Language AI Powered Pilot-ATC Communication Understanding for Airport Surface Movement Collision Risk Assessment0
Qieemo: Speech Is All You Need in the Emotion Recognition in Conversations0
Direct Speech to Speech Translation: A Review0
Fine-Tuning Whisper for Inclusive Prosodic Stress Analysis0
Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems0
LiteASR: Efficient Automatic Speech Recognition with Low-Rank ApproximationCode2
Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications0
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASRCode2
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision0
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition0
Exploring Gender Disparities in Automatic Speech Recognition Technology0
Improving the Inclusivity of Dutch Speech Recognition by Fine-tuning Whisper on the JASMIN-CGN CorpusCode0
Understanding Zero-shot Rare Word Recognition Improvements Through LLM Integration0
The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages0
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders0
Adopting Whisper for Confidence Estimation0
Show:102550
← PrevPage 5 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified