SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 21012150 of 3012 papers

TitleStatusHype
VenoMave: Targeted Poisoning Against Speech RecognitionCode0
Towards End-to-End Training of Automatic Speech Recognition for Nigerian PidginCode0
Cascaded Models With Cyclic Feedback For Direct Speech Translation0
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks0
Knowledge Distillation for Improved Accuracy in Spoken Question Answering0
FastEmit: Low-latency Streaming ASR with Sequence-level Emission RegularizationCode0
Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction0
Knowledge Transfer for Efficient On-device False Trigger Mitigation0
Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal Event Detection from audio stream0
Towards Data Distillation for End-to-end Spoken Conversational Question Answering0
Studying the Similarity of COVID-19 Sounds based on Correlation Analysis of MFCC0
Non-intrusive speech intelligibility prediction using automatic speech recognition derived measures0
Multimodal Speech Recognition with Unstructured Audio Masking0
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions0
Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification0
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling0
Improving Low Resource Code-switched ASR using Augmented Code-switched TTS0
WER we are and WER we think we are0
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTSCode0
Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text CorpusCode0
Fine-Grained Grounding for Multimodal Speech RecognitionCode0
A Study on Lip Localization Techniques used for Lip reading from a Video0
FluentNet: End-to-End Detection of Speech Disfluency with Deep Learning0
End-to-End Learning of Speech 2D Feature-Trajectory for Prosthetic HandsCode0
EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition0
Multi-modal embeddings using multi-task learning for emotion recognition0
Unmanned Aerial Vehicle Control Through Domain-based Automatic Speech Recognition0
Robust Spoken Language Understanding with RL-based Value Error Recovery0
Silent Speech Interfaces for Speech Restoration: A Review0
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer0
Convolutional Speech Recognition with Pitch and Voice Quality Features0
Multi-view Attention-based Speech Enhancement Model for Noise-robust Automatic Speech Recognition0
Data augmentation using prosody and false starts to recognize non-native children's speechCode0
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition0
Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts0
Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus0
Cross-Utterance Language Models with Acoustic Error Sampling0
Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical StudyCode0
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces0
Large-scale Transfer Learning for Low-resource Spoken Language Understanding0
MASRI-HEADSET: A Maltese Corpus for Speech Recognition0
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition0
Transfer Learning Approaches for Streaming End-to-End Speech Recognition System0
Online Automatic Speech Recognition with Listen, Attend and Spell Model0
Transformer with Bidirectional Decoder for Speech Recognition0
Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition0
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition0
Investigation of Speaker-adaptation methods in Transformer based ASR0
Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition0
Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions0
Show:102550
← PrevPage 43 of 61Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified