SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 17511800 of 3012 papers

TitleStatusHype
Transformer ASR with Contextual Block Processing0
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation0
Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project0
Transformer-based Model for ASR N-Best Rescoring and Rewriting0
Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture0
Transformer-based Streaming ASR with Cumulative Attention0
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video0
Transformer-Transducers for Code-Switched Speech Recognition0
Transformer with Bidirectional Decoder for Speech Recognition0
Transforming NLU with Babylon: A Case Study in Development of Real-time, Edge-Efficient, Multi-Intent Translation System for Automated Drive-Thru Ordering0
Transliteration Better than Translation? Answering Code-mixed Questions over a Knowledge Base0
TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition0
Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition0
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition0
Tropical Modeling of Weighted Transducer Algorithms on Graphs0
TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection0
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability0
TTS Skins: Speaker Conversion via ASR0
TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation0
Tutorial Proposal: End-to-End Speech Translation0
Two Front-Ends, One Model : Fusing Heterogeneous Speech Features for Low Resource ASR with Multilingual Pre-Training0
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems0
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR0
Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews0
A Multi-level Acoustic Feature Extraction Framework for Transformer Based End-to-End Speech Recognition0
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF0
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction0
UFO2: A unified pre-training framework for online and offline speech recognition0
UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition0
UML: A Universal Monolingual Output Layer for Multilingual ASR0
Understanding Semantics from Speech Through Pre-training0
Understanding Shared Speech-Text Representations0
Understanding the Role of Self Attention for Efficient Speech Recognition0
Understanding Zero-shot Rare Word Recognition Improvements Through LLM Integration0
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation0
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems0
Unified Modeling of Multi-Domain Multi-Device ASR Systems0
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator0
Unifying Streaming and Non-streaming Zipformer-based ASR0
Unintended Memorization in Large ASR Models, and How to Mitigate It0
Universal-2-TF: Robust All-Neural Text Formatting for ASR0
Universal Adversarial Perturbations for Speech Recognition Systems0
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling0
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions0
Unmanned Aerial Vehicle Control Through Domain-based Automatic Speech Recognition0
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition0
Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition0
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition0
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR0
Unsupervised ASR via Cross-Lingual Pseudo-Labeling0
Show:102550
← PrevPage 36 of 61Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified