SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 101150 of 3012 papers

TitleStatusHype
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language ModelsCode1
Memory-augmented conformer for improved end-to-end long-form ASRCode1
HypR: A comprehensive study for ASR hypothesis revising with a reference corpusCode1
EnCodecMAE: Leveraging neural codecs for universal audio representation learningCode1
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data GenerationCode1
ÌròyìnSpeech: A multi-purpose Yorùbá Speech CorpusCode1
Adaptation of Whisper models to child speech recognitionCode1
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive LearningCode1
A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-SupervisionCode1
SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy MinimizationCode1
Can Contextual Biasing Remain Effective with Whisper and GPT-2?Code1
CopyNE: Better Contextual ASR by Copying Named EntitiesCode1
Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data AugmentationCode1
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech RecognitionCode1
Back Translation for Speech-to-text Translation Without TranscriptsCode1
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech RecognitionCode1
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker OneCode1
Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech RecognitionCode1
Audio-Visual Efficient Conformer for Robust Speech RecognitionCode1
Towards Voice Reconstruction from EEG during Imagined SpeechCode1
Skit-S2I: An Indian Accented Speech to Intent datasetCode1
BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithmCode1
SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-LabelsCode1
A Persian ASR-based SER: Modification of Sharif Emotional Speech Database and Investigation of Persian Text CorporaCode1
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control CommunicationsCode1
Towards Improved Room Impulse Response Estimation for Speech RecognitionCode1
Multi-blank Transducers for Speech RecognitionCode1
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingCode1
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setupCode1
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive LearningCode1
Automatic Severity Classification of Dysarthric speech by using Self-supervised Model with Multi-task LearningCode1
There is more than one kind of robustness: Fooling Whisper with adversarial examplesCode1
ESB: A Benchmark For Multi-Domain End-to-End Speech RecognitionCode1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationCode1
Towards Relation Extraction From SpeechCode1
Can we use Common Voice to train a Multi-Speaker TTS system?Code1
A context-aware knowledge transferring strategy for CTC-based ASRCode1
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMTCode1
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representationsCode1
TVLT: Textless Vision-Language TransformerCode1
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LMCode1
Deep Sparse Conformer for Speech RecognitionCode1
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languagesCode1
ASR Error Correction with Constrained Decoding on Operation PredictionCode1
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognitionCode1
Improving Mandarin Speech Recogntion with Block-augmented TransformerCode1
Transfer Learning of wav2vec 2.0 for Automatic Lyric TranscriptionCode1
MM-ALT: A Multimodal Automatic Lyric Transcription SystemCode1
Distilling a Pretrained Language Model to a Multilingual ASR ModelCode1
A Systematic Comparison of Phonetic Aware Techniques for Speech EnhancementCode1
Show:102550
← PrevPage 3 of 61Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified