SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 101125 of 3012 papers

TitleStatusHype
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language ModelsCode1
Memory-augmented conformer for improved end-to-end long-form ASRCode1
HypR: A comprehensive study for ASR hypothesis revising with a reference corpusCode1
EnCodecMAE: Leveraging neural codecs for universal audio representation learningCode1
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data GenerationCode1
ÌròyìnSpeech: A multi-purpose Yorùbá Speech CorpusCode1
Adaptation of Whisper models to child speech recognitionCode1
A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-SupervisionCode1
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive LearningCode1
SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy MinimizationCode1
Can Contextual Biasing Remain Effective with Whisper and GPT-2?Code1
CopyNE: Better Contextual ASR by Copying Named EntitiesCode1
Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data AugmentationCode1
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech RecognitionCode1
Back Translation for Speech-to-text Translation Without TranscriptsCode1
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech RecognitionCode1
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker OneCode1
Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech RecognitionCode1
Audio-Visual Efficient Conformer for Robust Speech RecognitionCode1
Towards Voice Reconstruction from EEG during Imagined SpeechCode1
Skit-S2I: An Indian Accented Speech to Intent datasetCode1
BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithmCode1
SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-LabelsCode1
A Persian ASR-based SER: Modification of Sharif Emotional Speech Database and Investigation of Persian Text CorporaCode1
Towards Improved Room Impulse Response Estimation for Speech RecognitionCode1
Show:102550
← PrevPage 5 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified