SOTAVerified

Automatic Speech Recognition

Papers

Showing 151200 of 3174 papers

TitleStatusHype
MelHuBERT: A simplified HuBERT on Mel spectrogramsCode1
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple TargetsCode1
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control CommunicationsCode1
Towards Improved Room Impulse Response Estimation for Speech RecognitionCode1
Multi-blank Transducers for Speech RecognitionCode1
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingCode1
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive LearningCode1
Automatic Severity Classification of Dysarthric speech by using Self-supervised Model with Multi-task LearningCode1
There is more than one kind of robustness: Fooling Whisper with adversarial examplesCode1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationCode1
ESB: A Benchmark For Multi-Domain End-to-End Speech RecognitionCode1
Towards Relation Extraction From SpeechCode1
Can we use Common Voice to train a Multi-Speaker TTS system?Code1
A context-aware knowledge transferring strategy for CTC-based ASRCode1
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMTCode1
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LMCode1
Deep Sparse Conformer for Speech RecognitionCode1
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languagesCode1
ASR Error Correction with Constrained Decoding on Operation PredictionCode1
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognitionCode1
Improving Mandarin Speech Recogntion with Block-augmented TransformerCode1
Transfer Learning of wav2vec 2.0 for Automatic Lyric TranscriptionCode1
MM-ALT: A Multimodal Automatic Lyric Transcription SystemCode1
Distilling a Pretrained Language Model to a Multilingual ASR ModelCode1
A Systematic Comparison of Phonetic Aware Techniques for Speech EnhancementCode1
AVATAR: Unconstrained Audiovisual Speech RecognitionCode1
LAE: Language-Aware Encoder for Monolingual and Multilingual ASRCode1
Language Models with Image Descriptors are Strong Few-Shot Video-Language LearnersCode1
Vietnamese Automatic Speech Recognition using Wav2vec 2.0Code1
Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation AssessmentCode1
Speaker Recognition in the WildCode1
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo LanguagesCode1
A Survey on Non-Autoregressive Generation for Neural Machine Translation and BeyondCode1
Large-Scale Streaming End-to-End Speech Translation with Neural TransducersCode1
PriMock57: A Dataset Of Primary Care Mock ConsultationsCode1
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control CommunicationsCode1
indic-punct: An automatic punctuation restoration and inverse text normalization framework for Indic languagesCode1
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker ExtractionCode1
Streaming Speaker-Attributed ASR with Token-Level Speaker EmbeddingsCode1
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech RecognitionCode1
Integrating Lattice-Free MMI into End-to-End Speech RecognitionCode1
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERTCode1
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversionCode1
Earnings-22: A Practical Benchmark for Accents in the WildCode1
Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASRCode1
Dual-Path Style Learning for End-to-End Noise-Robust Speech RecognitionCode1
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech RecognitionCode1
Automatic Speech Recognition for Speech Assessment of Persian Preschool ChildrenCode1
Neural Predictor for Black-Box Adversarial Attacks on Speech RecognitionCode1
DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question AnsweringCode1
Show:102550
← PrevPage 4 of 64Next →

No leaderboard results yet.