SOTAVerified

Automatic Speech Recognition

Papers

Showing 51100 of 3174 papers

TitleStatusHype
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionCode2
Large Language Models are Strong Audio-Visual Speech Recognition LearnersCode2
4-bit Conformer with Native Quantization Aware Training for Speech RecognitionCode2
DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech RecognitionCode2
Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text RecognitionCode2
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech CodecCode2
emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface ElectromyographyCode2
Fast Transformers with Clustered AttentionCode2
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR ModelsCode2
Large Language Models are Efficient Learners of Noise-Robust Speech RecognitionCode2
Robust Self-Supervised Audio-Visual Speech RecognitionCode2
A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and RecognitionCode1
Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling InsightsCode1
Automatic Speech Recognition for Speech Assessment of Persian Preschool ChildrenCode1
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for PolishCode1
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNetCode1
Dompteur: Taming Audio Adversarial ExamplesCode1
Automatic Speech Recognition Benchmark for Air-Traffic CommunicationsCode1
Automatic Severity Classification of Dysarthric speech by using Self-supervised Model with Multi-task LearningCode1
Distilling the Knowledge of BERT for Sequence-to-Sequence ASRCode1
Automatic Disfluency Detection from Untranscribed SpeechCode1
AVATAR: Unconstrained Audiovisual Speech RecognitionCode1
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific ExpertsCode1
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech TranslationCode1
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation ModelsCode1
Audio-Visual Efficient Conformer for Robust Speech RecognitionCode1
Distilling a Pretrained Language Model to a Multilingual ASR ModelCode1
Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language TextCode1
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognitionCode1
AVLnet: Learning Audio-Visual Language Representations from Instructional VideosCode1
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control CommunicationsCode1
Deep Sparse Conformer for Speech RecognitionCode1
DiaCorrect: Error Correction Back-end For Speaker DiarizationCode1
Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech RecognitionCode1
Dual-Path Style Learning for End-to-End Noise-Robust Speech RecognitionCode1
CTC-synchronous Training for Monotonic Attention ModelCode1
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversionCode1
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech RecognitionCode1
D4AM: A General Denoising Framework for Downstream Acoustic ModelsCode1
ArTST: Arabic Text and Speech TransformerCode1
A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-SupervisionCode1
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker OneCode1
Cross-modal information fusion for voice spoofing detectionCode1
A Cross-Modal Approach to Silent Speech with LLM-Enhanced RecognitionCode1
A Survey on Non-Autoregressive Generation for Neural Machine Translation and BeyondCode1
A Systematic Comparison of Phonetic Aware Techniques for Speech EnhancementCode1
Attention-based Audio-Visual Fusion for Robust Automatic Speech RecognitionCode1
A transfer learning based approach for pronunciation scoringCode1
Attention-based Contextual Language Model Adaptation for Speech RecognitionCode1
ASR Error Correction with Constrained Decoding on Operation PredictionCode1
Show:102550
← PrevPage 2 of 64Next →

No leaderboard results yet.