SOTAVerified

Automatic Speech Recognition

Papers

Showing 51100 of 3174 papers

TitleStatusHype
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU LanguagesCode2
Robust Self-Supervised Audio-Visual Speech RecognitionCode2
Large Language Models are Strong Audio-Visual Speech Recognition LearnersCode2
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
Fast Transformers with Clustered AttentionCode2
emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface ElectromyographyCode2
Dialectal Coverage And Generalization in Arabic Speech RecognitionCode2
An Embarrassingly Simple Approach for LLM with Strong ASR CapacityCode2
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech CodecCode2
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionCode2
Paralinguistics-Aware Speech-Empowered Large Language Models for Natural ConversationCode2
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific ExpertsCode1
Adaptation of Whisper models to child speech recognitionCode1
Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech RecognitionCode1
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for PolishCode1
Distilling the Knowledge of BERT for Sequence-to-Sequence ASRCode1
Dompteur: Taming Audio Adversarial ExamplesCode1
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognitionCode1
DiaCorrect: Error Correction Back-end For Speaker DiarizationCode1
Distilling a Pretrained Language Model to a Multilingual ASR ModelCode1
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech TranslationCode1
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across ModalitiesCode1
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech RecognitionCode1
CTC-synchronous Training for Monotonic Attention ModelCode1
Cross-modal information fusion for voice spoofing detectionCode1
D4AM: A General Denoising Framework for Downstream Acoustic ModelsCode1
Deep Contextualized Acoustic Representations For Semi-Supervised Speech RecognitionCode1
CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian PortugueseCode1
Cross Attention Augmented Transducer Networks for Simultaneous TranslationCode1
Continuous speech separation: dataset and analysisCode1
Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMICode1
Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation ModelsCode1
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech RecognitionCode1
Deep Sparse Conformer for Speech RecognitionCode1
Dual-Path Style Learning for End-to-End Noise-Robust Speech RecognitionCode1
A Cross-Modal Approach to Silent Speech with LLM-Enhanced RecognitionCode1
ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact CentersCode1
Combining Frame-Synchronous and Label-Synchronous Systems for Speech RecognitionCode1
Can Contextual Biasing Remain Effective with Whisper and GPT-2?Code1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationCode1
Can we use Common Voice to train a Multi-Speaker TTS system?Code1
CL-MASR: A Continual Learning Benchmark for Multilingual ASRCode1
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsCode1
Brazilian Portuguese Speech Recognition Using Wav2vec 2.0Code1
Adapting End-to-End Speech Recognition for Readable SubtitlesCode1
CB-Conformer: Contextual biasing Conformer for biased word recognitionCode1
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global ContextCode1
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy SpeechCode1
CopyNE: Better Contextual ASR by Copying Named EntitiesCode1
Common Voice: A Massively-Multilingual Speech CorpusCode1
Show:102550
← PrevPage 2 of 64Next →

No leaderboard results yet.