SOTAVerified

Automatic Speech Recognition

Papers

Showing 101150 of 3174 papers

TitleStatusHype
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming CapabilitiesCode1
Earnings-22: A Practical Benchmark for Accents in the WildCode1
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognitionCode1
EnCodecMAE: Leveraging neural codecs for universal audio representation learningCode1
End-to-end Named Entity Recognition from English SpeechCode1
Adaptation of Whisper models to child speech recognitionCode1
Cross-modal information fusion for voice spoofing detectionCode1
End-to-End Speech Recognition from Federated Acoustic ModelsCode1
Adapting End-to-End Speech Recognition for Readable SubtitlesCode1
Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-AttentionCode1
Extending Whisper with prompt tuning to target-speaker ASRCode1
Factorized Neural Transducer for Efficient Language Model AdaptationCode1
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech RecognitionCode1
CTC-synchronous Training for Monotonic Attention ModelCode1
CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian PortugueseCode1
FlanEC: Exploring Flan-T5 for Post-ASR Error CorrectionCode1
Cross Attention Augmented Transducer Networks for Simultaneous TranslationCode1
D4AM: A General Denoising Framework for Downstream Acoustic ModelsCode1
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for PolishCode1
How2: A Large-scale Dataset for Multimodal Language UnderstandingCode1
HowToCaption: Prompting LLMs to Transform Video Annotations at ScaleCode1
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language ModelsCode1
Improved DeepFake Detection Using Whisper FeaturesCode1
Improved Noisy Student Training for Automatic Speech RecognitionCode1
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language ModelCode1
Improving Mandarin Speech Recogntion with Block-augmented TransformerCode1
DiaCorrect: Error Correction Back-end For Speaker DiarizationCode1
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy SpeechCode1
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global ContextCode1
Continuous speech separation: dataset and analysisCode1
Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech RecognitionCode1
Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMICode1
Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation ModelsCode1
Combining Frame-Synchronous and Label-Synchronous Systems for Speech RecognitionCode1
Accented Speech Recognition With Accent-specific CodebooksCode1
Common Voice: A Massively-Multilingual Speech CorpusCode1
Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech RecognitionCode1
CopyNE: Better Contextual ASR by Copying Named EntitiesCode1
A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applicationsCode1
A context-aware knowledge transferring strategy for CTC-based ASRCode1
CB-Conformer: Contextual biasing Conformer for biased word recognitionCode1
Can Contextual Biasing Remain Effective with Whisper and GPT-2?Code1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationCode1
Can we use Common Voice to train a Multi-Speaker TTS system?Code1
BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG dataCode1
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsCode1
BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithmCode1
Advancing Test-Time Adaptation in Wild Acoustic Test SettingsCode1
BembaSpeech: A Speech Recognition Corpus for the Bemba LanguageCode1
AVLnet: Learning Audio-Visual Language Representations from Instructional VideosCode1
Show:102550
← PrevPage 3 of 64Next →

No leaderboard results yet.