SOTAVerified

Automatic Speech Recognition

Papers

Showing 101150 of 3174 papers

TitleStatusHype
Extending Whisper with prompt tuning to target-speaker ASRCode1
D4AM: A General Denoising Framework for Downstream Acoustic ModelsCode1
Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data AugmentationCode1
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer LearningCode1
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific ExpertsCode1
Automatic Disfluency Detection from Untranscribed SpeechCode1
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech TranslationCode1
Developing a Multilingual Dataset and Evaluation Metrics for Code-Switching: A Focus on Hong Kong's Polylingual DynamicsCode1
ArTST: Arabic Text and Speech TransformerCode1
CL-MASR: A Continual Learning Benchmark for Multilingual ASRCode1
Accented Speech Recognition With Accent-specific CodebooksCode1
Advancing Test-Time Adaptation in Wild Acoustic Test SettingsCode1
HowToCaption: Prompting LLMs to Transform Video Annotations at ScaleCode1
Speech collage: code-switched audio generation by collaging monolingual corporaCode1
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language ModelsCode1
Memory-augmented conformer for improved end-to-end long-form ASRCode1
HypR: A comprehensive study for ASR hypothesis revising with a reference corpusCode1
Unimodal Aggregation for CTC-based Speech RecognitionCode1
DiaCorrect: Error Correction Back-end For Speaker DiarizationCode1
EnCodecMAE: Leveraging neural codecs for universal audio representation learningCode1
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion EncoderCode1
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data GenerationCode1
ÌròyìnSpeech: A multi-purpose Yorùbá Speech CorpusCode1
Learning Multi-modal Representations by Watching Hundreds of Surgical Video LecturesCode1
Adaptation of Whisper models to child speech recognitionCode1
A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-SupervisionCode1
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive LearningCode1
Quilt-1M: One Million Image-Text Pairs for HistopathologyCode1
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech RepresentationCode1
SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy MinimizationCode1
Improved DeepFake Detection Using Whisper FeaturesCode1
Can Contextual Biasing Remain Effective with Whisper and GPT-2?Code1
Scaling Speech Technology to 1,000+ LanguagesCode1
CopyNE: Better Contextual ASR by Copying Named EntitiesCode1
Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data AugmentationCode1
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech RecognitionCode1
Back Translation for Speech-to-text Translation Without TranscriptsCode1
CB-Conformer: Contextual biasing Conformer for biased word recognitionCode1
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLPCode1
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech RecognitionCode1
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker OneCode1
Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech RecognitionCode1
Cross-modal information fusion for voice spoofing detectionCode1
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical DistillationCode1
Audio-Visual Efficient Conformer for Robust Speech RecognitionCode1
Towards Voice Reconstruction from EEG during Imagined SpeechCode1
Skit-S2I: An Indian Accented Speech to Intent datasetCode1
BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithmCode1
SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-LabelsCode1
A Persian ASR-based SER: Modification of Sharif Emotional Speech Database and Investigation of Persian Text CorporaCode1
Show:102550
← PrevPage 3 of 64Next →

No leaderboard results yet.