SOTAVerified

Automatic Speech Recognition

Papers

Showing 13011350 of 3174 papers

TitleStatusHype
Factual Consistency Oriented Speech Recognition0
Can Whisper perform speech-based in-context learning?0
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations0
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos0
Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition0
Flexible Multichannel Speech Enhancement for Noise-Robust Frontend0
Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios0
FluentNet: End-to-End Detection of Speech Disfluency with Deep Learning0
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models0
FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator0
Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition0
Fotheidil: an Automatic Transcription System for the Irish Language0
Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition0
Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license0
Can We Train a Language Model Inside an End-to-End ASR Model? - Investigating Effective Implicit Language Modeling0
Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition0
Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study0
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition0
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion0
From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition0
Accented Speech Recognition: A Survey0
Extracting Biomedical Entities from Noisy Audio Transcripts0
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?0
From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data0
FT Speech: Danish Parliament Speech Corpus0
Full-text Error Correction for Chinese Speech Recognition with Large Language Model0
Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin0
Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning0
Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices0
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR0
Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin0
Exploring WavLM Back-ends for Speech Spoofing and Deepfake Detection0
Fusing ASR Outputs in Joint Training for Speech Emotion Recognition0
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition0
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition0
Fusion Models for Improved Visual Captioning0
Exploring Transfer Learning For End-to-End Spoken Language Understanding0
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices0
Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition0
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR0
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition0
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems0
Exploring the Role of Audio in Video Captioning0
Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance0
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation0
Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model0
Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping0
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems0
Generating Synthetic Clinical Speech Data through Simulated ASR Deletion Error0
Are disentangled representations all you need to build speaker anonymization systems?0
Show:102550
← PrevPage 27 of 64Next →

No leaderboard results yet.