SOTAVerified

Automatic Speech Recognition

Papers

Showing 401450 of 3174 papers

TitleStatusHype
A Theory of Unsupervised Speech RecognitionCode0
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for ConversationsCode0
Kurdish (Sorani) Speech to Text: Presenting an Experimental DatasetCode0
Multi-Sentence Resampling: A Simple Approach to Alleviate Dataset Length Bias and Beam-Search DegradationCode0
Improving LSTM-CTC based ASR performance in domains with limited training dataCode0
Improving CTC-based speech recognition via knowledge transferring from pre-trained language modelsCode0
A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality ConversionCode0
Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model DecodingCode0
Improving RNN Transducer Modeling for End-to-End Speech RecognitionCode0
HydraFormer: One Encoder For All Subsampling RatesCode0
Hybrid phonetic-neural model for correction in speech recognition systemsCode0
Hybrid ASR for Resource-Constrained Robots: HMM - Deep Learning FusionCode0
Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context ModelingCode0
HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanismCode0
A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognitionCode0
How You Say It Matters: Measuring the Impact of Verbal Disfluency Tags on Automated Dementia DetectionCode0
Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with Dynamic Stream WeightsCode0
A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision QuantizationCode0
HuBERT-EE: Early Exiting HuBERT for Efficient Speech RecognitionCode0
How Phonotactics Affect Multilingual and Zero-shot ASR PerformanceCode0
Human Transcription Quality ImprovementCode0
Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech RecognitionCode0
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer GeneratorCode0
Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations GenerationCode0
Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of SpeechCode0
Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech RecognitionCode0
A Simplified Fully Quantized Transformer for End-to-end Speech RecognitionCode0
AI-Generated Song Detection via Lyrics TranscriptsCode0
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASRCode0
A Unified Speaker Adaptation Approach for ASRCode0
FLEURS: Few-shot Learning Evaluation of Universal Representations of SpeechCode0
Assessing the Use of Prosody in Constituency Parsing of Imperfect TranscriptsCode0
Measuring the Accuracy of Automatic Speech Recognition SolutionsCode0
Finnish Parliament ASR corpus - Analysis, benchmarks and statisticsCode0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous ClientsCode0
Fine-Grained Grounding for Multimodal Speech RecognitionCode0
Multi-Stage Speaker Diarization for Noisy ClassroomsCode0
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative StudyCode0
Guiding Frame-Level CTC Alignments Using Self-knowledge DistillationCode0
Improving Voice Separation by Incorporating End-to-end Speech RecognitionCode0
Exploring Generative Error Correction for Dysarthric Speech RecognitionCode0
Explainability of Speech Recognition Transformers via Gradient-based Attention VisualizationCode0
Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event LocalizationCode0
FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech DataCode0
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech RepresentationCode0
Evaluating Variants of wav2vec 2.0 on Affective Vocal Burst TasksCode0
Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification TasksCode0
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech ToolkitCode0
Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation NetworkCode0
Show:102550
← PrevPage 9 of 64Next →

No leaderboard results yet.