SOTAVerified

Automatic Speech Recognition

Papers

Showing 31013150 of 3174 papers

TitleStatusHype
Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error CorrectionCode0
FastEmit: Low-latency Streaming ASR with Sequence-level Emission RegularizationCode0
Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2SeqCode0
Optimized Speculative Sampling for GPU Hardware AcceleratorsCode0
Boosting Cross-Domain Speech Recognition with Self-SupervisionCode0
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition SystemsCode0
SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition SystemsCode0
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition SystemsCode0
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation EvaluationCode0
Blank Collapse: Compressing CTC emission for the faster decodingCode0
Coupled Training of Sequence-to-Sequence Models for Accented Speech RecognitionCode0
A Theory of Unsupervised Speech RecognitionCode0
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasksCode0
Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with Dynamic Stream WeightsCode0
Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State TransducersCode0
MLS: A Large-Scale Multilingual Dataset for Speech ResearchCode0
Pansori: ASR Corpus Generation from Open Online Video ContentsCode0
When Is TTS Augmentation Through a Pivot Language Useful?Code0
FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech DataCode0
Analysis of EEG frequency bands for Envisioned Speech RecognitionCode0
AfriHuBERT: A self-supervised speech representation model for African languagesCode0
Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language UnderstandingCode0
Streaming Sequence Transduction through Dynamic CompressionCode0
Textless Speech-to-Speech Translation With Limited Parallel DataCode0
Audio Segmentation for Robust Real-Time Speech Recognition Based on Neural NetworksCode0
Towards End-to-End Speech Recognition with Deep Convolutional Neural NetworksCode0
Towards End-to-End Training of Automatic Speech Recognition for Nigerian PidginCode0
Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal ModelsCode0
Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR EvaluationCode0
A Dataset for Speech Emotion Recognition in Greek Theatrical PlaysCode0
A Comparative Study on Transformer vs RNN in Speech ApplicationsCode0
Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech RecognitionCode0
Momentum Pseudo-Labeling for Semi-Supervised Speech RecognitionCode0
Continual Learning for Monolingual End-to-End Automatic Speech RecognitionCode0
EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based DecodingCode0
Audio Adversarial Examples: Targeted Attacks on Speech-to-TextCode0
A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality ConversionCode0
Exploring Generative Error Correction for Dysarthric Speech RecognitionCode0
Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical StudyCode0
Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event LocalizationCode0
Robust Unstructured Knowledge Access in Conversational Dialogue with ASR ErrorsCode0
Explainability of Speech Recognition Transformers via Gradient-based Attention VisualizationCode0
Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource LanguagesCode0
An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer LearningCode0
Whose Emotion Matters? Speaking Activity Localisation without Prior KnowledgeCode0
Big model only for hard audios: Sample dependent Whisper model selection for efficient inferencesCode0
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective LearningCode0
Analyzing the impact of speaker localization errors on speech separation for automatic speech recognitionCode0
Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification TasksCode0
Evaluating Variants of wav2vec 2.0 on Affective Vocal Burst TasksCode0
Show:102550
← PrevPage 63 of 64Next →

No leaderboard results yet.