SOTAVerified

Automatic Speech Recognition

Papers

Showing 701750 of 3174 papers

TitleStatusHype
DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement0
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation0
Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning0
Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance0
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing0
Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text0
Preserving spoken content in voice anonymisation with character-level vocoder conditioningCode0
HydraFormer: One Encoder For All Subsampling RatesCode0
MathBridge: A Large Corpus Dataset for Translating Spoken Mathematical Expressions into LaTeX Formulas for Improved Readability0
ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval0
Self-Supervised Learning for Multi-Channel Neural Transducer0
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion0
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data0
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation0
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition0
Towards interfacing large language models with ASR systems using confidence measures and prompting0
Leveraging Self-Supervised Models for Automatic Whispered Speech RecognitionCode0
Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses0
Scaling A Simple Approach to Zero-Shot Speech Recognition0
On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures0
Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions0
A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives0
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization0
Quantifying the Role of Textual Predictability in Automatic Speech Recognition0
Trading Devil Final: Backdoor attack via Stock market and Bayesian Optimization0
Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance0
Handling Numeric Expressions in Automatic Speech Recognition0
Robust ASR Error Correction with Conservative Data Filtering0
A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR0
Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training0
Morphosyntactic Analysis for CHILDES0
Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models0
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation0
Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data0
Textless Dependency Parsing by Labeled Sequence PredictionCode0
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation0
Text-Based Detection of On-Hold Scripts in Contact Center CallsCode0
HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing0
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation0
Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation0
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition0
Written Term Detection Improves Spoken Term DetectionCode0
LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech0
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect0
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models0
Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models0
Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter0
Romanization Encoding For Multilingual ASR0
Multi-Convformer: Extending Conformer with Multiple Convolution Kernels0
Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis0
Show:102550
← PrevPage 15 of 64Next →

No leaderboard results yet.