| Advancing STT for Low-Resource Real-World Speech | Jun 10, 2025 | SentenceSpeech-to-Text | —Unverified | 0 |
| OAVA: the open audio-visual archives aggregator | Dec 16, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| On decoder-only architecture for speech-to-text and large language model integration | Jul 8, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture | Jul 5, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| On the Design of Strategic Task Recommendations for Sustainable Crowdsourcing-Based Content Moderation | Jun 4, 2021 | Recommendation SystemsSpeech-to-Text | —Unverified | 0 |
| On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models | Jun 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| On the Feasibility of Fully AI-automated Vishing Attacks | Sep 20, 2024 | Large Language ModelSpeech-to-Text | —Unverified | 0 |
| ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020 | May 24, 2020 | Data AugmentationDecoder | —Unverified | 0 |
| Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility | Feb 5, 2022 | Speech EnhancementSpeech-to-Text | —Unverified | 0 |
| PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction | Feb 10, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling | Jun 21, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks | Oct 21, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M | Jul 6, 2023 | Speech-to-Text | —Unverified | 0 |
| PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection | Sep 13, 2023 | Adversarial AttackSpeech-to-Text | —Unverified | 0 |
| Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili | Oct 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Polish Read Speech Corpus for Speech Tools and Services | Jun 1, 2017 | Action DetectionActivity Detection | —Unverified | 0 |
| Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison | Jan 4, 2025 | DecoderKnowledge Distillation | —Unverified | 0 |
| Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases | Feb 1, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Punctuation restoration in Swedish through fine-tuned KB-BERT | Feb 14, 2022 | Language ModellingPunctuation Restoration | —Unverified | 0 |
| Pushing the performances of ASR models on English and Spanish accents | Dec 22, 2022 | Speech-to-Text | —Unverified | 0 |
| Recent Advances in Direct Speech-to-text Translation | Jun 20, 2023 | Data AugmentationDecoder | —Unverified | 0 |
| Representation Purification for End-to-End Speech Translation | Dec 5, 2024 | Machine TranslationRhythm | —Unverified | 0 |
| Revisiting the Entropy Semiring for Neural Speech Recognition | Dec 13, 2023 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking | Mar 13, 2024 | Chinese Spell CheckingIn-Context Learning | —Unverified | 0 |
| Robust Semantic Communications for Speech Transmission | Mar 8, 2024 | Generative Adversarial NetworkSemantic Communication | —Unverified | 0 |
| Role of Intonation in Scoring Spoken English | Aug 23, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-Text Adversarial Attacks | Jul 14, 2022 | Speech-to-Text | —Unverified | 0 |
| S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation | Jun 11, 2025 | Reading ComprehensionSpeech Synthesis | —Unverified | 0 |
| SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation | May 17, 2022 | Representation LearningRetrieval | —Unverified | 0 |
| Self-Supervised Representations Improve End-to-End Speech Translation | Jun 22, 2020 | Cross-Lingual Transferspeech-recognition | —Unverified | 0 |
| Semantic-aware Speech to Text Transmission with Redundancy Removal | Feb 7, 2022 | Semantic CommunicationSpeech-to-Text | —Unverified | 0 |
| Semantic MIMO Systems for Speech-to-Text Transmission | May 13, 2024 | Semantic CommunicationSpeech-to-Text | —Unverified | 0 |
| Semantic-preserved Communication System for Highly Efficient Speech Transmission | May 25, 2022 | Semantic Communicationspeech-recognition | —Unverified | 0 |
| Simple and Effective Unsupervised Speech Translation | Oct 18, 2022 | Domain AdaptationMachine Translation | —Unverified | 0 |
| SimulSpeech: End-to-End Simultaneous Speech to Text Translation | Jul 1, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Speak2Label: Using Domain Knowledge for Creating a Large Scale Driver Gaze Zone Estimation Dataset | Apr 13, 2020 | Gaze PredictionSpeech-to-Text | —Unverified | 0 |
| Speaker Independent Continuous Speech to Text Converter for Mobile Application | Jul 19, 2013 | Action DetectionActivity Detection | —Unverified | 0 |
| Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction | May 8, 2013 | Speech SynthesisSpeech-to-Text | —Unverified | 0 |
| Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning | Sep 21, 2016 | DecoderMulti-Task Learning | CodeCode Available | 0 |
| Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset | Nov 29, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin | Oct 21, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| A Dataset for Speech Emotion Recognition in Greek Theatrical Plays | Mar 27, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| WACO: Word-Aligned Contrastive Learning for Speech Translation | Dec 19, 2022 | Contrastive LearningSpeech-to-Text | CodeCode Available | 0 |
| Careless Whisper: Speech-to-Text Hallucination Harms | Feb 12, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision | Dec 30, 2023 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition | Dec 23, 2021 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Infusing Future Information into Monotonic Attention Through Language Models | Sep 7, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding | Dec 16, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Calibrated SVM for Probabilistic Classification of In-Vehicle Voices into Vehicle Commands via Voice-to-Text LLM Transformation | Jun 28, 2024 | Speech-to-Texttext-classification | CodeCode Available | 0 |
| Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations Generation | Dec 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |