| Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution | Sep 27, 2023 | Machine TranslationManagement | —Unverified | 0 |
| Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing | Sep 27, 2023 | DecoderMachine Translation | —Unverified | 0 |
| Deepfake audio as a data augmentation technique for training automatic speech to text transcription models | Sep 22, 2023 | Data AugmentationFace Swapping | —Unverified | 0 |
| SpeechAlign: a Framework for Speech Translation Alignment Evaluation | Sep 20, 2023 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders | Sep 14, 2023 | Contrastive LearningKnowledge Distillation | —Unverified | 0 |
| PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection | Sep 13, 2023 | Adversarial AttackSpeech-to-Text | —Unverified | 0 |
| An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation | Aug 28, 2023 | Machine TranslationNMT | CodeCode Available | 0 |
| N-gram Boosting: Improving Contextual Biasing with Normalized N-gram Targets | Aug 4, 2023 | Speech-to-Text | —Unverified | 0 |
| Let's Give a Voice to Conversational Agents in Virtual Reality | Aug 4, 2023 | Speech-to-Texttext-to-speech | CodeCode Available | 0 |
| Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN | Jul 24, 2023 | Automatic Speech RecognitionSentiment Analysis | CodeCode Available | 0 |
| A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion | Jul 21, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Improving RNN-Transducers with Acoustic LookAhead | Jul 11, 2023 | HallucinationSpeech-to-Text | —Unverified | 0 |
| On decoder-only architecture for speech-to-text and large language model integration | Jul 8, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M | Jul 6, 2023 | Speech-to-Text | —Unverified | 0 |
| Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture | Jul 5, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| AudioPaLM: A Large Language Model That Can Speak and Listen | Jun 22, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Recent Advances in Direct Speech-to-text Translation | Jun 20, 2023 | Data AugmentationDecoder | —Unverified | 0 |
| Open Brain AI. Automatic Language Assessment | Jun 11, 2023 | Speech-to-Text | —Unverified | 0 |
| Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding | Jun 8, 2023 | dialog state trackingLanguage Modeling | —Unverified | 0 |
| Towards End-to-end Speech-to-text Summarization | Jun 6, 2023 | Abstractive Text SummarizationSpeech-to-Text | CodeCode Available | 0 |
| Improved Cross-Lingual Transfer Learning For Automatic Speech Translation | Jun 1, 2023 | automatic-speech-translationCross-Lingual Transfer | —Unverified | 0 |
| Strategies for improving low resource speech to text translation relying on pre-trained ASR models | May 31, 2023 | Automatic Speech RecognitionDecoder | —Unverified | 0 |
| STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions | May 30, 2023 | AllAutomatic Speech Recognition | —Unverified | 0 |
| CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training | May 27, 2023 | intent-classificationIntent Classification | —Unverified | 0 |
| VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation | May 25, 2023 | DecoderLanguage Modeling | —Unverified | 0 |