| Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Jun 11, 2024 | Contrastive LearningSpeech Synthesis | —Unverified | 0 |
| A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation | Jun 11, 2024 | DecoderSimultaneous Speech-to-Speech Translation | CodeCode Available | 2 |
| Synthetic Query Generation using Large Language Models for Virtual Assistants | Jun 10, 2024 | Information Retrievalspeech-recognition | —Unverified | 0 |
| StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection | Jun 10, 2024 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications | May 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions | May 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Semantic MIMO Systems for Speech-to-Text Transmission | May 13, 2024 | Semantic CommunicationSpeech-to-Text | —Unverified | 0 |
| A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection) | May 2, 2024 | Acoustic Scene ClassificationEvent Detection | —Unverified | 0 |
| Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair | Apr 18, 2024 | Machine TranslationSpeech-to-Text | CodeCode Available | 0 |
| NaturalTurn: A Method to Segment Transcripts into Naturalistic Conversational Turns | Mar 22, 2024 | Speech-to-Text | —Unverified | 0 |