| Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D | Nov 19, 2024 | Speech-to-Texttext-to-speech | —Unverified | 0 |
| Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages | Nov 11, 2024 | DecoderMachine Translation | —Unverified | 0 |
| NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts | Nov 8, 2024 | Mixture-of-ExpertsOptical Character Recognition (OCR) | —Unverified | 0 |
| CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR | Nov 7, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| LASER: Attention with Exponential Transformation | Nov 5, 2024 | Speech-to-Text | —Unverified | 0 |
| SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation | Nov 3, 2024 | speech-recognitionSpeech Recognition | CodeCode Available | 0 |
| Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody? | Oct 31, 2024 | Rhythmspeech-recognition | —Unverified | 0 |
| Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization | Oct 29, 2024 | GPURetrieval | —Unverified | 0 |
| Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model | Oct 24, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| A Survey on Speech Large Language Models | Oct 24, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum | Oct 18, 2024 | Speech-to-Text | —Unverified | 0 |
| Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck | Oct 15, 2024 | Speech-to-Text | —Unverified | 0 |
| Unsupervised Data Validation Methods for Efficient Model Training | Oct 10, 2024 | Data Augmentationmodel | —Unverified | 0 |
| Transducer Consistency Regularization for Speech to Text Applications | Oct 9, 2024 | Model OptimizationSpeech-to-Text | —Unverified | 0 |
| Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems | Oct 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Unveiling the Role of Pretraining in Direct Speech Translation | Sep 26, 2024 | Automatic Speech RecognitionDecoder | —Unverified | 0 |
| How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not | Sep 25, 2024 | Automatic Speech Recognitionspeech-recognition | —Unverified | 0 |
| On the Feasibility of Fully AI-automated Vishing Attacks | Sep 20, 2024 | Large Language ModelSpeech-to-Text | —Unverified | 0 |
| Toward Automated Clinical Transcriptions | Sep 20, 2024 | Speech-to-Text | —Unverified | 0 |
| Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text | Sep 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach | Sep 13, 2024 | In-Context LearningRetrieval | CodeCode Available | 0 |
| Evaluation of real-time transcriptions using end-to-end ASR models | Sep 9, 2024 | Action DetectionActivity Detection | —Unverified | 0 |
| LAST: Language Model Aware Speech Tokenization | Sep 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AI-Based IVR | Aug 20, 2024 | Speech SynthesisSpeech-to-Text | —Unverified | 0 |
| CMU's IWSLT 2024 Simultaneous Speech Translation System | Aug 14, 2024 | DecoderSpeech-to-Text | —Unverified | 0 |
| CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units | Jul 19, 2024 | Machine TranslationSpeech-to-Text | CodeCode Available | 0 |
| AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments | Jul 12, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks | Jul 10, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models | Jul 9, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 |
| Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation | Jul 4, 2024 | Machine Translationspeech-recognition | —Unverified | 0 |
| Investigating Decoder-only Large Language Models for Speech-to-text Translation | Jul 3, 2024 | Decoderparameter-efficient fine-tuning | —Unverified | 0 |
| Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders | Jul 2, 2024 | Clusteringspeaker-diarization | —Unverified | 0 |
| NAIST Simultaneous Speech Translation System for IWSLT 2024 | Jun 30, 2024 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| Calibrated SVM for Probabilistic Classification of In-Vehicle Voices into Vehicle Commands via Voice-to-Text LLM Transformation | Jun 28, 2024 | Speech-to-Texttext-classification | CodeCode Available | 0 |
| Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects | Jun 27, 2024 | Automatic Speech RecognitionMachine Translation | CodeCode Available | 0 |
| SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation | Jun 20, 2024 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| Transferable speech-to-text large language model alignment module | Jun 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving | Jun 16, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models | Jun 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Jun 11, 2024 | Contrastive LearningSpeech Synthesis | —Unverified | 0 |
| Synthetic Query Generation using Large Language Models for Virtual Assistants | Jun 10, 2024 | Information Retrievalspeech-recognition | —Unverified | 0 |
| StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection | Jun 10, 2024 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications | May 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Semantic MIMO Systems for Speech-to-Text Transmission | May 13, 2024 | Semantic CommunicationSpeech-to-Text | —Unverified | 0 |
| A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection) | May 2, 2024 | Acoustic Scene ClassificationEvent Detection | —Unverified | 0 |
| Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair | Apr 18, 2024 | Machine TranslationSpeech-to-Text | CodeCode Available | 0 |
| NaturalTurn: A Method to Segment Transcripts into Naturalistic Conversational Turns | Mar 22, 2024 | Speech-to-Text | —Unverified | 0 |
| Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking | Mar 13, 2024 | Chinese Spell CheckingIn-Context Learning | —Unverified | 0 |
| Robust Semantic Communications for Speech Transmission | Mar 8, 2024 | Generative Adversarial NetworkSemantic Communication | —Unverified | 0 |
| Compact Speech Translation Models via Discrete Speech Units Pretraining | Feb 29, 2024 | DecoderSelf-Supervised Learning | —Unverified | 0 |