| An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments | Jul 14, 2025 | Speech-to-Texttext-to-speech | —Unverified | 0 |
| LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization | Jun 20, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data | Jun 19, 2025 | SentenceSpeech-to-Text | —Unverified | 0 |
| I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs | Jun 17, 2025 | 3D visual groundingContrastive Learning | —Unverified | 0 |
| S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation | Jun 11, 2025 | Reading ComprehensionSpeech Synthesis | —Unverified | 0 |
| Advancing STT for Low-Resource Real-World Speech | Jun 10, 2025 | SentenceSpeech-to-Text | —Unverified | 0 |
| Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios | May 30, 2025 | Cross-Lingual TransferPhoneme Recognition | —Unverified | 0 |
| Improving Language and Modality Transfer in Translation by Character-level Modeling | May 30, 2025 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation System | May 29, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence | May 29, 2025 | Speech-to-Text | CodeCode Available | 0 |
| Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework | May 24, 2025 | Adversarial AttackSpeech Tokenization | CodeCode Available | 1 |
| Conversational Recommendation System using NLP and Sentiment Analysis | May 17, 2025 | Conversational RecommendationDynamic Time Warping | —Unverified | 0 |
| Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation | Apr 27, 2025 | RAGRetrieval | CodeCode Available | 1 |
| MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONS | Apr 25, 2025 | Clinical Language TranslationMachine Translation | CodeCode Available | 1 |
| Acquisition of high-quality images for camera calibration in robotics applications via speech prompts | Apr 15, 2025 | Camera CalibrationSpeech-to-Text | —Unverified | 0 |
| LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect | Apr 3, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Transformer-Based Named Entity Recognition for Automated Server Provisioning | Apr 1, 2025 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 0 |
| Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit | Mar 26, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation | Mar 18, 2025 | DecoderSpeech-to-Text | —Unverified | 0 |
| Focusing Robot Open-Ended Reinforcement Learning Through Users' Purposes | Mar 16, 2025 | Large Language Modelreinforcement-learning | —Unverified | 0 |
| Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale | Feb 27, 2025 | AI AgentLarge Language Model | —Unverified | 0 |
| Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision | Feb 26, 2025 | Audio SynthesisAutomatic Speech Recognition | —Unverified | 0 |
| Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM | Feb 24, 2025 | Automatic Speech RecognitionLanguage Modeling | —Unverified | 0 |
| Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation | Feb 24, 2025 | Automatic Speech RecognitionDiversity | —Unverified | 0 |
| Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks | Feb 19, 2025 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 0 |
| DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities | Feb 16, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| SparQLe: Speech Queries to Text Translation Through LLMs | Feb 13, 2025 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| Speech to Speech Translation with Translatotron: A State of the Art Review | Feb 9, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| High-Fidelity Simultaneous Speech-To-Speech Translation | Feb 5, 2025 | DecoderSimultaneous Speech-to-Speech Translation | CodeCode Available | 5 |
| When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation | Feb 1, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia | Jan 23, 2025 | Emotion RecognitionEvent Detection | CodeCode Available | 3 |
| WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning | Jan 15, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| MinMo: A Multimodal Large Language Model for Seamless Voice Interaction | Jan 10, 2025 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Jan 10, 2025 | Automatic Speech RecognitionClassification | CodeCode Available | 0 |
| Existential Crisis: A Social Robot's Reason for Being | Jan 6, 2025 | Speech-to-Text | —Unverified | 0 |
| Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison | Jan 4, 2025 | DecoderKnowledge Distillation | —Unverified | 0 |
| Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages | Dec 31, 2024 | Automatic Speech RecognitionData Augmentation | —Unverified | 0 |
| How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System? | Dec 24, 2024 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| Fine-tuning Whisper on Low-Resource Languages for Real-World Applications | Dec 20, 2024 | FormSentence | CodeCode Available | 1 |
| Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations Generation | Dec 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Representation Purification for End-to-End Speech Translation | Dec 5, 2024 | Machine TranslationRhythm | —Unverified | 0 |
| Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D | Nov 19, 2024 | Speech-to-Texttext-to-speech | —Unverified | 0 |
| Whisper Finetuning on Nepali Language | Nov 19, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages | Nov 11, 2024 | DecoderMachine Translation | —Unverified | 0 |
| NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts | Nov 8, 2024 | Mixture-of-ExpertsOptical Character Recognition (OCR) | —Unverified | 0 |
| CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR | Nov 7, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| LASER: Attention with Exponential Transformation | Nov 5, 2024 | Speech-to-Text | —Unverified | 0 |
| SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation | Nov 3, 2024 | speech-recognitionSpeech Recognition | CodeCode Available | 0 |
| Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody? | Oct 31, 2024 | Rhythmspeech-recognition | —Unverified | 0 |
| Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization | Oct 29, 2024 | GPURetrieval | —Unverified | 0 |