| The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence | May 29, 2025 | Speech-to-Text | CodeCode Available | 0 |
| Conversational Recommendation System using NLP and Sentiment Analysis | May 17, 2025 | Conversational RecommendationDynamic Time Warping | —Unverified | 0 |
| Acquisition of high-quality images for camera calibration in robotics applications via speech prompts | Apr 15, 2025 | Camera CalibrationSpeech-to-Text | —Unverified | 0 |
| LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect | Apr 3, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Transformer-Based Named Entity Recognition for Automated Server Provisioning | Apr 1, 2025 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 0 |
| Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit | Mar 26, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation | Mar 18, 2025 | DecoderSpeech-to-Text | —Unverified | 0 |
| Focusing Robot Open-Ended Reinforcement Learning Through Users' Purposes | Mar 16, 2025 | Large Language Modelreinforcement-learning | —Unverified | 0 |
| Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale | Feb 27, 2025 | AI AgentLarge Language Model | —Unverified | 0 |
| Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision | Feb 26, 2025 | Audio SynthesisAutomatic Speech Recognition | —Unverified | 0 |
| Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM | Feb 24, 2025 | Automatic Speech RecognitionLanguage Modeling | —Unverified | 0 |
| Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation | Feb 24, 2025 | Automatic Speech RecognitionDiversity | —Unverified | 0 |
| Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks | Feb 19, 2025 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 0 |
| SparQLe: Speech Queries to Text Translation Through LLMs | Feb 13, 2025 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| Speech to Speech Translation with Translatotron: A State of the Art Review | Feb 9, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation | Feb 1, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Jan 10, 2025 | Automatic Speech RecognitionClassification | CodeCode Available | 0 |
| MinMo: A Multimodal Large Language Model for Seamless Voice Interaction | Jan 10, 2025 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Existential Crisis: A Social Robot's Reason for Being | Jan 6, 2025 | Speech-to-Text | —Unverified | 0 |
| Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison | Jan 4, 2025 | DecoderKnowledge Distillation | —Unverified | 0 |
| Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages | Dec 31, 2024 | Automatic Speech RecognitionData Augmentation | —Unverified | 0 |
| How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System? | Dec 24, 2024 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations Generation | Dec 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Representation Purification for End-to-End Speech Translation | Dec 5, 2024 | Machine TranslationRhythm | —Unverified | 0 |
| Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D | Nov 19, 2024 | Speech-to-Texttext-to-speech | —Unverified | 0 |