SOTAVerified

Speech-to-Text

Papers

Showing 125 of 403 papers

TitleStatusHype
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments0
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization0
End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data0
I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs0
S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation0
Advancing STT for Low-Resource Real-World Speech0
Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios0
Improving Language and Modality Transfer in Translation by Character-level Modeling0
BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation SystemCode0
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model ConvergenceCode0
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box FrameworkCode1
Conversational Recommendation System using NLP and Sentiment Analysis0
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented GenerationCode1
MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONSCode1
Acquisition of high-quality images for camera calibration in robotics applications via speech prompts0
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect0
Transformer-Based Named Entity Recognition for Automated Server ProvisioningCode0
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit0
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation0
Focusing Robot Open-Ended Reinforcement Learning Through Users' Purposes0
Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale0
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision0
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM0
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation0
Measuring the Effect of Transcription Noise on Downstream Language Understanding TasksCode0
Show:102550
← PrevPage 1 of 17Next →

No leaderboard results yet.