SOTAVerified

Speech-to-Text

Papers

Showing 125 of 403 papers

TitleStatusHype
PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitCode6
High-Fidelity Simultaneous Speech-To-Speech TranslationCode5
OSUM: Advancing Open Speech Understanding Models with Limited Resources in AcademiaCode3
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationCode3
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech TranslationCode2
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
SeamlessM4T: Massively Multilingual & Multimodal Machine TranslationCode2
SONAR: Sentence-Level Multimodal and Language-Agnostic RepresentationsCode2
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text TranslationCode2
CVSS Corpus and Massively Multilingual Speech-to-Speech TranslationCode2
Speech Model Pre-training for End-to-End Spoken Language UnderstandingCode2
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box FrameworkCode1
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented GenerationCode1
MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONSCode1
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming CapabilitiesCode1
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher LearningCode1
Fine-tuning Whisper on Low-Resource Languages for Real-World ApplicationsCode1
STTATTS: Unified Speech-To-Text And Text-To-Speech ModelCode1
Denial-of-Service Poisoning Attacks against Large Language ModelsCode1
OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational AgentsCode1
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language ModelsCode1
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income CommunitiesCode1
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMsCode1
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNetCode1
Revisiting Interpolation Augmentation for Speech-to-Text GenerationCode1
Show:102550
← PrevPage 1 of 17Next →

No leaderboard results yet.