SOTAVerified

Speech-to-Text

Papers

Showing 150 of 403 papers

TitleStatusHype
PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitCode6
High-Fidelity Simultaneous Speech-To-Speech TranslationCode5
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationCode3
OSUM: Advancing Open Speech Understanding Models with Limited Resources in AcademiaCode3
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech TranslationCode2
SONAR: Sentence-Level Multimodal and Language-Agnostic RepresentationsCode2
Speech Model Pre-training for End-to-End Spoken Language UnderstandingCode2
CVSS Corpus and Massively Multilingual Speech-to-Speech TranslationCode2
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text TranslationCode2
SeamlessM4T: Massively Multilingual & Multimodal Machine TranslationCode2
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
PSST! Prosodic Speech Segmentation with TransformersCode1
Pre-training for Speech Translation: CTC Meets Optimal TransportCode1
MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONSCode1
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language ModelsCode1
One TTS Alignment To Rule Them AllCode1
Pushing the Limits of Zero-shot End-to-End Speech TranslationCode1
IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine TranslationCode1
Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech TranslationCode1
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented GenerationCode1
Information-Transport-based Policy for Simultaneous TranslationCode1
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned ProportionsCode1
Learning Shared Semantic Space for Speech-to-Text TranslationCode1
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech TranslationCode1
Fine-tuning Whisper on Low-Resource Languages for Real-World ApplicationsCode1
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMTCode1
OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational AgentsCode1
Denial-of-Service Poisoning Attacks against Large Language ModelsCode1
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMsCode1
Cross Attention Augmented Transducer Networks for Simultaneous TranslationCode1
DUB: Discrete Unit Back-translation for Speech TranslationCode1
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text TranslationCode1
Common Voice: A Massively-Multilingual Speech CorpusCode1
CoVoST 2 and Massively Multilingual Speech-to-Text TranslationCode1
Cross-modal Contrastive Learning for Speech TranslationCode1
A Large-Scale Chinese Multimodal NER Dataset with Speech CluesCode1
Deep Reinforcement Learning For Sequence to Sequence ModelsCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
EdiTTS: Score-based Editing for Controllable Text-to-SpeechCode1
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box FrameworkCode1
End-to-end Speech Translation via Cross-modal Progressive TrainingCode1
FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural NetworksCode1
Back Translation for Speech-to-text Translation Without TranscriptsCode1
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income CommunitiesCode1
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNetCode1
Late reverberation suppression using U-netsCode1
A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingCode1
A Whisper transformer for audio captioning trained with synthetic captions and transfer learningCode1
Clotho: An Audio Captioning DatasetCode1
Show:102550
← PrevPage 1 of 9Next →

No leaderboard results yet.