SOTAVerified

Speech-to-Text

Papers

Showing 251275 of 403 papers

TitleStatusHype
VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications0
WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment0
wav2vec and its current potential to Automatic Speech Recognition in German for the usage in Digital History: A comparative assessment of available ASR-technologies for the use in cultural heritage contexts0
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition0
WER-BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm0
What shall we do with an hour of data? Speech recognition for the un- and under-served languages of Common Voice0
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation0
Which French speech recognition system for assistant robots?0
Whisper Finetuning on Nepali Language0
Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages0
With One Voice: Composing a Travel Voice Assistant from Re-purposed Models0
Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering0
XTREME-S: Evaluating Cross-lingual Speech Representations0
Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation0
Learnings from Technological Interventions in a Low Resource Language: A Case-Study on Gondi0
Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D0
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation0
LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization0
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect0
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization0
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation0
Low-Resource Speech-to-Text Translation0
M3ST: Mix at Three Levels for Speech Translation0
MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation0
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction0
Show:102550
← PrevPage 11 of 17Next →

No leaderboard results yet.