SOTAVerified

Speech-to-Text

Papers

Showing 251300 of 403 papers

TitleStatusHype
Using of heterogeneous corpora for training of an ASR system0
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation0
Visual Features for Context-Aware Speech Recognition0
Voice based self help System: User Experience Vs Accuracy0
VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications0
WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment0
wav2vec and its current potential to Automatic Speech Recognition in German for the usage in Digital History: A comparative assessment of available ASR-technologies for the use in cultural heritage contexts0
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition0
WER-BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm0
What shall we do with an hour of data? Speech recognition for the un- and under-served languages of Common Voice0
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation0
Which French speech recognition system for assistant robots?0
Whisper Finetuning on Nepali Language0
Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages0
With One Voice: Composing a Travel Voice Assistant from Re-purposed Models0
Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering0
XTREME-S: Evaluating Cross-lingual Speech Representations0
Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers0
Language Model Augmented Monotonic Attention for Simultaneous Translation0
LASER: Attention with Exponential Transformation0
LAST: Language Model Aware Speech Tokenization0
Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation0
Learnings from Technological Interventions in a Low Resource Language: A Case-Study on Gondi0
Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D0
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation0
LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization0
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect0
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization0
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation0
Low-Resource Speech-to-Text Translation0
M3ST: Mix at Three Levels for Speech Translation0
MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation0
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction0
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer0
Multi-Discriminator Sobolev Defense-GAN Against Adversarial Attacks for End-to-End Speech Systems0
Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search0
Multilingual Speech Translation from Efficient Finetuning of Pretrained Models0
Multi-teacher Distillation for Multilingual Spelling Correction0
NAIST Simultaneous Speech-to-Text Translation System for IWSLT 20220
NAIST Simultaneous Speech Translation System for IWSLT 20240
Named Entity Detection and Injection for Direct Speech Translation0
Named Entity Recognition for Address Extraction in Speech-to-Text Transcriptions Using Synthetic Data0
Natural Language Interactions in Autonomous Vehicles: Intent Detection and Slot Filling from Passenger Utterances0
Natural Language Robot Programming: NLP integrated with autonomous robotic grasping0
NaturalTurn: A Method to Segment Transcripts into Naturalistic Conversational Turns0
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts0
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision0
N-gram Boosting: Improving Contextual Biasing with Normalized N-gram Targets0
Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction0
Numerically Grounded Language Models for Semantic Error Correction0
Show:102550
← PrevPage 6 of 9Next →

No leaderboard results yet.