SOTAVerified

Speech-to-Text

Papers

Showing 251300 of 403 papers

TitleStatusHype
Transferable speech-to-text large language model alignment module0
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces0
Unsupervised Data Validation Methods for Efficient Model Training0
Unveiling the Role of Pretraining in Direct Speech Translation0
Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition0
Using of heterogeneous corpora for training of an ASR system0
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation0
Visual Features for Context-Aware Speech Recognition0
Voice based self help System: User Experience Vs Accuracy0
VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications0
WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment0
wav2vec and its current potential to Automatic Speech Recognition in German for the usage in Digital History: A comparative assessment of available ASR-technologies for the use in cultural heritage contexts0
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition0
WER-BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm0
What shall we do with an hour of data? Speech recognition for the un- and under-served languages of Common Voice0
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation0
Which French speech recognition system for assistant robots?0
Whisper Finetuning on Nepali Language0
Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages0
With One Voice: Composing a Travel Voice Assistant from Re-purposed Models0
Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering0
XTREME-S: Evaluating Cross-lingual Speech Representations0
NUVA: A Naming Utterance Verifier for Aphasia Treatment0
A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect0
A Case Study on Filtering for End-to-End Speech Translation0
A combined approach to the analysis of speech conversations in a contact center domain0
A Comparative Study on End-to-end Speech to Text Translation0
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation0
Acquisition of high-quality images for camera calibration in robotics applications via speech prompts0
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation0
A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research0
Adversarial Attacks against Neural Networks in Audio Domain: Exploiting Principal Components0
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR0
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks0
A.I. based Embedded Speech to Text Using Deepspeech0
AI-Based IVR0
AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments0
Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems0
A low latency ASR-free end to end spoken language understanding system0
Analyzing ASR pretraining for low-resource speech-to-text translation0
Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions0
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments0
An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting0
Application-Agnostic Language Modeling for On-Device ASR0
Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization0
A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking0
A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems0
A Survey on Speech Large Language Models0
A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)0
Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems0
Show:102550
← PrevPage 6 of 9Next →

No leaderboard results yet.