SOTAVerified

Speech-to-Text

Papers

Showing 51100 of 403 papers

TitleStatusHype
Learning Shared Semantic Space for Speech-to-Text TranslationCode1
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income CommunitiesCode1
FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural NetworksCode1
Information-Transport-based Policy for Simultaneous TranslationCode1
Pre-training for Speech Translation: CTC Meets Optimal TransportCode1
End-to-end Speech Translation via Cross-modal Progressive TrainingCode1
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingCode1
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming CapabilitiesCode1
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented GenerationCode1
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMsCode1
Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech TranslationCode1
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNetCode1
DUB: Discrete Unit Back-translation for Speech TranslationCode1
EdiTTS: Score-based Editing for Controllable Text-to-SpeechCode1
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech TranslationCode1
Challenges and Opportunities of Speech Recognition for Bengali Language0
Development of Natural Language Processing Tools for Cook Islands M\=aori0
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?0
Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization0
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks0
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum0
BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text0
Application-Agnostic Language Modeling for On-Device ASR0
Bridging the Modality Gap for Speech-to-Text Translation0
Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models0
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR0
A Comparative Study on End-to-end Speech to Text Translation0
Digits micro-model for accurate and secure transactions0
Adversarial Attacks against Neural Networks in Audio Domain: Exploiting Principal Components0
BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge0
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM0
An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting0
Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network0
Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning0
A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research0
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments0
A Voice Controlled E-Commerce Web Application0
A combined approach to the analysis of speech conversations in a contact center domain0
A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect0
Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution0
Direct Punjabi to English speech translation using discrete units0
Multilingual Speech Translation with Efficient Finetuning of Pretrained Models0
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing0
CTC Alignments Improve Autoregressive Translation0
CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR0
Cross-modal Contrastive Learning for Speech Translation0
DARTS: Dialectal Arabic Transcription System0
Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions0
Crossing the SSH Bridge with Interview Data0
Show:102550
← PrevPage 2 of 9Next →

No leaderboard results yet.