SOTAVerified

Speech-to-Text

Papers

Showing 51100 of 403 papers

TitleStatusHype
"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text TranslationCode1
PSST! Prosodic Speech Segmentation with TransformersCode1
EdiTTS: Score-based Editing for Controllable Text-to-SpeechCode1
Towards an AI to Win Ghana's National Science and Maths QuizCode1
IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine TranslationCode1
A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingCode1
Denial-of-Service Poisoning Attacks against Large Language ModelsCode1
Consecutive Decoding for Speech-to-text TranslationCode1
Stacked DeBERT: All Attention in Incomplete Data for Text ClassificationCode1
Common Voice: A Massively-Multilingual Speech CorpusCode1
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMsCode1
Towards Automatic Speech to Sign Language GenerationCode1
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income CommunitiesCode1
Deep Reinforcement Learning For Sequence to Sequence ModelsCode1
DUB: Discrete Unit Back-translation for Speech TranslationCode1
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech TranslationCode1
Careless Whisper: Speech-to-Text Hallucination HarmsCode0
SparQLe: Speech Queries to Text Translation Through LLMsCode0
Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language PairCode0
Calibrated SVM for Probabilistic Classification of In-Vehicle Voices into Vehicle Commands via Voice-to-Text LLM TransformationCode0
Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning EnvironmentsCode0
Re-Translation Strategies For Long Form, Simultaneous, Spoken Language TranslationCode0
Scribosermo: Fast Speech-to-Text models for German and other LanguagesCode0
BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation SystemCode0
Pre-training on high-resource speech recognition improves low-resource speech-to-text translationCode0
An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text TranslationCode0
A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architectureCode0
Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration ApproachCode0
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-trainingCode0
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language ModelsCode0
Let's Give a Voice to Conversational Agents in Virtual RealityCode0
Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text TranslationCode0
A Dataset for Speech Emotion Recognition in Greek Theatrical PlaysCode0
CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation UnitsCode0
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation EvaluationCode0
Kurdish (Sorani) Speech to Text: Presenting an Experimental DatasetCode0
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task LearningCode0
InstaIndoor and Multi-modal Deep Learning for Indoor Scene RecognitionCode0
A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality ConversionCode0
Audio Adversarial Examples: Targeted Attacks on Speech-to-TextCode0
Automatic Quality Assessment for Speech Translation Using Joint ASR and MT FeaturesCode0
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak SupervisionCode0
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text TranslationCode0
LibriS2S: A German-English Speech-to-Speech Translation CorpusCode0
Attentively Embracing Noise for Robust Latent Representation in BERTCode0
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the WildCode0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations GenerationCode0
fairseq S2T: Fast Speech-to-Text Modeling with fairseqCode0
Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNNCode0
Show:102550
← PrevPage 2 of 9Next →

No leaderboard results yet.