SOTAVerified

Speech-to-Text

Papers

Showing 51100 of 403 papers

TitleStatusHype
OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational AgentsCode1
Speech Emotion Recognition with Multi-Task LearningCode1
EdiTTS: Score-based Editing for Controllable Text-to-SpeechCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
Stacked DeBERT: All Attention in Incomplete Data for Text ClassificationCode1
Pre-training for Speech Translation: CTC Meets Optimal TransportCode1
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
PSST! Prosodic Speech Segmentation with TransformersCode1
DUB: Discrete Unit Back-translation for Speech TranslationCode1
End-to-end Speech Translation via Cross-modal Progressive TrainingCode1
Deep Reinforcement Learning For Sequence to Sequence ModelsCode1
Revisiting Interpolation Augmentation for Speech-to-Text GenerationCode1
A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingCode1
Denial-of-Service Poisoning Attacks against Large Language ModelsCode1
STEMM: Self-learning with Speech-text Manifold Mixup for Speech TranslationCode1
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech TranslationCode1
Careless Whisper: Speech-to-Text Hallucination HarmsCode0
Re-Translation Strategies For Long Form, Simultaneous, Spoken Language TranslationCode0
Calibrated SVM for Probabilistic Classification of In-Vehicle Voices into Vehicle Commands via Voice-to-Text LLM TransformationCode0
Revisiting End-to-End Speech-to-Text Translation From ScratchCode0
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and TranslationCode0
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker PrivacyCode0
Pre-training on high-resource speech recognition improves low-resource speech-to-text translationCode0
Scribosermo: Fast Speech-to-Text models for German and other LanguagesCode0
BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation SystemCode0
Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration ApproachCode0
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language IdentificationCode0
An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text TranslationCode0
A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architectureCode0
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text TranslationCode0
Measuring the Effect of Transcription Noise on Downstream Language Understanding TasksCode0
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech RecognitionCode0
Automatic Quality Assessment for Speech Translation Using Joint ASR and MT FeaturesCode0
Let's Give a Voice to Conversational Agents in Virtual RealityCode0
Kurdish (Sorani) Speech to Text: Presenting an Experimental DatasetCode0
CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation UnitsCode0
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation EvaluationCode0
LibriS2S: A German-English Speech-to-Speech Translation CorpusCode0
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak SupervisionCode0
A Dataset for Speech Emotion Recognition in Greek Theatrical PlaysCode0
InstaIndoor and Multi-modal Deep Learning for Indoor Scene RecognitionCode0
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task LearningCode0
A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality ConversionCode0
Contextualized Translation of Automatically Segmented SpeechCode0
Audio Adversarial Examples: Targeted Attacks on Speech-to-TextCode0
Infusing Future Information into Monotonic Attention Through Language ModelsCode0
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language ModelsCode0
Attentively Embracing Noise for Robust Latent Representation in BERTCode0
Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text modelsCode0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Show:102550
← PrevPage 2 of 9Next →

No leaderboard results yet.