SOTAVerified

Speech-to-Text

Papers

Showing 351400 of 403 papers

TitleStatusHype
Let's Give a Voice to Conversational Agents in Virtual RealityCode0
Transformer-Based Named Entity Recognition for Automated Server ProvisioningCode0
End to End ASR System with Automatic Punctuation InsertionCode0
Pre-training on high-resource speech recognition improves low-resource speech-to-text translationCode0
LibriS2S: A German-English Speech-to-Speech Translation CorpusCode0
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker PrivacyCode0
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language ModelsCode0
Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text TranslationCode0
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the WildCode0
Efficient Speech Translation with Dynamic Latent PerceiversCode0
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model ConvergenceCode0
Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text TranslationCode0
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation EvaluationCode0
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text TranslationCode0
An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text TranslationCode0
Measuring the Effect of Transcription Noise on Downstream Language Understanding TasksCode0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Direct speech-to-speech translation with a sequence-to-sequence modelCode0
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech RecognitionCode0
Tools and resources for Romanian text-to-speech and speech-to-text applicationsCode0
Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text modelsCode0
Re-Translation Strategies For Long Form, Simultaneous, Spoken Language TranslationCode0
Revisiting End-to-End Speech-to-Text Translation From ScratchCode0
mask-Net: Learning Context Aware Invariant Features using Adversarial Forgetting (Student Abstract)Code0
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning AbstractionsCode0
Voices Unheard: NLP Resources and Models for Yorùbá Regional DialectsCode0
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-trainingCode0
SPES: Spectrogram Perturbation for Explainable Speech-to-Text GenerationCode0
SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognitionCode0
A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality ConversionCode0
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and TranslationCode0
BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation SystemCode0
Scribosermo: Fast Speech-to-Text models for German and other LanguagesCode0
CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation UnitsCode0
Contextualized Translation of Automatically Segmented SpeechCode0
A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architectureCode0
Audio Adversarial Examples: Targeted Attacks on Speech-to-TextCode0
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History SelectionCode0
Streaming Sequence Transduction through Dynamic CompressionCode0
Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNNCode0
fairseq S2T: Fast Speech-to-Text Modeling with fairseqCode0
Attentively Embracing Noise for Robust Latent Representation in BERTCode0
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech TranslationCode0
Automatic Quality Assessment for Speech Translation Using Joint ASR and MT FeaturesCode0
Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language PairCode0
Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning EnvironmentsCode0
ESPnet-ST-v2: Multipurpose Spoken Language Translation ToolkitCode0
SparQLe: Speech Queries to Text Translation Through LLMsCode0
Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration ApproachCode0
End-to-End Learning of Speech 2D Feature-Trajectory for Prosthetic HandsCode0
Show:102550
← PrevPage 8 of 9Next →

No leaderboard results yet.