SOTAVerified

Speech-to-Text

Papers

Showing 301350 of 403 papers

TitleStatusHype
Advancing STT for Low-Resource Real-World Speech0
OAVA: the open audio-visual archives aggregator0
On decoder-only architecture for speech-to-text and large language model integration0
Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture0
On the Design of Strategic Task Recommendations for Sustainable Crowdsourcing-Based Content Moderation0
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models0
On the Feasibility of Fully AI-automated Vishing Attacks0
ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 20200
Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility0
PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction0
Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling0
AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks0
Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M0
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection0
Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili0
Polish Read Speech Corpus for Speech Tools and Services0
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison0
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases0
Punctuation restoration in Swedish through fine-tuned KB-BERT0
Pushing the performances of ASR models on English and Spanish accents0
Recent Advances in Direct Speech-to-text Translation0
Representation Purification for End-to-End Speech Translation0
Revisiting the Entropy Semiring for Neural Speech Recognition0
Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking0
Robust Semantic Communications for Speech Transmission0
Role of Intonation in Scoring Spoken English0
RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-Text Adversarial Attacks0
S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation0
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation0
Self-Supervised Representations Improve End-to-End Speech Translation0
Semantic-aware Speech to Text Transmission with Redundancy Removal0
Semantic MIMO Systems for Speech-to-Text Transmission0
Semantic-preserved Communication System for Highly Efficient Speech Transmission0
Simple and Effective Unsupervised Speech Translation0
SimulSpeech: End-to-End Simultaneous Speech to Text Translation0
Speak2Label: Using Domain Knowledge for Creating a Large Scale Driver Gaze Zone Estimation Dataset0
Speaker Independent Continuous Speech to Text Converter for Mobile Application0
Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction0
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task LearningCode0
Kurdish (Sorani) Speech to Text: Presenting an Experimental DatasetCode0
Towards End-to-End Training of Automatic Speech Recognition for Nigerian PidginCode0
A Dataset for Speech Emotion Recognition in Greek Theatrical PlaysCode0
WACO: Word-Aligned Contrastive Learning for Speech TranslationCode0
Careless Whisper: Speech-to-Text Hallucination HarmsCode0
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak SupervisionCode0
InstaIndoor and Multi-modal Deep Learning for Indoor Scene RecognitionCode0
Infusing Future Information into Monotonic Attention Through Language ModelsCode0
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive DecodingCode0
Calibrated SVM for Probabilistic Classification of In-Vehicle Voices into Vehicle Commands via Voice-to-Text LLM TransformationCode0
Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations GenerationCode0
Show:102550
← PrevPage 7 of 9Next →

No leaderboard results yet.