SOTAVerified

Speech-to-Text

Papers

Showing 76100 of 403 papers

TitleStatusHype
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language ModelsCode0
Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation0
Investigating Decoder-only Large Language Models for Speech-to-text Translation0
Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders0
NAIST Simultaneous Speech Translation System for IWSLT 20240
Calibrated SVM for Probabilistic Classification of In-Vehicle Voices into Vehicle Commands via Voice-to-Text LLM TransformationCode0
Voices Unheard: NLP Resources and Models for Yorùbá Regional DialectsCode0
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMsCode1
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNetCode1
Revisiting Interpolation Augmentation for Speech-to-Text GenerationCode1
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech TranslationCode0
Transferable speech-to-text large language model alignment module0
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving0
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationCode3
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models0
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?0
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech TranslationCode2
Synthetic Query Generation using Large Language Models for Virtual Assistants0
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History SelectionCode0
VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications0
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned ProportionsCode1
Semantic MIMO Systems for Speech-to-Text Transmission0
A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)0
Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language PairCode0
NaturalTurn: A Method to Segment Transcripts into Naturalistic Conversational Turns0
Show:102550
← PrevPage 4 of 17Next →

No leaderboard results yet.