SOTAVerified

Speech-to-Text

Papers

Showing 51100 of 403 papers

TitleStatusHype
A Survey on Speech Large Language Models0
STTATTS: Unified Speech-To-Text And Text-To-Speech ModelCode1
Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model0
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum0
Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck0
Denial-of-Service Poisoning Attacks against Large Language ModelsCode1
Unsupervised Data Validation Methods for Efficient Model Training0
Transducer Consistency Regularization for Speech to Text Applications0
Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems0
Unveiling the Role of Pretraining in Direct Speech Translation0
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not0
Toward Automated Clinical Transcriptions0
On the Feasibility of Fully AI-automated Vishing Attacks0
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text0
Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration ApproachCode0
Evaluation of real-time transcriptions using end-to-end ASR models0
LAST: Language Model Aware Speech Tokenization0
AI-Based IVR0
CMU's IWSLT 2024 Simultaneous Speech Translation System0
OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational AgentsCode1
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language ModelsCode1
CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation UnitsCode0
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income CommunitiesCode1
AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments0
Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks0
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language ModelsCode0
Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation0
Investigating Decoder-only Large Language Models for Speech-to-text Translation0
Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders0
NAIST Simultaneous Speech Translation System for IWSLT 20240
Calibrated SVM for Probabilistic Classification of In-Vehicle Voices into Vehicle Commands via Voice-to-Text LLM TransformationCode0
Voices Unheard: NLP Resources and Models for Yorùbá Regional DialectsCode0
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMsCode1
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNetCode1
Revisiting Interpolation Augmentation for Speech-to-Text GenerationCode1
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech TranslationCode0
Transferable speech-to-text large language model alignment module0
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving0
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationCode3
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models0
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?0
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech TranslationCode2
Synthetic Query Generation using Large Language Models for Virtual Assistants0
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History SelectionCode0
VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications0
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned ProportionsCode1
Semantic MIMO Systems for Speech-to-Text Transmission0
A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)0
Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language PairCode0
NaturalTurn: A Method to Segment Transcripts into Naturalistic Conversational Turns0
Show:102550
← PrevPage 2 of 9Next →

No leaderboard results yet.