SOTAVerified

Speech-to-Text

Papers

Showing 101150 of 403 papers

TitleStatusHype
Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D0
Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages0
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts0
CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR0
LASER: Attention with Exponential Transformation0
SPES: Spectrogram Perturbation for Explainable Speech-to-Text GenerationCode0
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?0
Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization0
Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model0
A Survey on Speech Large Language Models0
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum0
Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck0
Unsupervised Data Validation Methods for Efficient Model Training0
Transducer Consistency Regularization for Speech to Text Applications0
Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems0
Unveiling the Role of Pretraining in Direct Speech Translation0
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not0
On the Feasibility of Fully AI-automated Vishing Attacks0
Toward Automated Clinical Transcriptions0
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text0
Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration ApproachCode0
Evaluation of real-time transcriptions using end-to-end ASR models0
LAST: Language Model Aware Speech Tokenization0
AI-Based IVR0
CMU's IWSLT 2024 Simultaneous Speech Translation System0
CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation UnitsCode0
AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments0
Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks0
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language ModelsCode0
Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation0
Investigating Decoder-only Large Language Models for Speech-to-text Translation0
Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders0
NAIST Simultaneous Speech Translation System for IWSLT 20240
Calibrated SVM for Probabilistic Classification of In-Vehicle Voices into Vehicle Commands via Voice-to-Text LLM TransformationCode0
Voices Unheard: NLP Resources and Models for Yorùbá Regional DialectsCode0
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech TranslationCode0
Transferable speech-to-text large language model alignment module0
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving0
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models0
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?0
Synthetic Query Generation using Large Language Models for Virtual Assistants0
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History SelectionCode0
VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications0
Semantic MIMO Systems for Speech-to-Text Transmission0
A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)0
Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language PairCode0
NaturalTurn: A Method to Segment Transcripts into Naturalistic Conversational Turns0
Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking0
Robust Semantic Communications for Speech Transmission0
Compact Speech Translation Models via Discrete Speech Units Pretraining0
Show:102550
← PrevPage 3 of 9Next →

No leaderboard results yet.