SOTAVerified

Speech-to-Text

Papers

Showing 2650 of 403 papers

TitleStatusHype
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming CapabilitiesCode1
SparQLe: Speech Queries to Text Translation Through LLMsCode0
Speech to Speech Translation with Translatotron: A State of the Art Review0
High-Fidelity Simultaneous Speech-To-Speech TranslationCode5
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation0
OSUM: Advancing Open Speech Understanding Models with Limited Resources in AcademiaCode3
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher LearningCode1
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Existential Crisis: A Social Robot's Reason for Being0
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison0
Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages0
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?0
Fine-tuning Whisper on Low-Resource Languages for Real-World ApplicationsCode1
Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations GenerationCode0
Representation Purification for End-to-End Speech Translation0
Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D0
Whisper Finetuning on Nepali Language0
Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages0
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts0
CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR0
LASER: Attention with Exponential Transformation0
SPES: Spectrogram Perturbation for Explainable Speech-to-Text GenerationCode0
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?0
Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization0
Show:102550
← PrevPage 2 of 17Next →

No leaderboard results yet.