SOTAVerified

SparQLe: Speech Queries to Text Translation Through LLMs

2025-02-13Code Available0· sign in to hype

Amirbek Djanibekov, Hanan Aldarmaki

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

With the growing influence of Large Language Models (LLMs), there is increasing interest in integrating speech representations with them to enable more seamless multi-modal processing and speech understanding. This study introduces a novel approach that leverages self-supervised speech representations in combination with instruction-tuned LLMs for speech-to-text translation. The proposed approach leverages a modality adapter to align extracted speech features with instruction-tuned LLMs using English-language data. Our experiments demonstrate that this method effectively preserves the semantic content of the input speech and serves as an effective bridge between self-supervised speech models and instruction-tuned LLMs, offering a promising solution for various speech understanding applications.

Tasks

Reproductions