CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

2024-09-29Code Available0· sign in to hype

Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin

Code Available — Be the first to reproduce this paper.

Code

github.com/x-lance/slam-llm
OfficialIn paper★ 0

Abstract

Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training framework designed to activate the chain-of-thought (CoT) capabilities of SLMs. We propose CoT-ST, a speech translation model that utilizes multimodal CoT to decompose speech translation into sequential steps of speech recognition and translation. We validated the effectiveness of our method on two datasets: the CoVoST-2 dataset and MuST-C dataset. The experimental results demonstrate that CoT-ST outperforms previous state-of-the-art methods, achieving higher BLEU scores (CoVoST-2 en-ja: 30.5->30.8, en-zh: 45.2->47.7, MuST-C en-zh: 19.6->21.2). This work is open sourced at https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/st_covost2 .

Tasks

speech-recognition Speech Recognition Translation

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Code

Abstract

Tasks

Reproductions