Google 學術搜尋

Chain-of-Thought Prompting for Speech Translation

K Hu, Z Chen, CHH Yang, P Żelasko… - arXiv preprint arXiv …, 2024 - arxiv.org

K Hu, Z Chen, CHH Yang, P Żelasko, O Hrinchuk, V Lavrukhin, J Balam, B Ginsburg

arXiv preprint arXiv:2409.11538, 2024•arxiv.org

Large language models (LLMs) have demonstrated remarkable advancements in language understanding and generation. Building on the success of text-based LLMs, recent research has adapted these models to use speech embeddings for prompting, resulting in Speech-LLM models that exhibit strong performance in automatic speech recognition (ASR) and automatic speech translation (AST). In this work, we propose a novel approach to leverage ASR transcripts as prompts for AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM model consists of a speech encoder and an encoder-decoder structure Megatron-T5. By first decoding speech to generate ASR transcripts and subsequently using these transcripts along with encoded speech for prompting, we guide the speech translation in a two-step process like chain-of-thought (CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model adaptation and shows superior performance to full model fine-tuning. Experimental results show that the proposed CoT prompting significantly improves AST performance, achieving an average increase of 2.4 BLEU points across 6 En->X or X->En AST tasks compared to speech prompting alone. Additionally, compared to a related CoT prediction method that predicts a concatenated sequence of ASR and AST transcripts, our method performs better by an average of 2 BLEU points.

arxiv.org

顯示更多顯示較少

儲存引用相關文章全部共 2 個版本 HTML 版

引用

進階搜尋

已儲存至「我的圖書館」

Chain-of-Thought Prompting for Speech Translation