Google 學術搜尋

SEAHORSE: A multilingual, multifaceted dataset for summarization evaluation

E Clark, S Rijhwani, S Gehrmann, J Maynez… - arXiv preprint arXiv …, 2023 - arxiv.org

E Clark, S Rijhwani, S Gehrmann, J Maynez, R Aharoni, V Nikolaev, T Sellam, A Siddhant…

arXiv preprint arXiv:2305.13194, 2023•arxiv.org

Reliable automatic evaluation of summarization systems is challenging due to the multifaceted and subjective nature of the task. This is especially the case for languages other than English, where human evaluations are scarce. In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation. SEAHORSE consists of 96K summaries with human ratings along 6 dimensions of text quality: comprehensibility, repetition, grammar, attribution, main ideas, and conciseness, covering 6 languages, 9 systems and 4 datasets. As a result of its size and scope, SEAHORSE can serve both as a benchmark to evaluate learnt metrics, as well as a large-scale resource for training such metrics. We show that metrics trained with SEAHORSE achieve strong performance on the out-of-domain meta-evaluation benchmarks TRUE (Honovich et al., 2022) and mFACE (Aharoni et al., 2022). We make the SEAHORSE dataset and metrics publicly available for future research on multilingual and multifaceted summarization evaluation.

arxiv.org

顯示更多顯示較少

儲存引用被引用 23 次相關文章全部共 4 個版本 HTML 版

顯示最佳搜尋結果。查看所有結果

引用

進階搜尋

已儲存至「我的圖書館」

SEAHORSE: A multilingual, multifaceted dataset for summarization evaluation