Abstract
This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages: Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish. We discuss the rationale behind the main decisions of the collection, the methodology used to generate the multilingual corpus, as well as challenges and problems faced per language. This paper overviews the work on Czech, Hebrew and Spanish languages.
Original language | English |
---|---|
Title of host publication | Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 13-19 |
Number of pages | 7 |
State | Published - Aug 2013 |