Multi-document multilingual summarization corpus preparation, part 2: Czech, hebrew and spanish

Michael Elhadad, Sabino Miranda-Jiménez, Josef Steinberger, George Giannakopoulos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages: Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish. We discuss the rationale behind the main decisions of the collection, the methodology used to generate the multilingual corpus, as well as challenges and problems faced per language. This paper overviews the work on Czech, Hebrew and Spanish languages.
Original languageEnglish
Title of host publicationProceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization
PublisherAssociation for Computational Linguistics (ACL)
Pages13-19
Number of pages7
StatePublished - 2013

Fingerprint

Dive into the research topics of 'Multi-document multilingual summarization corpus preparation, part 2: Czech, hebrew and spanish'. Together they form a unique fingerprint.

Cite this