Automatic identification of biblical quotations in hebrew-aramaic documents

Yaakov Hacohen-Kerner, Nadav Schweitzer, Yaakov Shoham

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Quotations in a text document contain important information about the content, the context, the sources that the author uses, their importance and impact. Therefore, automatic identification of quotations from documents is an important task. Quotations included in rabbinic literature are difficult to identify and to extract for various reasons. The aim of this research is to automatically identify Biblical quotations included in rabbinic documents written in Hebrew-Aramaic. We deal with various kinds of quotations: partial, missing and incorrect. We formulate nineteen features to identify these quotations. These features were divided into seven different feature sets: matches, best matches, sums of weights, weighted averages, weighted medians, common words, and quotation indicators. Several features are novel. Experiments on various combinations of these features were performed using four common machine learning methods. A combination of 17 features using J48 (an improved version of C4.5) achieves an accuracy of 91.2%, which is an improvement of about 8% compared to a baseline result.

Original languageEnglish
Title of host publicationKDIR 2010 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
Pages320-325
Number of pages6
StatePublished - 1 Dec 2010
Externally publishedYes
EventInternational Conference on Knowledge Discovery and Information Retrieval, KDIR 2010 - Valencia, Spain
Duration: 25 Oct 201028 Oct 2010

Publication series

NameKDIR 2010 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval

Conference

ConferenceInternational Conference on Knowledge Discovery and Information Retrieval, KDIR 2010
Country/TerritorySpain
CityValencia
Period25/10/1028/10/10

Keywords

  • Hebrew-aramaic texts
  • Information retrieval
  • Quotation identification

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Fingerprint

Dive into the research topics of 'Automatic identification of biblical quotations in hebrew-aramaic documents'. Together they form a unique fingerprint.

Cite this