Multi-lingual detection of web terrorist content

Mark Last, Alex Markov, Abraham Kandel

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

4 Scopus citations


The role of the Internet in the infrastructure of the global terrorist organizations is increasing dramatically. Beyond propaganda, the WWW is being heavily used for practical training, fundraising, communication, and other purposes. Terrorism experts are interested in identifying who is behind the material posted on terrorist web sites and online forums and what links they have to active terror groups. The current number of known terrorist sites is so large and their URL addresses are so volatile that a continuous manual monitoring of their multilingual content is definitely out of question. Moreover, terrorist web sites and forums often try to conceal their real identity. This is why automated multi-lingual detection methods are so important in the cyber war against the international terror. In this chapter, we describe a classification-based approach to multi-lingual detection and categorization of terrorist documents. The proposed approach builds upon the recently developed graph-based web document representation model combined with the popular C4.5 decision-tree classification algorithm. Two case studies are performed on collections of web documents in Arabic and English languages respectively. The first case study demonstrates that documents downloaded from several known terrorist sites in Arabic can be reliably discriminated from the content of Arabic news reports using a compact set of filtering rules. In the second study, we induce an accurate classification model that can distinguish between the English content posted by two different Middle-Eastern terrorist organizations (Hamas in the Palestinian Authority and Hezbollah in Lebanon).

Original languageEnglish
Title of host publicationIntelligence and Security Informatics
Subtitle of host publicationTechniques and Applications
EditorsHsinchun Chen, Christopher Yang
Number of pages18
StatePublished - 3 Jul 2008

Publication series

NameStudies in Computational Intelligence
ISSN (Print)1860-949X

ASJC Scopus subject areas

  • Artificial Intelligence


Dive into the research topics of 'Multi-lingual detection of web terrorist content'. Together they form a unique fingerprint.

Cite this