A decision tree framework for semi-automatic extraction of product attributes from the web

Lior Rokach, Roni Romano, Barak Chizi, Oded Maimon

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Semi-Automatic extraction of product attributes from URLs is an important issue for comparison-shopping agents. In this paper we examine a novel decision tree framework for extracting product attributes. The core induction algorithmic framework consists of three main stages. In the first stage, a large set of regular expression-based patterns are induced by employing a longest common subsequence algorithm. In the second stage we filter the initial set and leave only the most useful patterns. In the last stage we represent the extraction problem (in which the domain values are not known in advance) as a classification problem and employ an ensemble of decision trees. An empirical study performed on a real-world extraction tasks illustrates the capability of the proposed framework.

Original languageEnglish
Title of host publicationAdvances in Web Intelligence and Data Mining
EditorsMark Last, Piotr Szczepaniak, Piotr Szczepaniak, Zeev Vlvolkov, Abraham Kandel
Pages201-210
Number of pages10
DOIs
StatePublished - 27 Sep 2006

Publication series

NameStudies in Computational Intelligence
Volume23
ISSN (Print)1860-949X

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A decision tree framework for semi-automatic extraction of product attributes from the web'. Together they form a unique fingerprint.

Cite this