Product Bundle Identification using Semi-Supervised Learning

Hen Tzaban, Ido Guy, Asnat Greenstein-Messica, Arnon Dagan, Lior Rokach, Bracha Shapira

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Many sellers on e-commerce platforms offer buyers product bundles, which package together two or more different items. The identification of such bundles is a necessary step to support a variety of related services, from recommendation to dynamic pricing. In this work, we present a comprehensive study of bundle identification on a large e-commerce website. Our analysis of bundle compared to non-bundle listed items reveals several key differentiating characteristics, spanning the listing's title, image, and attributes. Following, we experiment with a multi-modal classifier, which takes advantage of these characteristics as features. Our analysis also shows that a bundle indicator input by sellers tends to be highly noisy and carries only a weak signal. The bundle identification task therefore faces the challenge of having a small set of manually-labeled clean examples and a larger set of noisy-labeled examples, in conjunction with class imbalance due to the relative scarcity of bundles. Our experiments with basic supervised classifiers, using the manually-labeled and/or the noisy-labeled data for training, demonstrates only moderate performance. We therefore turn to a semisupervised approach and propose GREED, a self-training ensemblebased algorithm with a greedy model selection. Our evaluation over two different meta-categories shows a superior performance of semi-supervised approaches for the bundle identification task, with GREED outperforming several semi-supervised alternatives. The combination of textual, image, and some metadata features is shown to yield the best performance, reaching an AUC of 0.89 and 0.92 for the two meta-categories, respectively.

Original languageEnglish
Title of host publicationSIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages791-800
Number of pages10
ISBN (Electronic)9781450380164
DOIs
StatePublished - 25 Jul 2020
Event43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020 - Virtual, Online, China
Duration: 25 Jul 202030 Jul 2020

Publication series

NameSIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020
Country/TerritoryChina
CityVirtual, Online
Period25/07/2030/07/20

Keywords

  • electronic commerce
  • ensemble learning
  • product bundling
  • self-training
  • semi-supervised learning

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Product Bundle Identification using Semi-Supervised Learning'. Together they form a unique fingerprint.

Cite this