Structuring the Unstructured: From Startup to Making Sense of eBay's Huge eCommerce Inventory

Ido Guy, Kira Radinsky

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Electronic commerce continues to gain popularity in recent years. On eBay, one of the largest on-line marketplaces in the world, millions of new listings (items) are submitted by a variety of sellers every day. This renders a rich diverse inventory characterized by a particularly long tail [7]. In addition, many items in the inventory lack basic structured information, such as product identifiers, brand, category, and other properties, due to sellers' tendency to input unstructured information only, namely title and description [6]. Such inventory therefore requires a handful of large-scale solutions to assist in organizing the data and gaining business insights. In 2016, eBay acquired SalesPredict to help structure its unstructured data. In this proposed presentation, we will share the story of a research startup from its inception until its acquisition and integration as eBay's data science team. We will review the numerous challenges from research and engineering perspectives of a startup and the principal challenges the eBay data science organization deals with today. These include the identification of duplicate, similar, and related products; the extraction of namevalue attributes from item titles and descriptions; the matching of items entered by sellers to catalog products; the ranking of item titles based on their likelihood to serve as "good" product titles; and the creation of "browse node" pages to address complex search queries from potential buyers. We will describe how the eBay data science team approaches these challenges and some of the solutions already launched to production. These solutions involve the use of large-scale machine learning, information retrieval, and natural language processing techniques, and should therefore be of interest to the SIGIR audience at large.

Original languageEnglish
Title of host publicationSIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages1351
Number of pages1
ISBN (Electronic)9781450350228
DOIs
StatePublished - 7 Aug 2017
Externally publishedYes
Event40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017 - Tokyo, Shinjuku, Japan
Duration: 7 Aug 201711 Aug 2017

Publication series

NameSIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017
Country/TerritoryJapan
CityTokyo, Shinjuku
Period7/08/1711/08/17

Fingerprint

Dive into the research topics of 'Structuring the Unstructured: From Startup to Making Sense of eBay's Huge eCommerce Inventory'. Together they form a unique fingerprint.

Cite this