TY - GEN
T1 - Structuring the Unstructured
T2 - 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017
AU - Guy, Ido
AU - Radinsky, Kira
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/8/7
Y1 - 2017/8/7
N2 - Electronic commerce continues to gain popularity in recent years. On eBay, one of the largest on-line marketplaces in the world, millions of new listings (items) are submitted by a variety of sellers every day. This renders a rich diverse inventory characterized by a particularly long tail [7]. In addition, many items in the inventory lack basic structured information, such as product identifiers, brand, category, and other properties, due to sellers' tendency to input unstructured information only, namely title and description [6]. Such inventory therefore requires a handful of large-scale solutions to assist in organizing the data and gaining business insights. In 2016, eBay acquired SalesPredict to help structure its unstructured data. In this proposed presentation, we will share the story of a research startup from its inception until its acquisition and integration as eBay's data science team. We will review the numerous challenges from research and engineering perspectives of a startup and the principal challenges the eBay data science organization deals with today. These include the identification of duplicate, similar, and related products; the extraction of namevalue attributes from item titles and descriptions; the matching of items entered by sellers to catalog products; the ranking of item titles based on their likelihood to serve as "good" product titles; and the creation of "browse node" pages to address complex search queries from potential buyers. We will describe how the eBay data science team approaches these challenges and some of the solutions already launched to production. These solutions involve the use of large-scale machine learning, information retrieval, and natural language processing techniques, and should therefore be of interest to the SIGIR audience at large.
AB - Electronic commerce continues to gain popularity in recent years. On eBay, one of the largest on-line marketplaces in the world, millions of new listings (items) are submitted by a variety of sellers every day. This renders a rich diverse inventory characterized by a particularly long tail [7]. In addition, many items in the inventory lack basic structured information, such as product identifiers, brand, category, and other properties, due to sellers' tendency to input unstructured information only, namely title and description [6]. Such inventory therefore requires a handful of large-scale solutions to assist in organizing the data and gaining business insights. In 2016, eBay acquired SalesPredict to help structure its unstructured data. In this proposed presentation, we will share the story of a research startup from its inception until its acquisition and integration as eBay's data science team. We will review the numerous challenges from research and engineering perspectives of a startup and the principal challenges the eBay data science organization deals with today. These include the identification of duplicate, similar, and related products; the extraction of namevalue attributes from item titles and descriptions; the matching of items entered by sellers to catalog products; the ranking of item titles based on their likelihood to serve as "good" product titles; and the creation of "browse node" pages to address complex search queries from potential buyers. We will describe how the eBay data science team approaches these challenges and some of the solutions already launched to production. These solutions involve the use of large-scale machine learning, information retrieval, and natural language processing techniques, and should therefore be of interest to the SIGIR audience at large.
UR - http://www.scopus.com/inward/record.url?scp=85029387467&partnerID=8YFLogxK
U2 - 10.1145/3077136.3096469
DO - 10.1145/3077136.3096469
M3 - Conference contribution
AN - SCOPUS:85029387467
T3 - SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 1351
BT - SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
Y2 - 7 August 2017 through 11 August 2017
ER -