It Runs in the Family: Unsupervised Algorithm for Alternative Name Suggestion Using Digitized Family Trees

Aviad Elyashar, Rami Puzis, Michael Fire

Research output: Contribution to journalArticlepeer-review

Abstract

Searching for a person's name is a common online activity. However, Web search engines provide few accurate results to queries containing names. In contrast to a general word that has only one correct spelling, there are several possible legitimate spellings when a name provided as a query. Today, most techniques used to suggest diminutives and alternative spellings in online search are based on pattern matching and phonetic encoding; however, they often perform poorly. As a result, there is a need for an effective tool for improved alternative name suggestion for a name provided as a query. In this paper, we propose a revolutionary approach for tackling the problem of alternative name suggestion. Our novel algorithm, GRAFT, utilizes historical data collected from genealogy websites, along with network algorithms. GRAFT is a general algorithm that suggests alternatives for input names using a graph based on names derived from digitized ancestral family trees. Alternative names are extracted from this graph, which is constructed using generic ordering functions that outperform other algorithms that suggest diminutives and alternative spellings based on a single dimension, a factor that limits their performance. We evaluated GRAFT's performance on three ground truth datasets of forenames and surnames, including a large-scale online genealogy dataset with over 16 million profiles and more than 700,000 unique forenames and 500,000 surnames. We compared GRAFT's performance at suggesting alternative names to the performance of 10 other algorithms, including phonetic encoding, string similarity, machine learning, and deep learning algorithms. The results show GRAFT's superiority with regard to both forenames and surnames and demonstrate its use as a tool to improve alternative name suggestion.

Original languageEnglish
Pages (from-to)1651-1666
Number of pages16
JournalIEEE Transactions on Knowledge and Data Engineering
Volume35
Issue number2
DOIs
StatePublished - 1 Feb 2023

Keywords

  • Alternative name suggestion
  • digitized family trees
  • name-based graphs
  • network science
  • networks
  • personal names

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'It Runs in the Family: Unsupervised Algorithm for Alternative Name Suggestion Using Digitized Family Trees'. Together they form a unique fingerprint.

Cite this