Leveraging metadata to recommend keywords for academic papers

Research output: Contribution to journalArticlepeer-review

9 Scopus citations


Users of research databases, such as CiteSeerX, Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.

Original languageEnglish
Pages (from-to)3073-3091
Number of pages19
JournalJournal of the Association for Information Science and Technology
Issue number12
StatePublished - 1 Dec 2016


  • automatic categorization
  • collaborative filtering
  • information retrieval

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management
  • Library and Information Sciences


Dive into the research topics of 'Leveraging metadata to recommend keywords for academic papers'. Together they form a unique fingerprint.

Cite this