SPVec: A Word2vec-Inspired Feature Representation Method for Drug-Target Interaction Prediction

Yu Fang Zhang, Xiangeng Wang, Aman Chandra Kaushik, Yanyi Chu, Xiaoqi Shan, Ming Zhu Zhao, Qin Xu, Dong Qing Wei

Research output: Contribution to journalArticlepeer-review

54 Scopus citations

Abstract

Drug discovery is an academical and commercial process of global importance. Accurate identification of drug-target interactions (DTIs) can significantly facilitate the drug discovery process. Compared to the costly, labor-intensive and time-consuming experimental methods, machine learning (ML) plays an ever-increasingly important role in effective, efficient and high-throughput identification of DTIs. However, upstream feature extraction methods require tremendous human resources and expert insights, which limits the application of ML approaches. Inspired by the unsupervised representation learning methods like Word2vec, we here proposed SPVec, a novel way to automatically represent raw data such as SMILES strings and protein sequences into continuous, information-rich and lower-dimensional vectors, so as to avoid the sparseness and bit collisions from the cumbersomely manually extracted features. Visualization of SPVec nicely illustrated that the similar compounds or proteins occupy similar vector space, which indicated that SPVec not only encodes compound substructures or protein sequences efficiently, but also implicitly reveals some important biophysical and biochemical patterns. Compared with manually-designed features like MACCS fingerprints and amino acid composition (AAC), SPVec showed better performance with several state-of-art machine learning classifiers such as Gradient Boosting Decision Tree, Random Forest and Deep Neural Network on BindingDB. The performance and robustness of SPVec were also confirmed on independent test sets obtained from DrugBank database. Also, based on the whole DrugBank dataset, we predicted the possibilities of all unlabeled DTIs, where two of the top five predicted novel DTIs were supported by external evidences. These results indicated that SPVec can provide an effective and efficient way to discover reliable DTIs, which would be beneficial for drug reprofiling.

Original languageEnglish
Article number895
JournalFrontiers in Chemistry
Volume7
DOIs
StatePublished - 10 Jan 2020
Externally publishedYes

Keywords

  • Word2vec
  • drug-target interaction
  • feature embedding
  • machine learning
  • representation learning

ASJC Scopus subject areas

  • General Chemistry

Fingerprint

Dive into the research topics of 'SPVec: A Word2vec-Inspired Feature Representation Method for Drug-Target Interaction Prediction'. Together they form a unique fingerprint.

Cite this