Improving molecular representation learning with metric learning-enhanced optimal transport

Fang Wu, Nicolas Courty, Shuting Jin, Stan Z. Li

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties.

Original languageEnglish
Article number100714
JournalPatterns
Volume4
Issue number4
DOIs
StatePublished - 14 Apr 2023
Externally publishedYes

Keywords

  • DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
  • deep learning
  • domain adaptation
  • drug discovery
  • geometric neural network
  • materials synthesis
  • molecular representation learning
  • optimal transport

ASJC Scopus subject areas

  • General Decision Sciences

Fingerprint

Dive into the research topics of 'Improving molecular representation learning with metric learning-enhanced optimal transport'. Together they form a unique fingerprint.

Cite this