Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces

Fang Wu, Stan Z. Li

Research output: Contribution to journalConference articlepeer-review

Abstract

Molecular surfaces imply fingerprints of interaction patterns between proteins. However, nonequivalent efforts have been paid to incorporating the abundant protein surface information for analyzing proteins' biological functions in juxtaposition to amino acid sequences and 3D structures. We propose a novel surface-based unsupervised learning algorithm termed Surface-VQMAE to overcome this obstacle. In light of surface point clouds' sparsity and disorder properties, we first partition them into patches and obtain the sequential arrangement via the Morton curve. Successively, a Transformer-based architecture named SurfFormer was introduced to integrate the surface geometry and capture patch-level relations. At last, we enhance the prevalent masked auto-encoder (MAE) with the vector quantization (VQ) technique, which establishes a surface pattern codebook to enforce a discrete posterior distribution of latent variables and achieve more condensed semantics. Our work is the foremost to implement pretraining purely on molecular surfaces and extensive experiments on diverse real-life scenarios including binding site scoring, binding affinity prediction, and mutant effect estimation demonstrate its effectiveness. The code is available at https://github.com/smiles724/VQMAE.

Original languageEnglish
Pages (from-to)53619-53634
Number of pages16
JournalProceedings of Machine Learning Research
Volume235
StatePublished - 1 Jan 2024
Externally publishedYes
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: 21 Jul 202427 Jul 2024

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces'. Together they form a unique fingerprint.

Cite this