Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space

Menachem Fromer, Chen Yanover

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

The task of engineering a protein to assume a target three-dimensional structure is known as protein design. Computational search algorithms are devised to predict a minimal energy amino acid sequence for a particular structure. In practice, however, an ensemble of low-energy sequences is often sought. Primarily, this is performed because an individual predicted low-energy sequence may not necessarily fold to the target structure because of both inaccuracies in modeling protein energetics and the nonoptimal nature of search algorithms employed. Additionally, some low-energy sequences may be overly stable and thus lack the dynamic flexibility required for biological functionality. Furthermore, the investigation of low-energy sequence ensembles will provide crucial insights into the pseudo-physical energy force fields that have been derived to describe structural energetics for protein design. Significantly, numerous studies have predicted low-energy sequences, which were subsequently synthesized and demonstrated to fold to desired structures. However, the characterization of the sequence space defined by such energy functions as compatible with a target structure has not been performed in full detail. This issue is critical for protein design scientists to successfully continue using these force fields at an ever-increasing pace and scale. In this paper, we present a conceptually novel algorithm that rapidly predicts the set of lowest energy sequences for a given structure. Based on the theory of probabilistic graphical models, it performs efficient inspection and partitioning of the near-optimal sequence space, without making any assumptions of positional independence. We benchmark its performance on a diverse set of relevant protein design examples and show that it consistently yields sequences of lower energy than those derived from state-of-the-art techniques. Thus, we find that previously presented search techniques do not fully depict the low-energy space as precisely. Examination of the predicted ensembles indicates that, for each structure, the amino acid identity at a majority of positions must be chosen extremely selectively so as to not incur significant energetic penalties. We investigate this high degree of similarity and demonstrate how more diverse near-optimal sequences can be predicted in order to systematically overcome this bottleneck for computational design. Finally, we exploit this in-depth analysis of a collection of the lowest energy sequences to suggest an explanation for previously observed experimental design results. The novel methodologies introduced here accurately portray the sequence space compatible with a protein structure and further supply a scheme to yield heterogeneous low-energy sequences, thus providing a powerful instrument for future work on protein design.

Original languageEnglish
Pages (from-to)682-705
Number of pages24
JournalProteins: Structure, Function and Bioinformatics
Volume75
Issue number3
DOIs
StatePublished - 15 May 2009
Externally publishedYes

Keywords

  • Approximate inference
  • Belief propagation
  • Combinatorial optimization
  • Maximum-a-posteriori estimation
  • Probabilistic graphical models
  • Protein design
  • Protein energetics
  • Structural sequence space

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology

Fingerprint

Dive into the research topics of 'Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space'. Together they form a unique fingerprint.

Cite this