Identifying and reducing error in cluster‐expansion approximations of protein energies |
| |
Authors: | Seungsoo Hahn Orr Ashenberg Gevorg Grigoryan Amy E. Keating |
| |
Affiliation: | 1. Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139;2. Department of Biochemistry and Biophysics, University of Pennsylvania, 422 Curie Blvd, Philadelphia, Pennsylvania 19107 |
| |
Abstract: | Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence‐based expansion is monitored and improved using cross‐validation testing and iterative inclusion of additional clusters. As a trade‐off for evaluation speed, the cluster‐expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by thecluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence–stability relationship for several protein structures: coiled‐coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin‐1 and endophilin‐1 as examples where the expanded pseudo‐energies are obtained from experiments. Our open‐source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010 |
| |
Keywords: | protein modeling cluster expansion heterogeneous variance protein design energy function |
|
|