Finkelstein A.V.1, Badretdinov A.Ya.1,2
1Institute of Protein Research, Russian Academy of Sciences, 142292 Pushchino, Moscow Region, Russia
2Laboratory of Molecular Biophysics, Rockefeller University, Box 270, 1230 York Ave, New York, NY 10021-6399, USA
The main obstacle to protein fold prediction is connected with errors in the energy parameters: due to them, the calculated energy of the native fold can be above the calculated energies of some other competing folds. However, using a set of homologs (the proteins having nearly identical 3D structures despite of numerous amino acid mutations in their chains) one can diminish the errors by averaging the fold energies over the homologs.
A simple (based on the Random Energy Model) analytical theory estimates the necessary (for prediction) number of homologs depending on the homology level and the level of the energy errors.
The theory is substantiated by computer experiments with simplified models of protein chains. The experiments simulate the lowest-energy fold prediction using the "corrupted" energy parameters and show that a sufficiently large set of sufficiently remote homologs allows to recognize this fold correctly.