In the OSTI Collections: Determining How Proteins Fold

Article Acknowledgement:
Dr. William N. Watson, Physicist
DOE Office of Scientific and Technical Information

Investigations Focused on Direct Experiments

Investigations Focused on Computation

Reports available through OSTI's SciTech Connect

Proteins are the materials in living cells whose primary structures are specified by the cell’s DNA. These primary structures are sequences of amino acid^[^Wikipedia^] residues bound together into one or more chains. Each link in a chain is formed when a nitrogen atom and a carbon atom, each from a different amino acid molecule, join together in what is called a peptide bond, after the nitrogen has detached from a hydrogen atom that it was originally linked to and the carbon has likewise detached from a hydroxyl (oxygen-hydrogen) group (see Figure 1). As these links are made between amino acid residues, detached hydrogen and hydroxyl groups pair up and bind together to form water molecules. Proteins differ in their residue sequences, some differing in the kinds and numbers of amino acid residues that they’re made of, while others differ only by having the same kinds and quantities of residues in a different order.

Figure 1. Top: Amino acid residues are linked together when a carbon atom in one amino acid separates from a hydroxyl group, a nitrogen atom in another amino acid separates from a hydrogen atom, and the carbon and nitrogen atoms bind together. The separated hydroxyl and hydrogen can bind together into a water molecule.

Bottom: Chains of amino acid residues, represented in this schematic as spheres, constitute the primary structures of proteins; proteins’ secondary, tertiary, and even quaternary structures result from the folding and combination of primary structures into particular forms. (From Wikipedia: “Peptide bond formation”^[^Wikipedia^], “Main protein structure levels”^[^Wikipedia^].)

While cells build proteins one link at a time, the results aren’t simply strings of amino acid residues. As a protein is formed, the electromagnetic forces exerted by its own atoms on each other and by other atoms in its environment cause the protein’s residue chain or chains to fold up into a particular intricate shape. Each protein’s shape plays an important role in its function, since the shape of its surface affects how it can and cannot come in contact with other molecules in the cell, and thus how it can and can’t react with those other molecules. It was realized years ago that the process by which proteins fold into the shapes in which they function properly is quite efficient; if the process were not steered efficiently by interatomic forces, but were instead a random wandering among all the enormously numerous possible foldings, the average time for any one protein to stumble into its functional or “native” shape would make the entire age of the universe look practically infinitesimal.

Proteins commonly contain dozens of amino acid residues, with each of the 23 amino acids found in living cells containing between 10 and 48 atoms, so that typical proteins are made of hundreds or thousands of atoms each. Since this makes it quite problematic to understand protein folding directly in terms of each atom’s interaction with every other atom around it, people have tried to determine whether groups of atoms behave to any extent as single units, and to sort out larger from smaller influences among these forces, to see what factors are most important. This way of analyzing physical processes has plenty of precedent. The workings of many physical systems, some much larger than proteins, are in fact practically impossible to trace if one tries to account equally for all the system’s numerous constituents and the interactions between them. Yet many such systems can still be profitably analyzed, because certain sets of components behave as units while some of their interactions affect their behavior much more than others. A point of view that accounts for the most important aspects first and treats information about the rest as refinements not only makes analysis feasible, but gives a much clearer idea of how the system works.

The gradual sorting out of which aspects are most important in protein folding has been the work of many years and is still in progress. Many approaches have been tried that are quite different, both in the physical aspects of protein folding that they examine and in the ways they represent proteins and their folding mathematically. The evaluation of such different approaches has been facilitated by the biennial comparison experiment known as Critical Assessment of techniques for protein Structure Prediction (CASP)^[CASP], in which teams of researchers use different methods to calculate proteins’ shapes so the different methods can be compared and researchers can all learn from each group’s results. One thing that has been learned from the various methods is how interatomic forces can steer proteins away from most of their otherwise possible shapes as they fold and thus avoid a practically endless wandering among unsuitable shapes to quickly reach their functional shapes in living organisms.

Some of the research into how proteins fold has been funded by the Department of Energy because of its relevance to bioenergy production and environmental protection.^[^DoE^] The following sample of reports about this research from the last several years describes some of the experiments and mathematical analyses that researchers have used to determine proteins’ shapes and what causes proteins to fold into them. Analysis methods developed in some of the work have been tested in CASP evaluations.

Top

Investigations Focused on Direct Experiments

Although the details of molecular structure are too small to see with the unaided eye, we have several techniques to supplement our eyes, each one able to provide some information that others can’t. One set of techniques is based on how objects affect waves.

Any wave involves back-and-forth oscillations of some kind that spread out from a source whose motion generates the wave. Water waves are generated by objects disturbing the water, sound by objects whose vibrations periodically compress the air or other surrounding medium to make pressure waves, and light by the motion of electrically charged particles as they accelerate within an atom, piece of metal, or empty space. The wave spreads out because its own back-and-forth oscillations cause similar oscillations in their immediate environment that have the same frequency but lag behind them; those oscillations set up similar oscillations in their environment, and so on, so the wave spreads. Each point in a wave is thus, in effect, a wave source itself, so the progress of a wave through any point in space at a particular instant can be worked out by adding up the effects of all the wave’s oscillations that occur right around that point just before that instant. Oscillations at neighboring points, when they’re in step with each other, will mutually reinforce to cause a strong disturbance in their environment a moment later, but if the oscillations are out of step, they’ll interfere with each other and cause little or no disturbance nearby.

The result of all this, when the wave is in a fairly uniform medium, is that the wave propagates along with little change in its form as it moves forward. But when the wave meets a new object in its path that it can interact with, the oscillations that the wave sets up in the object may have a quite different strength, direction, or propagation speed from the ones it sets up in the space around the object, and so the next moment’s oscillations in the immediate neighborhood will differ from what they would have been had the object not been there. Portions of the wave that would have been in step and mutually reinforcing will instead be out of step and at least partly cancel, while other portions that would have mutually interfered may reinforce. The object thus changes the form of the wave. If the object is much smaller than the wave’s own wavelength, the change will also be small, but if the object is about as big as one wavelength, the wave’s form will be more drastically affected.

This phenomenon is the basis of a technique for analyzing the structures of materials with x-rays. X-rays are a form of light—electromagnetic waves—whose wavelengths of roughly 10 picometers^[^Wikipedia^] to 10 nanometers^[^Wikipedia^] are much shorter than the few-hundred nanometer wavelengths of visible light. Since atoms are no wider than a few tenths of nanometers, the electromagnetic interactions of atoms, or molecules made of them, can significantly reshape waves of x-ray length. If the atoms or molecules are arranged in a regular lattice of identical small volumes or cells that each contain identical sets of atoms or molecules, thus constituting a crystal, then the way each atom or molecule reshapes an incoming wave is, except for timing, the same at each unit cell of the lattice.^[^Wikipedia^] Thus the multitude of unit cells acts as a set of identical wave sources, some of which oscillate in step with each other, some out of step. The waves coming from the different unit cells will interfere along some directions and reinforce along others, the directions depending on the wavelength and on how the crystal’s atoms are arranged in each unit cell, so the directional pattern of the outgoing x-rays’ intensity can be used to help determine what the atomic arrangement is.

X-ray beams scattered in this way by a crystal can have significant intensity in many different directions. An isolated molecule, on the other hand, will just spread an x-ray beam slightly. A larger molecule’s wide distribution of electric charges will react to an x-ray beam like a widely distributed set of x-ray sources, with the oscillations they produce in their neighborhoods largely cancelling out in all directions except those within a small angle of the original beam’s direction; the oscillations of the fewer, less widely distributed electric charges of a smaller molecule cancel each other less, so the resulting x-ray beam spreads over a somewhat wider (though still small) angle away from the original beam. The exact spread pattern of the x-ray beam depends on both the size and atomic arrangement of the molecule, so observing this phenomenon also gives clues to a molecule’s structure.

While x-ray studies of protein crystals go back many years, the scattering of x-rays through small angles by individual protein molecules has seen increasing use for determining protein structure more recently. The usefulness of the latter method is discussed in a report entitled “Structural analysis of flexible proteins in solution by small angle X-ray scattering combined with crystallography”^[^{SciTech Connect}^] by researchers at Lawrence Berkeley National Laboratory and the Scripps Research Institute. The report presents predictions based on small-angle x-ray scattering of the shapes of two proteins that cells use in repairing their DNA^[^Wikipedia^,^Wikipedia^]. Combining these results with published data, the report shows how small-angle x-ray scattering “combined with high resolution crystal structures efficiently establishes architectures, assemblies, conformations, and unstructured regions for proteins and protein complexes in solution.”

Not all uses of waves involve the kind of scattering effects discussed in these two reports. Molecules can not only scatter light waves, but can absorb waves if the waves’ frequencies and polarizations^[^Wikipedia^] are appropriate. This is the basis of a technique advanced and reported by researchers at Lawrence Livermore National Laboratory and their collaborators, which goes beyond the determination of stable protein structures toward solving the problem described in our introduction: how strings of amino acid residues fold up into the shape that the proteins require to function.

The introduction of one of the reports, by researchers at Lawrence Livermore National Laboratory and Stanford University (“Microsecond Microfluidic Mixing for Investigation of Protein Folding Kinetics”^[^{SciTech Connect}^]), discusses some limitations of existing techniques for observing protein folding:

“Although important structural events in protein folding are known to occur on the submillisecond time scale, the investigation of fast protein folding kinetics has been limited by the time required to mix protein and denaturant solutions. Experiments have demonstrated protein folding with mixing times of ~50 ?s^[^Wikipedia^] but with high sample consumption.”

The report goes on to describe a microfluidic device that’s designed to overcome these problems by reducing the mixing times to a few microseconds and using only small samples. The device, illustrated in Figure 2, allows measurements of protein folding with visible and ultraviolet light. The mixer takes a solution of protein and a denaturant—a substance that causes the protein to unfold—and focuses the solution to a stream less than a micrometer wide. As the denaturant diffuses out of the stream, the remaining protein molecules are able to refold.

How the folding progresses as the protein molecules move downstream is shown by the molecules’ effect on visible or ultraviolet light or other electromagnetic waves. As the molecules’ folding changes their shape, the shape change affects the ways in which the molecules’ negatively charged electrons and positive nuclei are able to vibrate. This in turn changes how strongly the charged particles absorb light waves of different frequencies or polarizations, so observing the changes in the proteins’ effect on the light that shines on them provides information on how the proteins are folding and how that folding is progressing. Later reports^[^{SciTech Connect}^;^{SciTech Connect}^] from Livermore, and collaborators at the University of California, Davis, the University of Potsdam, and the University of Zurich, describe further work on other microfluidic mixer designs for observing with different optical techniques exactly when proteins’ primary structures (their residue sequences) fold into their secondary structures (the coils and sheets found in various proteins).

Figure 2. Left: Schematic of diffusive mixing in a microfluidic device; a protein-denaturant solution enters from the top middle channel, is hydrodynamically focused into a thin stream, and separates as the denaturant diffuses out, allowing the remaining protein to fold.

Right: Nozzle area of the mixer before its glass cover slip is bonded on top. The channels are 8.6 micrometers deep, nozzles are 2.5 micrometers wide. The etching process that forms the channels leaves their walls scalloped. (From “Microsecond Microfluidic Mixing for Investigation of Protein Folding Kinetics”^[^{SciTech Connect}^], pp. 6-7 of 10.)

A different set of experiments and analyses by researchers at the University of Houston is briefly described in the report “Neutron Compton Scattering as a Probe of Hydrogen Bonded (and other) Systems”^[^{SciTech Connect}^]. (The title indicates that neutrons were used instead of electromagnetic waves to investigate the molecules examined.) The phenomenon probed, hydrogen bonds, is one that appears frequently in protein structures and results from a particular difference in the way positive and negative particles are distributed in some molecules. While the positive charge of an atom’s nucleus keeps most of its negatively charged electrons near that nucleus, the few outermost electrons interact significantly with the outer electrons of any other atoms nearby. When atoms whose charged particles have certain distributions come near each other, the electrical forces among them cause a redistribution that results in the atoms sharing pairs of their outermost electrons with each other to form covalent bonds^[^Wikipedia^]. If one of the atoms is hydrogen, its shared electron pair may lie largely away from its one-proton nucleus, thus leaving the proton’s positive charge more exposed to electrons in other atoms. These protons, and nearby atoms that have a higher concentration of negative electrons, may weakly bind together because of their opposite charges’ mutual electrostatic attraction. Though these hydrogen bonds are not as strong as covalent bonds, they can be strong enough to keep atoms together.

Hydrogen bonds that occur between atoms in a large, flexible molecule like a protein are thought to help the molecule stay in a particular shape if other forces on it aren’t too strong (though some evidence that other important factors exist is presented by one of the reports discussed below). In particular, the secondary structures known as alpha helices and beta sheets, illustrated in Figure 1, have hydrogen bonds between amino acid residues that aren’t directly connected by peptide bonds but become neighbors because of folds in the primary sequence. The experiments by the Houston group that related most directly to protein folding measured mixtures of lysozyme^[^Wikipedia^] in water at different temperatures and concentrations to determine how the momenta of protons varied in it. These experiments indicated that the momentum distribution depended strongly on temperature. The state of the proton in a hydrogen bond, largely described by its momentum distribution, is related to how strong the bond is, so this finding indicates something about how the strength of the protein’s hydrogen bonds changes with temperature.

Other work described in the report revealed some other facts of significance for protein biology. The Houston group’s experiments showed that water confined to spaces about 2 nanometers wide, unlike bulk water, does not behave as a collection of molecules that interact only through electrostatic attractions and repulsions between the water molecules’ more positive hydrogen and more negative oxygen regions. They note that this has major implications for biology, where most of the water in a living cell—the environment of the cell’s protein and other molecules—is confined to spaces of a few nanometers. The group’s theoretical analysis of experimental data also demonstrates that bulk water itself cannot be adequately described as a collection of self-contained, electrostatically interacting molecules, but as a network of hydrogen bonds, with electrons not just confined to particular molecules but distributed throughout the network. Such a situation for electrons in bulk water would resemble that of conduction electrons in a metal, which move among all the atoms in a metal instead of being bound to particular atoms.

Top

Investigations Focused on Computation

Many other reports describe various computational methods to study protein folding which, by comparison of their results with experiment, have provided clues about what features of proteins and their environment have the greatest effect on how they fold. Particular approximate descriptions of the protein molecules and the forces that affect their shapes, when the depictions of protein folding that they imply are accurate, may be highlighting significant characteristics of the forces and proteins. Some computational methods automate the entire deduction of a folding process or its result, while others rely on human experts’ involvement during the calculation. The following sample of eleven reports published since the turn of the century describes specific approaches taken and insights gained from them.

The paper “New local potential useful for genome annotation and 3D modeling”^[^{SciTech Connect}^], published in the Journal of Molecular Biology by researchers at the University of California, San Francisco and Lawrence Berkeley National Laboratory, described a way to augment existing methods of using proteins’ amino acid residue sequences to calculate their shapes, by adding data about how frequently each type of residue is found to form each possible angle with the next residue in the sequence. These statistics indirectly account for the forces that act on all the protein’s constituent particles by treating those forces as acting on each amino acid residue as a unit and affecting its orientation relative to its neighbors. And of course, accounting for all the individual twists and turns’ angles along the residue sequence will yield the protein molecule’s overall shape.

This method also takes advantage of another fact: that structures seen in real proteins tend to be the stable forms for many different residue sequences, not just one or a few. The algorithm comes up with candidate shapes for new residue sequences by checking their similarity with other sequences that have known native structures. This kind of similarity, which is common among the residue sequences of real proteins but not of arbitrary residue sequences, is a point discussed in some detail in the 2003 Iowa State University dissertation of Haibo Cao, “Protein Structure Recognition: From Eigenvector Analysis to Structural Threading Method”^[^{SciTech Connect}^]. Like “New local potential useful for genome annotation and 3D modeling”, Cao’s dissertation accounts for the forces affecting the amino acid residues, but in some respects accounts more directly for the residues’ interactions with each other and with water and other molecules in the surrounding medium.

The statistics examined in the dissertation aren’t about how often each type of residue makes angle ? with its successor in line, but how often each type of residue winds up in contact with residues of the same type and every other type further along the sequence when a fold in the protein brings them together. Physical characteristics of such residue pairs that appear frequently suggest that the way sequences fold is largely affected by the strength of different residue pairs’ mutual repulsion: pairs that repel each other more strongly tend to keep folds from bringing them together. Some amino acids tend to repel water; these “hydrophobic” acids also tend to repel other amino acid molecules, especially other hydrophobic ones. If only the forces accounted for in the model are considered, the structures that would be the most stable ones for a given residue sequence turn out to be quite similar to the structures of real proteins having the same sequence.

While the mathematical model presented in this dissertation doesn’t take explicit account of hydrogen bonds between amino acid residues, secondary helical and sheet structures like the alpha helices and beta sheets found in many proteins (cf. Figure 1) turn out to appear in “an unexpected abundance”. The versions in the model generally have more ordered structures compared with random configurations, though with the model’s geometric simplifications the model versions look like distortions of actual protein helices and sheets. (In the model, three-dimensional space is represented in a simplified way by a lattice of discrete volumes, each of which is either empty or occupied by a molecule or some part of a protein, so different sequences’ structures in the model can actually be identical rather than merely similar. See Figure 3.) The relative proportions of helices and sheets with given residue-sequence lengths also turn out to be similar for real and model structures. The dissertation suggests that the “hydrophobic collapse” leading to real proteins’ compact forms also helps build these secondary structures, and says that “[d]ifferent interactions might help the proteins form the native structure cooperatively, even though how this is achieved by nature is still unclear” (p. 31).

As mentioned above, the dissertation also points out that only a small fraction of all possible amino acid residue sequences are found in actual proteins, and that most humanly designed random residue sequences don’t fold into unique structures as proteins tend to. The mathematical model examined in the dissertation also suggests that real proteins’ structures are not just uniquely stable ones, but that the stable structures are very similar for many different residue sequences, not just one or a few. Furthermore, in the mathematical model, many proteinlike structures are found to contain secondary structures that are absent in the similarly compact structures of random sequences, and are found to have many more parallel running lines folded in a regular way than average random structures do. These findings from the model suggest that natural proteins are thermodynamically more stable than random compact sequences of amino acid residues, meaning that collisions with neighboring molecules are unlikely to be hard enough to distort a real protein’s stable native shape, while such collisions could easily knock most random amino-acid residue sequences out of their less stable shapes into other, almost equally unstable ones. Also, the close secondary-structure resemblances of many different real-protein residue sequences suggests that real proteins’ functions would in many circumstances be only slightly impaired if something caused slight alterations in their compositions. Such differences between the shape-stability of real proteins and random residue sequences would aid organisms’ survival.

Figure 3. One protein as represented in mathematical models that roughly approximate three-dimensional space as a lattice of discrete volumes, each of which is either empty or occupied by a molecule or some part of a protein, as some reports discussed, e.g., “Protein Structure Recognition: From Eigenvector Analysis to Structural Threading Method”^[^{SciTech Connect}^] and “Protein-Folding Landscapes in Multi-Chain Systems”^[^{SciTech Connect}^].
(From “Protein-Folding Landscapes in Multi-Chain Systems”, p. 20.)

The stability of a structure in the face of forces that would distort its shape, whether the structure is that of a protein molecule or any other object, can be described in terms of how the object’s energy would change if a force acted to change the shape. The more stable a shape is, the harder a force has to work—the more energy its application adds to the object—to change the shape by a given amount. If an object of given shape has less energy than it would have for any slightly different shape, the shape is at least somewhat stable. But if there are other more stable shapes, then a big enough force could temporarily add enough energy to overcome the object’s initial resistance to distortion, after which the object would release the added energy, and more besides, as it reconfigured into a more stable shape. If an object has a single most stable shape, at least some of the energy one added to change the shape would not be released as the new shape was reached. An object’s most stable shape is thus the shape at which it has the least possible energy; finding this shape or energy is an optimization problem.

Some challenges in solving this problem by a suitable numerical algorithm are described in a slide set entitled “Protein-folding via divide-and-conquer optimization”^[^{SciTech Connect}^], a Lawrence Berkeley Lab presentation for a 2004 conference of the Society for Industrial and Applied Mathematics. The mathematical function used in the algorithm to describe how the protein’s energy depends on its shape is only a model, which may or may not be sufficiently accurate. Even if the energy function is sufficiently accurate, the energy may have many local minima corresponding to the protein’s less stable shapes, whose presence can slow a computer’s progress in finding the protein’s most stable shape for which its energy is the absolute minimum. And the problem involves a large number of variables (on the order of a thousand to a million), which by itself also means a lot of work for the computer.

The investigators’ approach to these challenges was to divide the amino acid residue sequence into clusters and explore the protein shapes in cycles. In each cycle the lowest-energy shapes for each cluster were first found in parallel under the assumption that other atoms in other clusters weren’t changed, and then starting with the whole-protein shape that these individual cluster shapes implied, the lowest-energy whole-protein shape close to that one was found. Further cycles would start with that shape, divide the protein into clusters again, find the lowest-energy cluster shapes closest to those, and then take the implied whole-protein shape of that result and find the lowest-energy whole-protein shape close to it. The cycles were repeated until a cycle was reached with no further significant shape change. A rationale for these parallel cluster-whole protein cycles was that nonparallel algorithms with similar purposes didn’t show great differences in atomic arrangement between trial configurations, and that the differences shown appeared in small clusters. In each cluster-whole analysis cycle, the few “whole-protein” cycle portions seemed to keep the parallel cluster optimization in line with its nonparallel-algorithm equivalent. The cluster-whole cycle of searching for the minimum-energy protein shape, by taking less time per iteration than nonparallel algorithms, was found to significantly reduce computing time for some protein configurations, even though the energy differences of successive trial configurations are less than those found for nonparallel algorithms.

The introduction to another report (“Interactive Protein Manipulation”^[^{SciTech Connect}^]), which researchers at UC Davis and Lawrence Berkeley Lab presented at a 2003 conference of the Institute of Electrical and Electronics Engineers, takes a different approach that accelerates the optimization, reducing the protein-fold determination time from days to hours. Their approach, as described in this report and others (“Manipulating and Visualizing Proteins”^[^{SciTech Connect}^], “ProteinShop: A tool for interactive protein manipulation and steering”^[^{SciTech Connect}^]), is to have a computer use data about amino acid residues and their interactions to make initial inferences about the shape of a residue sequence and present them to human experts in the field in an interactive visual format. The human experts’ refinement of the computer’s inferences informs the computer’s next iterations, so that the final prediction of the protein shape is deduced from what the computer and human expert each do more effectively than the other. The reports describe different iterations of the software + user system they devised for this purpose, explaining the details of the system’s division of labor and the rationale behind it.

The final report by researchers with Oak Ridge National Laboratory and the University of Georgia for a project entitled “Structure and Function of Microbial Metal-Reduction Proteins”^[^{SciTech Connect}^] summarizes their accomplishments in developing two software-based techniques for predicting protein structure. The first is a computer program (PROSPECT) based on “threading”^[^Wikipedia^] (i.e., placing or aligning) a protein’s amino acid residues to known template structures to find the template that fits best. In the fourth CASP experiment, PROSPECT finished sixth out of 123 entries in protein-fold recognition, and first among those that used a threading technique; in the fifth CASP conducted two years later, it finished fifth among 150 entries and received favorable reviews for “offering secondary structure predictions, 3D predictions by threading [an amino-acid residue] sequence through candidate structures, and a 3D prediction pipeline for using a comprehensive set of tools and more submitter-provided information than sequence alone”, its particularly clear user interface and its presentation of output. The group’s second technique was a “pipeline” that integrates multiple prediction programs and was made available to the public and used in bioinformatics courses at the University of Georgia and the University of Tennessee at Knoxville. Fourteen published papers related to the project are listed at the end of the report.

While the algorithms listed thus far account in some way how the shape of a protein is affected by interatomic forces, none explicitly address the fact that proteins, like many molecules, naturally vibrate.^[^Wikipedia^] Those algorithms don’t take into account the fact that a molecule in a given shape isn’t stationary, or changing its shape so slowly and directly from one configuration to another that its kinetic energy is insignificant. Indeed, a stationary protein whose shape would be stable against small perturbing forces, but could be changed readily by larger forces to reach its most stable possible shape, could make the same transition if it were vibrating instead of stationary. A protein vibrating vigorously enough would already have the energy to change its shape, in kinetic form,^[^Wikipedia^] that a stationary protein would have to get from forces applied to it. These facts are not only accounted for, but used in the algorithm described in “Fast computational methods for predicting protein structure from primary amino acid sequence”^[^DOepatents^], a patent assigned to Oak Ridge National Lab’s management contractor UT-Batelle, LLC. This algorithm calculates shape changes that a protein molecule’s vibrations make possible to determine how the molecule folds into its native shape.

As far as the analysis of single amino-acid residue chains has taken us toward understanding the forces that guide protein folding so efficiently, such analyses only advance our understanding of the process up to a point. Proteins in cells don’t fold up in isolation, but interact with other proteins that affect their folding patterns. Also, some proteins consist of multiple residue chains instead of just one. Such analytical complications are needed to properly account for certain forces that work against a protein’s folding into its native state, to the point of proteins’ not achieving their function or even becoming toxic due to misfolding and aggregation. A team of researchers at the University of California at Berkeley, Virginia Commonwealth University, and Lawrence Berkeley National Laboratory has analyzed such situations using a mathematical model of the same “lattice-volume” type as the one used in Haibo Cao’s Iowa State University dissertation, but which distinguishes amino-acid residue interactions by exactly which two amino acid residues constitute the pair instead of just whether or not the residues involved are hydrophobic, and which analyzes interactions involving pairs and quadruples of residue chains as well as isolated chains. If their model, described in the report “Protein-Folding Landscapes in Multi-Chain Systems”^[^{SciTech Connect}^], is accurate, the higher a protein’s chain concentration, the lower the temperature at which the protein folds into a compact shape as well as sharply decreasing the number of its interprotein contacts. Also, the presence of other chains for a protein to interact with reduces the bias of interatomic forces toward getting each protein’s residue chain into its native shape, significantly extending the time proteins spend in misfolded states.

The model, by not including the presence of a type of molecule known to counteract this latter effect, shows the importance of these molecules and the desirability of determining their composition and structure to gain insight into what makes their counteraction happen. The model also predicts something not found in any experiments that its authors were aware of: that, since interprotein interactions leading to misfolding require the proteins to have more energy than they’d have if they were dissociated and folded into their native state, proteins’ aggregation would be due to there being more possible ways to aggregate than not to. The authors speculated that their model may underestimate the effect of hydrogen bonds, which are known to be important in aggregate formation; if so, this together with the single-chain modeling described in Cao’s dissertation suggests that hydrogen bonds are more important for protein molecules’ shapes when the molecules interact than when they are isolated from each other. The authors also speculated that modeling interactions among more than four chains would show a transition from disordered aggregates to ordered ones, as is found in some more detailed mathematical models.

A study of a particular single protein made of four chains of amino-acid residues is described in “Folding and association of a homotetrameric protein complex in an all-atom Go model”^[^{DOE PAGES Beta}^], a paper published in Physical Review Eby researchers at the University of Oklahoma. The protein, BBAT1, is one of the simplest and smallest proteins made of multiple subunits (four in this case). The subunits are identical chains of 21 amino acid residues apiece, each chain folding into a motif with two beta sheets and an alpha helix. The authors note that even a small protein complex as this, all-atom simulations of folding and association are a challenge because folding takes a long time on a molecular scale (microseconds to milliseconds) and the mathematical model has to accurately describe the relative energies of a wide range of shapes during folding. One way to simplify the calculation of the final shape is to approximate the interatomic forces with a type of all-atom mathematical model published in 1978 by Kyushu University researchers Yuzo Ueda, Hiroshi Taketomi, and Nobuhiro Go^[^Wiley^]. These models favor residue contacts that appear in a protein’s native shape, but intermediate shapes that actually occur during folding may be suppressed in Go models. The authors compared a Go model of BBAT1 with another model (UNRES, for “UNited RESidue”) that represents the protein itself in a simplified manner instead of its interatomic forces. The two models predict the same folding mechanism for single residue chains, but not for the four-chain BBAT1 protein as a whole. Without experimental evidence, it’s unclear which model (if either) is more correct on this point, but the discrepancy does show that even if a Go model’s prediction of how a single chain folds is correct, its prediction of a multiunit protein’s folding may not be accurate. The authors state that the usefulness of modifying Go models to include non-native interactions that depend on the sequence residue is being studied.

Figure 4. Comparison of the actual native structures of twelve proteins, shown in read, with the structures predicted by a new algorithm (TERITFIX) from researchers at the University of Chicago, shown in blue. Numbers below each structure refer to differences in position, expressed in Ångström units (tenths of a nanometer), between the actual and predicted centers of (a) each protein’s largest cluster and (b) entire structure—the top line for the prediction of the TERITFIX algorithm and the bottom line (in parentheses) for the prediction of an earlier program (DESRES) that doesn’t incorporate TERITFIX’s accelerating simplification about how actual protein folding incrementally stabilizes nativelike substructures. Blue highlights show which proteins where the cluster center predicted by TERITFIX had a smaller position difference than the one predicted by DESRES. The abbreviation RMSD for position difference stands for “root mean square deviation”^[^Wikipedia^].
(From “Simplified Protein Models: Predicting Folding Pathways and
Structure Using Amino Acid Sequences”^[^{DOE PAGES Beta}^].)

Finally, the observation that real proteins fold in a way that incrementally stabilizes nativelike substructures inspired the mathematical model described by University of Chicago researchers in the more recent Physical Review Letters publication “Simplified Protein Models: Predicting Folding Pathways and Structure Using Amino Acid Sequences”^[^{DOE PAGES Beta}^]. This observation, incorporated into a computer model named TERITFIX, results in folding simulations of twelve proteins that ran orders of magnitude faster than another model’s simulations of the same proteins that weren’t based on such simplifications and yet produced comparable predictions (see Figure 4). The Chicago group’s model indicated that, contrary to the implications of the other model, propensities toward the eventual native configuration in the unfolded protein don’t necessarily determine the order in which the native structure forms, but suggest that such propensities may be enhanced or overridden by the protein’s environment.

Top

References

Wikipedia

Amino acid
Peptide bond formation [figure]
Main protein structure levels [figure]
Picometer
Nanometer
Crystal structure
Proliferating cell nuclear antigen: Expression in the cell during DNA synthesis (a DNA repair molecule)
DNA glycosylase (a DNA repair molecule)
Polarization (waves)
Microsecond
Covalent bond
Lysozyme
Threading (protein sequence)
Root mean square deviation of atomic positions

Top

Research Organizations

Top

Reports Available through OSTI’s SciTech Connect

“Structural analysis of flexible proteins in solution by small angle X-ray scattering combined with crystallography” [Metadata and full text available through OSTI’s SciTech Connect]
“Microsecond Microfluidic Mixing for Investigation of Protein Folding Kinetics” [Metadata and full text available through OSTI’s SciTech Connect]
“Development of a Fast Microfluidic Mixer for Studies of Protein Folding Kinetics—Final Report Cover Page” [Metadata and full text available through OSTI’s SciTech Connect]
“Microfluidic Mixers for the Investigation of Protein Folding Using Synchrotron Radiation Circular Dichroism Spectroscopy” [Metadata and full text available through OSTI’s SciTech Connect]
“Neutron Compton Scattering as a Probe of Hydrogen Bonded (and other) Systems” [Metadata and full text available through OSTI’s SciTech Connect]
“New local potential useful for genome annotation and 3D modeling” [Metadata and full text available through OSTI’s SciTech Connect]
“Protein Structure Recognition: From Eigenvector Analysis to Structural Threading Method” [Metadata and full text available through OSTI’s SciTech Connect]
“Protein-folding via divide-and-conquer optimization” [Metadata and full text available through OSTI’s SciTech Connect]
“Interactive Protein Manipulation” [Metadata and full text available through OSTI’s SciTech Connect]
“Manipulating and Visualizing Proteins” [Metadata and full text available through OSTI’s SciTech Connect]
“ProteinShop: A tool for interactive protein manipulation and steering” [Metadata and full text available through OSTI’s SciTech Connect]
“Structure and Function of Microbial Metal-Reduction Proteins” [Metadata and full text available through OSTI’s SciTech Connect]
“Fast computational methods for predicting protein structure from primary amino acid sequence” [Metadata and full text available through OSTI’s SciTech Connect and DOepatents]
“Protein-Folding Landscapes in Multi-Chain Systems” [Metadata and full text available through OSTI’s SciTech Connect]
“Folding and association of a homotetrameric protein complex in an all-atom Go model” [Metadata and full text available through OSTI’s SciTech Connect and DOE PAGES^Beta]
“Simplified Protein Models - Predicting Folding Pathways and Structure Using Amino Acid Sequences” [Metadata and full text available through OSTI’s SciTech Connect and DOE PAGES^Beta]

Top

Additional References

Top

Back to In the OSTI Collections Listing for 2015

**View Past "In the OSTI Collections" Articles**