Most of us have to teach about protein structure. We know the drill: primary structure being the linear sequence of amino acids, secondary involving the folding of portions of this chain into an alpha helix or beta pleated sheet. Then comes tertiary, where these segments and the sequences linking them are further folded into less regular and more complex forms. Finally, for proteins made up of several chains, there is a further level of structure, the quaternary, describing how these subunits come together to form the full molecule. I can remember learning this more than 40 years ago, so this hierarchy has been around a long time, though the number of examples of each structural level has grown tremendously since then. In the 1960s, only a few proteins had been sequenced and even fewer had been worked out structurally. But the basics were known even “back then.” This makes it even more surprising that there is growing evidence that some proteins just don’t behave well, don’t have a defined structure beyond the primary.

In an article entitled “Breaking the Protein Rules,” Tanguy Chouard (2011) describes research on proteins that seem to be either totally or partially disordered. With a bioinformatics computer program that analyzes protein sequence data, Keith Dunker of Indiana University has found that about 40% of human proteins have some disordered segments of 30 amino acids or more; for about 25%, the entire molecule is disordered. This seems pretty startling – and in fact, many structural biologists don’t believe this analysis. While they admit that small segments of a protein may be at least partially disordered, this is usually only when it is active, changing shape to bind a ligand.

In defense of his research, Dunker argues that his results seem surprising because disordered proteins don’t crystallize, and thus their structures can’t be easily studied. These are the proteins that aren’t tackled by structural biologists. However, improvements in nuclear magnetic resonance (NMR) spectroscopy are making it possible to determine the form of small proteins, even if they are twisting and turning in solution. Such studies suggest that disorder may actually be essential for some proteins’ functioning; a shapeless segment can aid in a signaling protein’s recognition of its protein partner or help a regulatory protein interact with more than one target. This is a far cry from the lock-and-key model of protein binding that we all learned, a metaphor that implies a specific shape fitting into another specific form. In the case of a portion of the gene-regulatory protein CREB, it only takes on its lock-like form when it comes in contact with the “key” to which it binds. There is even a signaling protein called Sic1 that stays disordered even when bound to its receptor. It has six phosphate groups, and each occupies the binding site in turn: think of a writhing snake with different parts of its body making contact with a surface.

It’s still not at all clear how common such phenomena are, but they are fascinating and suggest that perhaps Dunker and his associates are correct, that things are not nearly as ordered in the protein world as many biologists had thought. Obviously, the debate between the disorder advocates and opponents will continue until more hard evidence replaces computer modeling. In all likelihood, both sides will prove to be partially correct. Mentioning this controversy could spice up a presentation on protein structure, which is definitely a topic that could use some spice. Structure on top of structure can seem very daunting to a student who is still trying to make sense of peptide bonds.

Doing It Differently

Another long-held tenet of protein structure is also being questioned: the idea that a protein has a unique structure determined by its amino acid sequence. This concept is usually introduced with the story of Christian Anfinsen heating a solution of ribonuclease until this enzyme unfolded, then allowing it to slowly cool, and finding that the protein folded back into its native state, without any other molecules involved. Admittedly, things haven’t turned out to be quite that simple, and there is a whole class of proteins called “chaperonins” that guide folding and prevent unwanted interactions, as this metaphorical name implies (Ranson et al., 2008). However, not all proteins read the rule book, and there are at least a few that adopt different conformations from the same amino acid sequence. The structural change in these “metamorphic” proteins is reversible, and it’s triggered by environmental factors (Murzin, 2008). For example, there is an RNA polymerase that undergoes a significant shape change during the transition from initiation and elongation. In addition, there is an immune protein, a chemokine, that binds to its receptor in one form and to heparin in another – both are essential to its normal activity. Here is still another layer of complexity that makes the exquisite functioning of living things possible. It suggests that there are probably a lot more interesting variations on the conformational theme yet to be discovered.

And before I leave the theme of protein oddities, I want to mention one more: protein containers that occur naturally in a variety of cellular situations. For example, there are protein shells in Escherichia coli that enclose enzymes involved in ethanolamine metabolism. Four related proteins make up the compartment that isolates the acetaldehyde produced in this metabolic process. Not only could excess aldehyde be toxic to the microbe, it could also move across the cell membrane, leading to the loss of valuable carbon.

It would seem that in a cell as well studied as E. coli there would be nothing surprising left to discover, but that’s obviously not the case. Analysis of the container’s proteins suggests that it is more than a mere vessel holding active molecules. Two of the proteins, EutM and EutS, make up the basic assembly of the structure, and EutK is a nucleic acid binding protein. The last protein, EutL, has two conformations, open and closed, which indicates that it might be involved in molecular transport through pores in the container. As noted in an article on this research, the coverings of these microcompartments seem to function as “dynamic skins” (Kang & Douglas, 2010). But E. coli isn’t the only bacterium with such containers. Bacillus subtilis has a shell involved in riboflavin biosynthesis, and Thermotoga maritima has one that plays a role in a stress response. This is a good example of the fact that finding something new in one organism often triggers successful searches for similar phenomena in others.

Searchers for Structure

While all these anomalies are fascinating, there is still a great deal of work to be done on more conventional proteins, the ones that do have primary, secondary, and tertiary structures and that can be studied with more traditional methods such as x-ray diffraction crystallography. The pace of such work has definitely speeded up over the years, from the time when it took Max Perutz 30 years to work out the structure of hemoglobin, a protein with a quaternary structure (Ferry, 2007). Some biologists estimate that only 1% of human protein structures have been fully resolved. This is despite the fact that structures can now be worked out in a matter of weeks or months, if good crystals for these molecules are available. That can be a big “if.” Ultrapure and highly concentrated preparations of the protein in question are needed to get crystals, and even then it can take an expert with a knack for this work a long time to coax it into crystallizing.

There is also the problem that many proteins are associated with membranes when they are in an active state, and the hydrophobic lipid environment of the membrane is essential to the form of these proteins. In other words, if the protein is removed from the membrane, its shape changes considerably and its activity is lost. Even with the variety of new techniques now available, this is still a major stumbling block in figuring out the entire structures of proteins and protein complexes such as ion channels, which control the movement of ions such as sodium and potassium through membranes. Protein chemists have attempted to work around this problem by crystallizing the portions of these molecules that protrude from the membrane, hoping that when these are known, the linking portion in the membrane can be predicted.


Trying to predict protein structures is a major aspect of the field today. This work has been going on for some time, and it is directly related to Anfinsen’s original research. Since he showed that the sequence contained enough information to lead to correct folding, some protein experts think that they should be able to predict structure from sequence data. They use data on proteins for which sequence and structural information are available, and write programs to correlate the two kinds of data. They began by tackling not whole molecules but portions with well-defined forms such as alpha-helices. Of course, many different sequences fold into this form, but there are also many sequences that don’t, so they were able to slowly improve their models.

It takes massive computer power to make such predictions, and supercomputer time can be difficult to obtain and is expensive as well. That’s why David Baker, a biochemist at the University of Washington, developed a computer program called Rosetta@home, which individuals can download onto their home computers and allow Baker’s group to use the spare computing power of all these machines, which now number over 100,000 (Callaway, 2007). This project has been joined by a similar one called Folding@home, which focuses more on proteins of interest in human health. It is associated with Stanford University and emphasizes the fact that many human diseases, including Alzheimer’s, cystic fibrosis, and sickle-cell anemia, are due to improperly folded proteins. Both projects claim to have had success in predicting protein structures from sequences, though in a limited way. They don’t get the precise structure, but they significantly narrow the possibilities, which makes it much easier to then decipher x-ray crystallography data. One of the problems with such data is that it’s difficult to interpret unless there are related structures with which the diffraction information can be compared. That is how Perutz finally managed to solve the problem of hemoglobin’s structure. He added heavy atoms (mercury) to the crystal, which slightly changed the x-ray diffraction patterns, so then he had a point of comparison for the protein in its native state with iron bound to it.

Working Slowly

While some researchers are using the rather democratic approach of Rosetta@home to do structural work, others are employing less communal methods. For example, Lizzie Buchen (2011) describes the rather solitary 20-year struggle that Brian Kobilka has had with a G-protein-coupled receptor (GPCR). G-proteins trigger a great many cellular processes. Their receptors first bind to specific molecules such as hormones, neurotransmitters, and odor molecules. The receptors, in turn, bind specific G-proteins, each of which triggers a cascade of reactions within cells. The GPCRs make up one of the largest families of human proteins and are the targets for over a third of the drugs now available, making these proteins of great interest to structural biologists.

Obviously, GPCRs would be the targets for structural biologists. However, as you might suspect, there’s a big problem here because they are membrane bound. Kobilka has been working on achieving an atomic-scale image of a G-protein bound to its GPCR. He began moving toward this goal in the 1980s as a postdoc studying the GPCR for adrenaline. He developed a way to work out the sequence for the receptor protein, even though only small amounts were available because the protein was locked in the cell membrane. When the sequence was resolved, it turned out that it had seven hydrophobic segments, which suggests that it wound through the membrane seven times and, thus, resembles rhodopsin, which is also known to activate a G-protein. This pointed in the direction that other GPCRs turning on G-proteins would also have this form, which has turned out to be the case with the more than 800 members of this family that are now known in humans.

But knowing the sequence was little help in crystallizing the protein or working out its structure. Kobilka considered the project so iffy that he couldn’t give it to a graduate student, and instead plodded away at it himself, while his laboratory worked out the parts of the molecule that were outside the membrane and thus accessible to traditional methods. They also discovered a great deal about how the molecule worked biochemically, information that gave Kobilka clues as to the receptor’s overall form. He finally was able to grow some small crystals, which couldn’t be used with standard x-ray diffraction but were enough for the focused beam of the European Synchrotron Radiation Facility in France. With this machine, Kobilka got data that enabled him to calculate the general form of the molecule, but not at the atomic level. Then, in order to steady the molecule, to keep it from writhing as it does in the membrane, he anchored it to antibodies and got good crystals of this complex. If this sounds like a roundabout way of doing things, it was. However, it shows how ingenious methods are often born of desperation and frustration. The antibody technique worked, and Kobilka got down to the atomic resolution of the receptor minus a loop that stuck out of the membrane. The next chapter of this winding tale was to get the structure of the loop added, and that took binding it to two other molecules to hold it in place for its x-ray picture.

I know this story is complex, but I think it’s worth telling because it shows how both persistence and genius are needed to do the difficult in science. A good mind isn’t enough; there has to be great passion and stubbornness backing it up. Now that Kobilka had the complete structure of the receptor, he wanted to see how it bound not only to its ligand but also to the G-protein it activated. These goals required building more stabilizers for these complexes, but he was ultimately successful in reaching his goal in 2011. Needless to say, he now has new aims, including understanding other G-protein receptors and figuring out the intermediate states involved in binding.

Getting Together

Kobilka’s story suggests a trend in protein structure research: biochemists are now aiming at more than the structure of single proteins; they are attempting to work out how proteins interact with each other and with other molecules. In an article called “Structures of Desire,” Ananyo Bhattacharya (2009) presents a “wish list” of large structures that crystallographers would love to resolve. These are definitely ambitious projects. The first one Bhattacharya describes is the attack on the structure of the eukaryotic ribosome. About 10 years ago, both subunits of the bacterial ribosome were deciphered, and then the structure of the united subunits. The eukaryotic ribosome is a much bigger problem to crack, and not only because it has about 80 proteins instead of the 50 or so in the bacterial form. There are many more regulatory areas, adding greater conformational complexity. For example, a number of initiation factors bind to the ribosome at the beginning of translation, whereas there are none involved in bacterial translation. Some of these factors are themselves complex protein assemblies, and when ribosomes are purified from cells, such structures are often attached. In order to get good crystals and consistent results, these add-ons have to be accounted for.

I can’t see going into such details with my students, but knowing about them myself makes me better able to explain why figuring out such structural problems can take years. I think it’s good for students to know that there are such difficulties, which are even greater for complexes larger than ribosomes. As Bhattacharya notes, the spliceosome has at least 150 proteins needed in the job of “editing” messenger RNA before it even gets to the ribosome to be translated into protein. Making this structure still more of a challenge is the fact that it’s really made of five different segments or “machines” with different roles, and they are in constant movement in relation to each other. Think of the kinds of problems that Kobilka faced and then multiply them many times over. One way to approach these difficulties is to literally put a monkey-wrench into the machinery in the form of a small molecule that can stop the spliceosome in the act of doing its job. This is easier said than done and will probably take a lot of trial and error in chemically bonding different molecules to the structure.

Even larger is the nuclear pore complex, which is about 30× heavier than the ribosome. There are only about 200 of these on the surface of one nucleus, which means that getting enough of the complex to crystallize is virtually impossible, especially because it is membrane bound. This means that wresting the pore from the membrane could significantly alter its form. Researchers at Rockefeller University in New York are approaching the problem by crystallizing components of the complex, thus working out portions of this massive structure, which is essentially made up of eight similar structures in a circle, with two of these circles on top of each other. The fact that the pore thus has an eight-fold rotational symmetry and a two-fold mirror symmetry makes this strategy more likely to be successful than if the structure were totally irregular, as a ribosome is. Balancing that is the problem of the proteins that surround the pore’s inner edge. They have many of those unfolded forms that I discussed earlier, and these may need to be tackled in some novel manner that one of the brilliant minds working on this problem still has to invent. However, programs that predict structure from sequence data have made headway in resolving this difficulty (Service, 2008).

Old Structures

While some scientists are working on the edge of molecular visibility, attempting to decipher huge, unknown structures, others are trying to clarify structures that have long been known. As of 2009, the Protein Data Bank (PDB) had almost 53,000 three-dimensional structures of protein molecules and nucleic acids. The structures can be viewed with a variety of software programs that are freely available. One of the easiest and most informative ways to access these structures is to go to a large-scale wiki called Proteopedia (, which presents many protein and nucleic acid structures that can be rotated. There is also a great deal of information on the molecules, so the wiki is much more than just a bunch of gee-whiz images.

These images are only as good as the data behind them, and that’s what sometimes needs enhancing or cleaning up. Mistakes happen as the data are transferred from one form into another. The PDB was created in 1971, so some of the data were generated when structural work was much less sophisticated than it is today. Dutch researchers are developing software that analyzes data and corrects at least some of the errors in the PDB (Sanderson, 2009). They have already gone through thousands of data files and produced new structures using the old data. Among other things, the program catches human errors such as mislabeling of data. In two-thirds of the cases, the new structures were better than the originals. This evaluation is based on calculating a quantity called R-free, which is used by crystallographers to measure structural quality. This work could eventually make the PBD even more valuable, but like everything else to do with molecular structure, correcting errors takes time. This is a great example of how scientists try to improve their results, to get a clearer picture of what is going on in nature. Science is not just brilliant discoveries; it also includes this type of painstaking and vital work.

Moving Targets

While some proteins have well-defined structure, and others seem to be floppy if not completely disorganized, all proteins undergo conformation change during binding of ligands. These interactions are often very short-lived; after all, some enzymes can perform their work hundreds of times or more per second. Structural biologists have devised a number of ways to capture such rapid events. Changes in the orientations of protein domains that occur during binding can be determined with small-angle x-ray scattering. NMR spectroscopy can provide averaged parameters to detect conformational dynamics (Bernadó & Blackledge, 2010). When the results of these two techniques are paired, a picture of protein activity is possible – though, because they are based on statistical analysis, the results tend to be “noisy.” Here is a reminder to us, and our students, that every experimental approach has its limits, and better results often have to await new and better equipment or even radically different approaches.

Another strategy is to use all-atom molecular dynamics (MD) simulations, which produce high-resolution views of the motions of macromolecules (Shaw et al., 2010). However, this technique has the opposite problem to most others, which can’t detect rapid change. This modeling involves time scales even shorter than those involved in conformational changes during protein binding. Researchers at Columbia University managed to develop a new machine that measured processes at the 1-millisecond scale, which is actually a thousand times slower than most MD simulations. Such time frames are almost impossible for the human mind to fathom, much less experience, and they indicate the sophistication of the equipment and expertise involved. It’s not surprising that the journal article describing this work listed 11 names. These researchers were able to capture the dynamics of interconversion between distinct conformation states of bovine pancreatic trypsin inhibitor.

I can imagine that at this point, you’ve had enough of protein structure. If you have stuck with me this far, then you must love proteins as much as I do. So I’d like to end by suggesting that you take a look at one of my favorite books. Although it is an older autobiography, I think that Arthur Kornberg’s (1989) For the Love of Enzymes is still worth reading. Here is someone who worked on proteins when they were first being purified and their activities studied. He tells a great story of investigating what enzymes do and how they do it. What story could be better than that?


Bernadó, P. & Blackledge, M. (2010). Proteins in dynamic equilibrium. Nature, 468, 1046–1047.
Bhattacharya, A. (2009). Structures of desire. Nature, 459, 24–27.
Buchen, L. (2011). It’s all about the structure. Nature, 476, 387–390.
Callaway, E. (2007). The shape of proteins to come. Nature, 449, 765.
Chouard, T. (2011). Breaking the protein rules. Nature, 471, 151–153.
Ferry, G. (2007). Max Perutz and the Secret of Life. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press.
Kang, S. & Douglas, T. (2010). Some enzymes just need a space of their own. Science, 327, 42–43.
Kornberg, A. (1989). For the Love of Enzymes: The Odyssey of a Biochemist. Cambridge, MA: Harvard University Press.
Murzin, A.G. (2008). Metamorphic proteins. Science, 320, 1725–1726.
Ranson, N.A., White, H.E. & Saibil, H.R. (1998). Chaperonins. Biochemical Journal, 333, 233–242.
Sanderson, K. (2009). New protein structures replace the old. Nature, 459, 1038–1039.
Service, R. (2008). Problem solved* (*sort of). Science, 321, 784–786.
Shaw, D.E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror, R.O., Eastwood, M.P., Bank, J.A. & others. (2010). Atomic-level characterization of the structural dynamics of proteins. Science, 330, 341–346.