If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
School of Chemistry, University of Bristol, Cantock’s Close, Bristol BS8 1TS, UKSchool of Biochemistry, University of Bristol, Medical Sciences Building, University Walk, Bristol BS8 1TD, UKBrisEngBio, School of Chemistry, University of Bristol, Cantock’s Close, Bristol BS8 1TS, UKMax Planck-Bristol Centre for Minimal Biology, University of Bristol, Cantock’s Close, Bristol BS8 1TS, UK
Protein science is being transformed by powerful computational methods for structure prediction and design: AlphaFold2 can predict many natural protein structures from sequence, and other AI methods are enabling the de novo design of new structures. This raises a question: how much do we understand the underlying sequence-to-structure/function relationships being captured by these methods? This perspective presents our current understanding of one class of protein assembly, the α-helical coiled coils. At first sight, these are straightforward: sequence repeats of hydrophobic (h) and polar (p) residues, (hpphppp)n, direct the folding and assembly of amphipathic α helices into bundles. However, many different bundles are possible: they can have two or more helices (different oligomers); the helices can have parallel, antiparallel or mixed arrangements (different topologies); and the helical sequences can be the same (homomers) or different (heteromers). Thus, sequence-to-structure relationships must be present within the hpphppp repeats to distinguish these states. I discuss the current understanding of this problem at three levels: First, physics gives a parametric framework to generate the many possible coiled-coil backbone structures. Second, chemistry provides a means to explore and deliver sequence-to-structure relationships. Third, biology shows how coiled coils are adapted and functionalized in nature, inspiring applications of coiled coils in synthetic biology. I argue that the chemistry is largely understood; the physics is partly solved, though the considerable challenge of predicting even relative stabilities of different coiled-coil states remains; but there is much more to explore in the biology and synthetic biology of coiled coils.
As a graduate student in the late 1980s, I was drawn to the challenge of the protein-folding problem, and to using protein design as a means of testing our understanding of protein structure and folding. Even then, we suspected that solutions to these problems would come through bioinformatics. However, we did not realize how long it would take for solutions to come, or the form that they would take in terms of combining big data, large computer power, and artificial intelligence (AI), specifically, machine learning (ML). Of course, I refer to the recent successes AlphaFold2 and RoseTTAfold in predicting protein structure from sequence alone (
). Like many, when I look at an AlphaFold2 model for a complex protein assembly I’m awestruck by the structural solutions that it finds. However, whilst, AlphaFold2 and RoseTTAfold provide solutions to the protein-folding problem—and, to be sure, these and other methods will improve to provide models for ever-more complex protein structures and assemblies—they are just that, solutions. In themselves, they do not necessarily, at least not at present, provide an understanding of the physics and chemistry that drives and directs protein folding and assembly, which, in turn, are responsible for protein function.
Back in the 1980s and 1990s it was the desire to understand protein folding that drove interest in the field; it was not just to find solutions to the problem per se. In short, we wanted to understand the physico-chemical principles that underpin protein structure, folding, assembly, and stability. In other words, we sought to decipher the underlying sequence-to-structure relationships for these properties. This perspective is about how far we have progressed in understanding protein folding and design in these terms. Well, at least for one particular protein structure—the α-helical coiled coil. At first sight, this is a relatively straightforward peptide and protein assembly in which two or more α-helical chains wrap around each other to form supercoiled or rope-like structures, Fig. 1.
Figure 1Early coiled-coil structures. A, 1.8 Å atomic-resolution structure of the leucine-zipper peptide from S. cerevisiae (2zta (
)). Coloring schemes: for A and B, the chains are colored by chainbow from their N termini (blue) through to their C termini (red); for C and D, the protomers of each trimer are colored differently, with the central coiled-coil chains in grey, yellow, and cyan The images were generated using PyMOL (pymol.org).
). I was drawn to the folding and design of leucine zippers as a post-doc with Tom Alber in the early 1990s. This was because of the apparent simplicity of these coiled coils, and the thought that unless we could understand such simple protein folds and assemblies, we would have no hope with more-complex globular proteins. Therefore, I adopted coiled coils as a model for developing an understanding of sequence-to-structure relationships in proteins. An aim of this perspective is to capture some of that journey, which has been contributed to by many scientists in many groups over the past few decades. However, it is not a complete review of coiled-coil structure, biology, or even design. Such an article would be redundant because many excellent reviews are already available (
). Instead, I want to give some sense of three things: first, of the journey and the joy of discovery, which have led to our current understanding of this relatively straightforward protein structure; second, that in some respects—for instance, the physics and chemistry of coiled-coil folding and assembly—our understanding is complete or very near to it; and third, despite this understanding, we still have much to learn, particularly on the biology of coiled coils.
This article is my perspective on our amassed understanding of coiled-coil proteins. Indeed, it closes some of my own research questions; namely, what are the sequence-to-structure relationships that govern coiled-coil folding and assembly, and how does these allow us to design de novo coiled coils with confidence? However, as some research chapters close others open. For coiled-coil research new challenges include: Achieving fully quantitative (free-energy) predictions for coiled-coil structure, stability, and partner selection. Gaining a deeper understanding of coiled-coil dynamics and plasticity and how this relates to coiled-coil function. And, as one of the best understood protein folds, how can de novo coiled-coil peptides and proteins be used in biotechnology and synthetic biology to address real-world applications, for instance in health and the environment. I am certain that the methods, principles, and understanding that have been developed over the past few decades will provide a foundation for these and other endeavors. And to come full circle, it is clear that new AI/ML-based methods for protein-structure prediction will contribute here. Indeed, I see these tools as fantastic hypothesis generators for structural molecular and cell biology, including for processes that involve relatively simple, but adaptable and versatile coiled coils.
2.1. The physics of coiled coils: a firm foundation for developing understanding
For more-complete historical perspectives on the conceptual origins of α-helical coiled coils please see reviews by Squire and Parry and by Lupas and colleagues (
). Moreover, at that time in the early 1950s there was little experimental data or confirmed details for any protein structure. So, what followed—namely, the first description of any protein structure, the concepts of helical nets and knobs-into-holes (KIH) packing, heptad repeats, and what we now refer to as the Crick Equations—was extremely insightful. Incidentally, Crick published this work in the same year that he and James Watson proposed the double helix for the structure of DNA (
)—the α helix has precisely 3.6 residues per turn (Fig. 2A). Crick reasoned that two or more such helices could interact tightly via seams of interacting side chains spaced 3 and 4 residues apart along polypeptide chains—i.e., with an average spacing of 3.5 residues—to match the 3.6 residues per turn as closely as possible. Now, we annotate these repeats abcdefg with the key interacting side chains falling at the a and d sites. Crick visualized this with helical-net diagrams (Fig. 2B). Highlighting the 3,4 spacing on one of these reveals the seam as line of connected diamond shapes. Two such seams can interlace to bring the helices into intimate contact (Fig. 2C): the diamonds on one helix form ‘holes’ into which side chains from the other helix can slot. In this way, Crick coined the terms ‘heptad repeat’ (from the Greek for seven) and ‘knobs-into-holes’ (KIH) packing for these sequence and structural features, respectively. These are now the hallmarks for coiled-coil peptides and proteins, and they provide a firm basis for understanding them.
Figure 2α-Helix and coiled-coil geometry. A, The α helix has a rise per residue (r) of 1.5 Å, 3.6 residues per helical turn, a backbone radius of 2.3 Å, and is stabilized by iCO to i+4NH hydrogen bonds. B, In Crick’s helical nets the positions of the Cα atoms of an α helix are projected as points onto a 2D plot (red). Here, the heptad repeats, abcdefg, are annotated onto the points with the a and d sites emphasized as discs. C, Overlay of the helical net from B and a second net in blue with crosses and discs for the Cα positions. Note how the a and d sites interdigitate, which leads naturally to the helices packing in a slanted rather than straight manner. In turn, this causes the α helices to wrap or supercoil around each other (see panel F). D, Helical-wheel diagrams where heptad repeats, abcdefg, for a parallel, dimeric coiled coil are projected onto circles representing backbones of each helix viewed from one end. The a→ g register is rainbow colored, i.e. by the visible spectrum, red → violet. Note that these wheels are idealized with 3.5 residues per turn to make the 7 residues span exactly 2 turns; they are plotted in ‘superhelical space’. This is opposed to the true α helix, which repeats every 18-residue or 5 turns, as depicted in panel E. F&G, Crick’s parameterization of coiled coils illustrating its three main parameters: coiled-coil radius (R), interface (or Crick) angle (I) pitch (P). These and the 3.6 residues per turn of the α helix are the only parameters needed to define any regular α-helical coiled-coil assembly. The structure shown in panel F was built using CCBuilder2.0 (
The term ‘coiled coil’ is a consequence of these sequence patterns and the structural packing. This is because 3.5 is less than 3.6. As a result, on a helical net the seam slants to the left (Fig. 2B). Thus, for two nets to interlace requires one to be offset at an angle to the other (Fig. 2C). In a three-dimensional α helix, the seam precesses around the surface of the helix with the opposite sense to the handedness of the helix. Thus, for two α helices to maintain contact, they must wrap around each other like the strands of a rope. As α helices made from l-amino acids are right handed, heptad-based coiled coils are left-handed ropes. This overall assembly—which is a quaternary structure and not a tertiary structure—is also helical. Strictly, it is a superhelix. This is why Crick called the envisaged structures coiled coils; “coils” refers to the α helices, and “coiled” refers to the superhelix.
From Pauling’s α-helical parameters—3.6 residues per turn and a rise per residues of 1.5 Å—Crick predicted that the crossing angle between the two helices would be ≈20˚, which gives a superhelical pitch of ≈126 residues or ≈186 Å (Box 1). These values correspond to those experimentally determined coiled coils (
). This formalized in the Crick Equations. Thus, the coiled coil is inherently parametric, which makes it physics, predictable, modellable and, as, we will see, ultimately designable (
). Several of these are accessible and easy-to-use computational tools on the internet (Box 2). Some allow coiled-coil backbones to be built quickly and accurately (
). This has made coiled-coil modelling, engineering and design accessible to non-experts, which is a significant advance. In turn, this has enabled exploration of coiled-coil assemblies beyond the confines of natural proteins (
The following exercise can be done by considering projections of the α helix as helical nets or helical wheels.
First, I need to dispel a key misconception made by students and professors alike. Pauling’s α helix has 3.6 residues per turn with a very low tolerance of variation (
), and small deviations from α-helical parameters incur large energy penalties. Indeed, the nearby 310 and π helixes are rare in protein structures and difficult to design (
). As a result, the α helix does not somehow collapse to 3.5 residues per turn in coiled coils. If it did, we would not have coiled coils: the average sequence repeat of 3.5 and a helical repeat of 3.5 would match, and the helices would pack straight and not wrap around each other. I suspect that the convenience of 7-residue helical wheels (Fig. 2D) rather than the more-accurate 18-residues helical wheels (Fig. 2E) has contributed to this misconception. Crick’s helical nets are a more faithful mapping of an α-helical surface in 2D (Fig. 2B).
With the Cα atoms of an α helix projected on an 18-residue helical wheel successive residues are separated by 100˚. Thus, the 7 residues of a heptad repeat would span out 700˚. This is 20˚ short of two full turns (2 x 360˚ = 720˚, and 7.2 residues) of the α helix. 20˚ goes into 360˚ 18 times. Therefore, 18 heptad repeats are required for the interacting seam to make one complete revolution of a helix and to bring the helices back into sync. This defines the pitch of the coiled coil. Hence, the pitch is 18 x 7 = 126 residues. Given that the α helix has a rise per residue of 1.5 Å, these would span 189 Å of a straight α helix. However, each α helix is inclined by ≈10˚ relative to the superhelical axis. Therefore, the rise per residue along this coiled-coil axis is 1.5 x cos(10˚) = 1.48 Å, and the ideal superhelical pitch is ≈186 Å (Fig. 2F). In addition to the rise per residue and the coiled-coil pitch, just two other parameters are needed to define and generate regular coiled-coil backbones. These are the radius of the coiled coil, and the interface or Crick angle, Fig. 2G.
Finally for this Box, we should also credit Pauling. Crick considered only 7-residue repeats. However, Pauling and colleagues considered other repeats in α-helical conformations; specifically, those with 11, 15, and 18 residues (
). Along with 7-residue repeats, these are compatible with combinations of 3- and 4-residue spacings; namely, 3-4, 3-4-4, 3-4-4-4, and 3-4-3-4-4, respectively. However, they have different average spacings between the interacting residues of 3.5, 3.67, 3.75, and 3.6, respectively. As the α-helical structural repeat is fixed at 3.6 residues per turn, when these repeats are realized in packed coiled coils they lead to different superhelical twists. These can also be calculated using the considerations laid out above. They result in further left-handed, some right-handed, and even “straight” coiled coils (
Tools and resources for predicting, building, analyzing, and visualizing coiled-coil structures
Prediction
Several tools for predicting coiled-coil structure from sequence are brought together and implemented at Andrei Lupas’s Max Planck Institute Bioinformatics Toolkit (
). Regarding predicting oligomeric states from sequence, in light of recent advances in coiled-coil design and protein-structure prediction generally, there is considerable room for improvement here. However, the following are currently available: LOGICOIL (
) as a backend, which can also be run using Python-based scripts for more detailed and accurate modelling of parametric biomolecular structures (https://github.com/isambard-uob). CCCP (Coiled-coil Crick Parametrization) (
), which was anticipated to be an extended dimeric coiled coil. This showed clear and unbroken repeats with mainly hydrophobic residues spaced alternately 3 and 4 residues apart; in retrospect, these are unmistakable as heptad repeats. Later, low-resolution 15 Å (
) structures from electron microscopy and X-ray diffraction revealed a supercoiled pair of α helices, Fig. 1B. Wilson, Skehel and Wiley published the first high-resolution structure of a coiled coil in 1981 for the trimeric influenza hemagglutinin (
), Fig. 1C. The hemagglutinin story became more interesting as it unraveled; for example, revealing a spring-loaded switch to a longer trimeric CC leading to virus-host membrane fusion, Fig. 1D (
). In 1991, O’Shea, Alber and Kim determined the first atomic-resolution coiled-coil structure for the leucine-zipper peptide of the yeast transcriptional activator GCN4 (
), Fig. 1A. This showed both the supercoiling of two parallel α helices, and intimate interdigitation of side chains predicted by Crick. Since then, many thousands of coiled-coil structures have been resolved, ushering in efforts to automate their identification, analysis, and categorization (Box 2). With some tweaks, the main tenets of Crick’s model are evident and validated by these structures and analyses (
In summary, Pauling gave us the α helix and, using this, Crick gave us the coiled coil with its sequence signature of 3,4 or heptad repeats and its structural signature of the KIH interactions. Indeed, I contend that for an α-helical assembly to be considered a coiled coil it has to have a recognizable sequence pattern and KIH interactions. Moreover, as described in the next section, the simplicity and reliability of Crick’s model allows protein designers to make reliable coiled-coil models in biro (i.e., simply by drawing) or in silico, build sequences to fit these, realize them experimentally, and confirm that the models match the experimental structures with atomic accuracy (
So, is the physics of the coiled coil solved? In a word, no. This is because, despite our abilities to predict, build, and design coiled-coil structures, we cannot predict ab initio the free energy of folding and stability of a coiled-coil sequence, or the relative free energies between alternate coiled-coil states that it might form. I return to these gaps and challenges later. Nonetheless, and as we will see in the next section, we have sufficient rules of thumb (i.e., chemistry) to understand the assembly of natural coiled coils and to deliver an impressive array of de novo designed these assemblies.
2.2. The chemistry of coiled coils: rules for coiled-coil assembly and design
The foregoing section skipped an important detail on the precise nature of the interacting side chains separated by 3 and 4 residues in heptad repeat. This was because Crick’s model is pure physics, and agnostic of detailed side-chain chemistry. Arguably, however, we understand the chemistry of α-helical coiled coils—i.e., their sequence-to-structure relationships—better than for any other protein structure. Indeed, I contend that we are close to a complete chemical understanding of coiled-coil structure and assembly, and others agree (
). This section, describes our current understanding of this chemistry.
The primary interacting side chains in coiled coils are assumed to be hydrophobic. That is, the 3,4 or heptad repeats are traditionally considered as hpphppp repeats, where h and p are hydrophobic and polar side chains, respectively. When folded, these form amphipathic α helices with a hydrophobic seam and a polar face, Fig. 3A. In water, driven by the hydrophobic effect, two or more such helices assemble to bury their hydrophobic seams and form a hydrophobic core, Fig. 3B. However, these cores are very different from those of globular proteins (
). This has important consequences: coiled coils can achieve high stability and specificity from relatively short stretches of sequence. Indeed, the ≈30-residue leucine-zipper domains are half the size of even the smallest recognized globular proteins and on a par with so-called miniproteins, which are a niche type of protein (
). There’s another more-subtle consequence that I develop below: KIH packing discriminates between h-type residues such that they are not all equal in terms of coiled-coil folding, assembly, and stabilization. That’s all chemistry.
Figure 3Amphipathic α helices and how they pack in coiled coils. A, Orthogonal views of an hpphppp (abcdefg) repeat superimposed on an α-helical backbone with h residues picked out as spheres and the a and d sites colored red and green, respectively. B, Two such amphipathic helices assembled via their hydrophobic faces with the same coloring as in panel A. C – E, Slices through the X-ray crystal structures of dimeric (C, CC-Di, PDB id 4dzm), trimeric (D, CC-Tri, PDB id 4dzl), and tetrameric (C, pLI, PDB id 3r4a) de novo designed coiled coils (
). The backbones are shown as Cα traces with rainbow coloring for the abcdefg sites, and the side chains at the a and d sites depicted in red and green sticks, respectively. The assemblies were oriented by aligning helices labelled ‘1’ in PyMol. This highlights the different types of knobs-into-holes (KIH) packing at the a and d sites in each assembly. The directions of the knobs are shown with open red and green arrows, and the bases of the corresponding holes are shown as broken red and green lines on the partnering helices. There are three types of KIH packing: in perpendicular packing, the Cα-Cβ bond vector of the knob residue points directly at the base of the hole, defined by a Cα-Cα vector on the partner helix; in parallel packing the Cα-Cβ bond vector of the knob aligns parallel with the base of the hole; and in acute packing, the arrangement lies between these two extremes. F, Slice through the central heptad of the GCN4-p1 structure (PDB id 2zta) showing an Asn:Asn side-chain hydrogen bond. Images made in PyMol (pymol.org).
The aim of this section is three-fold: first, to demonstrate that there is much more to coiled-coil sequences than simple hp patterns, and that there are clear sequence-to-structure relationships for coiled-coil folding, assembly, stability and specificity; second, to show that these relationships are more than simple heuristics, and that they can be understood in physico-chemical terms; and, third, that these relationships can be used as powerful rules for rational coiled-coil peptide and protein design.
Classical coiled-coil dimers, trimers and tetramers
Our understanding of coiled-coil chemistry leapt forward in the early 1990s through the joint efforts of the Kim and Alber laboratories. Their work centered on the GCN4 leucine zipper, Fig. 1A. Synthetic peptides for this ≈30 amino-acid, 4-heptad sequence are accessible to solid-phase peptide synthesis, amenable to biophysical characterization, and crystallizable allowing the determination of highly informative X-ray crystal structures (
). As a result, the parent peptide, GCN4-p1, became a model for protein folding, assembly, and stability. The rapid turnaround of GCN4-p1 variants pushed understanding of sequence-to-structure studies.
). It shows that the nature and order of h-type residues of the a and d sites of heptad repeats largely determine the oligomeric state of classical coiled coils. Harbury’s experiments were straightforward. He made variants of GCN4-p1 with different combinations of two of the most-common hydrophobic amino acids in coiled coils—leucine (Leu, L) and its isomer isoleucine (Ile, I)—at all of the a and d sites. Let’s call these peptides pIL, pLI, pII, and pLL, where the first named amino acid is at a and the second is at d (pad). Harbury characterized the peptides in solution and by X-ray crystallography. Unsurprisingly, all were stable α-helical oligomers in aqueous buffer. The surprise was that they formed different oligomers; pIL, pII, pLI were dimeric, trimeric, and tetrameric, respectively. This was surprising because most bioinformatic analyses would consider Leu and Ile to have similar impacts on protein structure. Harbury’s X-ray crystal structures explained this conundrum as described below and illustrated in Figs. 3C-E.
), the KIH packing at the a and d sites are different, Fig. 3C. The Cα-Cβ bond vector of Leu (the knob) at d points directly towards the neighboring helix and into a hole formed by side chains (at a, d, e, and a+1) of that helix (see Figs. 2B&C). We call this packing perpendicular, and, overwhelmingly, it best accommodates Leu residues (
). By contrast, the side chains at a point out of the core and towards solvent. Here, the Cα-Cβ bond vector of the Ile at a is parallel to its hole on the neighboring helix, which is formed by d-1, g-1, a, and d residues. Thus, the a sites of dimers can accommodate many more residue types than d, including the bulkier β-branched Ile (
Imagine bringing a third amphipathic helix into a dimeric assembly. Driven by the hydrophobic effect, the two original helices will respond and redirect their hydrophobic a+d faces towards that of the incoming helix; effectively, these helices rotate on their own axes. As a result, the KIH packing of all of the a and d side chains change. This is manifest in the structure of pII, which is trimeric in solution and the crystal state, Fig. 3D (
). From this structure, the change in side-chain packing angles is clear. They are no longer perpendicular or parallel, and they are similar to each other. We call this acute packing. This similarity means that the amino-acid preferences at the two sites are similar (
). Hence, making a = d = Ile drives towards similar packing at the two sites and, therefore, towards trimers.
Adding a fourth helix to the assembly alters the core packing angles again, Fig. 3E. In this case, the a side chains pack perpendicular and those at d parallel. This is the reverse of the dimer. Hence, when the residues at a and d are swapped, pIL → pLI, the new peptide forms a tetramer.
Harbury’s GCN4-p1 variants have repeated cores, whereas natural sequences are more complex and heterogenous, which bioinformatics bears out (
). Nevertheless, since their discovery, Harbury’s basic sequence-to-structure relationships have been confirmed by analyses of many natural coiled-coil sequences and structures (
). Moreover, they have been used widely as rules for protein design by many groups to deliver many de novo designed coiled-coil peptides and proteins (
), which are described in more detail below. A penultimate point on sequence-to-structure relationships that has emerged over the past 2 – 3 decades is that the hydrophobic cores of coiled coils tend to be built from aliphatic amino acids (A, I, L, M, and V) rather than the larger aromatic amino acids (F, W, and Y) (
). This is probably because of the limited volumes of the inter-helical space and packing requirements of KIH interactions in coiled coils. Indeed, although aromatic residues can be introduced into both natural and de novo coiled-coil peptides they tend to result in unusual structures that go beyond the classical and symmetric dimers, trimers, and tetramers (
That said, it’s not all about aliphatic hydrophobic residues either. Approximately, 20% of residues at the core a and d sites of coiled-coil sequences are polar, including charged residues (
). These reduce the thermal stabilities of the coiled-coil assemblies. However, given the hyperthermal stability possible with even relatively short coiled coils (
), the disruption of perfect hydrophobic repeats is almost certainly essential for protein dynamics and turnover in natural coiled coils. Moreover, these polar inclusions play important roles in specifying the correct structural state. A prime example of this is the conservation of an Asn residue at a central a site in the wider family of leucine-zipper transcription factors (
). Presumably, this offsets the energy penalty for including polar Asn in the hydrophobic core. However, as shown by reasoning, analysis, modelling, and experiments (
), this interaction cannot be made in alternate states such as antiparallel dimers and parallel trimers. In other words, [email protected]a is tolerated in parallel dimers but more destabilizing in other states and, thus, specifies the former. In protein design, this is called negative design, which refers to features that destabilize alternative accessible states more than the targeted state. As a result, [email protected]a and other polar inclusions are now widely implemented in peptide and protein design and engineering (
). This has been formalized by Boyken and Baker in the HBNet protocol in Rosetta, which can introduce hydrogen-bond networks into coiled-coil-like de novo proteins beyond the canonical Asn-Asn pairs of dimeric interfaces (
The previous section shows how different combinations of mostly aliphatic residues at the a and d sites of canonical heptad repeats leads to the different parallel oligomer states: dimer, trimer and tetramer. Examination of the high-resolution structures of two series of peptide assemblies—namely, Harbury’s engineered GCN4-p1 peptides, and a set of de novo design peptides (
)—reveals that something more is going on. In short, as the oligomer state increases more of each component helix becomes engaged in the helix-helix interfaces. This results in residues flanking the a + d seams—the e and g sites—becoming increasingly buried. Thus, potentially, KIH interactions can extend past the a and d sites in trimers and above. The idea that residues at e and g sites progressively become involved in coiled-coil interfaces with increasing oligomeric state was first formalized by Walshaw (
Walshaw’s logic and the resulting nomenclature are straightforward: He called the a and d sites of classical coiled coils with traditional hpphppp repeats “Type N interfaces”. He reasoned that adding h-type residues—or generally, residues that can act as knobs—different coiled-coil repeats and assemblies can be envisaged. For instance, expanding the interface with by one residues gives hpphhpp or hpphpph repeats, which Walshaw called Type I interfaces. The latter, with the additional knob residue at g, is the more likely and more common of these two repeats. Placing h-type residues at both e and g gives hpphhph repeats and Type II interfaces. As expanded below, it is helpful to consider this as two superimposed 3,4 hydrophobic repeats, hbcdhfg and abchefh with two distinct interfaces, a + e and d + g.
For the Type I and II interfaces, the original a + d interface is simply expanded and the hydrophobic seam on one face of the amphipathic helix is broadened. As a result, more helices can be recruited to the bundle. Mistakenly, Walshaw and I thought that this would stop at 6 helices (hexamers) (
). Finally, these expanded interfaces need not be contiguous. Repeats of the type hhphphp give two distinct hydrophobic seams formed by the a and d sites and the b and f sites and, thus, on the opposite sides of the helix. The resulting helices are no longer simple amphiphiles, they are bifaceted with the potential to form high-order structures (
In summary and as a rough guide: coiled-coil dimers tend to have canonical repeats and Type N interfaces; trimers have Type I interfaces; tetramers can have Type I or II interfaces, and, as a result, are at an interesting tipping point between trimers and higher-order structures (
); and larger assemblies of 10 helices and above usually require Type III, bifaceted interfaces.
Testing and expanding chemical understanding through de novo coiled-coil design
After Feynman’s epitaph, “What I cannot create, I do not understand.”, one test of our understanding of protein structure is to build entirely new proteins from scratch. Whilst de novo protein design has been active for ≈40 years, the field of is now advancing rapidly and booming (
). This has led to many de novo coiled-coil peptides and proteins that have been characterized in solution and resolved to atomic resolution by X-ray crystallography. The history and achievements of this subfield are well documented (
), so I will not repeat it here. Instead, and rather shamelessly, I will mainly describe the rational and computational design approaches that my group has taken to deliver a set of autonomous coiled-coil peptide modules. We call this the coiled-coil basis set, which is illustrated in Fig. 4.
Figure 4A gallery of de novo coiled-coil structures. Top row: coiled-coil bundles with 2 – 4 helices (
). The diameters of the lumens scale approximately with the number of helices in the assembly, ranging from ≈5 – 10 Å. Bottom row from left to right: a monomeric single-chain miniprotein with a polyproline-II helix followed packing with an α helix (
). Key:systematic names are given above each structure, and 4-digit, PDB codes are given below; CC stands for coiled coil, and Di, Tri, Tet, etc refer to dimer, trimer, tetramer etc; all of the assemblies with helices shown in solid colors are parallel bundles or barrels; those with antiparallel arrangements of helices are colored as chainbows from the N terminus (blue) to the C terminus (red), except for apCC-Di-AB, which only has the termini colored blue and red; the systematic names for the antiparallel structures are prefixed with ‘ap’. All images were made in PyMol (pymol.org) using the PDB codes given or from models generated in CCBuilder/ISAMBARD (
Figure 5Structural parameters and knobs-into-holes (KIH) packing and core-packing angles (CPAs) in more detail. A, How coiled-coil radius and superhelical pitch change with oligomeric state for 175 all-parallel structures of the 2022 version of the CC+ database (
). Search parameters: SOCKET packing cutoff, 7Å; sequence redundancy, 50%; helix orientation, all parallel; number of helices, 2 – 8; experimental method, X-ray crystal structures at 2.2 Å resolution or better. B, How the CPAs calculated by SOCKET (
) made by side chains at the a, d, e, and g sites in the same dataset change with oligomeric state (10,164 CPAs in total). The error bars are for 1 standard deviation; and the points are joined by lines to guide the eye. C, A simple geometric model for CPAs based on an idealized, flat, helical wheel (i.e., with 3.5-residues per turn) for the heptad repeat. In this model, CPAs are approximated as the angles made between vectors for the knob residues (a, d, e, or g) and the bases of the holes. The knob vectors are taken as extensions of the preceding Cα-Cα virtual bond vectors as indicated by the directions of the colored teardrops. The base vectors are corresponding Cα-Cα virtual bond vectors as follow: CPAa = ga into ga; CPAd = cd into de; CPAe = de into dc; and CPAg = fg into ab. When considered for different oligomer states, this results in the following equations: CPAa = 180˚ - 360˚/N; CPAd = 360˚/N - 77˚; CPAa = 77˚ - 180˚/N; and CPAa = 360˚/N + 26˚; where N = oligomeric state. D, Plot showing how these projected CPA values vary with oligomer. The zone where most of the experimentally observed CPAs (calculated by SOCKET) occur is shaded gray. The color schemes of panels B – D are matched.
) To test and develop sequence-to-structure relationships for coiled coils in a totally synthetic and controllable framework. This was motivated by much of the work to that point being done on the GCN4-p1 system, which increasingly revealed contexts and alternate states that thwarted systematic studies (
) to deliver a toolkit of modules for which the role of every amino acid in each peptide was understood. In turn, this would allow the modules to be used reliably in synthetic biology to construct more-complex and functional protein-like objects (
Our initial design approach was rational. It used 28-residue synthetic peptides, as these are accessible by solid-phase peptide synthesis, and usually form stable helical assemblies amenable to full biophysical and structural characterization. The peptides had 4 heptad repeats with (gabcdef)4 registers to maximise potential gi-1 → ei salt bridges in parallel homomers. Specifically, the repeat sequences were (EaAAdKX)4, with X usually Gln, Lys, Tyr or Trp to aid helicity and solubility, and to introduce chromophores. First, we used the aforementioned combinations of Leu, Ile and Asn at a and d sites (
) to target parallel dimeric, trimeric and tetrameric coiled-coil assemblies. The resulting peptides were all confirmed as thermostable, cooperatively folded, helical oligomers in solution by circular dichroism (CD) spectroscopy, and with the intended oligomeric states using analytical ultracentrifugation (
Designing heterodimeric two-stranded alpha-helical coiled-coils - Effects of hydrophobicity and alpha-helical propensity on protein folding, stability, and specificity.
) in which complementary acidic (A) and basic (B) chains are achieved by making g = e = Glu and g = e = Lys, respectively. This delivered CC-Di-AB variants with fully quantified affinities in the μM to sub-nM range (
Designing heterodimeric two-stranded alpha-helical coiled-coils - Effects of hydrophobicity and alpha-helical propensity on protein folding, stability, and specificity.
). Interesting, as we have also found, it appears difficult to obtain crystals and solve structures for heteromeric de novo coiled coils, and few have been resolved to high resolution (
Exploring the dark matter of coiled-coil space – α-helical barrels
The basis-set peptides led to two serendipitous discoveries. First and surprisingly, a permutation of CC-Tet with the repeat changed from EIAALKX to EIKALAX—which moved an Ala to e—formed a parallel hexamer, which we named CC-Hex (
), Fig. 6A. Thus, as introduced above, expanding the a+d hydrophobic seam to include small hydrophobic residues at g and e recruits more helices to coiled-coil assemblies.
Figure 6Structures of designed and natural α-helical barrels. A, A slipped heptamer formed by a mutant of GCN4-p1 peptide with Ala at the e and g positions (2hy6 (
)). This spans the periplasmic space to link the inner and outer membranes to allow efficient efflux from the cell. The upper β-barrel spans the outer membrane, the central 12-helix α-barrel bridges the space, and the lower antiparallel coiled-coil dimers engage other proteins of the efflux machinery at the inner membrane. E, The octomeric Wza protein from E. coli (2j58 (
)). This exports polysaccharides for assembly on the outer surface of the bacterium, with the upper part forming an 8-helix barrel in the outer membrane. F, The H protein from the ΦX174 coliphage forms a 10-stranded α-helical tube, which can span the periplasm of the host to deliver its single-stranded DNA genome (4jpp(
)). Note how the coiled coil switches from right-handed (near straight) at the N terminus (bottom) to left-handed at the C terminus (top). G, cryoEM structure of the F1F0 ATP synthase from a green algae (6rde (
)). The membrane-spanning c-ring, which comprises concentric rings of coiled-coil helices (top of the cartoon), couples proton transport to rotatory catalysis in the F1 assembly (bottom) via a central stalk, the γ subunit, which is an antiparallel coiled-coil dimer (slightly obscured and colored silver). H, A pentameric NMR ‘pinwheel’ structure for cardiac-muscle phospholamban (2kyv (
)). Although SOCKET analysis reveals a clear pentameric α-helical barrel, the central pore is too narrow to act as an ion channel. This structure is proposed to be the dominant T state in membranes. Chain coloring varies between the panels: in A, B, C, E, F, and H chainbows are used to trace the N to C termini of the different chains; in D and G the protomers are each colored differently. In panels B and C the atomic surfaces are shown meshed.
) both have central and fully accessible channels, Figure 4, Figure 6, making them α-helical barrels (αHBs) rather than α-helical bundles with consolidated hydrophobic cores. As described below, this opens possibilities for functionalizing de novo coiled-coil scaffolds considerably. However, to realize this, CC-Hex and other αHBs would have to be robust to mutation. Despite some early successes (
), we found that CC-Hex often collapsed back to parallel tetramer and other states when altered. Therefore, to deliver other and more-robust αHBs, we turned to computational protein design. This required the development of in-house parametric coiled-coil design tools (
) to assess the helix-helix interfaces. This delivered new and robust sequences for parallel and non-slipped pentameric, hexameric and heptameric coiled coils, CC-Pent, CC-Hex2 and CC-Hept, which were all confirmed in solution and by X-ray crystal structures, Fig. 4 (
Interestingly, the computational αHB designs have sequences related to the initial rational and serendipitous designs, namely: the a = Leu plus d = Ile core from CC-Tet and CC-Hex is preserved; as introduced above, the e and g sites are more-intimately involved in the helix-helix interfaces and tend to be more hydrophobic; and, consequently, the interhelix salt-bridging Lys and Glu residues are moved to b and c, respectively. Incidentally, for the computationally designed αHBs, and for most subsequent designs of higher-order coiled coils, we have changed from sequence repeats with g→f register to c→b registers (
). This maximizes interhelical salt bridges: in classical parallel dimers and trimer, these salt-bridges can form between residues at g on one helix and residues at e of the next heptad in the neighboring helix, i.e.g→e’+1 (
Finally on the chemistry of αHBs, there is a conundrum for the natural and serendipitously discovered barrel-like proteins. A basic tenet of coiled-coil assembly—and protein folding in water generally—is that the polypeptide chains fold to minimize their free energy, with a major part of this coming from burying their hydrophobic side chains to form a hydrophobic core. Thus, how do αHBs with predominantly hydrophobic residues at the lumen-facing a and d sites avoid collapse? Again, the answer lies in the stereochemistry of core packing.
Further empirical studies of the computationally designed αHB sequences have revealed the importance of β-branched residues at the a and d sites in maintaining the barrels (
): for open channels, the d sites must be predominantly Ile or Val in combination with a = Leu, Ile or Val. Relaxing this and allowing d = Leu leads to collapsed high-order oligomers with consolidated cores. Furthermore, we have found that the residues at the e and g positions also have profound and different effects on αHB formation and oligomeric state. For example, in parallel αHBs, side chains at g point directly towards the neighboring helices – they pack perpendicularly into e’a’+1b’+1e’+1 holes (discussed below and illustrated in Fig. 5). As a result, the oligomeric state is very sensitive to the size of the side chain here. For the same sequence background, the series Gly → Ala → Ser → Thr at g form nonamer, heptamer, hexamer, and pentamer, respectively (
), Fig. 4 and Table 1. That is, smaller side chains allow closer helix-helix contacts and, thus, recruitment of more helices to the barrel. By contrast, similar changes at e have less predictable effects, leading to αHBs, collapsed structures, and other helical bundles (Martin et al., unpublished data). Intriguingly, a sequence with Gly at e forms both open-barrel and collapse hexamers in the same crystal structure (Fig. 6B) and in solution (
). It appears that the introduction of [email protected]e relaxes the helix-helix interactions sufficiently to allow both close helix-helix contacts and hydrophobic collapse, but with the open αHB still energetically accessible (
Table 1Design rules for coiled-coil oligomers. Left-hand column
name oligomer
a
b
c
d
e
f
g
PDB code
CC-Di
I/N
A/X
A/X
L
K/E
X
E/K
4dzm
CC-Tri
I
A/X
A/X
I
K/E
X
E/K
4dzl
CC-Tet*
L
K/E
E/K
I
Q
X
Q
6xy1
CC-Pent*
L
K/E
E/K
I
A
X
T
7bav
CC-Hex2
L
K/E
E/K
I
A
X
S
4pn9
CC-Hept
L
K/E
E/K
I
A
X
A
4pna
CC-Oct
I
K/E
E/K
I
A
X
A
6g67
CC-Non
L
K/E
E/K
I
A
X
G
7bim
apCC-Tet*
L
EEKK
EEKK
I
A
X
Q
8a3g
systematic name of the de novo coiled-coil assembly (Fig. 4). Right-hand column: PDB code of a representative structure for the design. Middle columns: favored amino acids at the seven sites of the coiled-coil heptad repeats, abcdefg for the coiled-coil state. Important note on register: Straight a – g registers are usually not used in de novo coiled coils. Rather, in parallel dimers and trimers the sequence repeats are g → f. This is because side chains at g-1 of one helix can make interactions with those at e of the following heptad repeat on a neighboring helix; for example, to make g-1 → e salt bridges. However, for parallel tetramers and above, because side chains at e and g become increasing involved in helix-helix interactions the salt-bridge interactions are moved to c → b+1. Hence, the sequence repeats of these higher-order oligomers are best constructed with c → b register repeats. Key: standard one-letter codes are used for the amino acids; X = any proteinogenic amino acid except Pro. Note: as discussed in the text, although the sequence-to-structure relationships summarized here have been determined bioinformatically, computationally, or empirically and tested in multiple experiments, they are not all hard-and-fast rules. Also, they have largely been developed and tested in the context of 4-heptad sequences. Thus, they may be subject to context dependence.
This expansion of coiled-coil structural space presents an opportunity to examine how coiled-coil geometry changes with oligomer state. To do this, Prasun Kumar compiled data for all-parallel coiled coils from the 2022 update of the CC+ database (
). As expected, the radius of the coiled-coil superhelix increases with oligomer state, Fig. 5A. Turning to superhelical pitch, Fig. 5A, for dimers through hexamers these are near the theoretical value of ≈200 Å, although there is considerable variation around this. For heptamers and octomers there is a sharp increase in coiled-coil pitch. Most likely, this is due to straightening of the coiled coil needed for peripheral KIH interactions by residues at e and g to be made; though there are still very few high-resolution structures for these coiled coils to make firm conclusions.
A closer examination of KIH interactions made by side chains at a, d, e and g sites in the dataset is interesting, Fig. 5B. The aforementioned systematic changes in core-packing angles (CPAs) of residues at a and d between parallel (≈0˚), acute (≈45˚) and perpedicular (≈90˚) packing (see Figs. 3C-E) is clear for the dimers, trimers and tetramers. Extending this beyond tetramers a number of things become apparent: First, the CPAs at the a and d sites change little above tetramer. Indeed, they asymptote to ≈115˚ (near perpendicular) and ≈25˚ (near parallel), respectively. Second, KIH packing at the e and g positions only come into play for tetramers and above: for tetramers and pentamers KIH interactions are made here ≈50% and ≈75% of the time, respectively; for the hexamers >90% of side chains at these sites make KIH interactions; and for the few examples of heptamers and octomers all residues e and g positions act as knobs, i.e., they are fully Type II interfaces. This is why the tetramer is a tipping point between classical (Type N and Type I) and higher-order (Type II) coiled coils. Third, when KIH interactions are made by residues at e in tetramers and above the CPA is ≈30˚ regardless of oligomer state, and the packing is like that at d; whereas, at g the CPA changes from ≈95˚ for tetramers to ≈60˚ for the octomers. Thus, in the higher oligomers, side chains at e make parallel KIH interactions, and those at g perpendicular interactions. This is why side chains at g have a greater influence on coiled-coil structure and stability than those at e, as noted above (Table 1 and reference (
Finally, a simple model using projections on idealized, 3.5-residues per turn helical wheels captures many of the changes in CPAs and KIHs, Figs. 5C&D. This is my zeroth-order attempt to include side-chain packing geometries in Crick’s coiled-coil parameterization. It will be developed elsewhere as it may be of use to others engaged in rationalizing complex natural coiled-coil structures or designing them rationally and computationally.
Targeting antiparallel structures
Our second serendipitous finding was that certain CC-Hex variants formed another coiled-coil state, an antiparallel tetramer (
), these have been design targets for DeGrado, Dutton, their former group members, and, more recently by computational designers [REFs]. Nevertheless, we were interested in exploring this region of coiled-coil sequence and structure space both to avoid unwanted alternative states in αHB design, and to define rules for a new basis-set member; i.e., apCC-Tet, where the ‘ap’ prefix signifies antiparallel.
The initial antiparallel-tetramer variants of CC-Hex were far from ideal (
), Fig. 4. Subsequently, we have conducted a systematic rational and computational design of new apCC-Tet variants, leading to more-robust sequences and structures for both homo and hetero-typic antiparallel coiled-coil tetramers. Moreover, these helical sequences can be linked with turns and loops to render a single-chain antiparallel 4-helix coiled coils, sc-apCC-4, in a single design step (
). This whole process has been followed at atomic resolution with X-ray crystal structures for apCC-Tet*, apCC-Tet-A2B2*, and sc-apCC-4, Fig. 4. Thus, we have graduated from peptide to protein design using robust and rational design rules. This followed the pioneering work of Regan and DeGrado and by Hecht and the Richardsons (
), the following rules and principles emerge for antiparallel coiled-coil tetramers: the use of a = d = Leu, or better a = Leu d = Ile cores; an obligate Ala at e, similar to so-called Alacoils (
); a preference for Gln at g; and the use of charge complementarity at b & c as a final guide to helix-helix specification and orientation, and specifically, using oppositely charged residues in the N- and C-terminal halves of these designs (
The design of antiparallel coiled-coil dimers has been pursued by others for some time; for examples, see the work of Hodges, Oakley, Gellman, Keating and others (