Subfilamentous Protofibril Structures in Fibrous Proteins

The packing of the constituent molecules in some fibrous proteins such as collagen and intermediate filaments (IF) is thought to consist of several hierarchical levels, the penultimate of which is the organization of subfilamentous units termed protofibrils. However, to date only indirect evidence, such as electron microscopic images of unraveling fibers or the existence of mass quanta, has been adduced in support of the existence of protofibrils. We have reexamined this issue in IF. Cross-links have been induced in trichocyte keratin, cytokeratin, and vimentin IF proteins. Using improved experimental conditions, several additional and reproducible cross-links have been characterized. Notably, many of these link between columns of molecular strands four apart on two-dimensional surface lattices. These data provide robust support for the concept of an 8-chain (4-molecule) protofibril entity in IF. Further, their positions correspond to the axial displacements predicted for protofibrils in the different types of IF. Also, the data are consistent with intact IF containing four protofibrils. In addition, the positions of these novel cross-links suggest that there are multiple possible groupings of four molecular strands to form a protofibril, suggesting a promiscuous association of molecules to form a protofibril. This may underlie the reason that organized elongated protofibrils cannot be visualized by conventional microscopic methods.

The thick filaments in muscle, the collagen fibrils in connective tissue, and the intermediate filaments (IF) 1 found in many cell types typify a diverse class of structures known as fibrous proteins. Over the years there has been much speculation about the manner in which the constituent molecules in these proteins pack together in three dimensions to generate the observed filamentous structures (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11). The models proposed on the basis of x-ray diffraction patterns, electron micrographs, biochemical and other data have fallen into two general classes: those in which the largest structural unit is the individual molecule and those in which the molecules first aggregate to form a subfilamentous protofibrillar structure. In the former case the molecules pack in a regular or quasi-regular manner throughout the filament, and in the latter case the protofibrils in turn aggregate further to produce the intact filamentous structure.
Although the concept of a protofibril is an attractive one and has received widespread acceptance in principle, there are, in fact, very few hard data that substantiate this idea. Attempts, for example, to observe protofibrils directly in muscle (12), connective tissue (13), and IF (14) using transmission electron microscopy have not been convincing. The observed filamentous aggregates induced by chemical or physical means have not been reproduced easily, and neither have they had the appearance of specific aggregates. Low resolution x-ray fiber diffraction data obtained for fibrous proteins have been interpreted in terms of both classes of model (15), and it is probable that only through the acquisition and interpretation of much higher resolution data will we be able to resolve the issue in favor of one class of model over the other. Some indirect data, however, have provided support for the existence of protofibrils. For example, using electron microscopy, the diameters of fetal and neonatal collagen fibrils from transverse sections have been measured. In these tissues the fibril sizes are uniform, and this allows an accurate mean diameter to be determined. It has been shown that the values have a discontinuous distribution consistent with the quantization of fibril diameters (16). Consequently, it has been argued that these data imply an underlying protofibril substructure. Likewise, scanning transmission electron microscopy studies on in vitro aggregates of various IF have provided evidence that assembly occurs in multiples of 8 chains (which has been defined as a protofibril) to produce IF in which there are 16, 24, 32, 40, or even more chains in section (17)(18)(19)(20)(21)(22)(23)(24)(25). However, available evidence to date is that in vivo all contain about 32 chains in section (25,26), corresponding to four protofibrils.
Each IF molecule contains a pair of chains that associate over their central, heptad-containing regions to generate a parallel, in-register, two-stranded coiled-coil. The rod itself comprises four coiled-coil segments (1A, 1B, 2A, and 2B) separated by three linker regions (L1, L12, L2). Although all IF chains are built the same way, subtle differences between them have allowed classification into several distinct types. In recent years cross-link data have been established for oligomeric assemblies of reduced and oxidized type Ia/IIa trichocyte keratin (27), type Ib/IIb cytokeratin (28,29), type III vimentin (30), and type IV ␣-internexin (31) IF. These have documented four common modes of molecular registration assembly and have been designated A 11 , A 22 , A 12 , and A CN . The first three of these relate to antiparallel packing of two adjacent molecules that are half-staggered with their 1B segments largely overlapped (A 11 ), half-staggered with their 2B segments largely overlapped (A 22 ), and largely overlapped over their entire lengths (A 12 ). The fourth mode, derived from the combination of the A 11 and A 22 modes, relates to parallel molecules in larger assemblies that have a head-to-tail overlap of about 8 -10 residues. This occurs in all cases of reduced IF. In the case of oxidized trichocyte keratins of mature tissues such as hair, there is a 10-residue gap between similarly directed molecules (30) because of realignment of the A 11 mode during oxidation (27,32). These four rules have allowed the construction of two-dimensional surface lattices for the packing of molecules in each of the IF types. However, the way in which the columns or assemblies of molecules fold in three dimensions into IF remains unresolved, although one widely speculated possibility is that four adjacent columns of molecules would define a protofibril (14,21,25,33).
Many of these insights have been obtained from cross-linking studies on intact IF or various subassemblies of them. On the other hand, detailed electron microscopic studies have likewise revealed several important kinetic steps by which IF of several types assemble in vitro (21)(22)(23)(24)(25). Initially, a single protein chain assembles into a parallel in-register coiled-coil dimer molecule with another compatible chain. This is followed by the formation of a tetramer, or pair of molecules, aligned antiparallel and partly staggered in the A 11 and/or A 22 alignment modes. These two events presumably occur very rapidly in buffers of neutral pH and low ionic strength. On raising the ionic strength, these oligomerize very rapidly perhaps initially into 3-and 4-molecule species, and then into larger half-and full-width unit length filament (ulf) particles containing 8 or 16 molecules, respectively, or more (34,35). The ulf are ϳ70 nm long and 20 nm or more wide. Although the structure of ulf remains to be determined, it may consist of lateral arrays of molecules stacked side by side in alternating A 11 -A 12 -A 22 alignments (36). As for intact IF, the masses of ulf are polymorphic, implying Same as 7 K5 1B-62f Same as 1 II 1B-61e 4. II 1B-61e Ϫ2 Same as 6 II 1B-42g The numbers refer to the orders of elution of the red-arrowed peaks from the HPLC columns (Fig. 1). The heptad position is italicized. b The first number refers to axial displacement between rows or columns of molecules. For example, cross-link 1 for cytokeratins has a value of (Ϫ2 ϫ A 12 The second number refers to the relative protofibril stagger. L, link number. that polymorphism is introduced at this level of IF assembly (22,23). Moreover, their masses quantize in units of 4 molecules, suggesting that if protofibrils exist, they originate at this level of assembly. The final stage of IF assembly in vitro is the end-to-end joining of ulf particles (34), presumably by docking into each other, thereby forming A CN interactions between the end of one molecule with the beginning of the next molecule in each axial row. This process occurs in a time frame of seconds to minutes. Also, the growing IF condense to a more uniform and narrower 10-nm diameter (34,35). Thus a protofibril might consist of a column of four rows of molecules. Likewise, the often postulated protofilament (14,21,25,33) might consist of a column of two rows of molecules, so that two protofilaments might constitute one protofibril.
However, to date, no biochemical or microscopic data on unraveling IF have been obtained to support directly the existence of protofilaments or protofibrils in IF, or any other fibrous protein. This paper reports new data on the characterization of cross-links induced in reduced and oxidized trichocyte keratin, K5/K14 cytokeratin, and vimentin IF. We have now identified additional minor cross-links that have occurred between molecules four columns apart. The data afford direct evidence of the clustering of columns of molecules in groups of four and thus support the concept of a protofibrillar structure in IF.

MATERIALS AND METHODS
IF Proteins Used-Mouse type Ia/IIa trichocyte chains were coassembled from bacterially expressed proteins (27). These IF were also oxidized with the copper-phenanthroline reagent. Human type Ib/IIb K5/K14 cytokeratin IF were likewise coassembled from bacterially expressed proteins (37). For the keratins, the assembly buffer was 10 mM triethanolamine HCl (pH 7.6) containing 5 mM reducing reagent tris(carboxyethyl)phosphine (TCEP). Finally, a human vimentin construct (kindly provided by Drs. H. Herrmann and R. D. Goldman) was used for bacterial expression. The purified type III vimentin was assembled in 2 mM sodium phosphate (pH 7.6) buffer containing 5 mM TCEP, 1 mM MgCl 2 , and 0.15 M NaCl.
Characterization of Cross-links-Samples of each of the above four IF (10 -20 mg, at 40 -60 g/ml) were cross-linked with 0.4 -0.5 mM disulfosuccinimdyl tartrate (DST), dried, and fragmented with CNBr and trypsin (27)(28)(29)(30)(31). Peptides were resolved on a 150 ϫ 2-mm C 18 Nucleosil (120 Å pore size) HPLC column (Phenomenex, Torance, CA) using a 125-min extended acetonitrile gradient (see Fig. 1). Potential crosslinked peaks were identified by comparisons with uncross-linked proteins, recovered, and subjected to cleavage with 0.1 M sodium periodate. In this way, we recovered 20 -30 cross-linked peptides from each IF sample, most of which were identical to those reported previously. However, an additional 6 -15 reproducible and usually minor crosslinked peaks were identified and characterized further by sequencing. Seven of the cross-links of reduced trichocyte IF have been described previously (27).

Recovery of New Cross-links in Cytokeratin, Trichocyte, and
Vimentin IF-Previously, we have established methods for the identification and recovery of DST-induced cross-links in several types of IF (27)(28)(29)(30)(31). The procedures involve cross-linking through juxtaposed lysine residues of intact IF or subassemblies of them with the periodate-cleavable bifunctional crosslinking reagent DST (linker arm of 0.6 nm), and under mild conditions that allow IF assembly of the modified proteins. Subsequently, the cross-linked IF proteins were fragmented and the peptides resolved by HPLC. Comparisons of the peptide peaks before and after cross-linking revealed shifted peaks (0.02-0.3 mol/mol), which, after a reaction with periodate and sequencing, revealed two lysine residues that had been joined by the cross-linker. In this way, we have identified the four basic modes of alignments of neighboring rows of molecules common to all types of IF (A 11 , A 22 , A 12 , and A CN ). In more recent studies on reduced type Ia/IIa trichocyte keratin IF, using a more efficient reducing reagent (TCEP) coupled with higher resolution HPLC programs, we recovered seven additional cross-links that could be explained only by linkages between columns of molecules four apart on a two-dimensional surface map (27). These are listed in Table I. We then set out to examine whether such cross-links could also be identified in oxidized trichocyte IF as well as in type Ib/IIb cytokeratin and type III vimentin IF. Indeed, in each case, many quantitatively minor (0.02-0.001 mol/mol, the lowest level of resolution possible for subsequent protein sequencing analyses) shifted peptide peaks were recovered that had not been detected previously. Some new peaks were not reproducible between separate cross-linking experiments and did not contain cross-links and thus may represent protein chemical modifications of IF peptides which inevitably occur during the procedures. However, in each case, 11-14 minor peaks were reproducible in separate cross-linking experiments with different batches of proteins (Fig. 1). As these were cleavable with periodate, they were characterized further by sequencing. The new cross-link data are summarized in Table I and displayed on two-dimensional surface lattice drawings in Fig. 2.  Table  I. Smaller green arrows mark positions of known cross-links between adjacent antiparallel molecular strands that define A 11 , A 22 , or A 12 associations.

Characterization of New Links That Support the Concept of a Protofibrillar Structure-
The new cross-link data all correspond to axial staggers between parallel molecules and relate directly to the predicted values of the axial projection (z a ) of the surface lattice dimension a (25). The majority of the axial displacements correspond to about 80, 50, and 25 residues in a coiled-coil conformation for reduced trichocyte keratin and K5/ K14 cytokeratin, oxidized trichocyte keratin, and the type III IF (25), respectively, and hence correspond to links between molecular "strands" (that is, assemblies or columns of head-totail overlapped molecules) that are four apart on the surface lattice. These observations are consistent with the concept of a structural unit with four molecular strands (8 chains) in crosssection. We term this unit the protofibril, to be consistent with many previous predictions and models.
It is important to emphasize that the observed cross-link data for different IF types correspond to the particular value of z a predicted previously for that particular IF type. We have thus produced data for three unique IF systems that independently give rise to the same overall pattern of cross-links. This lends high credence to the structural significance of the results. Furthermore, for K5/K14 cytokeratin, oxidized trichocyte keratin, and vimentin we have also identified a number of crosslinks that correspond to a parallel axial displacement that is either two times larger than z a (ϳ160, 100, and 50 residues, respectively) or three times larger than z a (ϳ240, 150, and 75 residues, respectively) ( Table I). We interpret this as arising from interactions between molecular strands that are 8 (two protofibrils) or 12 (three protofibrils) apart on the topological surface lattice. Once again, these data strongly imply the presence of a structural unit with four strands (8 chains) in crosssection. In addition to these observations, all available scanning transmission electron microscopy studies indicate that native IF contain 32 chains in cross-section (25,26). It follows from both sets of experimental data therefore that most native IF may contain four protofibrils in section. However, IF assembled in vitro are polymorphic, with masses/unit length consistent with 24 -48 chains in section, which are usually multiples of 8 chains (17-19, 21-23, 34); that is, such IF may contain three to six (or more) protofibrils. Using these new cross-link data, we have further refined the parameters relating to the link lengths (where possible) and to the values of A 11 , A 22 , and A 12 (Table II). These differ only in minor aspects from those data reported previously.
The numbers of interprotofibrillar cross-links observed are likely to correlate with the physical proximity of the protofibrils in the IF studied. Thus both one-and three-protofibril staggers would arise between protofibrils that were immediately adjacent to one another in a four protofibril-containing IF. The likelihood of finding cross-links corresponding to a two-protofibrillar stagger, however, would be expected to be somewhat lower because these protofibrils might be diagonally opposed and hence may not be as close spatially to one another. As regards the extent of protofibril overlap this would be smaller for those IF with large values of z a (cytokeratin and reduced trichocyte keratin) than for those with small values of z a (such as vimentin). The odds of observing an interprotofibril crosslink will be roughly proportional to the degree of overlap that exists between the protofibrils in question. Overall, we report here 30 cross-links for a one-protofibril stagger, 4 for a twoprotofibril stagger, and 5 for a three-protofibril stagger (Table  I). Qualitatively, these values are generally consistent with the factors listed above.
It should also be pointed out that two of the common crosslinks that denote links between adjacent antiparallel molecules in keratin IF (green arrows in Fig. 1D) have more than one structural solution (38): they may link between parallel molecules one apart on the surface lattice. These are type II 1B-49 with type II 2B-23, and type I 1B-06 with type II 2B-63. These could represent intraprotofibril links. They may also be interpreted as links between two adjacent protofilaments, but the numbers of such cross-links are too small for a definitive assessment of this possibility.
Complexity of Protofibrillar Composition: Promiscuity Limits Protofibril Length-Our data summarized in Fig. 2 document that there are multiple possible choices for an association of rows of molecules to form protofibrils, especially in reduced keratin and vimentin IF. This arises from a consideration of the manner in which the cross-links can be interpreted structurally on a two-dimensional surface lattice drawing. Although all cross-links reported here arise between one column of head-totail overlapped molecules and a parallel one that is four col-FIG. 2. Drawings of two-dimensional surface lattices. These are for vimentin (left panels), cytokeratin and reduced trichocyte keratin (center panels), and oxidized trichocyte keratin (right panels). The positions of the DST crosslinks are shown and occur between molecular strands that are four apart on the surface lattice (red lines). In the cases of the vimentin and reduced keratin IF, even though the cross-links are positioned identically, the structural coherence of the protofibril can be defined in several ways (see Fig. 3). However, this may not be the case for oxidized trichocyte IF. Note that many cross-links for reduced trichocyte keratins and cytokeratins involve the same residues; those unique to the reduced trichocyte data set are shown by broken lines. In addition, in this figure, one set only of observed cross-links is represented for each of the three independent IF.
umns (or a multiple) apart, closer inspection reveals another level of complexity. By taking any molecule on the two-dimensional surface lattice as a basis point, some of the new crosslinks fit naturally to a parallel molecule four columns over, going left or right. About half, however, are not explained on this basis: it is necessary to move over left/right to an adjacent column (all links are displayed this way in Fig. 2) or go either up/down to the next molecule in a given column (see Fig. 3B) to account for a linkage four columns over. In this case, at any point in the lattice all of the new cross-links can be resolved within a cluster of six laterally adjacent columns (Fig. 2).
Considerations of how and at what level of IF hierarchy this arises are illustrated in Fig. 3. This is based on the realization that IF elongate by end-to-end joining of ulf entities. At the one ulf stage (Fig. 3A), the organization of four molecular strands into protofibrils is readily apparent from our new cross-linking  Table I) cross-link between adjacent protofibrils: between rows 1 and 5 (blue), 5 and 9 (red), or 9 and 12 (pink). Assuming the individual rows and protofibrils fold into a compact manner, perhaps in some way envisaged by Fraser et al. (15), the blue cross-link joins protofibril a to b; the red cross-link joins protofibril b to c, and so on. However, another level of complexity arises when ulf join, presumably by maintaining discrete rows joined through A 12 overlaps (yellow) (panel B). Now there are multiple solutions for the K5 1A-31/K5 1B-61 cross-link, involving adjacent rows. Using the above red cross-link as an example, cross-links occur between rows 4 and 8, 5 and 9, and 6 and 10; that is, these cross-links could denote linkages between five possible protofibrils, consisting of rows 1-4 (protofibril a, shaded green), 3-6 (aЈ, shaded yellow), 5-8 (b, green), 7-10 (bЈ, yellow), 9 -12 (c), or 11-13 (cЈ, yellow). It is therefore possible that IF display promiscuous aggregation characteristics because the grouping of four molecular strands could vary along the IF. The expected result is that elongated protofibril subfilamentous entities do not exist and therefore would not be expected to be seen in unraveled IF. Note that in this drawing, reduced keratin molecular alignments are used: similar promiscuity is evident in vimentin. The figure is not drawn to scale, and no attempt has been made to display end domains. data; the protofibrils would consist of strands 1-4 (shaded green) (denoted as protofibril a), 5-8 (b), 9 -12 (c), and 13-16 (d). As an example, the K5 1A-31/K5 1B-61 cross-link (see item 1 of Table I) links rows 1 and 5 (blue lines in Fig. 3A), 5 and 9 (red), or 9 and 13 (pink). These respectively link protofibrils a to b, b to c, and c to d. However, at the two-ulf stage, there are multiple solutions for each cross-link datum (Fig. 3B). Using the example above, the red cross-linkage joins rows 5 and 9 at one point, and rows 4 and 8 or rows 6 and 10 at other points, simply by moving over left or right to the next molecular strand or up and down to the next molecule on the two-dimensional lattice. This means that a protofibril could consist of rows 5-8 (shaded green, as above), or rows 3-6 (shaded yellow, denoted as protofibril aЈ), or rows 7-10 (yellow, bЈ). This level of potential complexity continues and extends as more ulf particles add to the elongating IF (not illustrated). Accordingly, there are multiple possible ways of gathering four elongated columns together to make a protofibril, and it seems likely that this grouping could change along the length of an IF. Hence, the association of molecules is potentially promiscuous. However, all of the available cross-links denoting a protofibril structure in oxidized trichocyte IF can be accounted for by the same set of four molecular strands (Fig. 2); that is, the grouping of four rows of molecules does not overlap so that promiscuity may not occur in this case. This may be because of extensive intraprotofibril disulfide bonding.
Also, these considerations offer a simple explanation as to why no investigator has yet consistently observed an organized elongated pattern of protofibrils in unraveled IF: protofibrils may have limited axial coherence arising from the competing modes of molecular aggregation that are possible in vivo.
Conclusions-The results presented here provide strong support for the protofibril concept in IF consisting of an 8-chain (4-molecule) entity. Because the ␣-helical chains in IF are right handed and the coiled-coil molecules are left handed, it is possible that the protofibrils (as regards their constituent molecular strands) will continue the alternation of hand and be right handed. Indeed, all of the images of unraveling IF reported to date display a right handed orientation (14,39). Furthermore, our data indicate that protofibrils in reduced IF might retain their coherence for only a short distance axially, perhaps just for one or only a few molecules in length, because of the potentially promiscuous association of molecules.
It remains to be seen, however, whether a protofibrillar structure also exists for collagen, although in this case it may be defined predominantly by a specific pattern of naturally occurring covalent cross-links. In this regard, the protofibrils may consist of a "fixed" set of molecular strands, as appears to be the case reported in Fig. 2 for oxidized trichocyte keratin IF. In the case of myosin thick filaments, we submit that an equivalent approach to that used here with IF may prove useful in defining the presence of a protofibrillar substructure.