What Curves α-Solenoids?

The α-helical solenoid proteins adopt a variety of elongated curved structures. They have been examined to identify the interactions that determine their curvature. A sequence pattern characteristic for strongly curved α-helical solenoids has been constructed and was found to match protein sequences containing the proteasome/cyclosome repeats. Based on this, a structural model of the repeat-containing domains of the Rpn1/S2 and Rpn2/S1 proteins, which represent the largest subunits of the 26 S proteasome, has been proposed. The model has a novel architecture resembling an α-helical toroid. Molecular modeling shows that these toroids have a central pore that would allow passage of an unfolded protein substrate through it. This implies that the Rpn1 and Rpn2 toroids are aligned along the common axial pores of the ATPase hexamer and form an “antechamber” of the 26 S proteasome. The proposed quaternary structure agrees with the available experimental data. It is suggested that the function of this antechamber is assistance to the ATPases in the unfolding of protein substrates prior to proteolysis. An evolutionary link between the PC repeat-containing proteins and tetratricopeptide repeat proteins is proposed.

Recently, a new class of protein structures that contain a continuous superhelical or solenoid arrangement of repeating structural units has emerged (1,2). The solenoid proteins with their elongated and repetitive structure contrast the structure of globular proteins, which fold in more complex ways. The repeating units of the solenoids consist of one to several segments of secondary structure, among which are ␣-helices, ␤-strands, and 3 10 -helices. There are purely ␣-helical or ␤-structural solenoid proteins and others having a mixture of the secondary structures.
The solenoid proteins adopt a variety of non-globular shapes ranging from straight rods to curved shapes resembling a horseshoe. In many instances, the curvature of these proteins predetermines the repertoires of molecules that can bind to them. For example, the ␣/␤-solenoids of leucine-rich repeat (LRR) 1 proteins have a horseshoe shape, and their concave surfaces can embrace certain globular protein domains (3). On the other hand, the ␤-solenoids have a rod-like shape, and their binding counterparts themselves may require concave surfaces to bind the ␤-helices (2). Therefore, understanding of the interactions that control the curvature is important for correct prediction of the structure and function of the solenoid proteins. For some of them, the reasons for the curvature are well established. Indeed, the pronounced curvature of the ␣/␤-solenoids formed by LRRs is the result of different diameters of the ␣-helices and ␤-strands when packed in a parallel arrangement (4,5). On the other hand, the purely ␤-helical structures do not have any curvature because their intercoil distance is determined by the distance between hydrogen-bonded ␤-strands, which remains the same on all sides of the solenoid (2). The highest variety of shapes is provided by ␣-helical solenoids. They can be straight (6), slightly curved (7), or fold into horseshoe-like structures (8,9). However, the interactions that control their shape are still poorly understood.
In this work, I systematically analyzed known three-dimensional structures of the ␣-helical solenoid proteins to identify interactions and amino acid sequence patterns that lead to their curvature. This analysis revealed two modes of interactions that are common for the curved structures as follows: (i) the helices that form the concave surface have small Ala, Ser, and Gly residues in certain positions, and these residues reduce their interhelical distances; (ii) the "concave" helices also have large hydrophilic or aromatic side chains that intercalate between the helices of the convex surface and increase their interhelical distances. Reduction of the interhelical distances on one side of the solenoid and their increase on the opposite side results in the curved superhelical structure.
To test the predictive power of the principles described above, I constructed a sequence pattern characteristic for the strongly curved ␣-helical solenoids and ran this pattern against the sequence data bases. It turned out that this sequence pattern matches several proteins containing so-called proteasome/cyclosome (PC) repeats (10). The members of this protein family include the two largest subunits of the 26 S proteasome and a protein of anaphase-promoting complex (APC). The 26 S proteasome is a major intracellular protease in eukaryotes (11)(12)(13)(14)(15). Because of its central role in the ubiquitin-mediated proteolytic pathway, this large ATP-dependent enzyme is involved in a wide variety of cellular processes, including cell cycle regulation, antigen presentation, and inflammation. The three-dimensional structures and functions of its protein subunits containing PC repeats are not known. The match between the sequence pattern of the curved ␣-solenoid and sequences of the PC repeat-containing proteins leads to a model of the threedimensional structure for the Rpn1 and Rpn2 proteins of the 26 S proteasome. In accordance with this prediction, the repet-itive parts of these proteins fold into a structure with a novel architecture resembling an ␣-helical toroid. The Rpn1 and Rpn2 toroidal structures have a central pore of about 20 Å in diameter that would allow the passage of a peptide substrate. The axial symmetry and the central pore of the Rpn1 and Rpn2 structures suggest that they are aligned along the axial pore of the ATPase hexamer and the 20 S core particle and form an "antechamber" of the 26 S proteasome. The toroidal structures and their arrangement within the proteasome are consistent with the data of electron microscopy (16,17) and protein subunit interaction maps (13, 18 -21). It is proposed that the function of this antechamber is to assist the ATPases in the unfolding of protein substrates prior to their proteolysis.
The prediction of the three-dimensional structures of Rpn1 and Rpn2 proteins and the quaternary structure of the base module of the 19 S regulatory particle and its docking onto the 20 S core particle all contribute to a better understanding of regulation in the ubiquitin-proteasome system and open possibilities for a number of experimental tests.

MATERIALS AND METHODS
Sequence Analysis-PatternFind server (www.isrec.isb-sib.ch/ software/PATFND_form.html) was used to search for motifs that are characteristic for the curved ␣-solenoids.
Molecular Modeling-An ␣/␣-hairpin from the TPR domain of Hop protein (22) was used as an initial template to model a PC repetitive unit. The model was generated using molecular modeling modules implemented in Insight II (23). The interhelical loops were modeled by using a homology module of the Insight II program that allows search for an appropriate loop conformation in the data base of known structures. The resulting structures were energy-minimized by applying 100 steps of the steepest descent algorithm and 600 steps of the conjugate gradient algorithm. The consistent valence force field (24) and the distance-dependent dielectric constant were used for the energy calculations. The final structural models were evaluated using PROCHECK (25) and shown to meet all standard requirements for stereochemistry, packing, and H-bonding of protein structures. The atomic coordinates of the Rpn1 and Rpn2 models are available on the World Wide Web (cmm.info.nih.gov/kajava/). Images of the structures on Fig. 2 were generated using MOL-SCRIPT program (26) and on Figs. 3C, 4A, and 5 using Insight II (23). The molecular surface on Fig. 4 was produced by DINO program (www.dino3d.org).

RESULTS AND DISCUSSION
Interactions That Determine the Curvature of ␣-Helical Solenoid Proteins-To understand the origin of the curvature in ␣-solenoids, I analyzed systematically their three-dimensional structures. There are over 20 known three-dimensional structures of ␣-solenoids (1, 2). However, not all are suitable for the present analysis. The curvature is well determined for the structures formed by the uniform packing of several repeated units lying in the same plane. The majority of such structures is represented by the ␣-solenoids containing two-␣-helix repetitive units (␣/␣-solenoids). They are formed by ␣/␣-hairpins stacked in a parallel manner, so that their overall structures are almost "flat" and have pronounced curvatures (Fig. 1). These proteins were the focus of my study. Among them were proteins containing HEAT repeats (9,27,28), tetratricopeptide repeats (TPR) (29 -31), and a recently determined MalT protein (32). A leucine-rich repeat variant protein having an ␣/3 10solenoid structure (7) was also included in the analysis. In addition, I considered the oligomeric structure of the F 1 F 0 -ATP synthase (33) that is formed by several individual ␣/␣-hairpin subunits stacked in a consecutive array so that they resemble curved ␣/␣-solenoids but lack covalent connections between the hairpins.
The leucine repeat variant (LRV) protein (7) has the smallest curvature among the analyzed structures ( Fig. 2A). Its repeated structural unit is characterized by a hairpin formed by an ␣-helix and a 3 10 -helix. The ␣and 3 10 -helices of each hairpin are parallel to their helix counterparts in a neighboring repeat. This creates a structure composed of a double layer of helices, with the ␣and 3 10 -helices forming the inner (concave) and outer (convex) faces, respectively. ( Fig. 2A). This is a surprising arrangement because the average diameter of the ␣-helix is larger than the diameter of the 3 10 -helix, and one can expect an opposite curvature. The observed curvature is partially explained by the fact that the 3 10 -helices do not have "knob-into-hole" hydrophobic interactions with each other. As a result, the packing between the outer 3 10 -helices is looser than between the inner ␣-helices. On the other hand, the analysis revealed that most of the ␣-helices have small Ala residues ( Fig. 2A) at positions that determine their interhelical distance within the inner layer (denoted as il positions in Fig. 1). In addition, it was noticed that the ␣-helix has a conserved Arg residue among residues, which form the hydrophobic interior of the solenoid (ii positions in Fig. 1). This Arg ii intercalates between the 3 10 -helices, and its charged guanidinium group escapes the hydrophobic environment ( Fig. 2A) and forms an ionic bond with an Asp or Glu side chain. This suggests that the intercalation of the Arg ii residues increases the distances (d out ) between the 3 10 -helices, and the small Ala il residues decrease the distances (d in ) between the ␣-helices, and these are the main factors governing the curvature of LRV protein.
Several ␣-solenoid structures with well defined curvature belong to a family of HEAT repeat-containing proteins (27). Among them are importin-␤ (28) and PR65/A subunit of protein phosphatase 2A (9) (Fig. 2B). Analysis of these structures showed that a highly conserved Pro residue in the middle of the outer ␣-helix induces formation of a -helical turn, which makes the ␣-helix wider. The inner ␣-helix is also distorted in the middle by a couple of 3 10 conformations that makes the inner ␣-helix thinner. Forces generated by packing of the helices probably induce the distortion of the inner helix. The difference in the diameters of the inner and outer helices causes the curvature. The side chains also contribute to the curvature. Similarly to the LRV protein, the HEAT proteins have conserved Arg/Lys residues within the apolar cluster of the inner helix, which intercalates between the outer helices. They also contain small Ala residues in il positions that determine the interhelical distances within the inner layer (Fig. 2B).
The repetitive units of the curved superhelical structure of TPR proteins consist of regular ␣-helices (Fig. 2C). The analysis shows that TPR proteins, analogous to LRV and HEAT FIG. 1. A schematic representation of three turns of a curved ␣/␣-solenoid viewed along the ␣-helical axis. Arrows denote orientations of side chains, which determine the distance d in between ␣-helices of the inner layer (il) and the distance d out between ␣-helices of the outer layer (ol), as well as orientations of the internal side chains belonging to the inner helix (ii) and outer helix (oi). The decrease of the interlayer distance (d) can increase the curvature of the ␣-solenoid at given d out and d in .
proteins, frequently have small Ala and Ser residues in several il positions controlling d in distance (Fig. 2C). The ii positions of the TPR inner helices have preference for bulky aromatic Trp and Tyr residues that intercalate between the outer helices similarly to the Arg/Lys of LRV and HEAT protein structures. Thus, the decrease of d in distance due to the location of small side chains in the il positions and increase of d out distance due to the intercalation of Trp or Tyr protruded from the ii positions are probably the reasons for the curvature. Small Ala residues in the oi positions of the outer helix allow outer and inner ␣-helical layers to be closer to each other (Fig. 2C), and this can increase the curvature.
Recently, an ␣/␣-solenoid structure of the transcription factor MalT from Escherichia coli has been determined (32). It has a strong curvature combined with a small twist (Fig. 2D). Again, small Ala often occupies two il positions of the inner helix, whereas one of the three ii positions always has Gln, Arg, or Tyr residues intercalating between the outer helices. Simi-larly to TPR proteins, MalT protein also has Ala in two oi positions ( Fig. 1) and, in its turn, reduced distance between the helical layers.
The three-dimensional structure of the transmembrane oligomer of the H ϩ -transporting ATP synthase from E. coli has been modeled (33) by using the solution structure of monomeric subunit "c" (36) and several intersubunit distances derived from cross-linking experiments (Fig. 2E). It is a hollow cylinder formed by 12 individual ␣/␣-hairpin subunits stacked in a consecutive array so that their packing is identical to the one in the curved ␣/␣-solenoids with the only difference that in the solenoids these hairpins are covalently linked to each other. The analysis shows that the inner helices are packed tighter than the outer ones due to a high content of Ala and Gly residues in almost all il positions. There are also residues with small side chains in ii and oi positions that reduce the interlayer distance and contribute to the increase of the curvature.  32), a representative of SUPR repeat-containing proteins; and E, an oligomer "c" of H ϩtransporting ATP synthase (PDB code 1J7F) (33). The latter structure consists of several monomers (one monomer is shown in green). On the right, there are cross-sections of a seven-helical fragment of the curved structure, a consensus sequence, and secondary structure of the repeats. The il residues, which reduce the distance between the inner (concave) helices, are shown in red, whereas ii residues, intercalated between the outer helices, are in blue. The ␣-helices are shown as cylinders in the secondary structure schemes. The consensus sequences have been taken from the literature or updated in this work: LRV repeat (7), HEAT repeat updated based on Ref. 27, TPR-updated, SUPR repeat (32). Capital letters indicate more than 40% occurrence of a given residue in a certain position, and lowercase letters indicate occurrence of a given residue more than 30%.
Thus, despite the fact that the curvature of some ␣-solenoids is partially driven by specific interactions (e.g. and 3 10 -turns of HEAT proteins), they share common structural features as follows: first, frequent occurrence of small Ala, Ser, and Gly residues in il positions that reduces the distance between the inner helices; and second, an intercalation of large hydrophilic or aromatic side chains from the inner helix to the space between the outer helices that increases the distance between outer helices. It is worth mentioning that a similar conclusion has been made previously about the origin of the curvature of the ankyrin repeat proteins (37,38). Although these proteins are not completely ␣-helical, the major part of their structure consists of pairs of antiparallel ␣-helices stacked side by side. It was suggested that conserved non-polar side chains associated with the inner helices tend to be smaller in volume than those associated with the outer helices. At the same time, in accordance with our conclusion, a fragment of clathrin, a straight rod-like ␣/␣-solenoid (6), does not have small residues between the helices. In contrast, its hydrophobic core has an unusually high number of aromatic residues.
The discovered principles of the solenoid curvature, in turn, propose amino acid substitutions that can change the original curvature of the ␣-solenoid proteins. These mutations can be used to understand the importance of the curvature for proper biological function of such proteins. It also allows the design of novel ␣-solenoids with a desirable curvature.
Proteins with Proteasome/Cyclosome Repeats Contain Sequence Motifs Characteristic for the Curved ␣-Solenoids-Understanding of the molecular basis of the ␣-solenoid curvature opens a possibility for prediction of yet unknown curved ␣-solenoid structures. In order to search for such structures, the three-dimensional information about the location of small residues in il positions of the curved ␣/␣-solenoids was translated into a sequence motif. Constructing the sequence pattern, I also took into consideration the fact that the known ␣/␣-solenoids have repetitive sequences with the repeat length ranging between 30 and 40 residues (1, 39). Therefore, the Ala/Gly-rich motif was repeated 3 times and looked as follows in PROSITE format: . It turned out, that this sequence pattern matches several proteins containing PC repeats (10). The best characterized members of this protein family include the two largest subunits of the 26 S proteasome and a protein of anaphase-promoting complex. I further examined the proteins of the 26 S proteasome (termed Rpn1 and Rpn2 in the Saccharomyces cerevisiae nomenclature or S2 and S1 in the mammalian one) using sequence analysis and molecular modeling in order to find additional support for the hypothesis that Rpn1/S2 and Rpn2/S1 have strongly curved ␣-solenoid structures.
Structural Models of Rpn1 and Rpn2 Subunits of the 26 S Proteasome-The 26 S proteasome is formed by the association of two subcomplexes, a 20 S proteolytic core and a regulatory 19 S particle that caps the core at both ends (11,13,14). Although the three-dimensional structure and enzymatic mechanisms of the 20 S particle are known (40,41), the structure of the 19 S cap is still unknown, and its functions are less defined.
The two largest proteins of the 19 S particle called Rpn1/S2 and Rpn2/S1 have tandem arrays of similar 34 -40 residue repeats within their C-terminal halves (10). The structures and functions of these repetitive domains are not known. Previously, it was suggested (10) that the three-dimensional structures of Rpn1/S2 and Rpn2/S1 are similar to the one of the LRR proteins. However, significant differences in the lengths and sequence patterns of LRR and PC repeats and an inability of the LRR-like models to explain the conservation of Gly/Ala in certain positions of PC repeats cast doubt on the validity of this model. On the other hand, a number of successful predictions of proteins with repeats below 40 residues suggest that structural prediction for proteins with repeats can be simpler and more reliable than prediction for globular proteins (39,(42)(43)(44). This, together with the identification of the sequence motif characteristic for the curved ␣-solenoids in the PC repeat-containing proteins, prompted a more comprehensive sequence and modeling analysis of the repetitive regions of Rpn1/S2 and Rpn2/S1 proteins.
Sequences of Rpn1 and Rpn2 proteins from yeast (993 and 945 residues) share significant similarities (ϳ20% identity). Both contain nine PC repeats of 35-40 residues (10), which are preceded by two more-divergent repeats (Fig. 3A). The arrays of the repeats are flanked by aperiodic N-and C-terminal sequences of unknown but, most probably, globular structures. The repetitive region of Rpn1 is interrupted in the middle by a hydrophilic negatively charged region of 100 residues.
Alignment of the repeats showed that they consist of two highly conserved parts separated by a variable region: the first part is characterized by an alternating pattern of large aliphatic residues and Gly or Ala residues, and the second one by several conserved positions of large aliphatic residues (Fig. 3B). As described above, the first region is enriched in Gly and Ala residues and matches the sequence motif characteristic for the curved ␣-solenoid. This suggests that this region corresponds to the inner ␣-helix, and the small Gly and Ala residues are clustered on their interfaces. Current state-of-the-art methods of the secondary structure prediction, for example, PSIPRED (45), also predict an ␣-helix for this region of the repeat. It is worth mentioning, that this conclusion differs from the previous result of the secondary structure predictions (10, 46), which suggested a ␤-structural conformation for this region. The second part of the repeat contains a pattern of hydrophilic and FIG. 3. PC repeats. A, arrangements of PC repeats within Rpn1 and Rpn2 subunits of the yeast 26 S proteasome. Each subunit contains nine well identified repeats (open boxes) and two "covert" repeats (hatched boxes). In Rpn1, a long negatively charged region (wavy line) interrupts the array of the repeats. In Rpn2, a long highly hydrophilic region denoted by a wavy line follows the repeats. B, alignment of TPR, PC repeat consensus sequences and PC repeats of Rpn2 from yeast. Rpn1 has a similar alignment of PC repeats (not shown). Boxes show the ␣-helical regions observed in TPR proteins and proposed for the PC repeat models. Columns containing residues with similar physicochemical properties are in boldface type. Internal ii and oi residues are shown in red. The il residues, which determine distance d in between the inner ␣-helices are shown in green. Indexes 1* and 2* denote two covert repeats. C, the three-dimensional structure of one PC repeat modeled using TPR structure as a template. Internal residues are shown in a ball-and-stick representation. The coloring of the residues is the same as used in B. q indicates a non-polar residue of consensus sequences. aliphatic residues, which, as it was noticed before (10), has a periodicity of amphiphilic ␣-helices. It can be assigned to the outer helix of the solenoid. Two regions of variable length located in the middle and end of the repeat can be assigned to the loops connecting the inner and outer ␣-helices (Fig. 3B).
Comparison of the PC repeats with those of the known ␣/␣solenoid structures revealed similarities between PC and tetratricopeptide repeats (Fig. 3B). TPR and PC repeats have similar length and consensus sequences. Furthermore, both types of proteins are involved in the same protein complexes (47) or biological pathways (30). This suggests that PC repeatcontaining proteins and TPR proteins may be evolutionarily related to each other. An ␣/␣-hairpin of TPR was used as a template to model a PC repetitive unit. In the modeled unit, the conserved aliphatic side chains are directed inside and form the hydrophobic core (Fig. 3C), whereas variable and mostly hydrophilic residues are exposed to the solvent. The next step was to estimate the curvature of Rpn1 and Rpn2 solenoids. Their curvature should be larger than that in the TPR structures because PC repeats have smaller residues in positions that control the distance between the inner helices. In addition, all known PC repeat proteins have a limited and almost identical number of repeats. Such constancy in the repeat number is unusual for "open" repetitive structures but can be explained by structures where the first and the last repeat interact with each other to "close" the structure. This limited and constant number of repeats suggests that the array of PC repeats folds into a toroid rather than solenoid structure. It is worth mentioning that TPR solenoids have a slight twist. If a similar twisting tendency exists between PC repeats, the overall toroid structure may be flattened by favorable interactions of the first and the last repeats. The symmetrical toroid structure of Rpn1 or Rpn2 can be obtained by rotation of the ␣/␣-hairpin, corresponding to one PC repeat, around the 9-fold axis of symmetry (or around the 11-fold axis, if two less apparent repeats are also included in the structure).
In order to choose optimal interhelical distances, several structures of three ␣/␣-hairpins stacked in a consecutive array have been generated. In these structures, the initial distance between the inner helices was identical or slightly greater than the corresponding 8-Å distance of the ATP synthase oligomer (33). After a session of energy minimization (100 steps of the steepest descent algorithm and 500 steps of the conjugate gradient algorithm implemented in Insight II (23)), these structures converged to one with a distance of about 9.5 Å. This distance was chosen to model the toroids of Rpn1 and Rpn2 proteins. After the modeling of interhelical loops the structures were energy-minimized and were shown to meet all standard requirements for protein structures. Fig. 4A displays a model of the PC repeat-containing region of Rpn2 from yeast. Nine ␣/␣-hairpins corresponding to the nine PC repeats form a super-helix where the first and the last unit interacts to close the symmetrical structure. The model explains the residue conservations that are observed in the sequence alignments of the PC repeats (Fig. 3B). Indeed, the conserved aliphatic residues of both inner and outer helices form the hydrophobic core of the toroid. Small Ala and Gly residues in the il positions allow a tighter packing of the inner helices compared with the outer helices. This leads to the strong curvature of the toroid. Similarly to the known TPR protein structures, the inner ␣-helix of the model is ended by a glycine-specific ␣ L -conformation that provides a short crossover to the outer helix. The occurrence of this ␣ Lconformation can explain why PC repeats frequently have a Gly residue in this position (Fig. 3B). The model has a central pore of about 20 Å in diameter. The wall of the pore is hydrophobic with a ring of positively charged residues bordering the pore (Fig. 4B). Interestingly, both Rpn1 and Rpn2 models have hydrophobic pores but only the Rpn2 model has a ring of positively charged residues. The remaining surface of the molecules is hydrophilic as in a typical soluble protein.
The proposed toroidal structure represents a novel protein fold. Among the previously determined structures, which have similar hollow cylindrical shapes, are the ATP synthase oligomer (33) (Fig. 2E) and the light-harvesting complex of photosynthetic bacteria (48). However, these are membrane proteins, and their cylinders are formed by several monomers.
It is worth mentioning that although Rpn1, Rpn2, and the other PC-containing proteins have 8 -9 well recognized repeats, they also have two additional, more divergent repeats (Fig. 3A). The current theoretical analysis cannot unambiguously predict whether the toroid contains 9 or 11 repeat units. Clarification of this point will need subsequent experimental tests.
Modeling of the Quaternary Structure of the 26 S Proteasome-In the 26 S proteasome, the 19 S regulatory particles associate with either one or both ends of the 20 S proteolytic FIG. 4. A toroidal model of the PC repeat-containing domain of Rpn2 from yeast. The structure may contain either 9 or 11 PC repeats. A model with 9 repeats is shown. Top, an axial projection of the structure. A ribbon shows the protein backbone. The van der Waals contour of the toroid is shown in gray. Bottom, the surface of the toroid pore. The figure represents a section through the toroid axis. The pore of Rpn2 is mostly apolar (green) and bordered on one side by a ring of positively charged residues (blue). Negatively charged residues are in red. core complex (11,16). The crystal structure of the 20 S core and its main enzymatic mechanisms have been determined (40,41). However, the overall organization and function of the 19 S regulatory particle are less defined. The 19 S particle is composed of two structurally distinct modules called the base and the lid (49). The base contains eight subunits (six ATPases of the AAA family and two large subunits Rpn1 and Rpn2) and is connected with the lid by Rpn10 protein (49). Like other members of the AAA family, the ATPases of the base should assemble into a hexameric ring that forms the interface of the 19 S complex with the 20 S core (50). A reliable structural model of the ATPase hexamer can be obtained by using the known crystal structures of the other ATPase rings (51). Together with the prediction of the repetitive structures of Rpn1 and Rpn2, this model opens a possibility to build the three-dimensional structure of the base and to dock it to the 20 S core particle.
The main part of the ATPase hexamer, which is involved in ATP binding and formation of the hexameric core, has been modeled using the crystal structure of the C-terminal part (residues 209 -458) of the AAA ATPase p97 (51). These parts have ϳ40% of sequence identity that gives confidence to the modeling. On the other hand, the first ϳ100 residues of the 19 S ATPases that form the hexamer periphery do not have good templates for the modeling. However, even a crude model of this part would be instrumental to fit the ATPase/Rnp1/ Rnp2 structure into the known electron microscopy density. An ␣-helical coiled-coil hairpin structure of seryl-tRNA synthase from Thermus thermophilis (52) was used to model these regions because most of them have two adjacent ␣-helical coiledcoil motifs (53). The next ϳ100 residues have ϳ20% of sequence identity to the second N-terminal domain (residues 112-190) of the ATPase p97 (51), and this four-stranded ␤-barrel was used for the modeling. Thus, a final model of the 19 S ATPase represents the N-terminal ␣-helical coiled-coil hairpin followed by the small ␤-barrel domain and the core domain of the hexamer. It is important to mention that the ATPases of the proteasome are related but nonetheless distinct from each other. Despite this, I modeled only one of them (Rpt1) and used it to construct a 6-fold symmetrical hexamer. Because the goal of the ATPase modeling in this work was to fit the structural models into the electron microscopy density, such approximation was appropriate.
In accordance with the results of the modeling, the ATPase hexamer and Rpn1 and Rpn2 toroids have the axial symmetry and the central pores. If this is true, then the ATPase hexamer and the toroids are probably all aligned along the axial pore of the 20 S core particle. Thus, the proposed structural model of the 19 S base placed on the top of the known 20 S core particle (40) has a common central pore of about 20 Å in diameter that would allow passage of a peptide substrate into the proteolytic chamber of the 20 S core (Fig. 5). The model of the complex nicely fits into the electron microscopy density of the 26 S proteasome (16,17). It can also explain an observed shape of an incomplete form of the proteasome with only the base of the 19 S particle on the top of the 20 S core (49). This form of the proteasome, similar to the suggested model, has a well recognized cylinder of a smaller diameter, nested on the cylindrical holder of a larger diameter. However, the thin cylinder of the model is longer than the one obtained by averaging the complexes observed on the electron micrographs (49) (Fig. 5). This can be explained by a limited resolution of the micrographs and by a possibility that the average image includes a mixture of complexes with one and with two Rpn1 and Rpn2 molecules. Indeed, the expected volume of the base particle can be calculated from the total mass of the subunits (Rpn1, Rpn2, and Rpt 1-6), and this calculation gives a larger volume than the one obtained by the averaging of the images.
In the model, one of Rpn1 and Rpn2 toroids is stacked on the top of the other. Previous studies used a "side-by-side" arrangement to describe Rpn1 and Rpn2 location within the 26 S proteasome (21), probably because both proteins can interact with the ATPase subunits (13, 19 -21). The stacked arrangement proposed here also can explain these experimental data inasmuch as both Rpn1 and Rpn2 form contacts with the ATPase hexamer (Fig. 5). Due to a lack of definitive experimental information, it is difficult to predict which of the two subunits is on top. The experimental evidences supporting interactions of the ATPase subunits with Rpn1 (13,18,19) outnumber ones indicative of interactions with Rpn2 (20,54), and this suggests that Rpn1 can be a "buried" pedestal of Rpn2. On the other hand, an indication that yeast protein Rad23 binds proteasome by directly interacting with the repetitive region of Rpn1 (55) favors an arrangement with Rpn1 on the top of the base. A suggested direct interaction between the toroids is supported by the fact that Rpn1 and Rpn2 have some affinity to each other (13). Interestingly, the overall shape of this dimer resembles the hollow cylinder of the 11 S regulator particle, which sometimes caps the 20 S proteolytic core (56).
In the model, the face of the ATPase hexamer, contacting the FIG. 5. Superposition of the observed EM density of the 26 S proteasome (16) and the three-dimensional structure suggested for a complex between the 20 S core particle, ATPase hexamer, and Rpn1 and Rpn2 toroids. The crystal structure of the 20 S core particle (40) is shown in magenta; structural model of the ATPase hexamer is shown in green and two stacked toroids of Rpn1 and Rpn2 are shown in blue. In this figure, the upper toroid consists of 9 repeats, whereas the buried toroid has 11 repeats to visualize the relationship between their sizes. However, in the proteasome, both toroids assumed to have the same number of repetitive units. A red contour line, corresponds to the EM density of the base of the 19 S regulatory particle (49). 20 S particle, was chosen based on the arrangement observed in the structure of the ATP-dependent HslVU protease (57). The peripheral ␤-structural and coiled-coil domains of ATPases interact with each other forming a nest for Rpn1/Rpn2 toroids (Fig. 5). As mentioned above, there are two additional "covert" repeats in the sequences of Rpn1 and Rpn2 proteins, and there is a probability for these proteins to fold into wider 11-repeat toroids (ϳ60 Å in diameter versus ϳ55 Å of the 9-repeat toroids). A model of the 11-repeat toroid was also built, and its docking demonstrated that it also fit into the ATPase "nest" (see a buried toroid in Fig. 5).
The largest subunit of another important protein complex of the cell cycle, called cyclosome or anaphase promoting complex (47,58), also has PC repeats and can be modeled similarly to Rpn1 and Rpn2 subunits of the proteasome. Given the fact that amino acid sequences of several other APC subunits are similar to the sequences of the TPR proteins with known three-dimensional structures (30), the three-dimensional structures of almost all cyclosome subunits can be modeled. This may open a possibility for reconstruction of the cyclosome quaternary structure and understanding of its function.
Functional Implication of Rpn1 and Rpn2 Subunits-The structural models of Rpn1 and Rpn2 and their location within the 26 S proteasome suggested that they, together with the ATPase ring, unfold proteins assigned for proteolysis. Indeed, the walls of the pores covered by hydrophobic residues provide an energetically favorable environment for unfolded protein substrates. The pores do not have enough room for partially unfolded protein structures, and this suggests that Rpn1/Rpn2 dimers unfold proteins starting from their N or C terminus. An intriguing detail of Rpn2 is a ring of positively charged residues bordering the pore. These positively charged residues are conserved among Rpn2 proteins from different organisms and most probably are important for the function. For example, the positively charged ring can favor protein unfolding starting from the negatively charged C terminus as it was suggested for the similar AAA ATPase complex (59). Further experiments can test the hypothesis about the functional importance of the charged ring of Rpn2.
In connection with the function suggested for these proteins, it is interesting to compare the 26 S proteasome with its bacterial homologue, the ATP-dependent protease HslVU (60). The HslVU does not have proteins similar to Rpn1 and Rpn2; however, it is fully competent in degradation and thus, presumably, in unfolding of protein substrates. The comparison shows that, instead, the HslVU ATPase ring has six characteristic distal domains I, which protrude from it and form an open-work antechamber of the protease complex (57,61). It was proposed previously (57, 62) that these domains I unfold the substrate polypeptide. The structures formed by bacterial domains I and by eukaryotic subunits Rpn1 and Rpn2 share similar locations and shapes, and this suggests that their functions are analogous. In both cases, once the polypeptide is clamped in the ATPase pore, concerted movements, promoted by ATP binding and hydrolysis, may facilitate further its unfolding and translocation.
Evolutionary Relationships of PC Repeat-containing Proteins-The sequence analysis and modeling of PC repeat-containing proteins suggest that they may be evolutionarily related to TPR-containing proteins. Indeed, TPR and PC repeats have the same length and similar consensus sequences. Some of these proteins are subunits of common protein complexes or are involved in the same cellular processes (47). Furthermore, the molecular modeling suggests that the overall structures of these proteins are similar with some differences in the curvature of their ␣-helical solenoids. Previously, it was also noticed that TPR-containing proteins and 14-3-3 proteins share common structural and functional properties, despite their lack of obvious sequence similarities (63). Thus, these three protein classes may have a common ancestor diverged into different families. Similarly to the proposed evolutionary scenario of proteins containing LRRs (64), ␤-trefoil repeats (65), and ARM/ HEAT repeats (66), the ancestor of 14-3-3, TPR, and PC repeat proteins could be a homo-multimer composed of one repeat.