An Intramolecular Chaperone Inserted in Bacteriophage P22 Coat Protein Mediates Its Chaperonin-independent Folding*

Background: The bacteriophage P22 coat protein has an HK97-like fold with an additional, genetically inserted domain. Results: The insertion domain provides thermodynamic stability to the full-length coat protein. Conclusion: The insertion domain acts as an intramolecular chaperone with the help of a peptidyl prolyl isomerase. Significance: This is the first study to establish the function of domain addition to a protein with the HK97 fold. The bacteriophage P22 coat protein has the common HK97-like fold but with a genetically inserted domain (I-domain). The role of the I-domain, positioned at the outermost surface of the capsid, is unknown. We hypothesize that the I-domain may act as an intramolecular chaperone because the coat protein folds independently, and many folding mutants are localized to the I-domain. The function of the I-domain was investigated by generating the coat protein core without its I-domain and the isolated I-domain. The core coat protein shows a pronounced folding defect. The isolated I-domain folds autonomously and has a high thermodynamic stability and fast folding kinetics in the presence of a peptidyl prolyl isomerase. Thus, the I-domain provides thermodynamic stability to the full-length coat protein so that it can fold reasonably efficiently while still allowing the HK97-like core to retain the flexibility required for conformational switching during procapsid assembly and maturation.

The proper folding of viral coat proteins is a crucial step in the virus life cycle that can drastically affect the subsequent assembly process and production of infectious particles. The coat proteins of viruses with icosahedral symmetry must retain sufficient flexibility for assembly and maturation. During assembly, identical subunits are "switched" into the slightly different conformations required to occupy quasi-equivalent sites in the icosahedron, a process known as conformational switching. Conformational flexibility is also necessary for the rearrangements during the maturation process of icosahedral capsids (1,2). Thus, a fine balance is necessitated between conformational polymorphism and the stability of monomeric coat proteins. The HK97-like viruses (named after bacteriophage HK97), which include viruses with bacterial, archaeal, and eukaryotic hosts, have icosahedral symmetry and utilize structurally conserved coat proteins (3). Many viruses in the HK97-like structural group rely on host-or virus-encoded fold-ing chaperonins for the proper folding of their viral coat proteins. One member of this group, bacteriophage P22, is an exception to this generality.
Bacteriophage P22, which infects Salmonella enterica serovar Typhimurium, has long been a model used for studying protein folding and assembly mechanisms of dsDNA icosahedral viruses (4). The assembly of bacteriophage P22 in vivo proceeds through a nucleation-limited reaction in which 415 copies of monomeric coat protein copolymerize with 60 -300 scaffolding proteins, the dodecameric portal complex, and ejection proteins to form a precursor ("procapsid") structure (5,6). Scaffolding protein directs the assembly of the procapsid but is not found in mature phage and so acts as an assembly chaperone. Procapsids mature into virions as dsDNA is packaged through the portal complex (7) and scaffolding protein concomitantly exits (8). Maturation is accompanied by an increase in volume of the head and a change in capsid morphology from one that is nearly spherical to one that is faceted (9,10).
The P22 coat protein (430 amino acids) contains 148 more residues than the mature HK97 coat protein (282 amino acids). A nearly complete, pseudoatomic model of the 47-kDa P22 coat protein based on three-dimensional image reconstruction and homology modeling (11) revealed that the additional residues were inserted into the primary amino acid sequence. The core (with the HK97-like fold) makes the bulk of the capsid contacts, with the additional domain positioned as a protrusion. The majority of the known temperature-sensitive-folding (tsf) 2 substitutions in the P22 coat protein cluster in this additional domain (4,14), indicating it is important for the folding and stability of coat protein monomers. Subsequently, a higher resolution reconstruction and model of the coat protein was published (12) that also revealed the domain. The authors suggested alternative hypotheses about the function of the I-domain in procapsid assembly and stability. The domain has been referred to as extra density domain (12,13) or telokin-like domain (11), but we will henceforth refer to it as the "insertion" domain or I-domain. * This work was supported, in whole or in part, by National Institutes of Because there is evolutionary pressure on viruses to maintain the smallest possible genome, we reasoned there must be a vital purpose for the I-domain. When the N-terminal (residues  and C-terminal (residues 191-430) halves of the P22 coat protein are expressed separately, they form inclusion bodies. However, the C-terminal half can be refolded into soluble protein that exhibits ␤-sheet content by circular dichroism (15). The autonomous folding of the C-terminal half suggests that it contains the folding nucleus of the P22 coat protein. The I-domain may act as an intramolecular chaperone, thereby eliminating the need for cellular chaperones to aid in proper folding of the coat protein. Characterization of the isolated I-domain and the coat protein core was performed to elucidate the role of this domain.

EXPERIMENTAL PROCEDURES
Chemicals, Buffers, and Media-Ultrapure urea was purchased from ICN. All other chemicals were of reagent grade and purchased from common sources. All in vitro experiments described were done in 20 mM sodium phosphate buffer (unless otherwise specified) made using Na 2 HPO 4 , with the pH adjusted to 7.6 with H 3 PO 4 . Luria broth (LB, from Invitrogen) was used to support bacterial growth.
Plasmids and Generation of Plasmid-encoded Variants-The I-domain plasmid (pID, comprising coat protein amino acids 223-345 with an N-terminal hexa-histidine tag) was constructed by PCR subcloning using primers designed to amplify coat protein amino acids 223-345, with the addition of the hexa-histidine tag, a stop codon, and appropriate restriction enzyme sites. The amplified region was digested with Nde1 and Sal1, ligated into a pET-30b expression vector, and transformed into E. coli strain BL21 (DE3) (18) by electroporation. Transformants were selected by kanamycin resistance, screened for the appropriate insert size by restriction digestion, and verified by DNA sequencing.
The core coat protein plasmid (pPC-Core, comprising coat protein amino acids 1-224 and 346 -430, with a 5-glycine linker) was constructed from pPC, a pET-3a vector (Novagen) with an insert encoding for both the scaffolding and coat protein genes (19). Inverse PCR (20) was performed with 5Ј phosphorylated PCR primers that also coded for the 5-glycine linker. Following amplification, the plasmids were recircularized by ligation and transformed into S. enterica serovar Typhimurium strain SB300A#1 (17) by electroporation. Transformants were selected by ampicillin resistance and verified by DNA sequencing. Protein expression from the various constructs was verified by induction of cultures with 1 mM isopropyl ␤-D-1-thiogalactopyranoside.
Purification of the I-domain-The recombinant I-domain was overexpressed in E. coli strain BL21 (DE3). Cells were grown at 37°C to mid-log phase in LB medium containing 40 g/ml kanamycin. Protein expression was induced with 1 mM isopropyl ␤-D-1-thiogalactopyranoside for 16 h at 30°C. After induction, the cells were harvested by sedimentation in a Sorvall SLC-6000 rotor at 5368 ϫ g for 20 min. The cells were disrupted by resuspension in 20 mM sodium phosphate (pH 7.6) containing 0.1% w/v Triton X-100, lysozyme (200 g/ml), and a 1:100 dilution of EDTA-free protease inhibitor mixture (Sigma), followed by lysis using a French press operating at 20,000 p.s.i. The lysate was brought to 5 mM MgSO 4 , 0.5 mM CaCl 2 , and 1 mM PMSF and treated with DNase and RNase (100 g/ml each) for 30 min. Cell debris was removed by sedimentation in a Sorvall F18 -12 ϫ 50 rotor at 38,725 ϫ g for 30 min. The supernatant was loaded onto a 15-ml Talon metal affinity column (Clontech, Mountain View, CA) for purification of the I-domain via the engineered N-terminal His 6 tag. Fractions containing the I-domain were pooled, and the protein was precipitated with 57.5% ammonium sulfate at 4°C. The purified I-domain was dialyzed three times against 20 mM phosphate buffer (pH 7.6).
Purification of the Core Coat Protein-The core coat protein was overexpressed in SB300A#1. Cells were grown at 30°C to mid-log phase in LB medium containing 100 g/ml ampicillin. Protein expression was induced with 0.2% L-arabinose (final concentration) for 4 h at 30°C. After induction, the cells were harvested by sedimentation in a Sorvall SLC-6000 rotor at 5,368 ϫ g for 20 min, and the resulting pellets were resuspended in buffer B (50 mM Tris-HCl, 25 mM NaCl, and 2 mM EDTA (pH 7.6)) with a 1:100 dilution of EDTA-free protease inhibitor mixture (Sigma). The cell suspension was sonicated for 6 min, and the pellets were collected by centrifugation in a Sorvall F18 -12 ϫ 50 rotor at 38,725 ϫ g for 30 min. The pellet was resuspended with washing buffer (2 M urea, 2% Triton X-100, and 0.2% ␤-mercaptoethanol in buffer B). This procedure was repeated twice. To remove the detergent and denaturant, the pellets were washed twice with washing buffer without Triton X-100 and urea. The core coat protein was solubilized with extracting buffer (9 M urea and 0.2% ␤-mercaptoethanol in buffer B) and clarified by centrifugation in a Sorvall RP80AT3 rotor at 100,000 ϫ g for 1 h, and the supernatant was reserved. The core coat protein was refolded by extensive dialysis against 20 mM phosphate buffer (pH 7.6) at 4°C.
Purification of the WT Coat Protein-The WT coat protein was obtained from empty procapsid shells that had been prepared as described previously (21,22). The ambers in gene 2 and gene 13 prevent DNA packaging and cell lysis, respectively. Empty procapsid shells were generated from the purified procapsids by incubation of the procapsids with 0.5 M guanidine hydrochloride, which extracts the scaffolding protein and the minor capsid proteins while leaving the coat protein lattice intact. Coat protein monomers were obtained from urea-denatured, empty procapsid shells. Briefly, urea-unfolded coat protein monomers were refolded by extensive dialysis against 20 mM phosphate buffer (pH 7.6 at 4°C) and clarified by centrifugation at 175,000 ϫ g at 4°C for 20 min in a Sorvall RP80AT3 rotor.
Thermolysin Time Course of Digestion-Thermolysin from Bacillus thermoproteolyticus rokko (Sigma) was prepared in 2.5 M NaCl containing 10 mM CaCl 2 (23). WT coat protein monomers, core coat protein, and the I-domain in 20 mM HEPES (pH 7.5) were digested using thermolysin at 20°C with an enzyme: substrate ratio of 1:250. At each time point, a sample was removed and quenched with reducing sample buffer with extra SDS and EDTA (4.4% SDS and 75 mM EDTA), heated for 5 min at 95°C, and analyzed by 16% Tricine-SDS-PAGE (24). The sequence of the ϳ20-kDa peptide resulting from digestion of the WT coat protein was determined by in-gel digestion and liquid chromatography coupled with tandem mass spectrometry performed by the Vermont Genetics Network Proteomics Facility.
Tryptophan Fluorescence-Fluorescence spectra were taken with an SLM Aminco-Bowman 2 spectrofluorometer thermostated at 20°C. The excitation wavelength was set at 295 nm, the emission was monitored from 310 -400 nm, and the band passes were set to 1 and 8 nm, respectively. All proteins were at a final concentration of 2.1 M.
Circular Dichroism-CD experiments were done with an Applied Photophysics (Leatherhead, Surrey, UK) Pi-Star 180 circular dichroism spectrapolarimeter with the 0.2-cm pathlength quartz cuvette maintained at 20°C with a circulating water bath. Refolded coat protein monomers were at a final concentration of 2.1 M for the WT and 2.9 M for the core. The I-domain was at a final concentration of 14 M. All measurements were done in 20 mM sodium phosphate buffer (pH 7.6), with or without 6 M urea. Wavelength scans were done over 200 -250 nm with the following settings: step resolution, 0.5 nm; bandwidth, 3.0 nm; and data averaging for 30 s/point, leading to a scan time of ϳ52 min.
To determine the melting temperature of the I-domain, the temperature was ramped from 20 -80°C with a step size of 0.5°C. At each temperature, the CD at 220 nm was averaged for 30 s. The raw data were normalized to show the fraction unfolded at each temperature.
Equilibrium Urea Titration-Solutions of the folded I-domain and the I-domain in 6 M urea were diluted to a final concentration of 14 M (0.2 mg/ml). The solutions were then mixed with a Hamilton Microlab 50 titrator to final urea concentrations from 0 -6 M. Samples were incubated at 20°C for 3 h prior to measurement. Separate equilibrium titration experiments beginning from only the folded or unfolded I-domain were done to ensure the reversibility of the folding. The CD signal at 220 nm was averaged for 30 s/point with a slit width of 4 nm. A 0.2-cm path length quartz cuvette maintained at 20°C with a circulating water bath was employed for all measurements. The thermodynamic stability (⌬G°N -U ) and the sensitivity of the transition to denaturant (m eq ) were determined by fitting the equilibrium data, assuming a linear relationship between the free energy of unfolding for the transition and the denaturant (25).
Kinetics of Unfolding and Refolding-Unfolding and refolding kinetic experiments were done with the folded and unfolded I-domain. In each experiment, the final concentration of the I-domain was 14 M. To initiate an unfolding reaction, the folded I-domain at 800 M was diluted (14 M final) with buffered urea. The solution of the folded I-domain was placed in the bottom of the cuvette, and the buffered urea was pipetted onto the folded protein while stirring constantly. Refolded experiments were done similarly with the I-domain that had been denatured in 4.5 M urea. To initiate refolding, the unfolded I-domain at 400 M was diluted (14 M final) with buffered urea solutions. The final urea concentration was determined by measuring the refractive index. The constantly stirred reactions were monitored by CD at 220 nm using a 1.0-cm path length quartz cuvette maintained at 20°C with a circulating water bath. This mixing method led to a dead time of ϳ 5-7 s.
Refolding nM. For the coat protein, the reactions were monitored by CD at 222 nm. All kinetic traces were fit to a first-order reaction with either one (for the I-domain) or two (for the coat protein) exponentials using the Kaleidagraph program, where (t) is the CD signal as a function of time, t, f is the final signal, 1 is the amplitude of the CD signal change because of the first relaxation, 2 is the amplitude of the CD signal change because of the second relaxation, and is the relaxation time (or the inverse of the apparent rate constant, 1/k. The relaxation times from the kinetic experiments were plotted, in a single chevron plot, against the urea concentration on a semilog plot. The urea dependence of the unfolding relaxation times was determined with Kaleidagraph using the equation (26), where k u 0 is the apparent rate constant of the unfolding reaction in the absence of urea and m u ‡ is the denaturant dependence on the unfolding reaction.

Cloning and Purification of the I-domain and the Coat Pro-
tein Core-Limited proteolysis of the C-terminal half of the coat protein (residues 191-430) results in a stable fragment identified as residues 223-349 (11). This contiguous, folded domain is linked to the HK97-like core of the coat protein via two strands of the ␤-hinge, a structurally conserved ␤-sheet at the center of the protein (4). To investigate the properties and role of the I-domain of P22 coat protein, we cloned the I-domain (residues 223-345, Fig. 1a) and the coat protein without its I-domain (core coat protein, residues 1-224 linked to 346 -430, b). The N-and C-terminal sections of the core coat protein were connected with a 5-glycine linker, joining two strands of the ␤-hinge (Fig. 1, b and e). Distance measurements on the basis of the cryoelectron microscopy reconstructions (11,12) indicate a distance between residues 225 and 345 of ϳ10 -13 Å. A typical staggered C-␣ to C-␣ bond is 3.63 Å (27). Therefore, a 5-glycine linker would span about 22 Å and should provide sufficient length and flexibility to accommodate the connection between the N-and C-terminal sections of the core coat protein.
The I-domain was overexpressed, easily purified by a Talon metal affinity column (Clontech), and remained soluble at concentrations exceeding 40 mg/ml (Fig. 1c). Previous work indicated that, when separately expressed, the C-terminal 239 residues of the coat protein, which includes the I-domain, form a dimer in solution. The oligomeric state of the I-domain was assessed by size exclusion chromatography using a Superdex 200 column (GE Healthcare). The protein was loaded onto the column at 4 and 40 mg/ml. Both elution profiles, monitored by absorbance at 280 nm, showed a single peak (data not shown). On the basis of calibration of the column with standards of known molecular weight, the molecular mass of the I-domain (15 kDa) is consistent with a monomeric protein.
The core coat protein formed inclusion bodies when overexpressed in Salmonella cells, indicating that it does not fold properly in vivo without the I-domain. The inclusion bodies were denatured with urea, and the core coat protein was refolded by dialysis following a procedure similar to that used to refold WT coat protein monomers from urea-denatured empty procapsid shells (Fig. 1d).
Stability of the Core Coat Protein and I-domain Compared with the Full-length WT Coat Protein-The stability of the core and I-domain was first assessed using sensitivity to protease. TLN digestion of refolded WT and core coat protein monomers and the I-domain was analyzed over time. The core coat protein showed a substantial increase in digestion with time compared with the WT coat protein. Complete digestion by TLN of the full-length core coat protein occurred in 5 min, whereas the full-length coat protein required 30 -60 min. Conversely, the I-domain was more protease-resistant compared with the WT coat protein, with a substantial amount of the full-length I-domain remaining at 60 min (Fig. 2) and longer (data not shown). Interestingly, digestion of the WT coat protein resulted in a well defined stable band at ϳ 20 kDa (Fig. 2,  arrow). In-gel digestion and liquid chromatography coupled with tandem mass spectrometry revealed that it comprised residues 203-261. This protease-resistant peptide is the I-domain with N-and C-terminal extensions of 20 and 16 residues, respectively. These results indicate that the I-domain forms an independently folded domain within the WT coat protein. In addition, the I-domain has a high stability in both the isolated domain and in the context of the full-length WT coat protein.
The Refolded Core Coat Protein Has More Random Coil than the WT Coat Protein-The enhanced protease sensitivity observed in the core coat protein could be due to improper folding or a decrease in stability. The folding of the full-length WT coat protein has been thoroughly investigated using CD for the secondary structure and tryptophan (Trp) fluorescence for the tertiary structure (21). The CD spectrum of the core coat protein was compared with the spectrum of WT coat protein monomers to determine whether the refolded core coat protein adopted a conformation with a regular secondary structure. The spectrum of the WT coat protein (Fig. 3a, gray) is consistent with a protein with high ␣-helical content mixed with ␤-sheets. The spectrum of the core coat protein (Fig. 3a, green) indicates a decrease in ␣-helical content and an increase in random coil. These data, along with the formation of inclusion bodies in vivo and enhanced protease sensitivity, indicate that the core coat protein cannot adopt the appropriate HK97-like fold (Fig. 1e) in the absence of the I-domain. The fluorescence of tryptophan residues can be used to probe for the presence of a hydrophobic core in proteins. Under native conditions, both the WT and core coat proteins displayed fluorescence emission maxima at 340 nm (Fig. 3b), indicating that some tryptophan residues are buried in a hydrophobic environment. Under denaturing conditions, the emission maxima of both proteins shifted to 355 nm, and the intensity decreased. Thus, the core coat protein likely adopts a compact conformation even though it is not properly folded.
The core coat protein deletion length (⌬225-345, ⌬225-365, ⌬233-338, and ⌬236 -334) and linker (5G, 3A, 3G, GGPGG, and GPG) were varied to verify that the inability to adopt the proper fold was not due to construct design. None of the core coat protein variations remained soluble when expressed in vivo or exhibited a WT-like secondary structure when refolded in vitro (data not shown).
The I-domain Is Spectroscopically Invisible-Next we set out to characterize the isolated I-domain in more detail. The I-do-main had little CD signal (Fig. 4a). The I-domain has significant ␤-sheet content connected by long loops (random coil) that have opposing CD signals (28). In this peculiar instance, these CD signals counterbalance, leading to a CD spectrum with a signal near the base line. In addition, its single tryptophan was solvent-exposed (Fig. 4b). Because the I-domain is spectroscopically invisible at the concentrations and conditions used in previous studies of the WT coat protein, only indirect contributions from this domain were observed in our previous characterization of the WT coat protein (21). Also, because of these considerations, monitoring the unfolding transition of the I-domain using these conventional techniques is difficult. Although small, the change in CD signal upon I-domain unfolding was found to be reproducible. Therefore, the CD signal at 220 nm was used as a probe to monitor the folded state of the I-domain for many of the experiments described below.
The I-domain Is More Stable to Heat and Denaturant than the Full-length Coat Protein-The I-domain shows a thermally induced decrease in the CD signal that can be used to monitor its melting transition. This was compared with the melting  . The core coat protein has significant random coil. a, circular dichroism spectra of the WT and core coat proteins in mean residue ellipticity (degrees cm 2 dmol Ϫ1 amino acids Ϫ1 ). The spectra were obtained as described under "Experimental Procedures." The protein concentrations were 0.1 mg/ml (2.1 M for the WT coat protein and 2.9 M for the core coat protein). b, fluorescence emission spectra of native (closed) and denatured (open) WT and core coat protein in arbitrary units (a.u.). The excitation wavelength was 295 nm. The concentrations of the WT and core coat protein were 2.1 M.
transition of the WT coat protein monitored by Trp fluorescence. Because the I-domain has only a single solvent-exposed tryptophan, changes in fluorescence upon unfolding of the WT coat protein only reflect denaturation of the HK97-like core (with no signal from the I-domain). The transition midpoint of the I-domain is ϳ15°C higher than the transition midpoint of the coat protein (Fig. 5a). This suggests that, for the full-length coat protein, the core may unfold first, followed by the I-domain. The heat-induced denaturation of the I-domain is ϳ90% reversible, whereas the full-length coat protein aggregates with heat denaturation (data not shown).
Equilibrium urea denaturation experiments were done with the I-domain to determine its thermodynamic stability (⌬G N-U o ).
In these experiments, I-domain was incubated in 0 -6 M urea until equilibrium was established, and CD at 220 nm was used to monitor the unfolding transition (Fig. 5b). Despite the small change in signal upon urea denaturation, it was possible to monitor the transition region and the pre-and post-transition base lines. The equilibrium data were fit to a standard two-state for the full-length coat protein is Ϫ5.8 kcal/mol (21). Additionally, the I-domain is demonstrated to be well folded because its m eq value is consistent with that of model proteins of similar size (29). Thus, a significant portion of the coat protein thermodynamic stability is due to the I-domain but was not observed because of its optical properties. We hypothesize that this domain may serve to add stability to the monomeric coat protein while still allowing the HK97-like core to retain the inherent flexibility required for conformational switching during procapsid assembly and maturation.
Refolding and Unfolding Kinetics of the I-domain-Single jump experiments were performed to determine whether the I-domain also enhances the refolding kinetics of the WT coat protein. For refolding jumps, the denatured I-domain, denatured in 4.5 M urea, was rapidly diluted into buffered solu- tions containing lower urea concentrations. For unfolding jumps, the folded I-domain was diluted with buffered urea solutions. For both refolding and unfolding reactions, the relaxation time to reach a new equilibrium was monitored by CD at 220 nm. The refolding and unfolding reactions of the I-domain were fit well by a first-order rate equation with a single exponential (Fig. 6, a and b). The residuals, the difference between the fit and the data, are randomly distributed above and below zero, indicating a good fit (30). In contrast, WT coat protein folding and unfolding kinetics have been shown previously to be the best fit to a first-order reaction with two exponentials (21).
The relaxation times from the fits of the kinetic data obtained by CD were plotted against the urea concentration in a chevron plot (Fig. 6c) (31). The peak of the chevron does not occur at the urea concentration that corresponds to the midpoint in our equilibrium urea titration experiment (3.2 M urea, see Fig. 5b). In addition, the slope of the folding arm is relatively independent of urea concentration. The   on or off the productive folding pathway) or changes in rate-limiting steps (32). We independently fit the unfolding arm (see "Experimental Procedures") to determine the u o ϭ 3.9 ϫ 10 4 s and m u ‡ ϭ Ϫ8.7 kcal mol/M, where u o is the apparent relaxation time of the unfolding reaction in the absence of urea and m u ‡ is the denaturant dependence on the unfolding reaction.
Nuclear magnetic resonance studies to determine a highresolution NMR structure of the isolated I-domain are currently underway (Ref. 33 and unpublished data). 3 Preliminary results from these studies revealed that the peptide bond preceding the proline (peptidyl prolyl bond) at position 310 is in the cis conformation (Fig. 6d). A cis peptidyl prolyl bond can complicate the in vitro folding analysis because the incorrect trans form predominates in an unfolded protein, and the cis º trans isomerization is intrinsically slow. The slow refolding kinetics of the I-domain suggests the possibility that interconversion of the peptidyl prolyl bond at residue 310 from the nonnative trans isomer to the correct cis configuration is limiting the rate of refolding. Such rate-limiting steps for in vitro protein folding are well documented (34,35). To determine whether the slow folding reaction of the I-domain is limited by proline isomerization, we performed several experiments, which are described below.
Double Jump Pulse Proteolysis Refolding Kinetics-As a protein unfolds, the peptidyl prolyl bonds become free to isomerize. Initially, the bond preceding Pro-310 will remain in its native cis configuration, and the protein should be in its "fastfolding" form. For longer unfolding times, isomerization of this bond will reach equilibrium, producing "slow-folding" species for which the prolyl bond at residue 310 is in the trans configuration. These slow-folding species require isomerization of this peptidyl prolyl bond prior to the formation of the native structure. A double jump experiment coupled with pulse proteolysis was performed to analyze this possibility (see the experiment schematic in Fig. 7a).
In the first jump, the I-domain is unfolded by rapid dilution into 8 M urea (for 6 s or 3600 s). The duration of the 6-s unfolding jump was chosen so that refolding initiates from a state that is fully unfolded (determined from extrapolation of the unfolding arm in the chevron plot, Fig. 6c) but contains native proline isomers because cis 3 trans prolyl isomerizations generally have relaxation times of tens to hundreds of seconds at 25°C (36). In fact, the relaxation time for cis 3 trans prolyl isomerization when the preceding residue is a threonine (as in our case) was measured on a model peptide to be 370 s (37). The refolding jump is then initiated by dilution into buffer. Thermolysin pulse proteolysis, which exploits the change in proteolytic susceptibility between folded and unfolded proteins (38,39), was used to measure the amount of native I-domain at varying times after refolding was initiated (Fig. 7b). The short incubation time and TLN concentration of the pulse proteolysis (15 s) were designed so that only unfolded protein is digested, leaving native protein intact. Quantification of the full-length I-domain (Fig. 7c) shows that the slow-folding species seen as a result of long-term denaturation (unfolded for 3600 s) is not present in the short-term denaturation sample (unfolded for 6 s). The double jump refolding experiments demonstrated that the short-term denatured I-domain with all proline residues in their native isomeric state folds to the native state within 6 s, likely because there are no slow isomerization reactions that need to occur. It was possible to fit the refolding pulse proteolysis data of the long-term denaturation sample. The relaxation time from pulse proteolysis is consistent with the CD kinetics data (Fig. 6c, X), indicating that both techniques were monitoring a global folding event. However, this folding event is rate-limited by peptidyl prolyl isomerization, for which direct evidence is provided below. Pulse proteolysis has . The I-domain folds immediately when the prolines are in the correct isomeric state. a, the I-domain was unfolded in 8 M urea for 6 or 3600 s, diluted into buffer, and then allowed to refold for varying times before being subjected to a 15-s pulse of digestion using a relatively high concentration of the nonspecific protease thermolysin. b, samples were quenched with sample buffer, and the extent of proteolysis was observed by 16% Tricine-SDS-PAGE. c, the full-length I-domain was quantified using densitometry and plotted versus refolding time. The solid gray line is the fit of the 3600-s unfolding data to a first-order rate equation with one exponential (see "Experimental Procedures"). The relaxation time from these data is included in Fig. 6.
been used previously to determine protein stability (38) and monitor unfolding kinetics (39). Here we show that this technique can also be modified to monitor refolding kinetics and in a double jump experiment.
Refolding with Cyclophilin A-In vivo the cis º trans isomerization of peptidyl prolyl bonds is catalyzed by a group of ubiquitous enzymes, the peptidyl prolyl isomerases (PPIases). The I-domain was refolded in the presence of the PPIase CyPA. This enzyme has been shown to catalyze peptidyl prolyl cis º trans isomerization reactions for model peptides and numerous pro-teins in vitro (36). Refolding jumps were performed by rapidly diluting I-domain denatured in 4.5 M urea for 60 min with buffer containing CyPA, and the relaxation time to reach a new equilibrium was monitored by CD at 220 nm. The refolding reactions of the I-domain were fit well by a first-order reaction with a single exponential (Fig. 8a, solid lines).
As the CyPA concentration was increased, the rate of I-domain folding increased correspondingly (decrease in relaxation time). The observed relaxation time of I-domain refolding decreases almost 15-fold at the highest CyPA concentration tested (0.57 M, Fig. 8a, cyan), providing direct evidence that proline isomerization is responsible for the slow folding kinetics. If the I-domain provides kinetic enhancement for coat protein folding, then CyPA should also increase the rate of coat protein folding. The full-length WT coat protein was refolded with CyPA to determine what effect increasing the rate of I-domain folding had on the rate of core folding. Refolding jumps were performed by rapidly diluting coat protein denatured in 4.5 M urea for 60 mn with buffer containing CyPA, and the relaxation time to reach a new equilibrium was monitored by CD at 222 nm. The refolding reactions of the coat protein were best fit well to a first-order reaction with two exponential (Fig.  8b, solid lines). Although not as drastic as with the isolated I-domain, CyPA decreased both the fast and slow relaxation times. Thus, rapid folding of the I-domain accelerates the folding of the core coat protein.

DISCUSSION
Efficient and rapid folding of viral coat proteins is vital to produce a supply of functional proteins for capsid assembly. However, there is a balance between the stability of the monomeric coat protein and the structural flexibility required for conformational switching during assembly and maturation. Bacteriophage P22 has a genetically inserted domain (I-domain) in its HK97-like core (11)(12)(13). Here we provide compelling evidence the I-domain acts as a folding nucleus and provides thermodynamic stability to the monomeric coat protein.
The I-domain Acts as an Intramolecular Chaperone-The role of a protein domain serving as a nucleator for the folding of multidomain proteins was proposed 30 years ago (40). The kinetic studies performed in the presence of CyPA suggest that folding of the coat protein initiates with proper conformational folding of the I-domain. Once in the native conformation, the I-domain contributes substantial thermodynamic stability, thereby allowing productive folding of the surrounding core. This domain is connected to the core via two strands of the ␤-hinge and possibly mediates the interactions between the Nand C-terminal halves of the core. Indeed, the core cannot adopt the proper HK97-like fold in the absence of the I-domain, as evidenced by the in vivo aggregation and in vitro destabilization of the core coat protein.
The coat proteins of phages HK97 and T4 require assistance from chaperonin complexes for proper folding both in vivo and in vitro. HK97 hijacks the GroEL-ES chaperone machinery of the host to properly fold its coat protein (41). T4 utilizes a hybrid chaperone complex composed of the bacterial chaperone GroEL and the phage-encoded gp31 (42). Conversely, the P22 coat protein does not require assistance from chaperonins   for proper folding (43). The I-domain addition to the P22 core increases the stability of the monomeric coat protein, likely eliminating the chaperonin requirement seen with other HK97-like coat proteins. Another striking difference is that the coat proteins of HK97 and T4 are not stable as monomers but form pentamers and hexamers immediately after folding with the assistance of chaperonin complexes. HK97 and T4 assemble from these preformed capsomers, whereas P22 assembles from monomeric coat proteins (44). This alteration in the assembly pathways may also reflect the stability of the monomeric coat proteins.
There is no evidence that the coat protein of eukaryotic HSV-1 requires chaperones for proper folding. The upper domain of the HSV-1 coat protein has been proposed to function as a folding nucleus (45). The authors suggest that this stable, upper domain may act as an inherent chaperone providing structural stability for the HK97-like floor domain of the protein, which must be flexible to participate in procapsid assembly. We propose a similar role for the I-domain of the P22 coat protein. The additional stability provided by the I-domain does not restrict the necessary conformational plasticity of the core, which is essential because a decrease in flexibility of the core results in assembly defects such as polyhead formation (46). Therefore, I-domain insertion provides a means of stabilizing the core without disrupting its ability to participate in assembly or subsequent rearrangements during maturation. However, this intramolecular chaperone likely requires assistance from a host-encoded PPIase before conformational folding can proceed.
Coat Protein Likely Utilizes a Host-encoded PPIase for Folding-The peptide bond has a partial double-bond character and, therefore, two energetically preferred states: cis and trans. For non-prolyl peptide bonds, the trans conformation with dihedral angles () of 180°is energetically favored because of steric constraints. However, peptide bonds N-terminal to proline residues (peptidyl prolyl bonds) are unique in that the trans conformation is only slightly energetically favored over the cis conformation. Consequently, in native, folded proteins, a higher occurrence of cis peptidyl prolyl bonds (ϳ6%) is found compared with Ͻ 0.1% for non-prolyl peptide bonds (47). The ribosome is thought to synthesize peptide bonds in the trans form and, therefore, trans 3 cis isomerization must occur for proteins with native cis peptidyl prolyl bonds. In vivo, the PPIases catalyze this cis º trans isomerization of peptidyl prolyl bonds. Cyclophilins are a structurally conserved family of enzymes with PPIase activity that have been found in both prokaryotic and eukaryotic organisms (48,49). CyPA, a member of the cyclophilin family, was shown here to increase the rate of coat protein folding in vitro by catalyzing the isomerization of the peptidyl prolyl bond at residue 310 in the I-domain to its native cis conformation. Therefore, we propose that a member of the PPIase enzymes is involved in P22 coat protein folding in vivo, catalyzing native I-domain formation and, thereby, allowing its intramolecular chaperone activity.
Replacement of proline residues with alanine by site-directed mutagenesis is a classic means of detecting and characterizing proline-limited folding reactions. Substitutions of cis prolines are often strongly destabilizing because the cis conformation of the respective bond may be maintained after replacement (50,51). Coincidently, an alanine substitution of the cis peptidyl prolyl bond at residue 310 in the coat protein was investigated unintentionally. The P22 coat protein has been used as a model to study protein folding for some time (4). Eighteen single amino acid substitutions in the coat protein, resulting from random mutagenesis of the phage, have been characterized that result in a tsf phenotype (14). One of the most severe of these tsf coat protein variants (lethal at 33°C) is P310A (14,52). This amino acid substitution results in significant coat protein misfolding and aggregation, even at the permissive temperature of 28°C (14). PPIases do not catalyze isomerization of non-prolyl peptide bonds. Therefore, maintenance of a cis peptide bond at this position after alanine substitution would have a detrimental effect on folding. A strong decrease in the folding rate because of slow uncatalyzed trans 3 cis isomerization of this non-prolyl peptide bond would abolish the ability of this domain to serve as an intramolecular chaperone. The GroEL-ES complex rescues the folding of the tsf coat protein variants, including P310A, suggesting that encapsulation of the coat protein in the folding chamber of the chaperonin allows sufficient time for proper folding. In vitro characterization of the P310A coat protein is necessary to confirm the maintenance of the cis peptide bond and the effect on the protein folding mechanism, although a preliminary study indicated that CyPA had no effect on the kinetics of refolding of the P310A coat protein in vitro. 3 A limited suppressor search of the P310A tsf phenotype identified five suppressors (53). Four of the five suppressors are located in the I-domain. The effect of these suppressor substitutions on the folding of coat protein has yet to be investigated.
Additions to the HK97-like Fold in Other Phage Coat Proteins-Other phages also have additional domains in their coat protein structure that are not part of the HK97-like fold, such as the Big2-like domain in 29 (54) and a chitin-binding-like insertion domain in T4 (55). These additional domains are hypothesized to stabilize the assembled capsid by bridging the neighboring molecules within one capsomer (T4) or between neighboring capsomers (29). Cryoelectron microscopy reconstructions of the phage C1 (56) and phage P-SPP7 (57) capsids also revealed additional density (not attributed to the HK97like fold) at the two-fold axes resulting from interactions across capsomers. Although the coat protein of HK97 does not have any additional domains, its capsid is uniquely stabilized through a network of covalent cross-links (58).
The process of protein folding is complex and unresolved. The folding mechanisms of large multidomain proteins are extremely complicated. Much advancement in understanding protein folding has come from studying small, isolated domains. Our work on the isolated I-domain of the P22 coat protein allows insight into its role in the bacteriophage life cycle that was not possible from studies of the WT coat protein. This domain provides thermodynamic stability to the monomeric coat protein and possibly contributes structural stability to assembled particles. In addition, it may abolish the requirement for host chaperonins for folding of the coat protein by serving as an intramolecular chaperone. Indeed, it may be possible to use the P22 insertion domain to stabilize HK97-like proteins or other, generally unstable proteins. A similar approach has been investigated. An unstable guest protein was inserted into a highly stable host protein (59). However, this approach tethers the N-and C termini and, therefore, constricts the flexibility of the protein. Using a stable insertion as an intramolecular chaperone may be preferred when conformational plasticity is required for assembly or function.