Structural Mapping of an Aggregation Nucleation Site in a Molten Globule Intermediate*

Protein aggregation plays an important role in biotechnology and also causes numerous diseases. Human carbonic anhydrase II is a suitable model protein for studying the mechanism of aggregation. We found that a molten globule state of the enzyme formed aggregates. The intermolecular interactions involved in aggregate formation were localized in a direct way by measuring excimer formation between each of 20 site-specific pyrene-labeled cysteine mutants. The contact area of the aggregated protein was very specific, and all sites included in the intermolecular interactions were located in the large β-sheet of the protein, within a limited region between the central β-strands 4 and 7. This substructure is very hydrophobic, which underlines the importance of hydrophobic interactions between specific β-sheet containing regions in aggregate formation.

Protein aggregation is highly important in biotechnology and biomedicine, as well as in studies in vitro focused on the mechanism of protein folding. Moreover, protein aggregation is known to be involved in several disorders, such as Alzheimer's disease, cystic fibrosis, and prion diseases. Aggregation-induced formation of inclusion bodies is often a major obstacle in large scale production of heterologous proteins, which is a drawback in biotechnology and pharmaceutical industries and in biomedical research (1,2). To be able to develop a rational strategy for design of novel drugs that intervene in aggregation and to control aggregation in various situations, it is necessary to understand the mechanism underlying aggregation. Transient formation of aggregates is also a potential problem in studies of refolding kinetics, because aggregates can easily be mistaken for on-pathway folding intermediates (3).
Protein aggregation has long been regarded as a nonspecific process. However, recent observations suggest that it is more likely that aggregation is due to specific interactions of partially folded intermediates (1,2,4). For several proteins, it has been found that partially folded intermediates (5) and molten globule intermediates (6) are highly prone to aggregate. For instance, the molten globule state has been suggested to induce the formation of the pathogenic scrapie form of the prion protein (7). Several reports have also indicated that aggregates often have a very high ␤-sheet content (2). Knowledge of highly resolved structures of aggregated domains is a prerequisite for understanding the mechanism of aggregation. So far, the most detailed information available on the structural nature of these aggregates have been obtained by using indirect methods, such as mutagenesis experiments (1). Therefore, it is essential that a detailed mapping of the intermolecular interactions involved can be investigated using complementary methods that directly monitor the interaction in the interface of aggregated proteins.
Human carbonic anhydrase II (HCA II) 1 (Fig. 1) has a molecular mass of 29.3 kDa and consists of 259 amino acid residues (8). It is basically a ␤-sheet protein that is divided into two halves by 10 ␤-strands that span the entire molecule. Considering the schematic drawing shown in Fig. 1 the upper half of the protein contains an N-terminal subdomain and the active site region, and the lower half contains a large hydrophobic core (9,10). The folding reaction of HCA II has been thoroughly studied (11), and it has been demonstrated that unfolding of the enzyme is a three-stage process that includes formation of a stable equilibrium intermediate of molten globule type at moderate concentrations of denaturant (12). Our folding studies have shown that HCA II forms aggregates during the refolding process and during incubation at elevated temperatures (13)(14)(15); similar aggregation behavior has been reported for bovine carbonic anhydrase II (16,17). Refolding in the presence of GroEL significantly increases the reactivation yield of HCA II and prevents aggregation at elevated temperatures, which indicates that the protein unfolds/refolds via an aggregationprone state (13)(14)(15).
In the present study, we used a direct method to map the region involved in aggregation by measuring excimer fluorescence of site-specific pyrene-labeled cysteine mutants to probe intermolecular interactions. We found that the aggregation of HCA II is highly specific and involves a molten globule-like intermediate. Furthermore, the regions that participate in the intermolecular interactions were identified. A key feature proved to be specific interactions in the central ␤-sheet structure, in which large patches of contiguous hydrophobic strands are probably exposed. tagenesis, protein production, and purification were performed as described in Freskgård et al. (31). Protein concentrations were determined from absorbance at 280 nm using ⑀ 280 nm ϭ 54,800 M Ϫ1 cm Ϫ1 for HCA II and for all mutants in which no tryptophan was removed. For tryptophan mutants ⑀ 280 nm values from Freskgård et al. (31) were used. GroEL was prepared according to Persson et al. (14).
Stability Measurements-To determine the stability of HCA II and mutants thereof, the enzyme (0.85 M) was incubated overnight in various concentrations of GuHCl containing 0.1 M Tris-H 2 SO 4 , pH 7.5. The intrinsic tryptophan fluorescence was used to monitor the unfolding of the protein. Fluorescence spectra were obtained on a Hitachi F-4500 spectrofluorimeter equipped with a thermostatted cell. The spectra were recorded in a 1-cm quartz cuvette at 23°C. The excitation wavelength was 295 nm, and three accumulative emission spectra were recorded in the wavelength region 310 -450 nm. 5-nm slits were used for both excitation and emission.
Refolding Measurements-HCA II was denatured for 24 h in various concentrations of GuHCl (0.75 to 5 M). Protein concentrations were 11 and 22 M. Refolding was initiated by dilution of the denatured enzyme solution to 0.3 M GuHCl and a final protein concentration of 0.85 M. The GuHCl solutions were buffered with 0.1 M Tris-H 2 SO 4 , pH 7.5. The CO 2 hydration activity of the enzyme was measured after 2 h of refolding. The enzyme activity assay has previously been described (32).
Refolding experiments were also conducted with added chaperonin GroEL. HCA II pwt (22 M) was denatured for 24 h in different GuHCl concentrations (5.0 M and 2.0 M). Refolding was started by dilution to 0.3 M GuHCl and a protein concentration of 0.85 M. GroEL-mediated refolding was achieved by inclusion of GroEL (1:1 molar ratio to HCA II pwt ) in the dilution buffer (0.1 M Tris-H 2 SO 4 , pH 7.5).
Dynamic Light Scattering Measurements-Dynamic light scattering measurements were conducted using a Brookhaven BI-90 instrument, equipped with a 2 W argon laser (Lexel Corp.). The measurements were performed at a wavelength of 488 nm. A laser power of 700 mW and a scattering angle of 90°were used. Sample pathlength was 1 cm in a thermostatted cell, at 20°C. The 2-ml samples were made up in various concentrations of GuHCl containing 0.1 M Tris-H 2 SO 4 , pH 7.5, and were incubated for 1 h prior to measurement. Protein concentration was 17 M. Each sample was filtered through a 50-nm filter prior to measurement to remove dust particles.
N-(1-Pyrene)maleimide Labeling-The fluorophore N-(1-pyrene)maleimide was used for the excimer screening. This probe has the advantage that the label is nonfluorescent until it has reacted with a cysteine (33); thus only labeled protein will be detected in the resulting emission spectrum.
The protein was labeled overnight with an equimolar amount of N-(1-pyrene)maleimide in the unfolded state, at 5.0 M GuHCl buffered with 0.1 M Tris-H 2 SO 4 , pH 7.5, and subsequently refolded to the molten globule state by dilution to 2.0 M GuHCl and a final protein concentration of 1 M. After 2 h of refolding pyrene fluorescence spectra were recorded (see below).
N- iodoacetamide Labeling-In addition to the N-(1pyrene)maleimide screening the naturally occurring Cys-206 in the wild type HCA II and the engineered Cys-245 in the W245C/C206S mutant, respectively, were labeled with N-(1-pyrenemethyl)iodoacetamide. The use of this fluorophore enabled purification of the pyrene-labeled species, because the iodoacetamide-thiol conjugate is chemically stable during affinity chromatography purification performed at pH 9. The N-(1-pyrene)maleimide conjugate on the contrary can give rise to a hydrolyzed linker if a pH Ͼ 8 is applied (34).
The labeling with N-(1-pyrenemethyl)iodoacetamide was performed as follows: 15 mg of protein was dissolved in 15 ml of 5 M GuHCl containing 0.1 M Tris-H 2 SO 4 , pH 7.5, and an equimolar concentration of ␤-mercaptoethanol. 10 molar excess of N-(1-pyrenemethyl)iodoacetamide dissolved in 200 l of Me 2 SO was added in aliquots, and the reaction was allowed to proceed overnight on a mechanical shaker in the dark. The reaction was quenched with a 2 molar excess of ␤-mercaptoethanol over reagent and centrifuged to remove precipitated reagent. The pyrene-labeled enzyme was refolded for 3 h by dilution with 0.1 M Tris-H 2 SO 4 , pH 7.5, to a volume of 600 ml containing a final concentration of 0.13 M GuHCl and of 0.85 M protein. The pH of the solution was then adjusted to 8.7 followed by purification by affinity chromatography (35). The degree of labeling was determined spectrophotometrically (⑀ 344 nm ϭ 41,000 M Ϫ1 cm Ϫ1 ) (33). The protein concentration was estimated from the absorption at 280 nm after subtraction of the contribution from the probe. The degree of modification for both derivatives (HCA II and C206S/W245C mutant) was determined to 0.9 -1.0 pyrene/protein molecule.
Pyrene Fluorescence Measurements-The pyrene emission spectra of N-(1-pyrene)maleimide-labeled mutants were recorded from the labeled protein (1 M) in 2.0 M GuHCl containing 0.1 M Tris-H 2 SO 4 , pH 7.5. The spectra from the N-(1-pyrenemethyl)iodoacetamide-labeled proteins were obtained from 1 M of protein incubated overnight in various concentrations of GuHCl buffered with 0.1 M Tris-H 2 SO 4 , pH 7.5. Both types of labeled protein were excited at 344 nm, and the emission spectra were registered in the wavelength range 360 -550 nm using 5 nm excitation and 2.5-nm emission slits.

RESULTS AND DISCUSSION
Formation of Aggregates in the Molten Globule State of HCA II-Tryptophan fluorescence was measured to monitor global unfolding of HCA II. The protein contains 7 tryptophan residues rather evenly distributed in the structure, thus the change in intrinsic tryptophan fluorescence should be a reliable parameter of global conformational changes accompanying unfolding of the protein (18). Equilibrium unfolding of HCA II, and mutants thereof, revealed a three-state transition curve N 3 I 3 U (Fig. 2, inset, and Table I). The midpoints of denatur- ation of HCA II occurred at 1.0 and 2.3 M GuHCl for the first and second unfolding transitions, respectively. The midpoints of concentration of denaturation for the many mutants in this study are summarized in Table I. The structure of the folding intermediate in the GuHCl concentration interval 1-2 M has been characterized in detail and shown to be of molten globule type (12,19). The HCA II intermediate has been shown to have exposed hydrophobic patches, because this state binds the fluorescent probe 1-anilino-naphtalene-8-sulfonate (20).
The refolding yield of HCA II was found to be 70%, as previously reported (21), when the protein was denatured in high concentrations of GuHCl (Ͼ3M; Fig. 2). The refolding yield was, however, significantly lower when refolding was performed on HCA II that was denatured in lower concentrations of GuHCl. The reactivation yield is mirrored by the two unfolding curves N 3 I and I 3 U, respectively, and forms a "refolding trough", implying that the amount of protein that can be reactivated decreases in parallel with the increase in formed I (molten globule) at both transitions (Fig. 2). This clearly indicates that the protein cannot be refolded under conditions that promote formation of the molten globule state of HCA II. The width of the refolding trough depends on the protein concentration, which indicates that aggregation is probably the cause of the low recoveries of active enzyme upon refolding (Fig. 2). Goldberg and co-workers (22) reported similar results for urea-denatured Escherichia coli tryptophanase.
Size of the Aggregates-Partial denaturation of HCA II to the molten globule state gives rise to a soluble protein species, because no precipitate can be noticed. Measurements of dynamic light scattering was done to determine the size of various states of the protein. In these experiments we used a pseudowild type mutant of the enzyme (C206S; designated HCA II pwt ) that had previously been shown to exhibit folding behavior indistinguishable from that of the wild type enzyme (12,19). HCA II pwt was employed to prevent the possibility of disulfide formation, because Cys-206 is the only cysteine residue present in the wild type protein (8). Under native solution conditions, the folded state of HCA II has a diameter of 4.5 nm, which is very similar to the dimensions found in the crystal structure, which are 3.9 by 4.2 by 5.5 nm (9). For the unfolded state of HCA II, we found only a minor deviation from the dynamic light scattering-determined diameter of bovine CA II (9 and 10 nm, respectively; the latter value was obtained by Cleland and Wang (16)). The particle diameter of the protein in the molten globule state (13.5 nm at 2.0 M GuHCl) was larger than that of the unfolded state (9 nm at 5.0 M GuHCl), hence the molten globule species cannot be monomeric, although it is difficult to determine the number of monomers in this larger species. Intermolecular associations leading to formation of dimers/ trimers have been observed for the equilibrium molten globule state of bovine carbonic anhydrase II at higher protein concentrations (23,24). On the other hand, Uversky (25) performed measurements at low protein concentrations to avoid aggregation and found a diameter of 5 nm for monomeric molten globule bovine CA II.
A closely packed 13.5-nm particle could hold as many as 15 spherical monomers with a diameter of 5 nm. However, an elongated loosely packed particle could probably be up to 13.5 nm in diameter and still contain few monomers and possibly even dimers or trimers. It has previously been shown that, b Surface accessibility of the native side chain was calculated using a probe radius of 1.4 Å as described by Connolly (36). c Fractional accessibility surface area was calculated as the ratio of the absolute area of each amino acid divided by the area of the exposed area of each amino acid situated in an exposed tripeptide, Ala-Xaa-Ala according to Lee and Richards (37). Ratios larger than a value of 0.3, as a default value, were considered as exposed residues. d ␤-strand positions according to Mårtensson et al. (12) (see Fig. 1). The ␤-strand nomenclature as described in Eriksson et al. (9) is given within parentheses.
e Excimer/monomer ratio was calculated as the ratio of the pyrene fluorescence intensities at 465 and 377 nm, respectively (after subtraction of the excimer/monomer ratio of the model compound pyrene labeled 2-mercaptoethanol (1 M), which can be regarded as background noise and was in the range 0.012 Ϯ 0.005).
f Determined by loss of enzyme activity as described previously (38). g Determined by change in tryptophan absorption at 292 nm as described previously (12). h Determined by change in tryptophan fluorescence emission peak as described under "Experimental Procedures." under refolding conditions, GuHCl-denatured bovine CA II initially exhibited dimeric forms that were later converted to multimeric species. A molten globule intermediate was suggested to form the dimer that could be the nucleating species for the further aggregation (16). In our study, at equilibrium, under conditions of relatively high concentrations of GuHCl, only the stable dimeric/oligomeric species of HCA II were present, and there was no precipitate consisting of micrometersized particles. The size of the molten globule species was unchanged for prolonged incubation for over 1 week. Because the dimers/oligomers are stable in solution, we managed to spectroscopically map the subdomain that is involved in the intermolecular interaction.
Mapping Intermolecular Interactions of Aggregates by Measuring Pyrene Excimer Fluorescence-To specifically map the interactions involved in the formation of the aggregated species, we developed a novel application of pyrene excimer fluo-rescence. Pyrene excimer methodology is based on the ability of pyrene molecules to form excited state dimers, called excimers. If two pyrene moieties are within a few Å distance of each other and are correctly oriented, they can form excimers (26). The pyrene excimer fluorescence band is very broad, structureless, and red-shifted compared with the monomeric pyrene fluorescence emission, and it is centered around 450 -470 nm, which is very different from the narrow and structured monomeric pyrene emission bands at 380 -400 nm (Fig. 3A, inset). Excimer fluorescence has been employed to study protein conformational changes (26), and we used it to probe intramolecular proximity in HCA II during different stages of folding, employing a doubly pyrene-labeled cysteine mutant (27). With that approach, it was possible to monitor the unfolding of stable residual structure in the "unfolded" state, because extensive unfolding of the protein separated the sites to which the pyrene moieties were linked, leading to disappearance of the excimer fluorescence. The reason for this is that the probes must be close together to allow excimer formation. A similar situation will occur for a singly labeled pyrene cysteine mutant, if aggregation brings two protein molecules together. Therefore, we concluded that exploiting excimer fluorescence of singly labeled HCA II molecules could be a way to specifically investigate intermolecular interactions that have arisen as a result of aggregation. Aggregates formed in the molten globule state of HCA II were mapped by the use of 20 different pyrene-labeled single-cysteine mutants. A pyrene fluorescence spectrum of the molten globule intermediate induced by 2.0 M GuHCl was recorded for each labeled mutant. The resulting emission spectra illustrated in Fig. 3A, inset, show that all labeled mutants (Fig.  1) were fluorescent, indicating that labeling was successful. The mutants that were labeled in positions 97, 118, 123, 142, 150, and 206 in addition displayed an excimer fluorescence band (a broad band centered around 450 -470 nm, the magnified part of the spectra in Fig. 3B). This clearly indicates that the interactions in the aggregated species formed from the molten globule intermediate are highly specific. Based on these results, we selected two positions in the HCA II molecule to represent locations that are involved (position 206) and are not involved (position 245) in aggregation. These positions were pyrene-labeled, and their excimer fluorescence was investigated when these labeled protein variants were incubated in various concentrations of GuHCl (Fig. 4). For the pyrene-labeled position 206, there was a steep rise in excimer fluorescence in the N 3 I unfolding transition, and the excimer fluorescence was completely lost when the GuHCl concentra-  4. Excimer fluorescence of N-(1-pyrenemethyl) tion was raised to 3 M GuHCl (I 3 U). This demonstrates that intermolecular excimer fluorescence, and thereby aggregation, occurs only when the HCA II molecules are in the molten globule state. No excimer fluorescence was detected for the pyrene-labeled position 245 at any stage of the unfolding process (Fig. 4).
We found that the excimer signal, in the wavelength region 440 -520 nm, from pyrene-labeled HCA II in position 206 was linearly dependent on the protein concentration in the range 0.5-4 M (data not shown). This demonstrates that the observed signal at long wavelengths originates from interactions between pyrene molecules. The excimer bands detected from the pyrene survey in Fig. 3A are not very distinct but can be more clearly visualized if the background fluorescence is subtracted as is shown in Fig. 3B. The fluorescence spectrum of the pyrene-labeled W245C mutant was used as a reference of background fluorescence, because it did not show any excimer bands. The resulting excimer fluorescence spectra show (in the excimer region 440 -520 nm) a typical excimer band for each of the positions 97, 118, 123, 142, 150, and 206 but not for the 14 other mutants in this study (Fig. 3B).
Contact Specificity-For pyrene excimer fluorescence to occur, two pyrene moieties have to be within a few Å of each other (26). Therefore, a homogenous contact surface must be formed between two or more protein molecules to permit excimer fluorescence because of intermolecular proximity, which is what we detected for some of the pyrene-labeled HCA II variants. Most interestingly, we also found that the aggregation interactions between different HCA II molecules was conspicuously specific, as shown by the pyrene mapping of the interacting surfaces (Fig. 5A). It can also be concluded that the aggregation process cannot be driven by pyrene-pyrene affinity, because unlabeled protein formed aggregates and because only a few of the labeled variants gave rise to excimer formation. Because we have monitored singly pyrene-labeled sites, only aggregation surfaces that are formed in isologous interfaces (like-with-like pairing) would be detected. To further explore the possibility of additional aggregation forming parts of the protein, we also made all possible mixtures of some of the nonexcimer forming positions 67, 160, 184, and 256. These samples did not show any excimer fluorescence (Fig. 5B) and are thus not involved in any hererologous or isologous aggregation surface. It would be possible to make a more refined structural mapping by additional measurements of mixtures of excimer forming mutants. The interpretation of such mixed samples can, however, be less straightforward, because a mixture of different species with varying fluorescence would be present.
The selected mutation sites to which the pyrene fluorophore was attached are well separated with respect to the amino acid sequence from positions 23 to 256 and are also very different regarding location in the three-dimensional structure ( Fig. 1 and Table I). Thus, several peripheral and several deeply buried positions situated in different structural contexts were probed. Six of 20 probed positions (i.e. positions 97, 118, 123, 142, 150, and 206) displayed excimer fluorescence intensity that was significantly higher than the background that emanates from the monomeric fluorescence in the expected excimer wavelength region (Fig. 3 and Table I (27), a rise in excimer fluorescence was detected in the N 3 I unfolding transition and was interpreted as being due to a more favorable interaction environment for the pyrene moieties (in the same protein molecule) when the rigid tertiary structure of the protein was disrupted. The discovery that these conditions promote aggregation makes another interpretation more plausible, namely formation of intermolecular excimers. In a recent 17 O magnetic relaxation dispersion study of the hydration of an acid-induced (pH 3) HCA II molten globule, it was found that the relative hydration of the molten globule and the native state of HCA II was very similar (39), indicating a very compact protein structure of the molten globule. Interestingly, in that study it was also found that the molten globule yields protein oligomers.
In the molten globule state most of the secondary structure is intact (12). In HCA II, ␤-strands 3-5 are extremely stable toward GuHCl denaturation, and that region has been shown to be compact in an unfolded state at 5 M GuHCl, a concentration at which the molten globule intermediate is ruptured (12,19). In a previous study, we demonstrated that this stable residual structure comprises the region that contains ␤-strands 3 to 7 (27). From our present results, it is apparent that the limited region that participates in aggregate formation is located within this stable part of the ␤-sheet. A common feature of the ␤-strands in this region is that they are very hydrophobic. Thus, ␤-strands 3-5 are part of a large aromatic hydrophobic cluster in the core of the protein, and, according to hydropathy calculations, ␤-strands 6 -7 represent the most hydrophobic part of the molecule (28). A hypothesis that has successively gained experimental support is that aggregation occurs upon specific interactions between hydrophobic surfaces of structural subdomains in partially folded intermediates (1, 2). Moreover, intermolecular interactions giving rise to aggregates frequently appear to involve ␤-sheet-like interactions. In most cases evidence for a specific interaction between specific domains has been obtained in an indirect way, for example by mapping point mutations that affect the formation of aggregates or inclusion bodies. Although in general it has not been possible from case to case to unambiguously establish a relationship between a suggested structural microdomain and aggregation, together the accumulated data convincingly support the idea of a high degree of specificity in aggregation (1,2). To thoroughly understand the mechanism of aggregation, it is essential that the intermolecular interactions involved can be mapped directly as in our study, to characterize in detail the interface structure of aggregated proteins. Obviously, our results support the ideas that have been put forward regarding the aggregation mechanism. More specifically, we found that an aggregation-prone HCA II molten globule intermediate docks and forms an interface that contains the most hydrophobic part of the ␤-sheet. This approach to specifically map an association surface should be applicable to other proteins.
HCA II forms aggregates during refolding, and this is likely to be the reason for reduced yields of native enzyme obtained during refolding. The molten globule state is believed to be formed as a kinetic folding intermediate of HCA II (29). In the present study, the equilibrium counterpart formed aggregates that involve specific parts of the protein structure. We plan to perform kinetic studies in an attempt to determine whether the same regions of the protein are involved in formation of offpathway aggregates during refolding.
GroEL Cannot Assist the Refolding of the Aggregated Molten Globule Intermediate-The identified aggregation surface is part of the stable residual structure that has been suggested to be the initiation site for folding of HCA II (19,27). Our present findings show that the suggested initiation site for folding can also be a folding trap, in the sense that hydrophobic patches that become exposed in partly folded proteins can also be a nucleation site for the formation of aggregates. Thus, two competing reactions appear to occur during folding of HCA II.
We have previously shown that aggregation during refolding of HCA II can be effectively prevented by the chaperonin GroEL (13,14). At elevated temperatures, an aggregationprone molten globule-like intermediate was formed that was also protected from aggregation by GroEL. Interestingly, the same structural region (␤-strands 4 -7) that in the present study is shown to be involved in specific aggregation was shown to be "loosened up" by the action of GroEL, which probably will facilitate rearrangements of misfolded structure during folding (15). Renaturation experiments were therefore carried out in the presence of the chaperonin GroEL. However, GroEL-mediated refolding of the GuHCl-induced (2.0 M) molten globule did not lead to any significant reactivation of the enzyme. On the other hand, if the enzyme was fully denatured in 5 M GuHCl almost a 100% yield of active enzyme was recovered in the presence of GroEL. The reactivation curves are shown in Fig. 6.
GroEL was not capable of dissolving aggregates that were preformed from a molten globule intermediate in GuHCl (2.0 M) when GroEL was present solely during the refolding reaction (Fig. 6). Thus, it seems that the surface that is actively affected by GroEL must be exposed, and not hidden as in the HCA II aggregates, to allow GroEL to exert its chaperone activity. Taken together our results suggest that surfaces responsible for off-pathway aggregation are overlapping with the surfaces involved in the interaction with chaperones.