Extended Polyglutamine Tracts Cause Aggregation and Structural Perturbation of an Adjacent β Barrel Protein*

Formation of fibrillar intranuclear inclusions and related neuropathologies of the CAG-repeat disorders are linked to the expansion of a polyglutamine tract. Despite considerable effort, the etiology of these devastating diseases remains unclear. Although polypeptides with glutamine tracts recapitulate many of the observed characteristics of the gene products with CAG repeats, such as in vitro and in vivo aggregation and toxicity in model organisms, extended polyglutamine segments have also been reported to structurally perturb proteins into which they are inserted. Additionally, the sequence context of a polyglutamine tract has recently been shown to modulate its propensity to aggregate. These findings raise the possibility that indirect influences of the repeat tract on adjacent protein domains are contributory to pathologies. Destabilization of an adjacent domain may lead to loss of function, as well as favoring non-native structures in the neighboring domain causing them to be prone to intermolecular association and consequent aggregation. To explore these phenomena, we have used chimeras of a well studied globular protein and exon 1 of huntingtin. We find that expansion of the polyglutamine segment beyond the pathological threshold (>35 glutamines) results in structural perturbation of the neighboring protein whether the huntingtin exon is N- or C-terminal. Elongation of the polyglutamine region also substantially increases the propensity of the chimera to aggregate, both in vitro and in vivo, and in vitro aggregation kinetics of a chimera with a 53-glutamine repeat follow a nucleation polymerization mechanism with a monomeric nucleus.

At least nine slowly progressing hereditary neurodegenerative diseases, including autosomal-dominant Huntington disease (HD), 3 dentatorubral and pallidoluysian atrophy, and several spinocerebellar ataxias, arise from the genetic expansion of an unstable polyglutamine (poly(Q))-repeat sequence (1)(2)(3). Despite the functional and structural dissimilarity among the poly(Q)-containing proteins and the distinct pathologies they produce, each forms intranuclear inclusion bodies whose role in the disease pathogenesis is still controversial (4). In all of the poly(Q)-repeat diseases, there is a remarkable threshold effect of the length of the poly(Q) tract, which is inversely correlated with the age of onset and disease severity (5). In the specific case of HD, the elongated CAG repeat encoding a stretch of glutamines is located within the N-terminal exon 1 of the gene encoding the 350-kDa protein 'huntingtin' (htt). Poly(Q) repeats in the protein huntingtin (Htt) below the threshold value of 35 are not associated with HD; whereas repeats over 36 residues cause neurodegenerative dysfunction, and longer repeats are characterized by an earlier age of onset (1). In ataxin-3, linked to type 3 spinocerebellar ataxia (Machado-Joseph disease), the poly(Q) repeat is located in the C-terminal portion of the protein, and there is a similar correlation of repeat length with disease severity and onset (6).
The fact that disease risk and the tendency to aggregate in vitro show the same dependence on the number of glutamines supports a central role for polyglutamine-mediated aggregation in disease pathogenesis (1,(7)(8)(9). Moreover, expression of htt exon 1 carrying the expanded CAG stretch recapitulates the toxicity in transgenic mice and Caenorhabditis elegans, causing an HD-like phenotype (10,11). However, other studies argue that aggregate formation is not linked to disease pathology or cell death and may instead reflect cellular strategies to handle toxic species (12)(13)(14)(15)(16)(17). Fundamental cellular processes such as proteasome-mediated proteolysis have been proposed to be inhibited by poly(Q)-containing protein products with repeat lengths above the pathological threshold (18 -20), raising the possibility of pathological mechanisms apart from aggregation.
Although much of the pathology of CAG-repeat disorders can be attributed to the behavior of the poly(Q) segments, several studies have shown that the context of the poly(Q) tract must also be considered (21)(22)(23). Recent work shows that the presence of a proline-rich extension C-terminal to a poly(Q) tract modulates its conformational behavior and decreases its tendency to aggregate (24). In exon 1 of huntingtin, the glutamine-rich tract is followed by a proline-rich region, which, based on this recent study, diminishes its aggregation propensity. In turn, effects on protein domains adjacent to glutamine-rich repeats may play a direct role in aggregation mechanisms. Incorporation of a Gly-Gln 10 -Gly peptide into chymotrypsin inhibitor 2 (CI2) led to formation of dimeric and trimeric oligomers by the normally monomeric CI2 (25). No perturbations in the structure of the CI2 were reported. By contrast, insertion of poly(Q) tracts longer than the pathological threshold into an otherwise stable protein (myoglobin) at the site of a surface-exposed loop caused modest structural perturbation and concomitant increase in aggregation propensity (26,27). Although in this case the structural perturbation was mild and did not appear to involve the core packing, the fact that myoglobin is very stable and contains a heme cofactor may have led it to be more robust.
By comparison, Bevivino and Loll (28) found that increasing the length of a glutamine repeat in the protein ataxin-3 from non-pathogenic (n ϭ 27) to a pathogenic length (n ϭ 78) was accompanied by significant structural destabilization as well as enhanced propensity to aggregate. To carry out experiments on the protein carrying a 78-residue-long glutamine repeat, these authors used constructs with ataxin-3 linked to a solubility-enhancing fusion partner, maltose-binding protein. Their findings appeared to be supported by another study of a non-pathological variant of the protein ataxin-3, which was observed to form fibrillar aggregates upon partial denaturation using guanidine HCl (29). This result implicates partially folded forms of protein domains neighboring poly(Q) tracts in the mechanism of aggregation. Recently, however, the same group has concluded that extending the poly(Q) tract in ataxin-3 does not destabilize the protein (30). In the latter study the authors were forced to use pH denaturation and kinetics of unfolding/folding to obtain stability data, because aggregation processes confounded the use of chaotropes such as guanidinium hydrochloride. They concluded that the length-dependent influence of the poly(Q) region was not on the native state of the flanking domain but rather on a fibrillogenic state. Moreover, they speculated that a flanking sequence may exert a restraining force on poly(Q) segments, modulating their aggregation propensities. Provocatively, in a recent study it is reported that the AXH domain of ataxin-3 has an intrinsic tendency to aggregate even in the absence of an adjacent poly(Q) tract (23). All of these studies, while not uniform in their conclusions, point to the importance of understanding how poly(Q) tracts influence the sequences around them, and in turn how adjacent sequences may influence the behavior of poly(Q) repeats, to better understand CAG-repeat diseases.
Examining how a poly(Q) sequence may influence an otherwise stably folded adjacent domain has been limited by the technical difficulties of obtaining well behaved soluble protein, in the presence of sequences that strongly favor aggregation. We have used fusions between cellular retinoic acid binding protein I (CRABP I) and Htt exon 1 for this work; constructs with the poly(Q)-containing sequence attached either at the N or C termini of CRABP I were examined (Fig. 1A). The extensive previous characterization of the structure and folding of this protein (31)(32)(33)(34)(35)(36) provides a foundation to interpret any changes caused by the poly(Q) fusions, and fortuitously, adequate solubility was obtained even with extended poly(Q) sequences to carry out biophysical study. CRABP I is a 136-amino acid long member of the intracellular lipidbinding protein family. These proteins have a ␤ barrel structure composed of 10 antiparallel ␤ strands, with a short helix-turn-helix between strands 1 and 2, wrapped around a ligand-binding cavity (solvent filled in the absence of ligand) (31,37). In contrast to the recent study of ataxin-3 (30), we find that poly(Q) expansion within Htt exon 1 in the pathological range (repeat length of 53) causes structural perturbation and decreased stability of the neighboring CRABP I protein domain as well as increased propensity for aggregation both in vivo and in vitro, whereas a fusion of CRABP I with Htt exon 1 containing a shorter poly(Q) repeat (n ϭ 20) remains soluble with no evidence of structural disruption. The tight correlation of these effects with the poly(Q) length argues that the repeat tract is solely responsible for these perturbations and molecular pathologies. Aggregation in vitro of the CRABP I fusions to Htt exon 1 containing poly(Q) repeats above the pathological threshold showed nucleated polymerization kinetics and fit to a monomeric nucleus size, consistent with a dominant role for the poly(Q) sequence. We incorporated a tetra-Cys sequence within CRABP I to label the expressed protein in the cell with a low molecular weight membranepermeable fluorescent dye, FlAsH (38). In vivo, constructs with pathological poly(Q) repeats (n ϭ 40, 53, or 64) aggregated, and the apparent lag time could be modified by varying the amount of expressed protein, as expected for nucleated polymerization kinetics. This system provides a manipulable model to explore how polyglutamine tracts may influence juxtaposed folded domains; both conformational and energetic factors can be investigated due to the well behaved nature of these constructs. We speculate that the nature of the CRABP I structure and its moderate stability may make it susceptible to perturbation from the poly(Q) sequences.

MATERIALS AND METHODS
Constructs-Tetra-Cys CRABP I (with a FlAsH-binding tetra-Cys motif) behind the T7 promoter was constructed as described (39). Upstream of the start codon comprising the NdeI restriction site, a new SalI restriction site necessary for the N-terminal insertion of the htt exon 1 was incorporated using a QuikChange protocol (Stratagene). For the C-terminal fusions, the stop codon and its nearest 3Ј-flanking region were replaced by XhoI and BamHI restriction sites. Due to the genetic manipulations, two new amino acids (Leu-Glu) were inserted at the C terminus of tetra-Cys CRABP I in the C-terminal fusions. Sequences encoding htt exon 1 with CAG repeats of 20 and 53 were amplified by PCR with NdeI/SalI (lacking a stop codon) or XhoI/BamHI flanking recognition sites from plasmids pGEX-htt20Q and pGEX-htt53Q (gifts from U. Hartl and M. Hayer-Hartl, Max Planck Institute, Martinsried, Germany) (40); sequences encoding htt exon 1 with 33, 40, and 64 length poly(Q) tracts were amplified with XhoI/BamHI flanking sites from plasmids pPD30.38-Q33-YFP, pPD30.38 -40-YFP, and pPD30.38-Q64-YFP, respectively (gifts of R. Morimoto, Northwestern University, Evanston, IL) (11). The double-digested PCR products were cloned up-or downstream of the tetra-Cys CRABP I gene yielding N-and C-terminal chimeras of tetra-Cys CRABP and Htt exon 1 with an N-terminal polyhistidine tag (sequence: MGHHHHHHHHHSSGHIEGRH-). The resulting plasmids were verified by DNA sequencing.
Protein Expression and Purification-All constructs (see Fig. 1A) were expressed in Escherichia coli BL21(DE3) cells. Tetra-Cys CRABP I, Htt20 tetra-Cys CRABP (Htt exon 1 fused to the N terminus), and tetra-Cys CRABP Htt20 (Htt exon 1 fused to the C terminus) were purified from the soluble fraction of the cell lysates (39), whereas both the N-and C-terminal tetra-Cys CRABP fusions with Htt53 were isolated from the insoluble fraction after solubilization with 8 M urea (35). Protein solutions (100 -150 M) of tetra-Cys CRABP I and tetra-Cys CRABP Htt20 (N-or C-terminal) were stored at 4°C and used within 2 weeks. Concentrations were determined spectrophotometrically using [cepsilon] 280 of 21,750 M Ϫ1 cm Ϫ1 (a corrected value for wild-type CRABP I containing four additional cysteines). Refolding was accomplished by dropwise dilution (100-fold) of the urea-containing protein fractions in 10 mM Tris⅐HCl, pH 8.0, or in 10 mM HEPES, pH 7.8, containing 2 mM ␤-mercaptoethanol (depending on the subsequent step). The refolding efficacy was determined by ligand (retinoic acid) binding, CD spectroscopy, and measurement of the intrinsic Trp fluorescence. The variants with 53 poly(Q) showed a high tendency to aggregate and were stored in dilute stocks (10 -15 nM). The protein samples were proteolytically stable for at least 6 days, as evidenced by mass spectrometry. Before use small aliquots were concentrated to the desired concentration using a Centricon system (molecular mass cut-off 10 kDa) at 4°C and centrifuged at 14,000 rpm in a microcentrifuge for 10 min to remove aggregates. The properties of refolded tetra-Cys CRABP I Htt53 (N-or C-terminal fusion) were found to be the same as those of protein isolated in very low yield from the soluble fraction 45 min post-induction.
In-cell FlAsH-EDT 2 Labeling-E. coli BL21(DE3) cells expressing tetra-Cys constructs were labeled as described (39). A 10-fold excess of ethanedithiol (EDT) was added to suppress the labeling of the endogenous cysteine pairs (38). The cells were preloaded with FlAsH-EDT 2 one generation prior to induction with 0.4 mM isopropyl ␥-D-thiogalactoside at A 600 ϭ 1. Aliquots of 150 l were withdrawn and subjected to fluorescence measurements at 530 nm (bandwidth 2 nm) with excita-tion at 500 nm (bandwidth 2 nm, Photon Technology International QM-1 fluorometer). The fluorescence of a negative control of CRABP I wild-type expressing cells pre-loaded with FlAsH-EDT 2 was subtracted from each point. (Note that FlAsH is now marketed by Invitrogen under the trade name "Lumio.") Cell Fractionation-10-ml aliquots of cells expressing the tetra-Cys CRABP I variants were harvested by centrifugation and fractionated into soluble and insoluble fractions (39). The insoluble fraction was solubilized in 50 mM phosphate buffer, pH 8.0, containing 8 M urea and 2% SDS and centrifuged at 27,000 ϫ g for 30 min. The pellet was re-suspended in 1.5 ml of 20 mM Tris⅐HCl buffer and defined as the SDSinsoluble fraction. All fractions were analyzed on a 12% SDS-PAGE, and the intensity of the Coomassie-stained bands of interest was detected by optical densitometry (Bio-Rad).
Circular Dichroism and Fluorescence-Far-UV CD spectra were collected at 20°C on a Jasco J-715 spectropolarimeter with a 1-mm path length cell (1 nm bandwidth). The concentration of protein was 5 M, and each spectrum was an average of five spectra. Fluorescence spectra of 3 M proteins were taken at 20°C with an excitation at 280 nm (1 nm slit width) and emission from 300 -380 nm (2 nm slit width).
Stability Measurements-For the urea unfolding titrations, 3 M protein (FlAsH-labeled) was diluted in 0 -8 M urea containing 10 mM Tris⅐HCl, pH 8.0, and incubated for Ͼ2 h at 37°C. Unfolding was monitored at 37°C by Trp fluorescence (as above). Urea titration curves were fit to a two-state model to derive thermodynamic values. To test proteolytic susceptibility and ability to bind ligand, tetra-Cys CRABP I variants were digested with ␣-chymotrypsin (␣-CT:protein ϭ 1:20 ratio) for 45 min on ice, with and without preincubation with 2.5-fold excess of retinoic acid (5 min in the dark). The reaction was stopped by adding SDS-loading buffer and boiling.
Aggregation Time-Course Assays in Vitro-Aliquots of tetra-Cys CRABP I Htt53 (10 -15 nM) were labeled with FlAsH-EDT 2 for 60 min as described previously (41). The diluted labeled tetra-Cys CRABP I Htt53 solutions were concentrated at 4°C by ultrafiltration (cut-off 10 kDa) and centrifuged at 14,000 rpm in a microcentrifuge for 10 min at 4°C. Solutions were used immediately to conduct kinetic experiments. To study the aggregation process in the laboratory at reasonable time scales, the labeled protein was destabilized with 1 M urea (in 10 mM HEPES, pH 7.8). Aggregation was followed by increasing FlAsH fluorescence or by estimation of the concentration of soluble monomer by absorbance measurements of the clarified supernatant at 280 nm as described (41). In both fluorescence and spectroscopic measurements, the samples were incubated without stirring and gently vortexed prior to withdrawal of an aliquot to ensure an equal distribution of the aggregates over the entire volume of the sample. In parallel, the fluorescence and absorbance at 280 nm of controls without protein were recorded for all experiments and subtracted from those of the samples.
Seeding Experiments-10 M FlAsH-labeled tetra-Cys CRABP Htt53 was destabilized by dissolving in 1.0 M urea and incubated overnight at 37°C without stirring to allow complete aggregation. To improve their seeding potency, the aggregates were sonicated for 2 min in 30-s bursts after washing with equal volumes of water and 10 mM Tris⅐HCl, pH 8.0. Solutions of labeled tetra-Cys CRABP I Htt53 were preincubated to reach the test temperature of 37°C, then seeded with pre-formed aggregates representing 1%, 5%, and 10% of the initial tetra-Cys CRABP I Htt53 concentration. Aggregation was monitored by FlAsH fluorescence.
Determination of Nucleus Size-The time course of in vitro aggregation was analyzed by the mathematical model describing nucleationcontrolled aggregation kinetics developed by Ferrone (42) that has been applied to several cases of protein aggregation (7,41,43). A plot of the initial rate of aggregate formation (followed by FlAsH fluorescence) or the changes of the soluble monomer concentration versus t 2 gives a slope equal to 1/2k ϩ 2 K n* c (n*ϩ2) , where k ϩ is the elongation rate, K n* is the equilibrium constant for the monomer-nucleus equilibrium, c is the concentration of monomer, and n* is the number of monomers in the nucleus. Plotting the log of the slope versus the log of the bulk monomer concentration [ln(slope) ϭ ln(1/2k ϩ 2 K n* ) ϩ (n* ϩ 2) ln(c)] yields a new slope of n* ϩ 2, which can be used to estimate the size of the critical nucleus.
Fluorescence and Electron Microscopy Experiments-Tetra-Cys CRABP I Htt53-expressing cells labeled with FlAsH-EDT 2 were harvested at different times after induction and imaged by fluorescence microscopy. Intact inclusion bodies were isolated from tetra-Cys CRABP I Htt53-expressing cells labeled with FlAsH-EDT 2 and harvested 4 h post-induction, as described by Oberg et al. (44) For fluorescence microscopy, 2 l of labeled cells or aggregates isolated from cells were immobilized in 1% agarose in LB and imaged using a Nikon Eclipse E600 with excitation at 485 nm and a 510 nm emission cut-off filter. For electron microscopy, samples of isolated aggregates were applied to carbon-coated copper grids, stained with 3% (w/v) ammonium molybdate (pH 6.8), and imaged under defocus conditions with a Phillips CM12 microscope operating at 120 kV.

Htt Exon 1 with an Expanded poly(Q) Tract Causes Aggregation
in Vivo-Expression of the fusion proteins in which Htt exon 1 with varying lengths of poly(Q) tracts was attached to the C terminus of tetra-Cys CRABP I (Fig. 1A) was followed by fluorescence at 530 nm (excitation 500 nm) after pre-loading the cells with FlAsH dye as described previously (39). The results show that these constructs exhibit poly(Q) length-dependent aggregation propensities. The fluorescence of cells expressing labeled tetra-Cys CRABP I Htt exon 1 fusions as a function of time after induction was close to that of cells expressing the soluble tetra-Cys CRABP I in the first 200 min when the poly(Q) repeat was in the non-pathological range (20 or 33 glutamines) (Fig. 1B). By contrast, a markedly different time course was observed when the poly(Q) repeat length exceeded the pathological threshold, i.e. when n ϭ 53 and 64 (Fig. 1B). In these two cases, there was essentially no lag from the time of induction to the time at which a rapid increase in fluorescence occurs; we have demonstrated in previous work that the abrupt fluorescence increase correlates with formation of inclusion bodies (39,41). We confirmed that the steep fluorescence increase reports on aggregation by carrying out cell fractionation (Fig. 1C). The tetra-Cys CRABP Htt20 variant remained soluble during the entire expression cycle and was present in amounts comparable to the amounts of tetra-Cys CRABP I, based on fractionation results (Fig. 1C); a small amount of aggregate was observed for the construct with 33 glutamines. Although the tetra-Cys CRABP Htt40 still partitions between the soluble and insoluble fractions, both tetra-Cys CRABP I fusions with longer poly(Q) stretches (53 and 64 glutamines) were found almost exclusively in the insoluble fraction. A noteworthy observation is that for the shorter poly(Q) repeats (n ϭ 33 or 40) all the aggregated protein was SDS-soluble, whereas the aggregates of the chimeras with 53-and 64-residue long poly(Q) repeats partitioned between SDS-soluble and SDS-insoluble fractions (Fig. 1C). The longer the poly(Q) stretch the higher the amount of the SDS-insoluble aggregates. The property of insolubility in SDS is associated with the formation of fibrillar aggregates.
We next asked if the position of the poly(Q)-containing Htt exon 1 in the chimera altered its influence on the aggregation propensity of the fusion protein. The fluorescence time course for N-terminally posi-tioned Htt 20 tetra-Cys CRABP I and Htt 53 tetra-Cys CRABP I expression were essentially the same as when the corresponding Htt exon 1 segments were added C-terminally (Fig. 2, A and B).
The absence of a lag phase in the in vivo kinetic curves for the Htt 53 and 64 fusion proteins (within the time resolution of the experiment) suggests that aggregation is favorable essentially as soon as expression of protein commences. Reducing the rate of translation and thereby lowering the protein concentration by treatment with low concentrations of chloramphenicol (2.5 g/ml) led to a short lag phase whose duration increased at higher chloramphenicol concentration (3.5 g/ml, Fig. 2D). Under these conditions the in vivo aggregation curve took on the typical sigmoidal shape characteristic of amyloidogenesis (45) and mirrored in vitro results (see below).
The aggregates formed by the Htt exon 1 chimeras with longer poly(Q) repeats in vivo changed character over time. As shown in Fig. 2B for the C-terminal fusion, within the first 30 min, a small amount of the protein aggregated, and maximal insoluble material was found at 120 min. At this time point, fluorescence microscopy showed hyperfluorescent aggregates at the poles of the bacterial cells (Fig. 2C). At later time points (Ͼ150 min after induction), the isolated insoluble aggregates underwent an interesting metamorphosis and converted into detergent-resistant aggregates, detectable between the stacking and resolving gels on SDS-PAGE. Aggregates isolated before this conversion appeared amorphous with spherical ensembles in electron microscopy (data not shown). Fluorescence and electron microscopic inspection of the detergent-resistant aggregates isolated from cells suggested, by contrast, that they are fibrillar (Fig. 2, E and F). The transition in the nature of the aggregates paralleled an increase of the fraction of cells with a filamentous phenotype, which is associated with a stress response and is reminiscent of the reported toxicity of Htt fragments with expanded polyglutamine segments in eukaryotic organisms (11,46).

In Vitro Aggregation of Tetra-Cys CRABP I Htt53 Follows a Nucleation-Polymerization Mechanism through a Monomeric Nucleus-
Although not amenable to a detailed quantitative analysis, the in vivo kinetic data described above are consistent with a nucleation polymerization mechanism with an apparently very short nucleation step when the poly(Q) tract is longer than 40. The high synthesis rates in bacteria in the expression system used appear to have masked the initial nucleation step, because some lag time was observed when synthesis was inhibited (Fig. 2D), suggesting a very low threshold concentration for aggregate propagation. To test this hypothesis, to carry out quantitative analysis of aggregation kinetics, and to further explore the aggregation mechanism, we examined the aggregation of tetra-Cys CRABP Htt53 in vitro. Aggregation kinetics was followed by two different methods: FlAsH fluorescence and changes in the concentration of the soluble monomer. Based on our previous work (41), we correlated the observed hyperfluorescence with the production of non-native states, including aggregates. The FlAsH signal is dominated by the aggregates, and thus FlAsH fluorescence can be used to follow aggregation in vitro. The use of FlAsH fluorescence to quantitate aggregate formation is supported by comparing results based on measurement of soluble monomer concentration. Aggregation of the tetra-Cys CRABP I Htt53 chimera under native conditions at 37°C is a relatively slow process. To accelerate aggregation kinetics to an extent that facilitated measurements at a reasonable time scale, we chose to work at an intermediate urea concentration (1 M), where partially folded states are populated significantly. The choice of conditions was based on urea-induced unfolding of tetra-Cys CRABP I Htt53 monitored by intrinsic Trp fluorescence, which showed an apparent denaturation midpoint of 1.05 Ϯ 0.08 M (see below and Table 1). The observation of faster aggregation near the denaturation mid-point has been made for several other proteins and exploited in protocols for following aggregation kinetics (47). Note that this observation points to a likely mechanism of aggregation with an active participation by CRABP I, because the urea concentration is not expected to have as significant an effect on the poly(Q) tract as it does on the CRABP I. Under the same conditions, the fusion between tetra-Cys CRABP I and Htt exon 1 in the non-pathological range (n ϭ 20) did not aggregate (data not shown).
In vitro aggregation of tetra-Cys CRABP I Htt53 showed a sigmoidal time dependence, with a well defined lag phase followed by an exponential increase of the FlAsH fluorescence (Fig. 3A) and decrease of the soluble monomer (Fig. 3B). These observables plateaued at later time points. Increasing the initial concentration of the bulk monomer shortened the lag phase, and the exponential growth reached higher final values of the fluorescence in the plateau phase (Fig. 3A). Overlaying the corresponding curves from both experiments showed a good inverse correlation, and the exponential increase of FlAsH fluorescence, reporting on the aggregate growth, coincided with the monomer disappearance.
More conclusive evidence for nucleation-polymerization aggregation was provided by quantitative analysis of kinetics during the initial time period. The time course of aggregate formation followed by FlAsH fluorescence or disappearance of the soluble monomer could be fitted using methods developed by Ferrone (42) for nucleated polymerization, because the initial aggregation phase was observed to be a function of time squared (t 2 ). Plotting the log of the aggregation rate versus the log of the monomer concentration gives a quantitative measure of the nucleus size with n* monomeric units ( Fig. 3C; see "Materials and Meth- Solid lines represent fits to sigmoidal curves. Tetra-Cys CRABP I Htt20 showed no aggregation under the same conditions (not shown). C, determination of the nucleus size based on the initial aggregation rates (panels A and B), following the approach of Ferrone (42). The open symbols and dashed lines correspond to data points from the aggregation kinetics followed by the FlAsH fluorescence, and the data points from the decrease of the monomer concentrations are presented as closed symbols fitted with a solid line. Note that linear fits yielded a slope of 2.8571 (dashed line) and 2.9017 (solid line) (equal to n* ϩ 2), consistent with a monomeric nucleus. The regression coefficients were 0.983 and 0.989, respectively. D, tetra-Cys CRABP I Htt53 aggregation can be accelerated by addition of pre-formed seeds. Aggregation of 2.5 M FlAsH-labeled tetra-Cys CRABP I Htt53 with and without pre-formed aggregates was monitored by FlAsH fluorescence. Fluorescence values in each experiment were normalized to exclude the contribution of the pre-labeled seed. Reactions contained no seed (F), 1% (E), 2.5% (OE), and 5% (‚) seed from the initial protein concentration in each experiment. E, determination of critical concentration. The values represent the concentration of the soluble monomer measured at the end points of aggregation (after 20 h) as a function of the initial tetra-Cys CRABP I Htt53 concentration. The critical concentration varies between 33 and 50 nM (0.7-1.1 ng/ml).  MAY 5, 2006 • VOLUME 281 • NUMBER 18

Poly(Q) Tracts Perturb Adjacent Domains
ods" and Refs. 7 and 41). Applying this analysis to the initial aggregation rates deduced from the aggregation kinetics followed by the FlAsH fluorescence (Fig. 3A) and by the decrease of the monomer concentrations (Fig. 3B) at different concentrations of tetra-Cys CRABP I Htt53 yielded slopes (n* ϩ 2; where n* is the number of monomers in the nucleus) of 2.86 and 2.90, respectively, consistent with a monomeric nucleus. Interestingly, a monomeric nucleus was previously implicated in aggregation of a mutant form of CRABP I (41) and in peptides with glutamine repeats (7), making it difficult to distinguish which species is mechanistically dominant. Another line of evidence supporting a nucleation-dependent polymerization mechanism is the effect of pre-formed seeds on the aggregation rate: low seed concentrations (1% (w/w)) shortened the apparent lag phase significantly, whereas seeds at 2.5 and 5% (w/w) were able to completely abrogate the well defined lag phase (Fig. 3D).
Linear regression of the final plateau level of soluble monomer concentration for varying initial monomer concentrations and extrapolation of the x-intercept provides an estimate of the critical concentration for aggregation (45,48). For all initial tetra-Cys CRABP I Htt53 concentrations (1-20 M), the amount of the soluble monomer decreased progressively as a function of time reaching at late times an equilibrium concentration of 33-50 nM (Fig. 3E), which provides an estimate of the critical concentration for aggregation of this construct.
Presence of an N-or C-terminal Adjacent Expanded poly(Q) Tract Destabilizes and Perturbs the Structure of Tetra-Cys CRABP I-The spectral features of the chimeras between tetra-Cys CRABP and Htt20 are nearly identical to those of tetra-Cys CRABP I (see Fig. 4, A-D). Strikingly, expansion of the poly(Q) tract from 20 to 53 residues perturbs the CRABP tertiary and secondary structure, whether the Htt exon 1 domain is added to the N or C termini (Fig. 4, A-D). For example, the fluorescence of intrinsic Trp residues in native tetra-Cys CRABP I Htt53 was shifted from 328 nm (characteristic of stably folded tetra-Cys CRABP I) to 336 nm, indicating significantly more solvent exposure and changes in the local Trp environments (Fig. 4A). These spectral properties are reminiscent of partially unfolded species of tetra-Cys CRABP I induced by intermediate concentration of denaturant. The specific inflection at 228 nm in the tetra-Cys CRABP I CD spectrum, previously shown to arise from the fine tertiary structure surrounding Trp-87 and Trp-109 (49), disappeared in the CD spectrum of the tetra-Cys CRABP I fusions to Htt53, and the minimum of the spectrum shifted to shorter wavelengths (Fig. 4, C and D), also suggesting partial unfolding, a gain of random coil structure, and an increased conformational heterogeneity (particularly given the broader minimum). Addition of the native ligand retinoic acid stabilized the structure of both N-and C-terminal fusions of tetra-Cys CRABP I with Htt53, as indicated by a shift of the minimum of the CD spectrum to 218 nm, characteristic of the native ␤-sheet structure, and a decrease of the fluorescence intensity accompanied by a blue-shift toward the native maximum (to 332 nm). Addition of retinoic acid protects CRABP I against chymotryptic digestion, and this protection is partially maintained in tetra-Cys CRABP I Htt53 (Fig. 4E). All of these ligand-dependent effects argue that the fusion proteins with the extended poly(Q) tract retain some capacity to bind ligand and that the long poly(Q) tract reversibly perturbs the fold of its neighboring CRABP domain. The added stability from ligand binding counteracts the destabilizing effect of the Htt exon 1 with an extended poly(Q) tract.
Urea-induced equilibrium unfolding titrations of tetra-Cys CRABP I fusions to Htt20, either N-or C-terminal, are very similar to that for tetra-Cys CRABP I (Fig. 5), with comparable free energies between native and urea-unfolded states ( Table 1). Expansion of the poly(Q) tract to 53 residues led to significantly lower apparent stability in urea melts. Quantitative comparison of free energies was complicated by the pronounced tendency of the tetra-Cys CRABP I/Htt53 fusion proteins to aggregate. At higher urea concentrations, melts showed deviations from the sigmoidal shape expected for reversible denaturation. Aggregation was less of a problem at the lower urea concentration branches of the denaturation curves, enabling us to derive apparent half-maximal urea concentrations for both the N-and C-terminal Htt53 fusions; these c m values were lower than those for either the tetra-Cys parent protein or the Htt20 variants by nearly 1 M (Table 1), arguing for substantially lower stability. By assuming reversible denaturation, we could estimate that the extended poly(Q) tract led to 1.5 kcal/mol loss of equilibrium stability (Table 1).

DISCUSSION
Proteins involved in poly(Q) disorders share little or no structural or functional homology except for the presence of a poly(Q) region some-where in their sequences. Consequently, the similar neuropathological picture in the various CAG-repeat disease states has generally been ascribed to the presence of an expanded poly(Q) tract (2,3). The panoply of effects of the poly(Q) expansions on the host protein and its interactions have been difficult to assess. A dominant role of the poly(Q) tract in favoring aggregation is well established in model systems, (7) and the ability of poly(Q) peptides to lead to symptoms akin to the pathological changes in CAG-repeat diseases has been shown in mice (10) and C. elegans (11). However, several possible mechanistic schemes remain under consideration in the etiology of the neurodegenerative diseases. Alterations of the functions and interactions of proteins hosting poly(Q) tracts must be included along with direct toxic effects of aggregates as possible contributors to disease. Also, the aggregates that form in cellular conditions may include all or part of the host protein (due to partial proteolysis) and may be formed both because of intrinsic properties of poly(Q) segments (22) and because of partial unfolding of host proteins, which then become more aggregation prone. To gain a better understanding of these alternatives, fundamental biochemical and biophysical data are needed on the impact of extended poly(Q) tracts on host proteins, and reciprocally, how host sequences may modulate the behavior of the poly(Q) sequence.
This study was directed toward the goal of assessing how a poly(Q) repeat affects a neighboring otherwise stably folded globular protein.
Our data using the ␤ clam protein, CRABP I, as a model host protein clearly show that the lengthening of the poly(Q) tract in the Htt exon 1 beyond the pathophysiological threshold led to marked propensity of the fusion proteins to aggregate and a substantial perturbation of the structural integrity of the adjacent model protein. We found the fusions of tetra-Cys CRABP I and Htt exon 1 with non-pathological length of the poly(Q) tract (20 or 33 consecutive glutamines) to be completely soluble when expressed in E. coli, whereas the tetra-Cys CRABP I fusions with Htt exon 1 containing a stretch of 40, 53, or 64 polyglutamines were highly aggregation-prone. Moreover, the fusion of an Htt exon 1 with an expanded poly(Q) tract (n ϭ 53) to tetra-Cys CRABP I perturbs and destabilizes the native conformation of the neighboring globular protein domain. The Htt53-induced perturbation of the spectroscopic properties of tetra-Cys CRABP suggests a partial unfolding, leading either to a homogeneous population of imperfectly folded species or to a heterogeneous population with some species more nativelike and some partially unfolded. Here, the simple fact that a poly(Q)containing exon with a high repeat number is neighboring to the CRABP disrupts its native-state structure, which likely contributes to enhanced propensity to aggregate. Although the destabilization caused by adjacency to the Htt exon 1 may favor aggregation, we cannot distinguish aggregation mechanisms mediated primarily by the partially folded CRABP I component or by the poly(Q) component, because studies on both of these show nucleation-polymerization kinetics with a monomeric nucleus (7,41). We are following up on this question by further characterization of the nature of the aggregated species formed by the chimeras. Intriguingly, the observed change in the character of the aggregates formed in vivo by the chimera with an expanded poly(Q) tract from detergent-soluble to detergent-insoluble, accompanied by a morphological change to fibrillar character, appears to parallel a change from CRABP I-dominated aggregation to poly(Q)-dominated aggregation. The preliminary evidence for this is the observation that the earlier aggregates can seed CRABP I aggregation but not poly(Q), and the later aggregates can seed poly(Q) aggregation but not CRABP I (data not shown). It will be exciting to explore in greater depth how the observed structural perturbation and the mechanism of aggregation are related to one another. What is the physical basis for the destabilization of a globular protein by poly(Q) sequences? Two formal possibilities must be considered: 1) the tendency of the poly(Q) segment to form intermolecular associations leads to a conformational constraint on adjacent sequences because of the driving force to draw segments into antiparallel ␤ sheet, for example as reported for myoglobin with a 50-glutamine insertion; or 2) the intramolecular conformational propensities of the poly(Q) segment compete with the adjacent sequence's ability to participate in native interactions critical to the stably folded protein. This could be either by the preference of the poly(Q) segment for a particular structure, or by indirect effects it has, for example, by stabilizing the denatured state of its neighbor. The work done early on by Perutz and coworkers with oligo-Gln insertions in CI2 argues that short stretches of glutamine can be tolerated without perturbing adjacent sequence, although they saw increased tendency for oligomers (25). Although one might have predicted that these oligomers would be stabilized by poly(Q)-poly(Q) interactions, as in the case of the poly(Q)-inserted myoglobin, instead a later crystal structure showed that domain swapping between the CI2 monomers mediated dimer formation (50). This form of interaction suggests that there was a destabilizing influence of the oligo-Gln segment on the CI2, similar to our findings. Analogously, structural perturbation and increased aggregation of ataxin-3 were reported by Bevivino and Loll (28) when the adjacent poly(Q) segment was extended from 27 to 78. This latter study was hampered by insolubility of the protein with the longer glutamine repeat, necessitating use of a solubility-enhancing fusion protein, and now has been called into question by the recent work using acid denaturation and unfolding/ refolding to assess stability changes (30). This latter study also was challenged by the insolubility of the constructs with extended poly(Q) repeats.
An advantage of the system we designed is that adequate solubility was retained in the fusions with extended poly(Q) repeats to enable measurements of some biophysically useful parameters and, possibly, even in the long run, structural elucidation. A similar fusion construct in which extended poly(Q) segments were soluble enough for biophysical characterization was reported by Masino et al. (51) and enabled the random coil character of the poly(Q) to be deduced. In our construct, we can monitor the poly(Q) segment and the neighboring domain simultaneously. The measurements we have made to date allow us to say something about how destabilizing the poly(Q)-containing tract is. The estimated change in stability of tetra-Cys CRABP I upon fusion to Htt exon 1 with 53 glutamines was 1.5 kcal/mol at 37°C. We hypothesize that two factors contribute to the impact of the flanking extended poly(Q)-containing Htt exon 1 sequence on CRABP I: the modest native state stability of CRABP I at that temperature (3.6 kcal/mol), which is insufficient to offset the conformational energetic driving force of the polyglutamine stretches longer than 20, and the potential susceptibility of the ␤ barrel structure of intracellular lipid-binding proteins to disruption by perturbation of terminal sequence elements (52). Hence, an overall structural perturbation (loss of native structure) was observed. A major limitation in our understanding is the lack of structural knowledge about polypeptides composed of glutamines. Several models for the preferred conformation of poly(Q)-containing proteins have been proposed (53)(54)(55)(56)(57)(58). There is general agreement that short oligoglutamine stretches are unstructured, but that ␤ structure is strongly preferred by longer poly(Q) tracts. The ␤ structure has been proposed to exist in ␤ hairpins, ␤ sheets (parallel or anti-parallel), or in ␤ helices. In general, these structures can form intramolecularly or intermolecularly, and the balance between these modes of formation may be affected by the length of the glutamine repeat. However, estimates of the energetic driving force toward formation of these specific structures are not available. Assuming additivity and a linear dependence on glutamine number, our data suggest that each glutamine after 20 might contribute 0.05 kcal/mol of conformational free energy that can bias the adjacent chain's behavior. Further experiments are clearly needed to pin down these effects quantitatively.
An analogy for the thermodynamic 'tug-of-war' between CRABP I and the poly(Q)-containing Htt exon 1 suggested by our results is the behavior of an engineered chimera between ubiquitin and barnase (59): because of the design of the insertion of ubiquitin into barnase, the topology of their native states are incompatible with retention of native structure by both components of the chimera. Loh and coworkers showed that the higher free energy of stability for barnase upon binding barstar unfolded the ubiquitin, but in the absence of barstar, ubiquitin remained folded and barnase lost its native structure. Because of its barrel topology, CRABP I may be exquisitely sensitive to truncation at either its termini. It is known that a three-residue truncation of a related intracellular lipid-binding protein leads to destabilization of the native state (52). The terminal sequences in this protein are close and form several mutual interactions. Thus, the conformational tendency of Htt exon 1 with an extended poly(Q) segment could be sufficient to disrupt key interactions in the terminal regions of the CRABP I ␤ barrel. In future work, it will be crucial to determine the structure of the poly(Q) segment and the mechanism by which it disrupts the CRABP I native state. Its effect indeed is likely to be intramolecular, because our experimental protocols were optimized to favor monomeric products.
The idea that poly(Q) tracts would favor a conformation strongly enough to perturb an adjacent protein's structure, without forming intermolecular interactions, seems at variance with thermodynamic estimates of the energetics of formation of aggregation nuclei from Wetzel and coworkers (60). Their studies suggest that adoption of the conformation that nucleates aggregation is a very uphill thermodynamic process with a very small equilibrium constant. To reconcile these results with the model, in which the poly(Q) tract can affect an adjacent domain in an intramolecular fashion, would require that the conformation adopted by the poly(Q) sequence as it perturbs the flanking protein is not the one that nucleates aggregation.
A complete understanding of the process of disease-related amyloid formation relies on information about the mechanism of aggregation directly in the cell. The approach we describe here has enabled us to monitor aggregation in real-time directly in cells and in so doing to observe behaviors that depend on the length of the homopolymeric poly(Q) region comparably to the pathologies described for CAG-repeat diseases. It is of interest to compare the processes we are able to characterize in vitro with those that occur under the solution conditions that obtain in the cell to relate the time courses of aggregation. We estimated from our in vitro data that the critical concentration below which no aggregation of tetra-Cys CRABP I Htt53 occurs is between 33 and 50 nM. The total concentration of tetra-Cys CRABP Htt53 in E. coli cells 30 min after induction can be estimated to be 1.75 M using optical densitometry and fractionation results (Fig. 2B). Assuming a constant synthesis rate, the amount of the newly synthesized tetra-Cys CRABP I Htt53 in E. coli cells would rapidly approach and exceed the critical concentration for aggregation in vitro. Moreover, the crowded environment in the cell is expected to reduce the critical concentration (24). It is therefore not surprising that we observe aggregation as soon as we monitor samples after induction of the protein and that we therefore see no apparent lag phase.
The system we describe sheds light on the marked impact of a naturally occurring poly(Q)-containing sequence, exon 1 of huntingtin, on a model protein that is adjacent to it, either upstream or downstream. The labeling strategy we have used enables the observation of all of these events and behaviors in vivo, with the added benefit that it is possible to screen for the effects of environment and added small molecules on these events. Because this system makes the direct observation of aggregation events possible, it offers advantages over some recently described screens for aggregation inhibitors that are less direct (61,62).