Design and characterization of mutant and wildtype huntingtin proteins produced from a toolkit of scalable eukaryotic expression systems

The gene mutated in individuals with Huntington's disease (HD) encodes the 348-kDa huntingtin (HTT) protein. Pathogenic HD CAG-expansion mutations create a polyglutamine (polyQ) tract at the N terminus of HTT that expands above a critical threshold of ∼35 glutamine residues. The effect of these HD mutations on HTT is not well understood, in part because it is difficult to carry out biochemical, biophysical, and structural studies of this large protein. To facilitate such studies, here we have generated expression constructs for the scalable production of HTT in multiple eukaryotic expression systems. Our set of HTT expression clones comprised both N- and C-terminally FLAG-tagged HTT constructs with polyQ lengths representative of the general population, HD patients, and juvenile HD patients, as well as the more extreme polyQ expansions used in some HD tissue and animal models. Our expression system yielded milligram quantities of pure recombinant HTT protein, including many of the previously mapped post-translational modifications. We characterized both apo and HTT–HTT-associated protein 40 (HAP40) complex samples produced with this HD resource, demonstrating that this toolkit can be used to generate physiologically meaningful HTT complexes. We further demonstrate that these resources can produce sufficient material for protein-intensive experiments, such as small-angle X-ray scattering, providing biochemical insight into full-length HTT protein structure. The work outlined and the tools generated here lay a foundation for further biochemical and structural work on the HTT protein and for studying its functional interactions with other biomolecules.


The gene mutated in individuals with Huntington's disease (HD) encodes the 348-kDa huntingtin (HTT) protein. Pathogenic HD CAG-expansion mutations create a polyglutamine
(polyQ) tract at the N terminus of HTT that expands above a critical threshold of ϳ35 glutamine residues. The effect of these HD mutations on HTT is not well understood, in part because it is difficult to carry out biochemical, biophysical, and structural studies of this large protein. To facilitate such studies, here we have generated expression constructs for the scalable production of HTT in multiple eukaryotic expression systems. Our set of HTT expression clones comprised both N-and C-terminally FLAG-tagged HTT constructs with polyQ lengths representative of the general population, HD patients, and juvenile HD patients, as well as the more extreme polyQ expansions used in some HD tissue and animal models. Our expression system yielded milligram quantities of pure recombinant HTT protein, including many of the previously mapped post-translational modifications. We characterized both apo and HTT-HTT-associated protein 40 (HAP40) complex samples produced with this HD resource, demonstrating that this toolkit can be used to generate physiologically meaningful HTT complexes. We further demonstrate that these resources can produce sufficient material for protein-intensive experiments, such as small-angle X-ray scattering, providing biochemical insight into full-length HTT protein structure. The work outlined and the tools generated here lay a foundation for further biochemical and structural work on the HTT protein and for studying its functional interactions with other biomolecules.
Huntington's disease (HD) 6 is a devastating inherited neurodegenerative disorder that causes a range of progressive behavioral, cognitive, and physical symptoms. Incidence of HD varies in different parts of the world, but HD is thought to affect between 0.42 to 17.2 per 100,000 of the population (1). There are currently no disease-modifying therapies available for patients (2). HD is hallmarked by an expansion of a CAG-trinucleotide repeat tract in exon 1 of the HTT gene above a critical threshold of ϳ35 CAG triplets (3,4), translating to a polyglutamine (polyQ) expansion in the extreme N terminus of the huntingtin (HTT) protein. PolyQ-expanded HTT is thought to be responsible for the wide-ranging biochemical dysfunction  observed in HD models and patients, including proteostasis network impairment (5), transcription dysregulation (6), mitochondrial toxicity (7,8), cellular energy imbalance (9), synaptic dysfunction (10), and axonal transport impairment (8).
Although it is thought that HTT is likely a scaffold protein (11,12), the function of HTT, WT, or polyQ expanded, is still incompletely understood. Biochemical investigation of the role of HTT, in either the WT or the disease state, is often dependent on obtaining large amounts of pure HTT protein of different polyQ lengths. The HTT protein is 3144 amino acids long (assuming a polyQ stretch of 23 residues, NCBI reference sequence NP_002102.4), a potentially daunting prospect for expression and purification given its size. A number of groups have published tools and methods by which full-length HTT might be expressed and purified from either insect (13)(14)(15) or mammalian cells (16). However, the tools produced and shared with the wider community are often limited by the number of different polyQ lengths, the position of an affinity tag, or their tractability for large-scale production for biochemical studies. To date, the published literature reporting experiments with purified HTT protein samples remains limited. Therefore, tools and detailed methods that will enable biochemical and biophysical studies of HTT by a larger number of researchers should accelerate our understanding of the function of this elusive protein.
Toward this end, we have cloned 28 HTT constructs that allow expression of HTT protein through transient transfection of mammalian cells or viral transduction of insect or mammalian cells. Constructs have either N-or C-terminal FLAG tags to assist in purification and yields of WT and polyQ-expanded HTT protein using these systems are up to 1 mg/liter of suspension culture of either insect or mammalian cells. The protein samples obtained from a simple two-step protocol are highly pure (Ͼ90% purity) and amenable to numerous downstream analyses and assays. Our constructs permit production of HTT in complex with the HTT-binding protein HAP40, as well as in its apo-form, and we have characterized these HTT protein samples. This includes mapping post-translational modifications (PTMs) of the proteins derived from both insect and mammalian cells, revealing similar modification motifs to those previously reported in the literature (17)(18)(19). Both apo and HTT-HAP40 complex samples are folded, as judged by reasonable thermal melting transitions of protein samples in solution. HTT and HTT-HAP40 samples were also assessed for monodispersity using size-exclusion chromatography in tandem with multiangle light scattering (SEC-MALS). We demonstrate how using these resources to generate large amounts of purified HTT protein sample enables protein-intensive experiments such as SAXS. We analyzed SAXS data for apo HTT samples and the HAP40 complex, providing initial insight to the complex structure in solution.

Cloning of HTT expression constructs
Ligase-independent cloning (LIC) was used to clone the fulllength HTT gene into the baculovirus transfer vector pBMDEL (Fig. 1A), for expression of proteins in insect cells as well as in mammalian cells. In addition to the sites for LIC, the vector contains a "stuffer" fragment that includes the SacB gene, allowing negative selection on 5% (w/v) sucrose, and a truncated VSVG fragment for pseudotyping of the baculovirus. As described previously for other HTT clones, a 15-bp repetitive element containing a mix of CAG and CAA codons was used to encode the polyQ expansion, in an effort to help maintain stability and integrity of the DNA sequence through various generations of vector propagation (20).
As an ϳ10-kb gene with multiple repetitive sequence elements, HTT is nontrivial to subclone between different vectors. We first generated N-and C-terminally FLAG-tagged pBMDEL-HTT constructs lacking part of the exon 1 sequence. Using different polyQ lengths encoding exon 1 PCR-generated cDNAs, our LIC-cloning protocol generated a variety of different polyQ-encoding HTT constructs due to the error-prone nature of the recombination step. By sequencing multiple colonies, we identified HTT clones with a variety of polyQ lengths with both N-and C-terminal FLAG tags (Fig. 1B). These entry vectors serve as valuable reagents to allow future generation of even more polyQ length HTT constructs. Additionally, by using a repetitive mix of codons for the polyQ expansion (CAG CAA CAG CAA CAA) n , we expect improved polyQ stability over generations of plasmid, bacmid, and baculovirus propagation compared with repetitive CAG codon tracts (20).
The resulting HTT open reading frames encoded within this series of constructs have either an N-terminal FLAG-octapeptide between the START methionine and the N-terminal Figure 1. A, pBMDEL vector map. B, 28 HTT expression constructs with different polyQ lengths were generated with either N-or C-terminal FLAG tags. C, FLAG tags are appended to either end of the full-length HTT expression construct (comprising exon1, the N-terminal HEAT domains, the IDR, the Bridge domain, and the C-terminal HEAT domain) with minimal additional sequence.

A toolkit of HTT protein resources
methionine of the HTT amino acid sequence or have the FLAG-octapeptide linked to the extreme C terminus of HTT via a Gly-Gly-Ser-Gly linker (Fig. 1C). As subtle changes to the exon 1 amino acid sequence of the HTT protein have been shown to give rise to changes to biophysical properties of the protein (21,22), the C-terminally FLAG-tagged constructs allow expression of a "clean" exon 1 sequence.

Expression of HTT variants in insect cells or mammalian cells yields functional proteins
The HTT pBMDEL expression constructs we have developed allow the expression of HTT protein by three different methods: baculovirus-mediated expression in insect cells; transient transfection in mammalian cells; or transduction in mammalian cells (Fig. 2). All three methods allow cell growth in suspension culture permitting facile scaling of the culture volumes and thus scaling of the protein production as needed. Irrespective of the expression system, HTT protein could be purified in a two-step protocol, as described previously (14), from cell lysates in a Tris-salt buffer system comprising first a FLAG pulldown step and followed by size-exclusion chromatography using a Superose6 resin column ( Fig. 2A). Similar to the HTT purification efforts of other research groups, multiple peaks are present in the size-exclusion chromatography profile, likely indicating the presence of a range of different oligomeric and/or aggregated states.
Yields of the WT (Q23) purified HTT protein samples by the three expression methods can be as high as ϳ1.6 mg/liter production in Sf9 insect cell culture to ϳ1 mg/liter in transduced EXPI293F mammalian cells and ϳ0.4 mg/liter in transiently transfected EXPI293F mammalian cells when measured after FLAG pulldown. Comparisons of preparations of HTT Q23 samples with either N-or C-terminal FLAG tag did not show significant difference in yield. In contrast, comparison of the yields of purified HTT with different polyQ expansions showed a trend of decreasing yield with increasing polyQ length. For example, in insect cells HTT Q42 yielded ϳ0.5 mg/liter, whereas Q145 gives yields of Ͻ0.1 mg/liter. Longer polyQ lengths were also generally found to be more variable in yield between productions. For all constructs in each expression system, the two-step purification protocol yielded a protein sample that is Ͼ90% pure by Coomassie-stained SDS-PAGE analysis (Fig. 2B). Samples were analyzed by Western blotting using anti-HTT antibodies that revealed a discrete band of the expected molecular weight for each sample (Fig. S1).
To assess whether these protein samples were folded, the C-terminal FLAG-tagged HTT samples of different polyQ lengths, purified from Sf9 cells, were analyzed by DSLS over a temperature gradient from 25 to 85°C to assess thermal stability and propensity to aggregate under increasing temperatures. HTT samples were stable up to ϳ55°C with sigmoidal thermal melting curves reflective of a folded globular protein (Fig. 3A). Interestingly, irrespective of polyQ length, the temperature of aggregation (T agg ) values (35) for all HTT samples were similar at ϳ60 -63°C, indicating that the polyQ repeats did not significantly affect protein thermal stability. This suggests that polyQ may not be interacting with the folded globular part of the protein.
Purification of the HTT-HAP40 complex, previously reported from an adherent mammalian cell-expression system (23), could be achieved through a 3:1 viral titer ratio of HTT 1-3144 Q23 in a C-terminally FLAG-tagged pBMDEL expression construct and HAP40  in an N-terminally His 6 -tagged pFBOH-MHL insect cell expression vector (Fig. 2C). The purification protocol was modified so that an additional Ni-affinity chromatography step was included following the FLAG pulldown. The final sizeexclusion chromatography step reveals that the HTT-HAP40 complex is a monodisperse sample, indicating increased protein stability and conformational homogeneity compared with apo HTT. Formation of this complex by HTT produced in insect cells indicates that the protein expressed is correctly folded and functional with respect to formation of an important protein interaction.
HTT-HAP40 complexes with WT (Q23) and polyQ expanded (Q54) were also analyzed by DSLS (Fig. 3B) yielding similar thermal aggregation profiles (ϳ57-60°C) again, suggesting a lack of interaction between the polyQ repeat and the globular portion of the protein complex.

HTT expressed in Sf9 insect cells retains reported phosphorylation PTMs
PTM of HTT is well-described for protein derived from various mammalian cell systems and in some detail for HTT extracted from post-mortem brain tissue (17)(18)(19). However, it is not known whether these PTMs are conserved in HTT expressed in Sf9 insect cells. Purified HTT Q23 and Q54 from Sf9 and EXPI293F were subjected to bottom-up proteomics (24, 25). PTMs were mapped for HTT expressed in Sf9 and EXPI293F cells and compared with published PTMs of mammalian-derived HTT (Tables 1-4 detail results for HTT Q23 samples from Sf9 and EXPI293F production, and complete data can be found on PRIDE (accession PXD010865) and in Zenodo).
To map the PTMs on HTT Q23 produced in Sf9 cells as completely as possible, this sample was digested with five proteases having complementary and nonspecific cleavage specificity: trypsin; lysargiNase (27); pepsin; WT ␣-lytic protease (WaLP); and M190A ␣-lytic protease (MaLP) (28). Trypsin and lysargiNase cleave at the C and N termini, respectively, of lysine and arginine residues thus yielding complementary (or mirrorimage) peptides. WaLP and MaLP preferentially cleave at aliphatic amino acids, whereas pepsin at pH Ͼ2 cleaves at Phe, Tyr, Trp, and Leu in position P1 or P1Ј (29). MaLP, WaLP, and pepsin were selected to probe the Lys-and Arg-poor regions of the protein. Digestion of the other HTT samples was performed with trypsin alone.
When LC/MS data from the five proteolysis reactions of Q23 HTT from Sf9 are searched together, at least 90% sequence coverage was observed, whereas trypsin digestion of the other HTT samples yielded at least 50% sequence coverage (Figs. S2 and S3). As we were able to digest such large overall amounts (hundreds of micrograms) of HTT protein in multiple rounds of MS experiments, due to the high levels of production from our expression systems, this also increased the overall peptide coverage and allowed us to map PTMs with lower incidence in the samples. Because of the large size of the HTT protein, rou-tine MS protein identification experiments of intact sample are not feasible. Instead, we used peptide mapping analysis to confirm that the purified sample was indeed the HTT protein. As expected for the high purity of the samples indicated by SDS-PAGE analysis, HTT sequence peptides were the highest abundance proteins detected, although some contaminating proteins were detected. Details of these contaminants for the HTT Q23 samples from EXPI293F cells (Table S1) show that

A toolkit of HTT protein resources
most of these proteins are unlikely to be true HTT interactors as they have high CRAPome scores (30) or are of very low abundance. Modifications were detected for all samples, with well-described phosphorylation motifs being present in HTT samples from both Sf9 and EXPI293F production methods (Fig. 4).
By employing multiple enzymes, sequences in regions containing sparse Lys and Arg residues, for example a 20-a-mino acid-long peptide within exon 1 (Fig. S4), were detected. Peptide-spectrum matches (PSMs) were used to prioritize the confident phosphorylation sites. HTT expressed in Sf9 cells retains many of the highly-observed phosphorylation sites described in the literature for mammalian cell lines and post-mortem tissue (Tables 1 and 2

Table 1 Phosphorylation motifs identified for Sf9-expressed Q23 HTT 1-3144 in references to literature
Modifications that have been discovered in proteomics studies, but not published, were retrieved from PhosphoSitePlus (17). Some modifications have not been described before. To illustrate the likelihood of these being physiologically relevant modifications, NetPhos 3.1 predictions for the putative enzyme and likelihood score are included (31). Only modifications with at least three peptide spectrum matches for at least one peptide containing the modification are listed in the table. All data are available via PRIDE (accession PXD010865) with summaries in Zenodo.

Site
No. of PSMs a Reports or predictions HD patient/control tissue samples

A toolkit of HTT protein resources
which have been reported in at least one instance in the literature. Mapping the remaining modifications to the cryo-EM HTT-HAP40 model shows that most would-be surface-exposed residues in the context of apo HTT (Fig. S7) and their respective physiological likelihood and probably kinase, as determined by NetPhos analysis (31), are detailed in Table 1. Monomethylation of some lysine and arginine residues was also detected ( Table 2). Sequence analysis of HTT using CIDER (32) and IUPred (33) in conjunction with analysis of the recently published near-atomic resolution cryo-EM structure of HTT in complex with HAP40 ( Fig. 5) (23) reveals that most of the phosphorylation sites are within disordered regions of the protein structure as described previously (18). Although some of these previously unreported modifications may be artifacts of the Sf9 expression system, they appear to have a minimal effect on global huntingtin function, as seen by the ability of this sample to form a complex with HAP40. A total of 25 phosphorylation motifs with at least three PSMs were mapped for HTT Q23 expressed in EXPI293F cells, 19 of which have been described previously in the literature (Tables 3  and 4 and Fig. S5). Interestingly, we also observed acetylation (Lys-826 and Lys-2932), monomethylation (Arg-2781 and His-2786), and dimethylation (Arg-2053) of our samples, none of which have been previously described in the literature. Acetylation of HTT at other sites has been previously Table 2 Arginine and lysine monomethylation modifications identified for Sf9-expressed Q23 HTT 1-3144 in references to literature Methylation of huntingtin has not been previously described or reported. Only modifications with at least three peptide spectrum matches for at least one peptide containing the modification are listed in the table. All data are available via PRIDE (accession PXD010865) with summaries in Zenodo.

Site
No 3 (4) Arg-2774 7 (15) a PSMs are reported as the number of peptide spectrum matches for the most abundant peptide containing the modification described with the total number of peptide spectra for all peptides containing this modification motif in parentheses.

Table 3
Phosphorylation motifs identified for EXPI293F-expressed Q23 HTT 1-3144 in references to literature Modifications that have been discovered in proteomics studies, but not published, were retrieved from PhosphoSitePlus (17). Some modifications have not been described before. To illustrate the likelihood of these being physiologically relevant modifications, NetPhos 3.1 predictions for the putative enzyme and likelihood score are included (31). All data are available via PRIDE (accession PXD010865) with summaries in Zenodo.

Site
No. of PSMs a Previously reported HD patient/control tissue samples  A toolkit of HTT protein resources described, and methylation motifs are observed in MS data of post-mortem human brain tissue samples of HTT (18), indicating that HTT protein methylation is a physiological modification.

Characterization of HTT and HTT-HAP40 protein sample monodispersity
Size-exclusion chromatography of HTT protein derived from insect Sf9 or mammalian EXPI293F cells using a Super-ose6 10/300 GL column gives a characteristic elution profile ( Fig. 2A), with a void-aggregate peak followed by peaks previously attributed as being dimer and monomer species of HTT based on column standards (14 -16). The recent cryo-EM structure of HTT in complex with HAP40 reveals a bi-lobed structure of HTT in which the N-HEAT and C-HEAT domains wrap around HAP40 yielding a more compact and globular structure (23). Furthermore, HAP40 was described as being critical for producing a conformationally homogeneous HTT sample amenable to cryo-EM structure determination. Our purified samples of apo HTT therefore lack a binding partner such as HAP40, which may account for the broad and overlapping elution peaks observed in gel-filtration analyses as well as the tendency for self-association when HTT samples are analyzed at higher protein concentrations.

A toolkit of HTT protein resources
To further understand this tendency for self-association and sample heterogeneity, a C-terminal FLAG-tagged HTT Q23 sample taken from the "monomer" peak of the Superose6 elution profile was analyzed by SEC-MALS using the same specification Superose6 column, which allows calculation of the insolution protein mass. This analysis revealed a peak with a shoulder with approximate mass calculations indicating that this sample is a mixture of both HTT monomer and dimer (Fig.  6A). In contrast, the HTT-HAP40 complex sample run on the same SEC-MALS set up at the same total protein concentration is monodisperse, and the mass calculated across the peak is stable indicating the sample is homogeneous and not self-associating (Fig. 6B). Long-term storage and freeze-thaw of HTT-HAP40 samples had minimal effect on the peak profile, whereas apo HTT samples had a tendency to redistribute from monomer peak to a peak profile similar to that observed during purification. Taken together, these results suggest HAP40 binding reduces homotypic HTT interaction, possibly by competing for an interaction interface or through a linkage effect (34).

SAXS analysis of the HTT-HAP40 complex in solution
The cryo-EM structure of HTT-HAP40 (23) has laid a tremendous foundation for our understanding of the structure of the huntingtin protein with respect to its global architecture, HEAT repeat organization, and complex formation with the HAP40-binding partner protein. However, technical limitations of cryo-EM combined with sample limitations from the conformational flexibility and heterogeneity of HTT limit our current understanding of certain structural details. The Guo et al. (23) cryo-EM structure was resolved at a resolution of 4 -5 Å and is missing several regions of the HTT protein. Roughly 25% of the huntingtin protein, including many functionally important elements such as exon1 and the highly modified 400 -650 amino acid intrinsically disordered region (IDR), are not resolved in the cryo-EM maps, presumably due to the fact these regions are intrinsically disordered (Fig. 5).
To further understand the structure of the entire HTT protein, we conducted SAXS analysis of both the Q23 isoforms of HTT (isolated monomer peak) and the HTT-HAP40 complex in solution. Similar to other biophysical and structural analysis techniques, SAXS requires large (milligram) quantities of protein. Our toolkit of HTT reagents permits production of sufficient sample for structural analyses, expediting further investigation of the HTT protein by such methodologies.
For both HTT and HTT-HAP40 sample data, R g -based Kratky plots of the experimental curve do not fit the expected data for a generic globular protein of similar mass (Fig. 7). The experimentally calculated radius of gyration (R g ) for both samples was also much larger than that expected on the basis of the resolved residues of the cryo-EM structure (Table 5). This indicates that there is a degree of flexibility or disorder in both samples. This is not unexpected due to the large regions of the HTT protein sequence with predicted disorder, which are not present in the Guo et al. (23) cryo-EM HTT-HAP40 model. Interestingly, the normalized pair distance distribution function P(r) of HTT-HAP40 shows a narrower range of atomic radii compared with the apo HTT sample, consistent with the

A toolkit of HTT protein resources
HAP40 complex being more compact. However, taking into account the high propensity of HTT self-association observed in our analytical size-exclusion chromatography profiles, apo HTT SAXS analysis may also be confounded by self-association of molecules in the concentrated solutions used for SAXS data collection. This assertion is corroborated by the higher molecular weight estimated from SAXS data ( Table 5) for apo HTT compared with HTT-HAP40. Therefore, further SAXS analysis of the apoprotein was not pursued.
To better understand the HTT-HAP40 structure, including the disordered/missing regions of the cryo-EM model, we performed coarse-grained molecular dynamics (MD) simulations, and calculated an ensemble of conformations that best fits the solution SAXS data for HTT-HAP40. This modeling approach assumed that the residues with known coordinates in the cryo-EM model form a quasi-rigid complex, whereas the residues with missing coordinates are flexible. Predicted SAXS scattering curves were averaged over an ensemble of MD-simulated structures using the optimal weights for each ensemble member obtained with the sparse ensemble selection (SES) method (35). The resulting average "theoretical" scattering curve for the SES weighted ensemble of structures gave a much better fit to the experimental scattering data than that of the cryo-EM structure (Fig. 8A).
The most populated model (44%) in this ensemble (Fig. 8C) shows extensive protruding regions of disorder extending out from the rigid complex core indicating that the overall envelope of the protein is likely to be larger than that calculated from the cryo-EM structure (Fig. 8B). For many of the HEAT repeats, the disordered regions of the connecting sequence are not seen in the cryo-EM structure, but the molecular modeling we have completed allows us to visualize how these might be arranged with respect to the rigid HEAT repeat core structure. The complete SES ensemble (Fig. 8D) further indicates how, in particular, the IDR and exon1 region of huntingtin are probably very structurally heterogeneous and dynamic in their conformation and are able to extend away from the more rigid core of the structure in many different arrangements due to their inherent flexibility. The extension of both exon1 and the IDR away from the core HTT-HAP40 complex is consistent with the accessibility of these domains to enzymes capable of post-translational modification.
A key feature of the cryo-EM structure is the large cavity that extends through the N-terminal HEAT repeat domain. This cavity is approximately the same diameter as a dsDNA helix, and it is tempting to envision a potential nucleic acid-binding role of this region of the HTT protein, especially given the functional links between HTT and DNA damage repair (36,37). At the current resolution of the structural information, it is also difficult to analyze potential surface charge or "greasy" surface residue hotspots that could indicate interaction sites. However, our SES ensemble model indicates how this cavity could be capped by certain conformational states of disordered loop regions, not resolved in the cryo-EM structure. These loops could act as gatekeepers to any binding partner, nucleic acid, or protein by accessing this cavity. Similarly, an apparent cavity on the side of the N-terminal HEAT repeat domain in the cryo-EM model may also be capped by a flexible protein chain. Expansion of the polyQ region seems unlikely to affect the global structure of huntingtin given that exon1 is distal from the rigid HEAT repeat domains. Therefore, the mechanism by which the polyQ expansion affects huntingtin protein structure and function remains a question for future structural studies.

Discussion
We have generated a resource of 28 different HTT expression constructs that allow the generation of purified HTT samples of different polyQ lengths and affinity tags from three different expression systems. All constructs are available through Addgene, including the entry vector that will allow other researchers to make additional CAG expansion forms of the HTT gene should they require them. Our expression constructs permit facile scaling of culture volumes to enable the purification of milligram quantities of WT HTT protein from both insect and mammalian cells as well as substantial production of various polyQ-expanded HTT species.
These HTT proteins from different expression systems are modified with PTMs previously described in the literature as well as additional modifications of unknown physiological relevance but that do not seem to alter HTT function in its ability to form a complex with HAP40. We describe HTT protein methylation for the first time, a modification conferred by various protein methyltransferases, many with links to neurodegeneration (38). Further characterization of these modifications and their function could open up novel avenues for understanding HTT protein structure-function. The constructs cloned may also be used in future studies for co-expression with modifying enzymes to make highly site-specific PTMs on the sample as well as to test how certain enzymes might function on HTT.
Our work characterizing the biophysical properties of HTT confirms that the protein is not monodisperse or homogeneous when purified in its apo-form. Co-expression and purification with HAP40 allow purification of a more monodisperse protein sample, rendering it amenable to more detailed structural analysis, as performed previously by cryo-EM (23). It is unclear whether HAP40 is a constitutive binder of HTT in physiological settings, although its effects on the biophysical characteristics of the HTT protein are clearly significant. Interestingly, very few HTT protein-protein interaction studies have identified HAP40 as an interacting protein of HTT, and it is only identified in the published literature in a small number of articles (23, a Radius of gyration (R g ) was calculated the using Guinier fit in the q range 0.015 Ͻ q Ͻ 0.025 Å Ϫ1 and 0.012Ͻ q Ͻ 0.019 Å Ϫ1 for HTT/HAP40 and HTT, respectively. b Radius of gyration was calculated using GNOM. c Maximum distance between atoms was calculated using GNOM. d Molecular mass was estimated using SAXSMoW (47). The mass expected from the sequence is shown in the parentheses. e Molecular mass was estimated from SAXS data based on volume of correlation (Vc) (48). 39) compared with the multiple extensive HTT interaction network publications (41,42). Because of the high yield of both HTT and HTT-HAP40 proteins from our toolkit of resources, we were able to conduct preliminary biophysical analyses of these samples. Our SAXS analysis in tandem with molecular dynamics simulations permitted the generation of an SES ensemble, representing a possible solution structure of the HTT-HAP40 complex. This model gives insight into how HTT is post-translationally modified at flexible and accessible regions of sequence and suggests potential regulatory mechanisms such as steric capping of binding regions of the protein. Our SAXS model serves as an important resource for understanding the complete HTT-HAP40 complex, and it should help prevent misinterpretation of certain features of the cryo-EM structure that lacks ϳ26% of the protein molecule. In particular, the exon1 region of HTT is distal to the complex core, and polyQ expansion does not affect HTT thermal stability as shown by DSLS. Therefore, it is likely that the effect of polyQ expansion on HTT structure-function is more nuanced. Both exon1 and the IDR have many of the hallmarks of interacting domains observed for intrinsically disordered protein regions (43), as they are heavily modified by PTMs, are conformationally flexible, and contain regions of charged amino acids (i.e. nearly 20% of residues in the IDR are negatively charged). The purported role of huntingtin as a scaffold protein could be explained through dynamic proteinprotein interactions mediated through these regions of the structure (40,41).
The precise molecular function of unexpanded HTT remains elusive, so it is unclear how the polyQ expansion may alter the HTT protein sufficiently to give rise to the wide-ranging biochemical dysfunction observed in HD models and patients. These reagents and the accompanying methods and validation for the production of HTT protein will provide an enabling

A toolkit of HTT protein resources
framework for future research requiring purified HTT and its complexes for a wide range of polyQ lengths.

Cloning of HTT and HAP40 expression constructs
HTT expression constructs were assembled in two steps into the mammalian/insect cell vector pBMDEL, an unencumbered vector created for open distribution of these reagents. First, entry vectors for N-terminal FLAG-tagged and C-terminal FLAG-tagged HTT (amino acids 1-3144) were constructed without the polyQ regions, amino acids 7-85. PCR products encoding WT HTT were amplified from cDNA (Kazusa clone FHC15881) using primers N_int_FWD (ttaagaaggagatatactA-TGGACTACAAAGACGATGACGACAAGATGGCGACC-CTGGAAAAGCgctGACCTTAGTCGCTAAcctgcaGGAGC-CGCTGCACCGACCAAAG)/N_int_REV (gattggaagtagaggtt-ttaGCAGGTGGTGACCTTGTGGAC) for the N-terminal FLAG-tagged HTT and C_int_FWD (ttaagaaggagatatactATG-GCGACCCTGGAAAAGCgctGACCTTAGTCGCTAAcctgc-aGGAGCCGCTGCACCGACCAAAG)/C_int_REV (gattggaa-gtagaggttttaCTTGTCGTCATCGTCTTTGTAGTCaccgcttcc-accGCAGGTGGTGACCTTGTGGAC) for the C-terminal FLAG-tagged HTT. All PCR products were inserted using the In-Fusion cloning kit (Clontech) into the pBMDEL that had been linearized with BfuAI. Second, synthetic polyQ regions were inserted into the intermediate plasmids using the In-Fusion cloning kit. The polyQ regions were PCR-amplified using the primers polyQP_Fwd (ATGGCGACCCTGGAAAA-GCTG)/polyQP_Rev (TGGTCGGTGCAGCGGCTCCTC) from template plasmids CH00007 (Q23), CH00008 (Q73), and CH00065 (Q145) (all from Coriell Institute Biorepository). PCR products were inserted into the intermediate vectors that had been linearized with AfeI and SbfI. Upon screening the assembled HTT expression constructs, we found that our cloning method generated a range of polyQ lengths. We selected constructs with polyQ lengths from Q15 to Q145. The HTT-coding sequences of intermediate and final expression constructs were confirmed by DNA sequencing. The sequences were confirmed by Addgene where these reagents have been deposited. HAP40 cDNA corresponding to amino acids 1-371 was subcloned into pFBOH-MHL expression vector using ligase-independent cloning.

HTT and HTT-HAP40 protein expression
The recombinant transfer vectors HTT pBMDEL and HAP40 pFBOH-MHL were transformed into DH10Bac Escherichia coli cells (Invitrogen, Bac-to-Bac System) to generate recombinant Bacmid DNA. Sf9 cells (Invitrogen) were transfected with Bacmid DNA using jetPRIME transfection reagent (PolyPlus transfection, catalog no. 89129-924), and recombinant baculovirus particles were recovered. The recombinant virus titer was sequentially amplified from P1 to P3 virus stocks for protein production in the Sf9 insect cells and EXPI293F mammalian cells.
Baculovirus-mediated expression of HTT in insect cells-Sf9 cells at a density of ϳ4.5 million cells/ml were infected with 8 ml of P3 recombinant baculovirus and grown at 130 rpm and 27°C. HyQ SFX insect serum medium containing 10 g/ml gentamicin was used as the culture medium. Infected cells were harvested when viability dropped to 80 -85%, normally after ϳ72 h post-infection. For HTT-HAP40 complex production, the same general protocol was followed but with a 3:1 ratio of HTT/HAP40 P3 recombinant baculovirus infection step.
Transduction of HTT in mammalian cells-EXPI293F cells (ThermoFisher Scientific, catalog no. A14527) were maintained in EXPI293 expression medium (ThermoFisher Scientific, catalog no. A1435102) in a humidified 8% CO 2 incubator at 37°C and 125 rpm. Cells were transduced at a density of 2-3 million cells/ml of culture. The transduction used recombinant baculoviruses of HTT constructs generated by transfecting Sf9 cells using Transfer vector pBMDEL and JetPRIME transfection reagent (catalog no. 89129-924). The volume of the virus added into the cells was at ratio 6% of the total volume of the production. Infected cells were harvested after 7-10 days posttransduction depending on cell viability.
Transient transfection of HTT in mammalian cells-EXPI293F cells (ThermoFisher Scientific, catalog no. A14527) were maintained in EXPI293 expression medium (Thermo-Fisher Scientific, catalog no. A1435102) in a humidified 8% CO 2 incubator at 37°C and 125 rpm. Cells were transfected at a density of 2-3 million cells/ml of culture. FectoPRO transfection reagent (VWR, catalog no. 116-001) and plasmid pBMDEL-HTT DNA were separately diluted in serum-free OptiMEM complexation medium (ThermoFisher Scientific, catalog no. 31985062) at 10% of the total production volume in a ratio of 1 g of DNA to 1.

HTT and HTT-HAP40 protein purification
The same protocol was used to purify HTT from insect and mammalian cell culture, adapted from Huang et al. (16) and Guo et al. (23). Cell cultures were harvested by centrifugation at 4000 rpm, 20 min, 4°C (Beckman JLA 8.1000), washed in prechilled PBS, resuspended in 20 cell paste volumes of preparation buffer (50 mM Tris, pH 8, 500 mM NaCl), and stored at Ϫ80°C prior to purification. Cell suspensions were thawed and diluted to at least 50 times the cell paste volumes with prechilled buffer and supplemented with 1 mM phenylmethylsulfonyl fluoride, 1 mM benzamidine-HCl, and 20 units/ml benzonase. Note: two freeze-thaw cycles of cell suspensions were found to be sufficient for cell lysis. The lysate was clarified by centrifugation at 14,000 rpm, 1 h, 4°C (Beckman JLA 16.2500), and then bound to 0.1 cell paste volumes of anti-FLAG resin (Sigma M2) at 4°C with rocking for 2 h. Anti-FLAG resin was washed twice with the 100 cell paste volumes of buffer. HTT protein was eluted with 1 cell paste volume of buffer supplemented with 250 g/ml 3ϫFLAG peptide (Chempep) run twice over the anti-FLAG resin. Residual HTT protein was washed from the beads with 0.5 cell paste volume of buffer. The sample was spin-concentrated with molecular weight cutoff of 100,000. Depending on the protein yield, the sample was run as one or more sample runs on Superose 6 10/300 GL column in sizeexclusion chromatography buffer (20 mM HEPES, pH 7.5, 300 mM NaCl, 5% (v/v) glycerol, 1 mM TCEP) at 0.4 ml/min ensuring no more than 2 mg of protein was applied per run to minimize protein aggregation. For HTT-HAP40, the same protocol was followed except for using preparation buffer with just 300 mM NaCl, and an additional step where FLAG elution was rocked with 2 ml of equilibrated nickel-nitrilotriacetic acid resin for 30 min before washing in preparation buffer and then elution with buffer supplemented with 300 mM imidazole prior to the size-exclusion chromatography step.

SDS-PAGE and Western blot analysis
SDS-PAGE and Western blot analysis were performed according to standard protocols. In brief, purified proteins were denatured in sample buffer (50 mM Tris-HCl, 0.1 M DTT, 2% SDS, 0.1% bromphenol blue, and 10% glycerol, pH 6.8) at 98°C for 5 min, followed by SDS-PAGE. After electrophoresis, the gel was either directly stained with Coomassie Blue or subjected to Western blot analysis. For Western blot analysis, proteins were transferred onto polyvinylidene difluoride membranes. The primary antibody used was anti-HTT (Abcam, ab109115, 1:5000), and the secondary antibody used was IRDye 800CW anti-rabbit IgG (LI-COR, 926-32211, 1:5000). Membranes were visualized on an Odyssey CLx Imaging System (LI-COR).

HTT MS
All data were acquired on an Agilent 1260 capillary HPLC system coupled to an Agilent Q-TOF 6545 mass spectrometer via the Dual Agilent Jetstream ion source.
Bottom-up proteomics for sequence coverage and PTM analysis-Proteins were processed according to established protocols (42). Briefly, proteins were reduced with DTT (10 mM final concentration) for 30 min at room temperature, alkylated with iodoacetamide (55 mM final concentration) for 30 min at room temperature, and incubated with trypsin (6 l, 0.2 mg/ml) overnight at 37°C. The digests acidified to pH 2 in hydrochloric acid and were desalted on-column (by diverting the first 2 min to waste), before analysis. Peptides were separated on a C18 Advance BioPeptide column (2.1 ϫ 150 mm 2.7-m particles) at a flow rate of 400 l/min and an operating pressure of 4700 p.s.i. Peptides were eluted using a gradient from 100% solvent A (98:2 H 2 O/ACN with 1% formic acid) to 50% B (96:4 ACN/H 2 O with 1% formic acid) for 80 min. Mass spectra were acquired from m/z 300 to 1700 at a rate of eight spectra/s. The tandem mass spectra were acquired in automated MS/MS mode from m/z 100 to 1500 with an acquisition rate of three spectra/s. The top 10 precursors were selected and sorted by abundance only. Collision-induced dissociation was done using all ions at [4⅐(m/z)/100] Ϫ 1 and Ϫ 5.
Data analysis-Raw data were processed using PEAKS Studio 8.5 (build 20171002) and the reference complete human proteome FASTA file (Uniprot). Cysteine carbamidomethylation was selected as a fixed modification, and methionine oxidation and Asn/Gln deamidation as variable modifications. A minimum peptide length of five, a maximum of three missed cleavage sites, and a maximum of three labeled amino acids per peptide were employed.

HTT sequence disorder prediction
Disorder prediction was performed using IUPred (33,43). A threshold of 0.5 was used to define disordered or ordered regions, with predicted disordered regions shaded in light red (Fig. 5). Further sequence analysis of HTT was performed using the sequence analysis tool local CIDER (32). Hydrophobicity was calculated using a normalized Kyte-Doolittle scale (43, 44). Resolved structure and domain annotations were based on the solved cryo-EM structure of HTT in complex with HAP40 (23).

SAXS data collection and analysis
SAXS measurements were carried out at the beamline 12-ID-B of the Advanced Photon Source, Argonne National Laboratory. The energy of the X-ray beam was 14 keV (wavelength ϭ 0. 8856 Å), and two setups (small-and wide-angle X-ray scattering) were used simultaneously to cover scattering q ranges of 0.006 Ͻ q Ͻ 2.6 Å Ϫ1 , where q ϭ (4/)sin, and 2 is the scattering angle. Thirty two-dimensional images were recorded for each buffer or sample solutions using a flow cell, with the accumulated exposure time of 0.8 -2 s to reduce radiation damage and obtain good statistics. No radiation damage was observed as confirmed by the absence of systematic signal changes in sequentially collected X-ray scattering images. The 2D images were corrected and reduced to 1D scattering profiles using the Matlab software package at the beamlines. The 1D SAXS profiles were grouped by sample and averaged.

A toolkit of HTT protein resources
The scattering profile of the protein was calculated by subtracting the background buffer contribution from the sample-buffer profile, and the difference data were extrapolated to zero solute concentration by standard procedures. Guinier analysis and the experimental radius of gyration (R g ) estimation from the data of infinite dilution were performed using PRIMUS. The pair distance distribution function, P(r), and the maximum dimension of the protein, D max , in real space was calculated with the indirect Fourier transform using program GNOM (46). The molecular weights were estimated separately based on volume calculated by SAXMoW (47), and volume of correlation (Vc) (48) was calculated by DATVC in q range of 0 Ͻ q Ͻ 0.3 Å Ϫ1 . The theoretical scattering intensity of a structural model was calculated and fitted to the experimental scattering intensity using CRYSOL (49) and FoXS (50) programs.

Fitting structural ensemble to SAXS data
The SAXS data indicate that the HTT/HAP40 complex possesses some degree of flexibility. The known EM structure of the complex (Protein Data Bank code 6ez8) is missing ϳ26% of the residues, and it does not fit the SAXS data. We assume that the residues with known coordinates form a quasi-rigid part of the complex, although the residues with missing coordinates are flexible. We performed coarse-grained MD simulations to generate the initial ensemble of possible conformations of the complex. The MD trajectory of 1000 ns was generated at 300 K, and the theoretical scattering profiles in the q range 0 Ͻ q Ͻ 0.3 Å Ϫ1 for 5000 frames taken from the trajectory were calculated using FoXS. The calculated scattering curves were averaged over the entire ensemble of structures using the optimal weights for each ensemble member obtained with the SES method (35), and this average profile was compared with the experimental scattering data.

Coarse-grained molecular dynamics simulations
We used a coarse-grained model of HTT/HAP40 protein complex to enhance the sampling efficiency in the conformational space of the complex. In this model, amino acid residues in the proteins are represented as single beads located at their C ␣ positions and interacting via appropriate bonding, bending, torsion angle, and nonbonding potential. A Gö-like model of Clementi et al. (51) was employed to maintain the structured globular domains as quasi-rigid in the simulation. For flexible regions, we adopted a simple model in which adjacent amino acids beads are joined together into a polymer chain by means of virtual bond and angle interactions with a quadratic potential as shown in Equation 1, with the constants K b ϭ 50 kcal/mol and K ␣ ϭ 1.75 kcal/mol and the equilibrium values b 0 ϭ 3.8 Å and ␣ 0 ϭ 112°for bonds and angles, respectively. The excluded volume between nonbonded beads was treated with pure repulsive potential as shown in Equation 2, where r ij is the inter-bead distance; R ϭ 4 Å, and ⑀ R ϭ 2.0 kcal/mol.
The interaction between quasi-rigid domains is modeled with the residue-specific pair interaction potentials that combine short-range interactions with the long-range electrostatics as described (52,53). The short-range interaction is given by a Lennard-Jones 12-10 Ϫ6 -type potential, and simple Debye-Hückel-type potential is used for the electrostatic interactions (53). In this study, we used the dielectric constant of 80 and the Debye screening length of 10 Å, which corresponds to a salt concentration of about 100 mM. In-house software was developed and used to carry out constant temperature molecular dynamics simulations of the coarse-grained model described above.