The structure of the AXH domain of spinocerebellar ataxin-1.

Spinocerebellar ataxia type 1 is a late-onset neurodegenerative disease caused by the expansion of a CAG triplet repeat in the SCA1 gene. This results in the lengthening of a polyglutamine tract in the gene product ataxin-1. This produces a toxic gain of function that results in specific neuronal death. A region in ataxin-1, the AXH domain, exhibits significant sequence similarity to the transcription factor HBP1. This region of the protein has been implicated in RNA binding and self-association. We have determined the crystal structure of the AXH domain of ataxin-1. The AXH domain is dimeric and contains an OB-fold, a structural motif found in many oligonucleotide-binding proteins, supporting its proposed role in RNA binding. By structure comparison with other proteins that contain an OB-fold, a putative RNA-binding site has been identified. We also identified a cluster of charged surface residues that are well conserved among AXH domains. These residues may constitute a second ligand-binding surface, suggesting that all AXH domains interact with a common yet unidentified partner.

Spinocerebellar ataxia type 1 (SCA1) 1 is an autosomal dominant neurodegenerative disorder characterized by the loss of Purkinje's cells in the cerebellar cortex. It is a member of the polyglutamine expansion disease family. These diseases are caused by the abnormal lengthening of a CAG triplet repeat in the coding region of the respective gene. In normal individuals, the triplet repeat translates into a polymorphic glutamine tract of fewer than 35ϳ40 residues. In patients with these diseases, the length of the polyglutamine tract exceeds the 35ϳ40-residue threshold. This results in a toxic gain of function that leads to tissue-specific neuronal loss. In most cases, nuclear ubiquitinated aggregates of the pathogenic proteins are observed in affected neurons (1). The relationship between the polyglutamine tract and disease pathology is unclear and has been the subject of much interest (2). In some cases, overexpression of even the normal protein can lead to mild disease phenotypes (3), suggesting that protein misfolding or turnover may play a role in the disease process. SCA1 is caused by polyglutamine expansion in ataxin-1, a nuclear protein of ϳ800 residues. Transgenic animal models for this disease have contributed significantly to our understanding of polyglutamine expansion diseases in general. The areas addressed have included the role of protein aggregates in neural toxicity, the effects of chaperones and the proteasome in neuropathology, and the role of post-translational modification and protein-protein interactions in disease progression (4). Recently, it has been shown that neurodegeneration is mediated by the interaction of ataxin-1 with the 14-3-3 proteins (5). This finding has emphasized the importance of understanding the normal function of the protein. Ataxin-1 appears to be involved in the regulation of gene expression. It has been shown to associate with several proteins involved in controlling transcription, and an expanded allele of ScA1 can down-regulate early gene expression in Purkinje's cells in transgenic mice (6). A 120-residue region of ataxin-1 is similar in sequence to part of the HMG box transcription factor HBP1 (HMG box-containing protein-1) ( Fig. 1) (7)(8)(9)(10). This protein has a role in chromatin remodeling and regulates gene expression during the arrest of cell proliferation and during cell differentiation. The region of similarity has been shown to act as a transcription repression domain in HBP1 (9). This region has been termed the AXH (ataxin-1/HBP1) domain (SMART Database accession number SM00536). 2 There are small proteins corresponding to just the AXH domain present in Caenorhabditis elegans and Drosophila melanogaster (Fig. 1), suggesting that this is an independently folded unit. The ataxin-1 AXH domain has been implicated in self-association and RNA binding (11,12) as well as in binding of a ubiquitin protease (13) and p80 coilin (14). As part of an effort to understand the structure and function of proteins involved in polyglutamine expansion diseases, we have determined the structure of the AXH domain of ataxin-1.

EXPERIMENTAL PROCEDURES
Crystallization and Data Collection-The DNA sequence encoding the ataxin-1 AXH domain (residues 563-694, all residue numbering in this work follows that of ataxin-1 with a 30-glutamine repeat, unless otherwise stated) was subcloned into a modified pRSET-A vector (Invitrogen) that allows overproduction of a C-terminal His-tagged protein with a thrombin cleavage site. The plasmid was transformed into the Escherichia coli C41 strain (15) for overexpression. A double mutant of the AXH domain containing the I611M and V641M mutations was created to facilitate selenium incorporation. This mutant, SeMet AXH, was overproduced using the protocol of van den Ent et al. (16). Both native and SeMet AXH proteins were purified by Ni 2ϩ affinity chromatography, subjected to thrombin cleavage, and further purified by anion-exchange chromatography (Amersham Biosciences Resource-Q) and gel filtration (Superdex-75). The purified recombinant AXH proteins have 133 residues and contain a non-native Gly residue at their N termini. The identities of the proteins were confirmed by electrospray mass spectrometry.
The proteins were concentrated to ϳ20 mg/ml in 10 mM Tris (pH 7.0) with 5 mM dithiothreitol. Both native and SeMet AXH crystals were * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The A set of native diffraction data consisting of 114 frames of 1°oscillation each was collected from one single crystal. One SeMet crystal was used to collect a complete set of multiple wavelength anomalous dispersion data at the peak, edge, and remote wavelengths (Table I). For each wavelength, 100 frames of 1°oscillation image were collected, followed by another 100 frames corresponding to an inverse beam set. All data were collected under a nitrogen stream at 100 K at beamline 14.2 of the Synchrotron Radiation Source at Daresbury Laboratory (Warrington, United Kingdom). The diffraction data were processed with IPMOSFLM (17) and were merged, scaled, and reduced with programs from the CCP4 suite (27). The statistics for data processing are summarized in Table I. The space group of the native crystal is P2 1 2 1 2 1 , with unit cell dimensions a ϭ 48.78, b ϭ 82.44, and c ϭ 137.78 Å. There are four molecules in the asymmetric unit, giving a V m of 2.4 Å 3 /Da, which is within the range commonly observed in protein crystals (18). This corresponds to a 51% protein content.
Structure Solution and Refinement-The structure of the AXH domain was determined using multiple wavelength anomalous dispersion methods. Initial phases were calculated with the program SOLVE and refined with RESOLVE (19,20) and subsequently with the program SHARP (21). Phasing statistics are shown in Table II. Solvent flattening with SOLOMON (22) at an optimal solvent content of 42% produced an interpretable electron density map. An atomic model of one monomer chain was built into this 3.0-Å multiple wavelength anomalous dispersion experimental map with the program MAIN (23,24). The monomer model was then placed into the remaining electron densities corresponding to three other monomers. Because the native and SeMet mutant crystals are isomorphous, the crude structure model could be used for refinement against the native data at this stage. The SeMet data set was not used in further refinement. Each cycle of refinement consisted of manual rebuilding with the program O (25), followed by computational refinement at the early stages with CNS (26) and later with Refmac5 (27). Rebuilding was guided by electron density maps calculated with 2mF o Ϫ DF c and mF o Ϫ DF c coefficients (28). At all stages of refinement, 5% of the data were excluded in refinement for cross-validation. A bulk solvent model was used, and non-crystallographic symmetry was not imposed. A combination of programs ARP and Refmac4 (27) was employed in building and refinement of the solvent structure. The statistics of the final round of refinement are summarized in Table III.
Quality of Structure Model-The four molecules of the AXH domain in the structure model are of good quality. The N termini are better defined than the C termini in general. The C terminus of chain D is clearly revealed by electron density and can be completely built. Chains A-C have the last five residues disordered. There is a region of chain B that is disordered (residues 600 -611), and three residues of chain C (residues 635-637) cannot be modeled because of lack of electron densities. Chain B residues 600 -611 were modeled with reference to weak densities calculated from the SeMet data sets. These two regions were allowed to refine to elevated B factors to indicate that they are disordered. The final model was assessed by the program PROCHECK (29), and the stereochemistry was found to be excellent (Table III). The overall G-factor is Ϫ0.1, which is similar to other structures determined at this resolution. For analysis of the monomer, chain D was used; for dimer, the C⅐D dimer was used.
Sequence Conservation-Five AXH domains (human ataxin-1, CG4547 protein from D. melanogaster (Swiss-Prot accession number   Q9W3V7), K04F10.1 protein from C. elegans (Swiss-Prot accession number O44771), human HBP1 (Swiss-Prot accession number O60381), and frog HBP1 (Swiss-Prot accession number Q8JH80)) aligned manually were submitted to the ConSurf server (30) 3 for analysis. The ConSurf server assigns relative conservation scores to each residue, taking into account the evolutionary relationships among the family of homologs. The scores are normalized such that the average score is zero, and negative and positive deviations represent the degrees of conservation and variation, respectively. Each residue is then assigned a value 1-9 (1 for most variable, 5 for average, up to 9 for most conserved), which is used for mapping the relative conservation on the molecular surface (see figure legends).
Dimeric Interface Analysis-The dimeric interfaces were analyzed with the Protein-Protein Interaction server at University College London. 4 Accessible surface areas (ASA) and their differences (⌬ASA) were calculated with NACCESS.

RESULTS
Quaternary Structure-There are four chains in the crystal asymmetric unit arranged into two spherical dimers (A⅐B and C⅐D) in which individual molecules are related by an ϳ2-fold rotation (Fig. 2). Analytical ultracentrifugation indicated that the AXH domain is a dimer in solution (Fig. 3). The A⅐B and C⅐D dimers form similar interfaces with buried surface areas of 1570 and 1645 Å 2 , respectively. These values agree well with those obtained for other known homodimeric interfaces (Table  IV) (31). The other intermolecular contacts (B⅐C) seen in the crystal lattice are most likely an effect of crystal packing.
Monomer Structure-The main feature of the monomer is an open ␤-barrel with a Greek key motif formed by strands ␤3, ␤4, ␤5, and ␤9 (Fig. 4). This is known as an OB (oligomer-binding)fold and is found in many different proteins (32-34). When the AXH domain was queried with the DALI server 6 for structural homologs, the four top scoring structures were all OB-fold proteins, the most similar being the N-terminal domain of ribosomal protein L2. The AXH domain lacks the strand corresponding to OB-fold strand 5. In a typical OB-fold structure, the connectivity between strands 3 and 4 forms an ␣-helix that caps the barrel. In the AXH domain, the loop between strands ␤5 and ␤9 also contains a helix. There are, however, insertions at both ends of the loop that form three short strands (␤6, ␤7, and ␤8) (Fig. 4). The OB-fold is preceded by an ␣-helix (␣1) that packs onto the edge of the ␤-barrel and two short strands (␤1 and ␤2) (Figs. 4B and 6B).
Differences among the Four Monomers-There are considerable structural differences among the four chains of the AXH domain in the crystal asymmetric unit. Most noticeably, the 19 N-terminal residues preceding strand ␤1 (residues 562-580) can have two different structures: the A and C monomers adopt a main chain conformation that is distinct from that of the B and D monomers (Fig. 5, A and B). In chains A and C, residues 565 and 566 constitute a strand (␤0), and residues 573-575 constitute a partial 3 10 helix (h1) (Fig. 5A). In chains B and D, strand ␤0 is replaced by a 3 10 helix (h0) formed by residues 568 -570 (chain B) or by residues 564 -566 (chain D), followed by residues 574 -576 constituting the second 3 10 helix (h1) (Fig.  5A). In a strict structural sense, A⅐B and C⅐D are heterodimers, with residues 581-688 being the invariant body of the protein.
When chain A is aligned with chains B-D over this range, the 5 Available at www.mrc-cpe.cam.ac.uk. 6 Available at www.ebi.ac.uk/dali.

FIG. 5.
Interactions at the AXH C⅐D dimeric interface. A, chain D is shown with secondary structures in orange, and the N-terminal tail of 21 residues (positions 562-582) on chain C is in blue. This view is the same as that in Fig. 4B. B, shown is a detailed view of residues 562-582 of chain C (blue) and chain D (orange). The dyad axis is shown in yellow. Note that the two chains differ in structure up to residue 579.  Structure of the AXH Domain of Ataxin-1 3762 root mean square deviations in C-␣ atomic positions are 3.8, 0.5, and 1.2 Å, respectively. The large deviation between chains A and B is mainly due to the presence of a 13-residue disordered loop (residues 600 -612) in the B chain (Fig. 2). This region of chain B is not in contact with other chains of the AXH domain or with any symmetry-related molecules.
The Dimeric Interface-The 20 residues at the N terminus are the most important in maintaining the dimeric organization (Fig. 5, A and B). These two extended tails, running parallel to the dyad axis, embrace each other and bind to a mostly nonpolar binding surface. Both main chain and side chain atoms of the N-terminal tails participate in dimerization (Fig.  5B). Apart from the N-terminal tail, residues contributing to the dimeric interface include mostly hydrophobic residues from OB-fold elements ␤3, ␤5, ␣2, and ␤9. The residues that are most important in forming the dimer are listed in Table V.
Sequence Conservation-Approximately 65% (82) of the residues are conserved among the five AXH domains shown in Fig.  6B. The largest variation occurs in the region at the C terminus of helix ␣1 and the N-terminal half of strand ␤3. There are three regions where conservation is high. The first one includes many residues that are N-terminal to and that precede the OB-fold: from helixes h1 to ␣1. The second region covers from the middle of strand ␤5 to strand ␤7, including the two ␤-strands (␤6 and ␤7) that are extensions of the OB-fold. The third conserved region includes residues from the loop after strand ␤8 to strand ␤9. Most of the conserved solvent-exposed residues are clustered on one face of the molecule in an arrangement consistent with a ligand-binding site (Fig. 6A). About half of the residues that are important for dimer formation are conserved (Fig. 6B).

Structure and Function of the AXH Domain of Ataxin-1-It
has long been established that ataxin-1 dimerizes in vivo (35) and that dimerization is an important requirement for its normal function. The structure of the AXH domain and the analytical ultracentrifugation data provide direct confirmation of this observation. From truncation studies, residues 495-605 have been identified as the minimal self-association region of ataxin-1 (Fig. 1) (12). The boundaries of the self-association region do not coincide with those of the AXH domain: the two regions share ϳ40 residues in common, including the 20 residues that are the determinants for dimerization (Fig. 1). The fact that the self-association region can self-associate indicates that the interactions between the two N-terminal tails are strong enough to mediate dimerization in vitro, even without the rest of the AXH domain. A recent study revealed that the first five residues (positions 563-567, SPAAA) can be removed without affecting dimerization, whereas a further deletion of residues 568 -573 (PPTLPP) leads to low yield and non-native structure (36).
It is conceivable that these dimerization residues can be exploited in mediating heterodimeric interactions with other proteins. When the 20-residue peptide sequence was queried in BLASTP for similarity, the only significant hits were from the equivalent regions in the mouse and rat ataxin-1 homologs (data not shown). The search did not even pick up the corresponding region in the HBP1 protein, presumably because the overall homology of this 20-residue motif is weak (Fig. 6B). Nevertheless, the best conserved residues (Lys 577 , Gly 578 , and Ser 579 ) in this 20-residue motif contribute little to dimerization (Fig. 6B). It is unlikely that the AXH domain and HBP1 interact via this 20-residue motif.
The 20 N-terminal residues constitute an interesting motif that can adopt one of two structures, and these two different conformations interact complementarily to form an extensive dimeric interface. Despite the low sequence complexity, these two alternative conformations are well ordered (Fig. 5B). Apparently this structural adaptability is essential in maintaining the AXH dimer. It has been known for some time that some protein or peptide sequences can adopt different conformations dependent on the protein context. The most dramatic demonstration is a designer protein that harbors two 11-residue motifs, called the "chameleon sequence," that can form different secondary structures on different parts of the protein (37). There are many cases of disordered loops in proteins becoming well ordered upon binding a ligand. When a dimeric protein binds to a single DNA or RNA substrate, part of the monomer can adopt variable structures to allow for asymmetric interactions. This is found in the MAT␣2/MCM1⅐DNA complex (38), the MutS⅐DNA complex (39), and the NSP3⅐mRNA complex (40). The 20-residue chameleon sequence in the AXH domain has several unique features compared with these structures. First, the existence of alternative structures is not induced by ligand binding. On the contrary, the two unique conformations can be viewed as mutually inducing or adapting. Second, the interactions contributed by this motif are extensive and involve 20 residues compared with other reported chameleon sequences, which usually involve fewer than 11 residues. To the best of our knowledge, this observation of a dimeric interface FIG. 7. Structure of the LPCSK sequence of the ataxin-1 AXH domain. By sequence homology, these residues correspond to the residues of the IXCXE RB-interacting motif of the HBP1 protein.
Xenopus laevis (x) HBP1 (Swiss-Prot accession number Q8JH80). Residue numbering is that of the ataxin-1 30-glutamine repeat. The black and gray triangles mark the positions where a 12-residue insertion and a single amino acid insertion are present in the two HBP1 sequences, respectively. The secondary structures corresponding to chain D are shown on top. The IXCXE RB-interacting motif of HBP1 is highlighted in pink. The residues involved in dimer formation are identified by semicircles on top. Blue left-sided and red right-sided semicircles mark the residues of chains C and D, respectively. The number of semicircles is proportional to the change in ASA of that residue upon dimerization, with each semicircle representing 20 Å 2 and a maximum cutoff at four semicircles. Purple double-headed arrows identify conserved charged surface residues; cyan double-headed arrows identify putative RNA-binding residues (see A). Relative sequence conservation was analyzed with the ConSurf server (see "Experimental Procedures"). The colors dark green, light green, gray, light orange, and dark orange represent amino acids that are highly conserved (ConSurf color 9), conserved (ConSurf colors 7 and 8), average (ConSurf colors 4 -6), variable (ConSurf colors 2 and 3), and highly variable (ConSurf color 1), respectively. The color bar shows how the color numbers assigned by the ConSurf server are represented in this figure. formed by two mutually adapting yet different 20-residue chameleon motifs in the AXH domain is novel.
Ataxin-1 is known to be able to bind to RNA (but not to DNA) via the region defined by residues 541-767, including the whole AXH domain (Fig. 1) (11). Importantly, the binding was found to be affected by the length of the glutamine repeat. Sequence analysis failed to reveal any homology between this region and other known RNA-binding motifs (11). A more recent study demonstrated that, within this region, it is the AXH domain (residues 568 -689) that binds RNA homopolymers poly(rG) and poly(rU) (36). Here, we have shown that the AXH domain has an OB-fold. Many proteins with this topology are known to bind nucleic acids and, in particular, RNA. Ribosomal proteins S1 and L2, the Rho transcription termination factor, and translation initiation factor eIF1A all harbor RNA-binding OB-fold domains (41)(42)(43)(44). OB-fold proteins share a general ligand-binding surface (32, 33). When projected on the AXH domain, the corresponding OB-fold general binding site comprises strands ␤3, ␤4, and ␤5 and the loops between them. This site is not obstructed by dimerization and contains several positively charged and aromatic residues that are typically involved in protein-RNA interactions (Fig. 6A). These putative RNA-binding residues are not conserved in other AXH domains (Fig. 6B). It is not known if RNA binding is a general property of AXH domains.
There is a cluster of charged residues that are highly conserved among the AXH domains (Fig. 6, A and B) that could constitute a ligand-binding site. If this were the RNA-binding site, it would represent a novel form of nucleic acid recognition for an OB-fold.
The C-terminal region of ataxin-1, consisting of residues 539 -816, including the whole AXH domain, is responsible for interaction with a ubiquitin-specific protease, USP7 ( Fig. 1) (13). This interaction is, again, influenced by the length of the polyglutamine tract of ataxin-1. It has also been found that USP7 interaction is disrupted when the crucial dimerization residues (positions 563-582) of the AXH domain are removed. This suggests that either these residues are involved in binding USP7 or that the AXH domain interacts with USP7 as a dimer.
Implications for the Function of HBP1-HBP1 is a sequencespecific DNA-binding protein that can act as both a transcription activator and repressor. HBP1 is believed to function by remodeling chromatin. The three-dimensional structure of HBP1 is unknown, except for the small three-helix DNA-binding HMG box near the C terminus. Based on the structure reported here, it can be assumed that HBP1 also contains an OB-fold module. Recent work has shown that the AXH domain of HBP1 is monomeric and that dimerization is not a general function of AXH domains (36). The AXH domain of HBP1 overlaps with a known transcription repression domain. The putative second ligand-binding site is strongly conserved in the AXH domains of HBP1 and ataxin-1 (Fig. 6B), suggesting that they may bind to similar targets. It is possible that the AXH domain of HBP1 also binds RNA.
HBP1 is a target of the members of the retinoblastoma (RB) protein family (9). The AXH domain of HBP1 contains the sequence IXCXE (Fig. 6B), which is similar to the consensus RB protein-binding sequence, LXCXE. The structures of several proteins complexed to the RB protein have been determined (45,46). In all of these, the LXCXE motifs are in an extended conformation and are fully exposed. The region equivalent to the IXCXE motif in HBP1 in the ataxin-1 AXH domain shows a high degree of conservation (Fig. 6B), suggesting that the structure of this region will be similar in the two proteins. Based on this, it can be inferred that the IXCXE motif of HBP1 is buried and inaccessible to the RB protein (Fig. 7). It is more likely that RB protein binding occurs at a second site containing the LXCXE motif in the N-terminal portion of HBP1.
Conclusion-The structure of the AXH domain provides the first three-dimensional information on ataxin-1, a polyglutamine expansion disease protein. This has allowed functional data on the protein to be put into a structural context. The structure offers supporting evidence for an RNA-binding function for ataxin-1. Ataxin-2, another member of the polyglutamine expansion disease family, also has a region of sequence similarity to proteins involved in RNA splicing (47) and associates with an RNA-binding protein (48). This suggests that disruption of protein-RNA interactions might be a common feature of these diseases. There is considerable evidence that ataxin-1 is involved in the control of gene expression. Noncoding RNA molecules have been shown to play a role in transcription regulation (49), and it is possible that the AXH domain of ataxin-1 binds to a regulatory RNA. Given the role of HBP1 in transcription regulation, it is possible that the AXH domain of ataxin-1 has a similar role. It will be important to identify the physiological ligand(s) for these domains, as this will help to delimit the role of both ataxin-1 and HBP1 in the regulation of gene expression.
What does this structure tell us about the likely effects of polyglutamine expansion in SCA1 on ataxin-1 function and stability? The ataxin-1 AXH domain is remote from the glutamine repeat in sequence (Fig. 1). The length of the polyglutamine tract does not affect dimerization (12), presumably because the AXH dimer has high stability. The dimer has extensive complementary interfaces and is consistent with noncooperative thermal unfolding behavior (36). Interactions of ataxin-1 with RNA and USP7 are weakened as the glutamine repeat expands (11,13). We speculate that the polyglutamine tract is adjacent, in space, to the AXH dimerization domain. An expanded repeat may form aggregated local structures (e.g. ␤-sheet) that interfere with protein-protein or protein-RNA interactions, e.g. by blocking the respective binding sites, thereby affecting the normal cellular functions of ataxin-1.