Structure of the Chromo Barrel Domain from the MOF Acetyltransferase

histone H4 of the the structure of this a (cid:1) -barrel that is the (cid:2)(cid:3) (cid:1) chromo domain. the differences, there are simi-larities that support an evolutionary relationship between the two domains, we the name “chromo barrel.” The chromo domains may be divided into two groups, MSL3-like and MOF-like, on the basis of whether a group of conserved aromatic residues is present or not. The structure suggests that, although the MOF-like domains may have a role in RNA binding, the MSL3-like domainscouldinsteadbindmethylatedresidues.TheMOFchromo barrel shares a common fold with other chromatin-associated modules, including the MBT-like repeat, Tudor, and PWWP domains. structural

The chromo domain was originally defined as a region of homology between heterochromatin protein 1 (HP1) 3 and Polycomb (Pc) (1) and is found in many other chromatin-associated proteins (2,3). Later, it was recognized that the HP1 family of proteins also contain a second highly related chromo shadow domain (2)(3)(4), which unlike the chromo domain, proves to be dimeric (5,6). Structure determination of the chromo domain suggests that they might act as protein interaction modules (7), and recently, the canonical HP1 and Pc chromo domains have both been shown to recognize and specifically bind methyllysinemodified peptides (8 -12). The chromo shadow domain shares a common fold with the chromo domain, and it too has been found to be a protein interaction module (for a review, see Ref. 13). Interestingly, it binds peptides by sandwiching them between two ␤-strands, a feature it shares with the chromo domain, but it does so in a completely different way. When the chromo domains bind methylated peptides, they become sandwiched between two extended strands in the monomer (14,15). The chromo shadow domain, on the other hand, sandwiches the peptide between extended strands from the C-terminal tail of the two different subunits in the dimer (16) (for a recent review, see Ref. 17).
The structure of the chromo domain-methylated histone H3 complex showed that three aromatic residues are essential for recognition of either mono-, di-, or trimethylamino groups, and these residues are conserved in most chromo domains, implying that they too may be involved in methylated peptide binding (14,15). By contrast, other chromo domains do not contain these conserved aromatic residues, and they are likely to have a different function (14,15). For example, the Drosophila Mi-2 chromo domain has been shown to bind DNA (18). However, structure-guided sequence analysis indicates that a significant fraction of proteins, which are widely believed to contain chromo domains because they contain a short "chromo box" sequence motif (2), does not conform to this canonical fold (19,20). Examples of such proteins include members of the Drosophila dosage compensation complex: MSL3 and MOF. These proteins contain a putative chromo domain to the N terminus of the catalytic domain, which has been shown to be involved in interactions with RNA (21).
The dosage compensation mechanism in Drosophila is different from that in mammals. Genes in both female X chromosomes are actively expressed, but the single male X chromosome is transcribed at a 2-fold higher level to give an equal level of transcription in both sexes. This results from the differential expression of the Drosophila dosage compensation complex (see Scheme 1), which is known to include the MSL1, MSL2, MSL3, MLE, and MOF proteins, as well as two nontranslated RNAs, roX1 and roX2 (22). MOF is a histone acetyltransferase that acetylates histone H4 at Lys-16, and this modification is specific for the dosage compensation mechanism (23)(24)(25). In Schneider cells, association of both MOF and MSL3 to the X chromosome is sensitive to RNase treatment (21,26), and point mutations in the MOF chromo-related domain severely affect the interaction of MOF with RNA (21). Here we present the solution structure of the putative chromo domain from MOF, show that this new type of domain (chromo barrel) does not by itself bind RNA, and explore its structural and functional relationships with other proteins.

Expression and Purification of the MOF Chromo Barrel Domain-
The DNA sequence corresponding to amino acids 367-454 of Drosophila MOF was inserted into the NdeI and BamHI sites of the pET15b vector (Novagen). The vector was transformed into the expression strain Escherichia coli TUNER (DE3) pLacI (Novagen) and the protein purified using standard methods (see supplemental information).
NMR Spectroscopy and Structure Calculations-All NMR experiments were recorded at 25°C on Bruker DRX 600 and 800 MHz spectrometers equipped with 5-mm triple resonance H/C/N z-gradient probes. Samples used for structural work were all at pH 5 and 1 mM concentration and contained 10% D 2 O. Structure calculations were performed using crystallography NMR software, version 1.0 (27) and ARIA, version 1.1.2 (28). Further details are provided in the supplemental information.

RESULTS AND DISCUSSION
Definition of the Structured Domain-The fragment of MOF used in the structural studies ( Fig. 1a) was chosen based on an alignment of canonical chromo domain amino acid sequences together with a sequence alignment and structure prediction of the new family of chromo barrel domains. The chromo barrel domain contains the conserved chromo box motif (Fig. 1b), but in addition, it is possible to recognize a pattern of aliphatic residues that define a strand equivalent to ␤1 in the chromo domain. N-and C-terminal to this, there are other clusters of aliphatic residues that we thought might also be part of the structure. This sequence, residues 367-440, was then extended at the C terminus until it included a good number of closely spaced charged residues (which were unlikely to be structured), resulting in a final construct composed of MOF residues 367-454.
This construct was used to produce protein for NMR experiments. The domain was found to be monomeric by analytical ultracentrifugation (data not shown). Due to its high level of stability, it was possible to obtain a large number of structural restraints, and the resulting ensemble of structures has a high degree of precision (TABLE ONE). Fig. 2a shows an overlay of the 30 lowest energy structures aligned over the secondary structural elements; the root mean square deviation over the backbone atoms of the well defined part of the construct (residues 373-443) was 0.7 Å. The backbone dynamics of the structure was investigated by measuring the 15 N T 1 and T 2 relaxation rates and the 1 H N -15 N heteronuclear NOE. These experiments defined the structured region as residues 374 -437, with residues N-terminal to 370 and C-terminal to 441 being highly unstructured (data not shown). A model free analysis of the T 1 and T 2 relaxation rates suggested that the domain had an overall tumbling time of 7.2 ns, and the S 2 values also suggested that the well defined region comprised residues 374 -438 (S 2 Ͼ 0.8 for data recorded on a 600-MHz spectrometer). However, a careful analysis of the NOE data showed that unambiguous NOEs that connected residues 439 -443 to the core of the structure could be observed. It could be concluded, therefore, that these residues were probably only in this conformation transiently. Likewise, at the N terminus, there were NOEs connecting residues 370, 371, and 373 to the core of the structure, and these residues also probably only interacted transiently. In conclusion, the structured domain comprised residues 370 -443, but the N and C termini only interacted transiently with the core of the protein.
The Fold of the Chromo Barrel Domains-The structure of the MOF chromo barrel domain is virtually all ␤. It consists of five ␤-strands (␤1, residues 383-386; ␤2, residues 392-401; ␤3, residues 411-416; ␤4, residues 425-428; and ␤5, residues 432-433) organized into two sheets of three strands with one strand, ␤2, spanning the two sides of the domain. Strands ␤5, ␤1, and ␤2 form one side of the barrel and strands ␤2, ␤3, and ␤4, the other. A bulge in strand ␤2 (residues 398 -399) helps provide the required geometry (Figs. 1b and 2). The ␤-barrel comprises amino acids 383-433, and the alignment clearly shows that the ␤-strands are conserved among the chromo barrel domains (Fig. 1b). However, as discussed above, the NOE data suggest that residues 370 -443 are structured. The residues N-terminal to the ␤-barrel form a turn-like structure, with Ile-373 involved in forming a hydrophobic lid capping the ␤-barrel (Fig. 2c). This turn only forms transiently in the MOF domain, but analysis of the sequence alignment of the chromo barrel domains shows that a pair of large hydrophobic residues and one with a small side chain are conserved in the N terminus (see residues in green in Fig. 1b). This hydrophobic pair is typically separated by only two other residues, compared with the six in MOF, predicting a more stable hairpin turn in the chromo barrel domains from other proteins (e.g. MSL3). The residues C-terminal to strand ␤5 form one turn of the helix in the MOF domain but are not conserved in the chromo barrel family.
Comparison of the Chromo and Chromo Barrel Domains-It is clear that the fold of the MOF chromo barrel domain is different from that of the chromo domain. The C-terminal helix of the chromo domain is absent in the chromo barrel, whereas two of the five ␤-strands are not present in the chromo domain (see Figs. 1b and 3). The residues conserved between the two types of domains, the originally defined chromo box motif, correspond to the ␤-hairpin formed by strands ␤2 and ␤3 in the chromo domain and strands ␤3 and ␤4 in the chromo barrel. The structural relationship between the chromo and chromo barrel domains is supported by a further feature: Arg-401 in ␤2 and Trp-426 in ␤4 stack against each other on the surface contributing to domain stability. These residues are conserved in a number of chromo and chromo barrel domains, supporting an evolutionary relationship between the two domains. For example, in the HP1, Drosophila Pc, and mouse M33 4 chromo domains, the structures reveal exactly the same interaction. The tryptophan is conserved in most of the domains, but the arginine is often absent (Fig. 1b). In most cases where this arginine is absent, the amino acid corresponding to Ser-400 is a large hydrophobic residue, and this may instead pack against the Trp. It is also possible that interactions with the conserved Tyr/Phe at position 412 compensate for the loss of the Arg to Trp stacking.
The three histidine residues in the domain are all involved in stabilizing the structure. The chemical shifts of the N ⑀2 proton in the imidazole side chains show that, at pH 5, His-393 and -430 are still partly deprotonated. The chemical shift changes of 1 H N -15 N pairs in heteronuclear single quantum correction spectra recorded at pH values of 5, 6, 7, and 8 also allowed an estimation of the pK a values of the histidines: His-393, pK a ϳ 6; His-430, pK a ϳ 6.5; and His-415, pK a ϳ 7.5. His-415 and Asp-424 form a salt bridge that fixes the ends of the loop between ␤3 and ␤4. This salt bridge is likely to be conserved among most of the chromo barrel and some of the chromo domains (Fig. 1b).
Strikingly, however, when the chromo barrel domain is compared with that of the HP1 chromo domain in complex with the Lys-9-methylated N-terminal tail of histone H3 (H3-K9), there is a much higher degree of structural identity. In the MOF chromo barrel domain, the strand corresponding to the histone peptide in the chromo domain complex is provided by the N terminus of the protein, similar to the situation with the archaeal proteins Sac7d/Sso7 (29, 30) (see Fig. 4). This interaction prevents "chromo domain-like" peptide interactions. When the structures are aligned over the five ␤-strands, the root mean square deviation over the backbone atoms is only 1.3 Å. In effect, therefore, the chromo barrel structure is an "auto-inhibited" chromo domain, where methylated peptide binding is prevented by interactions of the protein with its own N-terminal tail.
Function of Chromo Barrel Domains-The structure of the chromo domain-histone H3-K9 complex (14,15) shows that binding of the dimethylamino group of the lysine requires a cluster of aromatic residues on the surface of the protein. However, these three critical aromatic residues in the chromo domain are not conserved in the MOF chromo barrel domain; MOF has His-393, Tyr-416, and Leu-419, whereas HP1 has Tyr-21, Trp-42, and Phe-45 (see Figs. 1b and 3b). In addition, superposition of the two structures shows that the MOF residue Arg-387 would clash with the methylated Lys-9 side chain in the chromo domain-histone H3 complex (Fig. 3b). Consequently, binding of methylated residues is prevented in the MOF chromo barrel domain.
The residues corresponding to the methyllysine binding aromatic box are, however, present in a subset of the chromo barrel domains (Fig.   FIGURE 1. a, construct of the MOF chromo barrel domain (CBD) used in this work. HAT, histone acetyltransferase zinc finger domain. b, structure-based sequence alignment and secondary structure of protein families related to the chromo domain. In each case, the secondary structure is indicated by green arrows for ␤-strands and blue cylinders for helices. i, proteins homologous to the MOF chromo barrel domain. These are divided into the MOF and MSL3 families based on the absence or presence of the aromatic pocket residues, respectively. ii, chromo domains with known structures. The thin red arrows indicate short ␤-strands, which only form in the HP1 chromo domain-histone H3 peptide complex. iii, chromo shadow domains with known structures. In each case (i-iii), the yellow shading indicates conserved residues, which are important for stabilizing the ␤-sheet structure. The red shading indicates the aromatic pocket residues that bind the dimethylamino group in the HP1 chromo domain-histone H3 peptide complex, and cyan shading indicates the equivalent positions in the MOF and MSL3 families. (The position of a fourth aromatic residue, conserved in the MSL3 chromo barrels and the Pc chromo domains, is shaded in a lighter shade of cyan.) The purple shading indicates residues in the MOF chromo barrels that are structurally equivalent to H3-K9 in the HP1 chromo domain-histone H3 complex, whereas green shading indicates residues that might form a potential conserved turn in the MOF and MSL3 families. The black line shows the chromo box sequence motif originally used in sequence data base searches to identify chromo domains based on primary structure. In both a and b, the mutations in the chromo barrel domain that affect the interaction of MOF with RNA (21) are shown by arrows. 1b). On this basis, the chromo barrel domains can be classified into MOF and MSL3 subfamilies. The MSL3 proteins have Tyr, Phe/Tyr, and Trp conserved in the positions corresponding to 21, 42, and 45 in the HP1 chromo domain. Moreover, in the MSL3 subfamily domains, Arg-387 in the MOF domain is replaced by a conserved histidine (Fig.  1b), and the conserved residues of the well defined loop between strands ␤1 and ␤2 suggest a different structure in this region. It is therefore possible that the exposed aromatic box provides a binding site for a methylated residue in the MSL3 (but not the MOF) group of chromo barrel domains.
Previous studies of MOF have shown that the interaction of MOF and MSL3 is sensitive to RNase treatment. Furthermore, it has been shown that point mutations in the conserved Tyr-416/Trp-426 abolish RNA binding (21). Using RNA electrophoretic mobility shift assays we found, however, that the MOF chromo barrel domain is not sufficient for binding either a randomly chosen fragment of roX1 RNA or other nonspecific RNA sequences (data not shown). Moreover, even very weak nonspecific binding could not be detected at much higher concentrations by NMR (data not shown). These results suggest, therefore, that the MOF chromo barrel domain is necessary (but not sufficient) for the interaction of MOF with RNA.
The structure suggests that mutation of Tyr-416/Trp-426 is likely to affect the folding of the domain. Interestingly, these residues are adjacent to a conserved positively charged patch, the NRRL sequence in the loop connecting strands ␤3 to ␤4, which may be involved in the interaction with RNA. However, other MOF sequences are clearly also required. The involvement of residues outside the chromo barrel domain in RNA binding is also supported by the fact that acetylation of Lys-116 in MSL3 (a residue that is outside the chromo barrel domain) interferes with binding to roX2 RNA both in vitro as well as in coimmunoprecipitation experiments from cells (26). In future work, we will need to identify the precise binding site on roX1 or roX2 RNA and then define which fragment of MOF (or the MSL complex) is sufficient for RNA binding.
Evolutionary Relationship of Chromo-related Domains-The structure of the chromo barrel domain provided a probable evolutionary link between the chromo and chromo shadow domains, on the one hand, and the MBT repeats, the Tudor, and the PWWP domains, on the other. At least three different members of this new superfamily have similar functions. Both the HP1 and Pc chromo domains, the tandem Tudor domains of the 53BP1 protein (31), and the lethal (3) MBT protein 5 have all been shown to recognize a specific modification in histones, methylated lysine. The different domains recognize this modification in vari-5 W. Fischle, personal communication.

Experimental restraints and structural statistics
The loop between strands ␤3 and ␤4 is less well ordered than the rest of the structure. The cross peaks corresponding to residues 403 and 410 in the I H-15 N heteronuclear single quantum correlation spectrum are broadened indicative of slow time scale motions, and although the loop appears well defined in the ensemble, the Ramachandran plot reveals that several of the residues are constrained to unfavorable conformations, probably due to averaging of measured NMR parameters. It is however, likely that this loop is not completely flexible, because the values of the cross-correlated cross-relaxation rates (see "Materials and Methods") are not averaged, as is the case for residues in the N-or C-terminal regions. (computed over residues 374 -438) for the ensemble. b ͗SA͘ c represents values for the structure that is closest to the mean. c The Lennard-Jones potential was not used at any stage in the refinement. ous peptide contexts, and they therefore need not (and likely do not) share a common peptide-binding site. They do, however, share a putative functional site with each other and with other superfamily members.
Despite the fact that they have now diverged, the chromo and chromo-barrel domains, as well as the MBT repeat and Tudor domains, all contain a similar cluster of aromatic residues that, in the HP1 and Pc chromo domains, recognize methylated lysine (see Figs. 1b and 4). In each of these families, these aromatic residues are not always conserved suggesting that they are not essential for structure or stability. However, in each family, many members retain aromatic residues in all three of the crucial methyllysine-binding positions found in the HP1 and Pc chromo domains. In addition, two of the three aromatic sites are conserved in some PWWP domains, and these have an alternative site for a third conserved aromatic residue. Furthermore, a fourth aromatic position, which has recently been shown to be important for methyllysine binding in yeast CHD1 (32), is also conserved in the MSL3-like chromo barrels, the Pc chromo domains, and some of the MBT repeats.
Strands ␤2 and ␤3 in the chromo domain or ␤3 and ␤4 in the chromo barrel domain together with the connecting loop that carries two of the three key aromatic residues correspond to the originally defined chromo box motif (Fig. 1b). The structure of this chromo box motif has been retained in evolution, because it contains a number of conserved interactions, in particular the conserved stacking interaction involving Trp-426 (Fig. 1b). In the MBT repeat and Tudor domain, the loop has a somewhat different conformation, but the aromatic residues are in similar positions (Fig. 4). This suggests that this aromatic residue pocket was present in a common ancestor, but is not now required in all of these proteins. The hypothesis that members of all branches of this superfamily of proteins might be involved in binding methylated residues is supported by a recent study of the 53BP1 protein (31). There the conserved aromatic clusters in the tandem Tudor domains in 53BP1 were found to have a role in the recognition of methylated Lys-79 of histone H3 (H3-K79). Although H3-K79 probably binds in the cleft between the two Tudor domains, i.e. not as H3-K9 does to the chromo domain, its methyl group(s) does likely bind the aromatic site in the N-terminal domain. An interaction of this type has also recently been identified in the lethal (3) MBT protein. 3 In addition to the aromatic site, in the chromo barrel, MBT repeat, and Tudor domains, the C terminus forms an extended interaction down one side of the barrel, whereas the N terminus forms a stabilizing hook that caps the ␤-barrel (Fig. 4). It is these common structural features, together with the conserved aromatic site, that suggests that these proteins are evolutionarily related to each other and which sets them apart from other similar ␤-barrel folds, e.g. the SH3 domains. As well as revealing the fold of a distinct family of chromo barrel domain proteins, the structure also suggests a probable evolutionary pathway between the chromo barrel and canonical chromo domains involving a dramatic perturbation of structure, the extension of the C-terminal ␣-helix and loss of the N-terminal ␤-strand, resulting in the formation of the chromo domain peptide-binding site. By contrast, the dimeric chromo shadow domain, with its dimerization interface harboring a new peptide-binding site, has probably evolved from the canonical chromo   Fig. 2 with the remainder of the backbone in gray, apart from the loop connecting strands ␤2 and ␤3, which is yellow. In the HP1 chromo domain-histone H3 complex, the H3 peptide is black, and the methyl-K9 is purple. In the HP1␤ chromo shadow domain-CAF-1 p150 complex, one of the monomers is shaded in softer tones, and the CAF-1 peptide is pink. Sso7 is also shown in its complex with DNA. Strands that are equivalent to those in the MOF chromo barrel are shown using the same color scheme, and the positions of the conserved aromatic residues are indicated by spheres. The Protein Data Bank accession numbers for the structures used in the Fig. are  domain by a more conventional pathway of point mutation, insertion, and deletion. This has resulted in a change in conformation of the loop carrying the two critical aromatic residues, as well as an insertion of two residues in the loop connecting the ␤-sheet to the ␣-helix, which disrupts the binding site for methylated peptides in chromo shadow domains. The similarity of the chromo barrel and chromo domains to the common fold of the MBT repeat, Tudor, and PWWP domains supports the hypothesis that all of these "Royal Family" domains share a common ancestry (33,34,35).