The Methyltransferase NSD3 Has Chromatin-binding Motifs, PHD5-C5HCH, That Are Distinct from Other NSD (Nuclear Receptor SET Domain) Family Members in Their Histone H3 Recognition*

Background: NSD proteins are histone H3K36 methyltransferases linked to multiple human diseases. Results: Crystal structures of chromatin-binding motifs, PHD5-C5HCH, of NSD3 bound to the H3 N-terminal peptide were solved. Conclusion: PHD5-C5HCHs of the NSD family are conserved in their overall structure yet variant in H3 recognition. Significance: Their variant recognitions may play a role in the localization of NSD proteins to different genomic sites. The NSD (nuclear receptor SET domain-containing) family members, consisting of NSD1, NSD2 (MMSET/WHSC1), and NSD3 (WHSC1L1), are SET domain-containing methyltransferases and aberrant expression of each member has been implicated in multiple diseases. They have specific mono- and dimethylase activities for H3K36, whereas play nonredundant roles during development. Aside from the well characterized catalytic SET domain, NSD proteins have multiple potential chromatin-binding motifs that are clinically relevant, including the fifth plant homeodomain (PHD5) and the adjacent Cys-His-rich domain (C5HCH) located at the C terminus. Herein, we report the crystal structures of the PHD5-C5HCH module of NSD3, in the free state and in complex with H31–7 (H3 residues 1–7), H31–15 (H3 residues 1–15), and H31–15K9me3 (H3 residues 1–15 with trimethylation on K9) peptides. These structures reveal that the PHD5 and C5HCH domains fold into a novel integrated PHD-PHD-like structural module with H3 peptide bound only on the surface of PHD5 and provide the molecular basis for the recognition of unmodified H3K4 and trimethylated H3K9 by NSD3 PHD5. Structural studies and binding assays show that differences exist in histone binding specificity of the PHD5 domain between three members of the NSD family. For NSD2, the PHD5-C5HCH:H3 N terminus interaction is largely conserved, although with a stronger preference for unmethylated H3K9 (H3K9me0) than trimethylated H3K9 (H3K9me3), and NSD1 PHD5-C5HCH does not bind to H3 peptides. Our results shed light on how NSD proteins that mediate H3K36 methylation are localized to specific genomic sites and provide implications for the mechanism of functional diversity of NSD proteins.

The NSD (nuclear receptor SET domain-containing) family members, consisting of NSD1, NSD2 (MMSET/WHSC1), and NSD3 (WHSC1L1), are SET domain-containing methyltransferases and aberrant expression of each member has been implicated in multiple diseases. They have specific mono-and dimethylase activities for H3K36, whereas play nonredundant roles during development. Aside from the well characterized catalytic SET domain, NSD proteins have multiple potential chromatin-binding motifs that are clinically relevant, including the fifth plant homeodomain (PHD5) and the adjacent Cys-Hisrich domain (C5HCH) located at the C terminus. Herein, we report the crystal structures of the PHD5-C5HCH module of NSD3, in the free state and in complex with H3 1-7 (H3 residues 1-7), H3 1-15 (H3 residues 1-15), and H3 1-15 K9me3 (H3 residues 1-15 with trimethylation on K9) peptides. These structures reveal that the PHD5 and C5HCH domains fold into a novel integrated PHD-PHD-like structural module with H3 peptide bound only on the surface of PHD5 and provide the molecular basis for the recognition of unmodified H3K4 and trimethylated H3K9 by NSD3 PHD5. Structural studies and binding assays show that differences exist in histone binding specificity of the PHD5 domain between three members of the NSD family. For NSD2, the PHD5-C5HCH:H3 N terminus interaction is largely conserved, although with a stronger preference for unmethylated H3K9 (H3K9me0) than trimethylated H3K9 (H3K9me3), and NSD1 PHD5-C5HCH does not bind to H3 peptides. Our results shed light on how NSD proteins that mediate H3K36 methylation are localized to specific genomic sites and provide implications for the mechanism of functional diversity of NSD proteins.
Histone post-translational modifications (PTMs) 4 have an important role in regulating chromatin dynamics and functions. The PTMs act either sequentially or in combination to form "histone codes", which are recognized by protein modules, referred to as effectors or "readers" (1). Recognitions of readers to histone codes recruit or stabilize various chromatinassociated proteins or complexes (2). In addition, many readercontaining proteins or complexes also possess "writer" or "eraser" modules that further modify the epigenetic landscape by writing or erasing PTMs.
The NSD family of SET domain-containing histone methyltransferases includes NSD1, NSD2 (also known as WHSC1, Wolf-Hirschhorn syndrome candidate 1 or MMSET, multiple myeloma SET), and NSD3 (also known as WHSC1L1, WHSC1 like 1). All three NSD proteins have been directly linked to multiple human diseases. Haploinsufficiency and point mutations of NSD1 cause human Sotos syndrome, a childhood developmental disease (3,4). NSD2 haploinsufficiency is implicated in the pathogenesis of Wolf-Hirschhorn syndrome, a disease characterized by severe growth retardation, mental as well as cardiac defects (5). Furthermore, each NSD family member behaves as an oncogene. Translocations resulting in NUP98 fusion to NSD1 and NSD3 lead to the development of acute myeloid leukemia (6,7). The t(4;14)(p16.3;q32) translocation, which results in overexpression of NSD2, is responsible for about 15% of multiple myeloma (8,9). Moreover, overexpres-sion of NSD3 via gene amplification occurs in about 15% of breast cancer cell lines (10).
Despite the physiologic importance of NSD family proteins, their mechanisms of action are only beginning to become elucidated (11). A striking feature of the three NSD proteins is that they are highly similar within a block of about 700 amino acids, which contains a catalytic SET domain with its pre-(associated with SET) and post-domains, a PWWP (named after the conserved motif Pro-Trp-Trp-Pro in NSD2) domain, five PHD (plant homeodomain) fingers, and an NSD-specific Cys-His rich domain (C5HCH) (Fig. 1A) (10). However, the similar domain architecture of the three NSD family members does not indicate a functional redundancy (10,12). For example, NSD1 defective mice exhibit embryonic lethality, but mice defective in NSD2 die shortly after birth and exhibit a spectrum of defects that are consistent with Wolf-Hirschhorn syndrome (13,14). It has been suspected that functional diversity of the three NSD proteins might originate from different substrate specificities of their SET domains (15). Although there have been several conflicting reports on the catalytic activities of NSD proteins, it has become clear that all three members are highly specific H3K36 mono-and dimethylases when nucleosomes are used as their substrates (16 -18). The rationale for this functional diversity between the three NSD members remains a mystery.
H3K36 methylation has been implicated in multiple biological processes, and the modifications biological consequences are largely determined by where in the gene the methylated H3K36 mark is placed (12). Because the NSD proteins also contain several histone readers, one attractive possibility is that these readers target the NSD H3K36 methyltransferases to their specific gene sites, and thus lead to distinct biological outcomes. Several reports have revealed that NSD family members are possibly recruited to specific gene loci. NSD1 binds upstream of the bone morphogenetic protein 4 promoter, enforces H3K36 methylation levels within this region, and thus promotes bone morphogenetic protein 4 transcription (19). NSD3, LSD2 (an H3K4-specific demethylase, lysine demethylase 2, also known as KDM1B), and G9a (an H3K9-specific methyltransferase, also known as EHMT2) form complexes in vivo, predominantly localize to the gene bodies of actively transcribed genes and may coordinate modifications at H3K4, H3K9, and H3K36 during elongation for optimal transcription (20). So far, great efforts have been made to understand the substrate specificities of the enzymes, but little structural and functional information is known about the reader domains.
It has been reported that the PHD5-C5HCH module of NSD1 (PHD5-C5HCH NSD1 ) was the sole region required for tight binding of the NUP98-NSD1 fusion protein to the HoxA9 gene promoter, implicating that PHD5-C5HCH NSD1 might have chromatin targeting ability. In this study, we demonstrate that the PHD5-C5HCH module of NSD3 (PHD5-C5HCH NSD3 ) recognizes the H3 N-terminal peptide containing unmodified K4 and trimethylated K9. We have solved high resolution crystal structures of PHD5-C5HCH NSD3 in its free state and in complex with unmodified H3 residues 1-7 (H3 1-7 ), residues 1-15 (H3 1-15 ), and residues 1-15 with trimethylation on lysine 9 (H3 1-15 K9me3), respectively. These structures reveal an integrated tandem PHD-PHD-like fold with H3 peptide bound only on the surface of PHD5 and provide the structural basis for peptide recognition by PHD5 of NSD3. Further mutagenesis and peptide binding assays show that PHD5-C5HCH NSD1 does not possess binding ability to H3 and the PHD5-C5HCH module of NSD2 (PHD5-C5HCH NSD2 ) prefers to bind the H3 N-terminal peptide containing unmodified K4 and K9. This is likely due to a minor sequence change at the H3 binding surface. Our findings suggest that PHD5-C5HCHs of the NSD family are conserved in the overall structure but vary in H3 recognition. These variant recognitions may play a role in localization of these H3K36 methyltransferases to different genome sites and are consistent with the distinct and nonredundant functions of NSD proteins.

EXPERIMENTAL PROCEDURES
Cloning, Expression, and Purification-Human PHD5-C5HCH NSD1 (2115-2220), PHD5-C5HCH NSD2 (1232-1331), and PHD5-C5HCH NSD3 (1310 -1413) were amplified from the human brain cDNA library by PCR, and cloned into modified pET-28a(ϩ) (Novagen) plasmids (for NMR, ITC, and crystallization) or pGEX-4T1 vectors (for GST pulldown and peptide pulldown assays). The modified pET-28a(ϩ) plasmid contains an N-terminal His 6 tag, and a tobacco etch virus protease cleavage site. All of the mutants were generated by the Mutant Best kit (Takara), and verified by DNA sequencing. The recombinant proteins were produced in Escherichia coli BL21(DE3) cells (Novagen). Cells were cultured in LeMaster and Richards minimal medium (LR medium), supplemented with 100 M ZnSO 4 . For stable isotope labeling, LR medium was supplemented with 15 NH 4 Cl and [ 13 C 6 ]glucose as the sole nitrogen and carbon sources, respectively. The protein expression was induced when A 600 reached around 0.8 with 0.1 mM isopropyl 1-thio-␤-D-galactopyranoside at 16°C for 24 h. The expressed His-tagged proteins were purified using a nickel-chelating column (Qiagen), followed by tobacco etch virus cleavage, sizeexclusion chromatography on a Hiload 16/60 Superdex 75 column (GE Healthcare), and anion exchange chromatography on a MonoQ HR 5/5 column (GE Healthcare). The GST fusion proteins were purified by glutathione-Sepharose (GE Healthcare), and further purified by size exclusion chromatography on a Hiload 16/60 Superdex 200 column (GE Healthcare). The purified proteins were dialyzed against Buffer A (20 mM Bis-Tris-HCl, pH 7.0, 150 mM NaCl), and concentrated for subsequent analysis.
Peptide Synthesis-All of the peptides used in the study were synthesized by GL Biochem (Shanghai). The lyophilized synthesized peptides (purity Ͼ98%, verified by Mass Spectrometry) were carefully weighted and dissolved in Buffer A, and the pH of each sample was adjusted to a value of 7.0.
Peptide Pulldown Assays-16 g of purified GST fusion proteins were incubated with biotinylated histone peptides (2 g each) for 2 h at 10°C in Buffer A, and the fusion proteins were also incubated without peptides as negative controls. Then 50 l of 50% streptavidin beads (Thermo Scientific) were added and incubated for 1 h. After incubation, the beads were washed 3 times using Buffer A supplemented with 0.1% Triton X-100, and then the bound materials were subjected to SDS-PAGE and Coomassie Blue staining.
Calf Thymus Histone Binding Assays-The binding assays were performed as previously described (21). GST fusion proteins were bound to glutathione-Sepharose beads (GE Healthcare), and the beads were subsequently incubated with calf thymus histones (Worthington) in buffer containing 50 mM Tris-HCl, pH 7.5, 1 M NaCl, 1% Nonidet P-40. After a rigorous wash, the bound materials were subjected to SDS-PAGE and Coomassie Blue staining.
Peptide Arrays-A peptide array comprising 384 peptide spots purchased from Active Motif (catalog number 13001) was utilized and the detailed information can be obtained from the company website. The array was blocked overnight in 1ϫ TBS-T (0.1%) containing 5% BSA. The array was then washed 3 times in 1ϫ TBS, and incubated with 50 nM purified GST-PHD5-C5HCH NSD3 in Buffer A for 1 h at room temperature. After the reaction, the membrane was washed 3 times in 1ϫ TBS-T, and incubated with goat anti-GST antibody (Sigma) for 1 h at room temperature. The unbound antibody was then washed three times with 1ϫ TBS-T, and the array was incubated with horseradish peroxidase-conjugated anti-goat IgG antibody for 1 h at room temperature. After several washes in 1ϫ TBS-T, the membrane was submerged in an ECL Western blotting reagent kit (Thermo Scientific) and the resulting image was captured with ImageQuant LAS 4000 mini-biomolecular imager (GE Healthcare).
Isothermal Titration Calorimetry-ITC experiments were carried out on a MicroCal iTC200 calorimeter (GE Healthcare) at 25°C. The protein concentrations of the untagged PHD5-C5HCH NSD3 and its mutants were determined photometrically and diluted to 200 M. The concentration of each peptide was 6 mM. A reference measurement (peptide injected into buffer) was carried out to compensate for the heat of dilution of each peptide. Curve fitting to a one-binding site model was performed by ITC data analysis module of Origin 7.5 (Origin Lab Corporation) provided with the iTC200 calorimeter.
NMR Experiments-The purified 13 C, 15 N-labeled PHD5-C5HCH NSD3 was dissolved to a final concentration of 0.6 mM in 500 l of Buffer A containing 10% D 2 O. All the NMR experiments were carried out on a Bruker DMX600 spectrometer or Varian 700 MHz spectrometer at 298 K.
The following spectra were acquired to obtain backbone resonance assignments: two-dimensional, 1 H-15 N HSQC; three-dimensional sets: HNCA, HNCOCA, CBCA(CO)NH, CBCANH. All of the NMR spectra were processed by NMRPipe (22) and the resonances were assigned with Sparky (T. D. Goddard and D. G. Kneller, University of California, San Francisco, CA).
For NMR titration experiments, 0.25 mM 15 N-labeled PHD5-C5HCH NSD3 was prepared in Buffer A containing 10% D 2 O. Lyophilized H3 peptide was dissolved in the NMR buffer of identical composition, and added stepwise with a sample dilution of less than 10%. 1 H-15 N HSQC spectra were recorded to follow chemical shift perturbations at each titration point. Combined chemical shift perturbations and dissociation constants were estimated as previously described (23).
A sample containing 0.6 mM 15 N-labeled PHD5-C5HCH NSD3 was prepared for 15 N relaxation measurements performed on a Bruker DMX 600 spectrometer as described (24). Data analysis and exponential curve fitting were performed utilizing NMRPipe and Sparky.
Crystallization Conditions, Data Collection, and Structure Determination-All the crystals were grown using the hanging drop vapor diffusion method at 16°C. Crystals of PHD5-C5HCH NSD3 in the free form were grown by mixing equal volumes of 15 mg ml Ϫ1 of protein in Buffer A and crystallization buffer (0.1 M Hepes, pH 7.0, 1.1 M sodium malonate, pH 7.0). Crystals of the PHD5-C5HCH NSD3 ⅐H3 1-7 complex were obtained by incubating 15 mg ml Ϫ1 of protein with peptide at a 1:3 ratio, followed by mixing an equal volume of reservoir solution containing 0.1 M Tris-HCl, pH 7.0, 30% (w/v) PEG-3000, and 0.2 M NaCl. For crystallization of the H3 1-15 complex, the peptide was incubated with PHD5-C5HCH NSD3 at a 2:1 ratio to a complex concentration of about 12 mg ml Ϫ1 , and then the complex was mixed with an equal volume of crystallization buffer (0.1 M Hepes, pH 7.5, 70% (v/v) MPD). Crystals of the H3 1-15 K9me3 complex were prepared by mixing the peptide with 12 mg ml Ϫ1 of proteins at a 2:1 ratio, followed by mixing with an equal volume of crystallization buffer (0.1 M Mes, pH 6.5, 20% PEG3350). Reservoir solutions with the addition of 25% glycerol were used as cryoprotectants for all four crystals mentioned above.
Data sets for all crystals mentioned above were collected on beam line 17U at Shanghai Synchrotron Radiation Facility (SSRF). A zinc SAD data set was collected from a single crystal of free PHD5-C5HCH NSD3 at 1.278 Å. The data set was integrated with iMosflm (25) and scaled with SCALA (26) from the CCP4 program suite (27). Twinning was performed utilizing Ctruncate and revealed a perfect merohedral twinning of the free PHD5-C5HCH NSD3 data set, and the twinning hampered the successful zinc SAD phasing. A data set of PHD5-C5HCH NSD3 complex with H3 1-7 was collected at 0.979 Å without twinning. Despite the fact of the wavelength being remote from the zinc edge, by using SHELX C/D/E (28), all four expected endogenous zinc atom positions were unambiguously determined, and the resulting experimental phases could be calculated. An initial model was automatically built by ARP/ wARP (29), and was further built and refined using Coot (30) and Refmac (31), respectively. The other two complex structures and the free PHD5-C5HCH NSD3 were solved by molecular replacement using the program MOLREP (32) employing the PHD5-C5HCH NSD3 ⅐H3 1-7 structure as the search model. The free PHD5-C5HCH NSD3 structure was refined with twinning-based refinement implemented in Refmac. This twinningbased refinement gave good electron density (supplemental Fig.  S6). For the high resolution data sets of PHD5-C5HCH NSD3 ⅐H3 1-7 (1.47 Å) and PHD5-C5HCH NSD3 ⅐H3 1-15 (1.55 Å), anisotropic B-factor refinement of protein/ligand heavy atoms (C/N/O or heavier), together with isotropic B-factor refinement of waters were applied in Refmac during the later stages of the refinement process. Procheck (33) was utilized for the validation of protein model geometry and the Ramachandran plots show that all the residues of the four structures are within the allowed regions. Crystal diffraction data and refinement statistics are displayed in Table 1.

RESULTS
PHD5-C5HCH NSD3 Recognizes Unmodified K4 and Trimethylated K9 in Histone H3-To identify whether the PHD5-C5HCH region could bind to histones, we performed GST pull-down assays with calf thymus histones using recombinant GST fusion proteins corresponding to the PHD5-C5HCH modules of human NSD1, NSD2, and NSD3, respectively. We detected substantial binding of PHD5-C5HCH NSD2 and PHD5-C5HCH NSD3 with histone H3, whereas almost no binding could be detected for PHD5-C5HCH NSD1 (Fig. 1B). A small quantity of H4 appeared to also be pulled down, which may be caused by the formation of the heterotetramer with H3 and being simultaneously pulled down (Fig. 1B).
The interactions of PHD5-C5HCH NSD3 with H3 tail peptides were further quantitatively analyzed by ITC as well as nuclear magnetic resonance (NMR) spectroscopy ( Fig. 1, C-G). The ITC measurements showed that PHD5-C5HCH NSD3 binds the unmodified H3 1-12 and H3 1-15 peptides, with similar binding affinity (supplemental Table S1). Shortening of the H3 tail to residues 1-7 significantly reduced this interaction, indicating that the H3 8 -12 region is directly involved in the interaction (supplemental Table S1). Moreover, the binding of PHD5-C5HCH NSD3 to histone H3 1-15 was almost completely abolished by mono-, dimethylation, or acetylation at H3K4 (Fig. 1, C and D, and supplemental Table S1). By contrast, di-and trimethylation at H3K9 slightly improved the binding affinity by a factor of ϳ1.2 and 1.6, respectively (Fig. 1, C and D, and supplemental Table S1).
PHD5-C5HCH NSD3 Folds as an Integrated Structural Unit, and Only PHD5 Is Responsible for H3 Tail Peptide Recognition-In NMR chemical shift perturbations, large perturbations or disappearances of signals were observed only for residues of PHD5, which suggested that the PHD5 but not C5HCH domain might be involved in the binding with H3 tail (supplemental Fig. S4A). In addition, the 15 N relaxation data implied that these two domains might fold as an integrated structural unit (supplemental Fig. S5).
To better understand the structural basis for H3 binding by PHD5-C5HCH NSD3 , we solved the crystal structures of PHD5-C5HCH NSD3 in its free state (2.27 Å) and in complex with unmodified H3 1-7 (1.47 Å), H3 1-15 (1.55 Å), and H3 1-15 K9me3 (1.73 Å) with the structural statistics listed in Table 1. The crystal structures demonstrate that the PHD5 and C5HCH domains of NSD3 indeed fold dependently as a single structural unit, coordinating four zinc atoms in all, whereas only PHD5 is responsible for H3 peptide binding (Fig. 2, A-C). PHD5 adopts a canonical PHD finger fold, which consists of a two-strand anti-parallel ␤-sheet (␤1 and ␤2) followed by two 3 10 -helixs (Fig. 2, A and B), and is stabilized by two zinc atoms coordinated by the Cys 4 -His-Cys 2 -His motif in a cross-brace topology ( Fig. 2A and supplemental Fig. S7). Interestingly, the C5HCH domain folds into a compact module containing two small ␤-hairpins: one positioned at its core (␤3 and ␤4), and the other positioned (␤5 and ␤6) close to the C terminus (Fig. 2, A and B). The C5HCH finger is also stabilized by two zinc atoms in a cross-brace topology, anchored by the novel Cys 5 -His-Cys-His motif (Fig. 2, A and B, and supplemental Fig. S7). A structurebased homology search using DALI (34) shows that the C5HCH finger structurally resembled both the PHD and FYVE fingers. Actually, at the protein fold level, the C5HCH finger is more similar to the FYVE domain due to having a second ␤-hairpin close to the C terminus. However, it lacks characteristic amino acids in the FYVE domain for lipid binding (supplemental Fig.  S8A); conversely, the C5HCH finger has been implicated in chromatin-associated processes that most PHD domains function in, so we still termed the C5HCH domain a PHD-like finger.
The compact fold of PHD5-C5HCH NSD3 originates from extensive contacts, primarily hydrophobic, between the PHD5 and C5HCH finger. Specifically, the hydrophobic residue Trp 1364 , positioned at the interface of the two zinc fingers, bridges these two fingers through hydrophobic interactions with Phe 1325 of PHD5 and Phe 1377 and Pro 1382 of C5HCH (Fig.  2, A and B). These four key residues form the tandem domain fold and are highly conserved in the NSD family ( Fig. 2A). In addition, Tyr 1345 and Trp 1360 are characterized as the hydrophobic core residues of PHD5 NSD3 , and Phe 1385 is characterized as the key residue responsible for the fold of C5HCH NSD3 , and all are identical among the three NSD proteins (Fig. 2A). Moreover, the three PHD5-C5HCHs share about 65% sequence similarity with each other and have the same zinc chelating motifs. This suggests that the PHD5-C5HCH motifs of the NSD family fold as a similar integrated PHD-PHD-like structural module.
In addition to the backbone interactions, side chains of the H3 N-terminal residues establish a network of interactions with PHD5 residues. The side chains of H3A1 and H3T3 are accommodated within two separate shallow hydrophobic pockets, respectively (Fig. 3, C and D). The side chains of H3R2 and H3K4 are isolated to two shallow negatively charged channels separated by the side chain of Met 1335 (Fig. 3, C and D). In the complex structure of PHD5-C5HCH NSD3 with H3 1-15 K9me3 or H3 1-7 , the density of the side chain of H3R2 can be clearly observed (supplemental Fig. S10), and its N proton forms a

Structure and Histone H3 Tail Binding of NSD PHD5-C5HCH
FEBRUARY 15, 2013 • VOLUME 288 • NUMBER 7 hydrogen bond to the side chain carbonyl oxygen of Asp 1337 ( Fig. 3D and supplemental Fig. S9, C and D). However, in the PHD5-C5HCH NSD3 ⅐H3 1-15 complex structure, the side chain of H3R2 is fully exposed to solvent, with relatively poor electron density, indicating a moderate positional variability (supplemental Figs. S9G ). Notably, asymmetric dimethylation of H3R2 only causes a slight affinity reduction of less than 2-fold, whereas a remarkable affinity decrease of the H3 1-15 K9me3 peptide was observed if Asp 1337 of PHD5 was mutated to Ala ( Fig. 3E and supplemental Table S1). A reasonable explanation for this is that the D1337A mutant not only disrupts hydrogen bonding with H3R2, but also destroys hydrogen bonding with the backbone amide of Trp 1360 (Fig. 3D and supplemental Fig.  S9, C and D), which might interfere with the H3A1 interactions. Specific recognition of H3K4 in the unmodified state is secured by a canonical mode as observed in BHC80 (also known as PHF21A) (35). The ⑀-amino group of K4 forms hydrogen bonds to the side chain carbonyl oxygen of Asp 1322 , the backbone carbonyl oxygen of His 1320 and Glu 1321 , as well as a water molecule ( Fig. 3D and supplemental Fig. S9, C and G). Additional K4 recognition is achieved by hydrophobic contacts between the K4 alkyl chain and the side chain of Leu 1333 . Methylations at H3K4 reduce binding significantly due to steric clash and unfavorable polar-hydrophobic interactions.
Compared with the N-terminal first seven amino acids of His 3 , the Arg 8 and Lys 9 or K9me3 have relatively larger B factors for their side chain atoms (data not shown), which suggests a higher degree of flexibility and agrees with relatively little shape complementarity for these residues (Fig. 3C). The side chain carboxyl group of Asp 1329 forms hydrogen bonds to the guanidinium group of H3R8 and the backbone amide of H3K9 or H3K9me3, undergoing an induced flipping coupled with the peptide binding (supplemental Figs. S11B and S12). Interestingly, in the complex structure of PHD5-C5HCH NSD3 ⅐H3 1-15 , the carboxylate side chain of Asp 1329 possesses alternative conformations: one similar to that in the PHD5-C5HCH NSD3 ⅐ H3 1-15 K9me3 complex and the other corresponding to that in the free state and PHD5-C5HCH NSD3 ⅐H3 1-7 complex (supplemental Figs. S9G ), indicating Asp 1329 might be involved in transient electrostatic interactions with unmodified K9 of the peptide. Moreover, the ammonium group of unmodified K9 is positioned close to the backbone carbonyl oxygen of Gly 1328 (about 3.7 Å for side chain nitrogen of Lys 9 ) in the structure involving unmodified K9 (supplemental Fig. S9G), whereas the K9me3 points its side chain near the aromatic ring of Tyr 1323 and slightly away from Gly 1328 (Fig. 3D), probably losing the electrostatic interaction with Gly 1328 . Consistently, the K9A mutation of the H3 peptide decreased the binding affinity to PHD5-C5HCH NSD3 , thus confirming that the side chain of unmodified K9 plays a significant role in the interaction (supplemental Table S1). H3K9me1 was observed to induce a reduced affinity compared with H3K9me0 (supplemental Figs. S1 and S2). This can be explained, the enhanced cation-interaction with the aromatic ring of Tyr 1323 by monomethylation at H3K9 may not be sufficient to compensate for the loss of interaction with Gly 1328 . However, the increased cation-interaction caused by H3K9me3 may have overcome this barrier and further provides a basis for the observed preference for K9me3 over K9me0. This phenomenon agrees well with the NMR chemical shift perturbation data. The majority of chemical shift changes in PHD5-C5HCH NSD3 induced by H3 [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15] K9me3 were paralleled to those caused by unmodified H3 1-15 , whereas the amide cross-peaks of Gly 1328 and Asp 1329 were shifted to a larger extent in the PHD5-C5HCH NSD3 ⅐H3K9me3 complex (supplemental Fig. S4).
The introduction of an Ala mutation in Asp 1322 , Asp 1329 , or Asp 1337 caused a significant reduction in binding affinity of PHD5-C5HCH NSD3 to the H3 1-15 K9me3 peptide, confirming the significance of these residues for H3 binding at K4 (Asp 1322 ), R2 (Asp 1337 ), R8 and K9 (Asp 1329 ). The D1322A/ D1329A/D1337A triple mutant led to a complete loss of binding ( Fig. 3E and supplemental Table S1). Furthermore, the Ala mutation of Tyr 1323 weakened the H3 1-15 K9me3 binding of PHD5-C5HCH NSD3 to nearly the level of unmodified H3 1-15 peptide binding. This was accompanied with remarkably reduced binding enthalpy, hence confirming the importance of the Tyr side chain for this interaction (Fig. 3E and supplemental Table S1). Taken together, these results demonstrate how the sequence-specific histone H3 binding by PHD5-C5HCH NSD3 is modulated negatively by methylation at H3K4 and positively by trimethylation at H3K9.

A Conservative Sequence Motif Defines a Subgroup of PHD Fingers That Conducts H3K4me0 and H3K9 Cross-talk-
Recent structural studies suggest that the N-terminal sequence motif (NTS motif) around the first and the second zinc coordinating cysteines of PHD fingers is often critical in histone H3 peptide recognition (36,37). From sequence alignment (Fig. 4A) and structural comparison and analysis (Fig. 4, B-G), we extracted an NTS motif, which defines a subgroup of PHD fingers reading out H3K4me0 and H3K9me3 (or H3K9me0) simultaneously. This motif is "(D/E){(W/Y/F)/(D/ E)}CXXCX(D/N)GG", where "(/)" means alternative amino acids, "{/}" denotes alternative types of amino acid, and "X" represents any amino acid. We identify the "0" position of the NTS motif as the key Asp or Glu residue participating in the recognition of H3K4me0 (Fig. 4, A-G). The residue positioned at "ϩ1" is responsible for K9 recognition, which has alternative roles: one is an Asp or a Glu forming hydrogen bonds with the ⑀-amino group of H3K9, such as in AIRE-PHD1 (the first PHD finger of autoimmune regulator) (Fig. 4B) (38); the other is an aromatic residue optimal for H3K9me3 recognition through a cation-interaction, such as in CHD4-PHD2 (the second PHD of chromo domain helicase DNA-binding protein 4), TRIM33-PHD (the PHD finger of tripartite motif 33), and NSD3-PHD5 (Fig. 4, C-E) (39, 40). The Asp or Asn residues located at the "ϩ7" position is another critical amino acid implicated in the FIGURE 4. The NTS motif in PHD fingers for H3K4me0 and H3K9 cross-talk. A, sequence alignment of PHD fingers that are known to conduct H3K4me0 and H3K9 cross-talk, including PHD5 of NSD3, PHD1 of AIRE, the PHDs of TRIM33 and TRIM24, and PHD2 of CHD4. The PHD finger of BHC80 and PHD1 of BRPF2, which are not involved in cross-talk, are also included in the alignment. Key residues involved in H3K4me0 and H3K9 recognition are grouped by green background. The conservative NTS motif dedicated to H3K4me0 and H3K9 dual readout is listed at the top of the alignment, with specific positions numbered above. B-G, ribbon representations of the structures of PHD⅐H3 complexes that are indicated in the alignment in A. The H3 peptides are colored yellow with K4, R8, and K9 or K9me3 side chains shown in stick representation. Key residues of the PHD fingers involved in the recognition of H3 are also shown as sticks.

Structure and Histone H3 Tail Binding of NSD PHD5-C5HCH
FEBRUARY 15, 2013 • VOLUME 288 • NUMBER 7 formation of hydrogen bonds with the main chain amide of H3K9 and sometimes the side chain guanidinium group of H3R8 (Fig. 4, B-E) (38 -40). When substituted for a Lys at this position, such as in BHC80-PHD, a disruption in the sensitivity to trimethylated K9 is observed even if the "ϩ1" residue is a Phe (Fig. 4F) (35). In addition, the "ϩ8 to ϩ9" positions seem to be a conserved "G-G" motif, and bulky residues at these two positions might exclude the segment after H3A7 from the binding surface. BRPF2-PHD1 (the first PHD finger of Bromodomain-PHD finger protein 2) does not contain such an NTS motif and can only bind with the first six residues of H3 (Fig. 4G) (41). In brief, these key residues that we defined in the NTS motif cooperate to mediate the relatively long range cross-talk between H3K9me3 (or H3K9me0) and H3K4me0.
We searched for this motif in a large sequence collection of human PHD domain containing proteins obtained from SMART (42) and Pfam (43) databases, and the results, listed in supplemental Table S2, suggested that the NTS motif is a common motif among PHD fingers in the human proteome. Interestingly, the NTS motif of PHD5 NSD2 falls into the subset of H3K4me0 and H3K9me0 recognition.
The Lys 2134 residue in PHD5 NSD1 , corresponding to residue Asp 1337 of NSD3, may interfere with the binding of H3 by disrupting hydrogen bonding interactions with the guanidinium group of H3R2 and destabilizing the H3A1 interaction loop (Fig. 2A). Besides, Ala 2127 in NSD1, which corresponds to resi-due Gly 1330 of NSD3, may have a negative effect on the binding to the backbones of H3A7 and H3R8 ( Fig. 2A). Therefore, the completely blocked interaction of PHD5-C5HCH NSD1 with H3 peptides revealed by the peptide pulldown assay might be due to these two residues (Fig. 5A). To confirm this notion, we prepared a K2134D/A2127G double mutant of PHD5-C5HCH NSD1 . Expectedly, this mutant exhibited binding ability with H3 N-terminal peptides, and showed a preference for unmethylated H3K9 over H3K9me3 as PHD5-C5HCH NSD2 (Fig. 5A).
Taken together, the tandem PHD5-C5HCH modules of the three NSD proteins may have a similar structural fold, yet exhibit different binding abilities and specificities to the histone H3 tail as a consequence of a minor sequence variation located on the H3 binding surface: PHD5-C5HCH NSD1 loses the ability to bind the histone H3 N-terminal tail, PHD5-C5HCH NSD2 targets the H3 tail containing unmodified K4 and K9, whereas PHD5-C5HCH NSD3 prefers to bind the H3 peptide containing unmodified K4 and trimethylated K9. These differences associated with the PHD5-C5HCHs of the NSD proteins shed light on the functional diversity of this family.

DISCUSSION
The H3 Recognition Feature of Each NSD Family Member Is Conserved Across a Wide Variety of Species-The structural and biochemical data reveal that the PHD5 domains of the three human NSD proteins have distinct binding abilities and preferences toward the H3 N-terminal tail, which comes from minor sequence variations that are located on the H3 binding surface. Because our analysis was carried out predominantly using human NSD proteins, it would be interesting to determine whether this is a common phenomenon in other species.
There is only one NSD homolog called maternal-effect sterile 4 in worms or flies, which contains no PHD5-C5HCH fingers (Fig. 1A). All vertebrates, including species ranging from zebrafish to humans, have three NSD members and each member contains a PHD5-C5HCH region. A detailed sequence alignment of each NSD PHD5-C5HCH region along with homologues from different species demonstrate that the key residues for the three variant binding specificities of PHD5-C5HCHs to H3 are highly conserved during evolution. The lysine residue accounting for the loss of NSD1 binding to H3 is identical among NSD1 homologues, and the key residues that are responsible for unmodified H3K4 and K9 recognition of NSD2 are totally the same as with its homologous proteins (supplemental Fig. S13, A and B). For NSD3, the preference for H3K9me3 is also well conserved in mammals (supplemental Fig. S13C). Residues that are involved in hydrophobic core formation and zinc coordination for the tandem PHD-PHD-like module have been well maintained throughout evolution (supplemental Fig. S13). These imply that the overall fold and the disparity in the H3 tail target among the three PHD5-C5HCH members of the NSD family are both well conserved across a wide variety of species, which further indicates that the discrepancy of this region in binding with histones may be associated with the functional diversity associated with the NSD family.
The C5HCH Domain Acts as a Potential Protein-Protein Interaction Module and the Interaction Property May Also Be Divergent Among the Three NSD Members-The t(5;11)(q35; p15.5) translocation, which generates the NUP98-NSD1 fusion protein, gives rise to about 5% of human acute myeloid leukemia (7). NUP98-NSD1 induces acute myeloid leukemia by enforcing expression of the HoxA7, HoxA9, HoxA10, and Meis1 proto-oncogenes. In a mouse model, the PHD5-C5HCH NSD1 motif is essential for recruiting the fusion protein to the HoxA9 gene promoter, indicating that PHD5-C5HCH NSD1 harbors a specific chromatin-targeting ability (44). However, our data strongly suggests that PHD5-C5HCH NSD1 lacks a direct binding ability to histones. Therefore, the manner in which the motif contributes to the recruitment currently remains unclear. One possibility is that PHD5-C5HCH NSD1 targets the chromatin via an adapter protein, which binds to the chromatin and the PHD5-C5HCH motif simultaneously. The C5HCH NSD1 domain has been implicated to directly interact with the C2HR domain of Nizp1 (NSD1-interacting zinc finger protein 1) (45). Nizp1 contains four consensus C2H2-type zinc fingers and the C2H2 finger is widely identified as a sequence-specific DNAbinding motif. Thus, Nizp1 is likely characterized as an adapter protein that recruits PHD5-C5HCH of NSD1 to chromatin.
Our GST pulldown assay showed that PHD5-C5HCH NSD1 strongly binds to C2HR of Nizp1, whereas PHD5-C5HCH NSD3 does not (supplemental Fig. S14A), consistent with previous reports (45). Besides, the triple mutant of PHD5-C5HCH NSD3 or PHD5-C5HCH NSD2 , in which the three key Asp residues were all mutated to alanine, was proven to demonstrate a significant loss of binding to calf thymus histones, which suggested that the histone binding ability of the PHD5-C5HCH module either in NSD3 or NSD2 only originates from the binding of PHD5 to H3 N-terminal tail but not from C5HCH (supplemental Fig. S14B). Taken together, these results imply that the C5HCH domain may act as a nonhistone protein interacting motif, and similar to PHD5 recognizing specific histone H3 modifications, each NSD C5HCH may have distinct protein interaction partners. All of these raise the possibility that the PHD5-C5HCH motif has dual roles in H3 and nonhistone protein interactions, which are carried out by PHD5 and C5HCH, respectively. These hybrid interactions may play important roles in targeting the NSD proteins that mediate H3K36 dimethylation to different genomic sites and thus lead to distinct physiological outcomes.
PHD5-C5HCH Is a New Type of Tandem PHD Fingers-Thus far, only two tandem PHD finger structures have been reported in the literature. These include the tandem PHD (PHD1-PHD2, PHD12) domains of MOZ (monocytic leukemia zinc finger protein, a histone acetyltransferase) and Dpf3b (an adaptor component of BAF chromatin remodeling complex), which share high sequence identities and structural similarities (46,47). However, the structure of PHD5-C5HCH NSD3 is considerably different from the PHD12s of MOZ and Dpf3b despite them all being tandem PHD finger modules. First, PHD1 and PHD2 of MOZ or Dpf3b associate with each other mainly through polar interactions (supplemental Fig. S15, B and C) (46,47), whereas PHD5 and C5HCH of NSD3 associate with each other through hydrophobic interactions. Second, PHD1 and PHD2 of MOZ or Dpf3b are pressed against one another in a "face to back" direction, whereas PHD5 and C5HCH of NSD3 in a "face to side" manner (supplemental Fig. S15, A-C) (46,47). The third, MOZ or Dpf3b recognizes the unmodified N-terminal residues and acetylated K14 of H3 with PHD2 and PHD1, respectively (supplemental Fig. S15, B and C) (46,47), whereas PHD5-C5HCH NSD3 forms a complex with the H3 N-terminal peptide only through the first PHD (PHD5) of the tandem module ( Fig. 2C and supplemental Fig. S15A). Interestingly, the PHD finger and Bromo domain of TRIM33 also associate with each other mainly through hydrophobic interactions, and the PHD finger of TRIM33 shows a similar H3 binding preference with PHD5 NSD3 (40). Therefore, to some extent, PHD5-C5HCH NSD3 shares certain similarity with PHD-Bromo of TRIM33 (supplemental Fig. S15D and note the direction of the bound peptide). In summary, PHD5-C5HCH of the NSD family defines a new type of tandem PHD finger motif.
Sotos Mutations-In Sotos syndrome, multiple missense mutations that occur in the catalytic SET, PWWP2, PHD1-5, and C5HCH domains produce the same phenotype as the elimination of the entire NSD1 gene (4). The crystal structure of the SET domain of NSD1 and the histone lysine methyltransferase (HKMT) activity assays reflect that the missense mutations within the SET domain are responsible for the loss of or reduction of the histone lysine methyltransferase activity of NSD1, which may induce a berrant H3K36 methylation pattern during development (18).
In our study, we show that PHD5-C5HCH NSD1 does not directly interact with histones, and its overall structure may resemble PHD5-C5HCH NSD3 . Previous work has (4) identified at least 12 missense mutations within the PHD5-C5HCH motif, 10 of which are zinc coordinating residues, and the other two mutations are conserved hydrophobic core residues of PHD5 and C5HCH, respectively (supplemental Fig. S16). Thus, Sotos syndrome mutations within PHD5-C5HCH NSD1 seem to obstruct the normal function of NSD1 through disrupting the proper folding of the PHD5-C5HCH region.
The Histone Target Specificity of PHD5-C5HCH NSD3 and Its Implication for the Function of Human NSD3-The N-terminal tail of histone H3 is rich in PTMs, thereby providing a platform for histone PTM cross-talk and a wealth of regulatory potential. Among the PTMs in H3 N-terminal tail, H3K4me3 and H3K9me3 are the most studied modifications. H3K4me3 is a hallmark of active transcription start sites, whereas H3K9me3 Structure and Histone H3 Tail Binding of NSD PHD5-C5HCH FEBRUARY 15, 2013 • VOLUME 288 • NUMBER 7 usually demarcates heterochromatin. In this study, we define a subgroup of PHD fingers, which conducts the H3K4me0 and H3K9 cross-talk. This subgroup contains an NTS motif that is responsible for the cross-talk. In addition to this subgroup of PHD fingers, the ADD (ATRX-DNMT3-DNMT3L) domain of ATRX was recently confirmed to be an H3K4me0 and H3K9me3 dual reader, in which an atypical "polar pocket" is utilized for K9me3 readout (48,49).
The preference for H3K9me3 of these fingers implicates a role associating with heterochromatin. ADD ATRX was recruited to pericentromeric heterochromatin in an H3K9me3dependent manner (48,49). Similarly, the NuRD (nucleosome remodeling and histone deacetylase) complex, which contains CHD4, has been shown to localize to heterochromatin and display CHD4-dependent activity in heterochromatin maintenance and assembly (50). In contrast to ATRX and CHD4 associated with heterochromatin, the PHD-Bromo cassette of TRIM33 recognizes a poised chromatin platform that contains H3K9me3 and K18ac present in the promoter region of master regulators in embryonic stem cells (40).
The details concerning how PHD5-C5HCH NSD3 contributes to the recruitment of NSD3 to specific genomic regions in vivo is still largely unknown. Shi et al. (21) recently reported that NSD3 participates in transcriptional elongation. Their research indicates that NSD3 can form complexes with H3K4-specific demethylase LSD2 and H3K9-specific methyltransferase G9a. Strikingly, the preferable simultaneous binding of PHD5-C5HCH NSD3 to unmethylated H3K4 and methylated H3K9, which are just demethylation and methylation products of LSD2 and G9a, indicates that NSD3 might function downstream of LSD2 and G9a. On the other hand, the distinct distribution of LSD2, which shows binding preferentially at the 3Ј end of actively transcribed genes, is similar with the pattern of H3K36 methylation deposition (20). Therefore, our results support a model: the LSD2 and G9a enzymes first mediate methylated H3K4 demethylation and H3K9 methylation at the 3Ј end of actively transcribed genes, producing a histone modification pattern recognized by the PHD5 finger of NSD3, which helps NSD3 locate and then methylate H3K36. However, additional investigations are required to fully understand how LSD2, G9a, and NSD3 orchestrate appropriate histone modifications for optimal transcription elongation.
In summary, it is the first time a detailed analysis on the reader domains of the NSD family is presented. We identify the different histone binding specificities of PHD5-C5HCHs and provide the underlying molecular basis. Thus our findings offer insights into the role of this novel tandem module, suggesting that it may play a role in targeting the NSD enzymes to different genomic sites and thus lead to distinct biological outcomes. Our results are consistent with the nonredundancy of NSD proteins and may also imply a link between the deposition of H3K36 methylation and the recognition of H3K4 and H3K9. Another important question emerges concerning the chromatin targeting roles of other reader domains, including the PHD1-4 and the two PWWP domains. Further studies are required to establish an understanding of the interplay between the reader domains and the catalytic SET domain. Each new piece of information seeks to elucidate the underlying mechanisms of the NSD family action.