Identification and Characterization of a Novel Human Histone H3 Lysine 36-specific Methyltransferase * □ S

Histone methylation plays an important role in eukaryotic transcriptional regulation. A number of histone methyltransferases (HMTases) with distinct functions have been identified. The HSPC069 / HYPB gene was originally isolated from the human hematopoietic stem/progenitor cells (HSPCs), and it was also identified as a huntingtin interacting protein, implicated in the patho-genesisofHuntingtondisease(HD).However,itsbiochemicalfunc- tion is poorly understood. Here we report the structural and functional characterization of the huntingtin interacting protein B (HYPB). 1) The triplicate AWS-SET-PostSET domains mediate a histone H3 lysine 36 specific HMTase activity. 2) A low charged region that is rich in glutamine and proline has been characterized asanoveltranscriptionalactivationdomain.Thestructuralfeatures of this region are evolutionarily conserved in vertebrates. 3) Coimmunoprecipitation assays indicate that HYPB protein associates withhyperphosphorylatedRNApolymeraseII(RNAPII)butnotthe unphosphorylated


Histone methylation plays an important role in eukaryotic transcriptional regulation. A number of histone methyltransferases (HMTases) with distinct functions have been identified. The HSPC069/HYPB gene was originally isolated from the human hematopoietic stem/progenitor cells (HSPCs), and it was also identified as a huntingtin interacting protein, implicated in the pathogenesis of Huntington disease (HD). However, its biochemical function is poorly understood. Here we report the structural and functional characterization of the huntingtin interacting protein B (HYPB). 1) The triplicate AWS-SET-PostSET domains mediate a histone H3 lysine 36 specific HMTase activity. 2) A low charged region that is rich in glutamine and proline has been characterized as a novel transcriptional activation domain. The structural features of this region are evolutionarily conserved in vertebrates. 3) Coimmunoprecipitation assays indicate that HYPB protein associates
with hyperphosphorylated RNA polymerase II (RNAPII) but not the unphosphorylated form. Furthermore, the RNAPII-association region of HYPB protein has been identified to encompass the C-terminal 142 amino acids. Thus, our results suggest that HYPB HMTase may coordinate histone methylation and transcriptional regulation in mammals and open perspective for the further study of the potential roles of HYPB protein in hematopoiesis and pathogenesis of HD.
Nucleosome, the bead-like unit of DNA packaging in eukaryotic cells, consists of DNA wound around a protein core made up of eight histone molecules. Covalent modifications of the N-terminal tails of the core histones have emerged as key regulatory mechanisms of gene expression (1)(2)(3). These histone modifications, including acetylation, phosphorylation, ubiquitination, and methylation, create both synergistic and antagonistic signals that correlate with the transcriptional activity of a gene, through recruiting/dispelling some protein complexes or through changing the structure of chromatin to allow access for RNA polymerase to initiate transcription. Moreover, these histone modifications and the consequent changes in chromatin structure may serve as an epigenetic marking system that is responsible for establishing and maintaining the heritable programs of gene expression during cellular differentiation and organism development (4,5).
Methylations of histone lysine residues, with exception of H3 lysine 79, are catalyzed by a family of SET domain-containing proteins (6). The SET domain is an evolutionarily conserved, ϳ130-amino acid sequence motif. It was originally identified in members of polycomb group (PcG), trithorax group (trxG), and Su(var) genes and was named after the genes Su(var)3-9, Enhancer of zeste (E(z)) and trithorax (trx) (7). Not all SET domain-containing proteins possess histone methyltransferase (HMTase) 5 activities. The cysteine-rich regions adjacent to the SET domains are also required (8,9). In addition to the SET domains, most HMTases carry other functional domains such as transcriptional activation or repression domains (10,11), protein-protein interaction domains (9,11), protein-DNA/RNA interaction domains (12,13), etc. These domains direct the HMTases to certain protein complexes and mediate some specific activities.
It has been known that many SET domain-containing genes play important roles in embryo development (14 -18). Recently, several SET domain-containing genes have been implicated in the pathogenesis of human diseases (19 -21). HSPC069, a SET domain-containing gene, was originally isolated from human CD34 ϩ hematopoietic stem/progenitor cells (HSPCs) in our previous works (22,23). Independent studies indicated that the C-terminal region of the encoded protein interacts with Huntington disease (HD) protein huntingtin, and thus it was also named as huntingtin interacting protein B (HYPB), implying its roles in pathogenesis (24,25). It has been next identified as a DNA-binding factor that binds the proximal E1A promoter of adenovirus serotype 12 (26). Northern blotting assays indicate that HYPB gene is expressed ubiquitously in all tissues examined (24,26). In the present studies, we found that HYPB protein possesses a histone H3 lysine 36 (H3-K36) specific HMTase activity. Furthermore, it contains several other important functional features such as auto-methylation activity, a novel transcriptional activation domain, and the hyperphosphorylated RNA polymerase II association. Our findings suggest that HYPB may serve as a linker between histone H3-K36 methylation and transcriptional reg-ulation in mammals and provide useful implications for elucidating the molecular pathogenesis of HD.

EXPERIMENTAL PROCEDURES
Plasmids-The HYPB cDNA was obtained from a cDNA library of CD34 ϩ HSPCs as described previously (23). A 3Ј-segment was obtained from KIAA1732-pBluescript plasmid (gift from Dr. O. Ohara) (27). For HMTase activity assays and glutathione S-transferase (GST) pull-down assays, fragments of HYPB cDNA were subcloned into pGEX-5X1 vector (Amersham Biosciences) to create the N-terminal GST-tagged HYPB bacterial expression plasmids. A fragment of human MLL cDNA was PCR-amplified from HL60 cells and cloned into pGEX-5X1 vector. For transactivation assays, fragments of HYPB cDNA were subcloned into pBIND vector (Promega) to create CMV-driven Gal4-tagged HYPB mammalian expression plasmids. For coimmunoprecipitation assays, a fragment of HYPB cDNA was subcloned into pFLAG-CMV4 vector (Sigma) to create a CMV-driven FLAG-tagged HYPB mammalian expression plasmid. All PCR-amplified products described above were confirmed by sequencing. The arginine 1122-to-histidine mutant of HYPB, the lysine 36-to-arginine mutant of H3, and the WW domaindeleted HYPB construct were created with bridge-PCR mutagenesis strategy.
Bacterial Protein Expression-Recombinant proteins were expressed in Escherichia coli strain BL21 and purified with glutathione-Sepharose 4B according to the manufacture's protocol (Amersham Biosciences). Protein concentration was determined by Coomassie Brilliant Blue R-250 staining of SDS-PAGE gel. Beads-bound fusion proteins were used for in vitro HMTase activity assays, GST-pull down assays, or store at Ϫ80°C.
Cell Culture and Transfection-HEK293 cells and NIH3T3 cells were grown in Dulbecco's modified Eagle's medium with 10% fetal bovine serum (Invitrogen) at 37°C in a humidified 5% CO 2 incubator. For transient transfection, SuperFect transfection reagent (Qiagen) was used following the manufacture's protocol.
Luciferase activity values were normalized for transfection efficiency using ␤-galactosidase activity.
Coimmunoprecipitation-HEK293 cells were transiently transfected with FLAG-HYPB, FLAG-STAT5A or Gal4-HYPB expression plasmids, respectively. Cells were harvested 24 -36 h after transfection and lysed with Nonidet P-40 lysis buffer (50 mM Tris, pH 8.0, 150 mM NaCl, 1.0% Nonidet P-40) in the presence of mixture protease inhibitor (Roche Applied Science) and 20 mM NaF to inhibit phosphatases. For each immunoprecipitation, 400 l of cell lysate were incubated with anti-Gal4 antibody (Santa Cruz Biotechnology) for 5 h at 4°C. Then 20 l of protein A/G beads (Santa Cruz Biotechnology) were added and rotated for 2 h at 4°C. For immunoprecipitation of FLAG-tagged proteins, 20 l of anti-FLAG beads (Sigma) were added directly in the lysates and rotated for 2 h at 4°C. Bound immune complexes were washed three times with phosphate-buffered saline. Both the complexes and flow-throughs were analyzed by Western blotting.

RESULTS
Structural Features of HYPB Protein-To evaluate structural characteristics of HYPB protein and identify promising regions for experimental investigation, the protein sequence was subject to several sequence analysis programs. By using the SMART (Simple Modular Architecture Research Tool) server (33), two remarkable conserved regions were identified: 1) a SET domain with the adjacent AWS (associate with SET) and PostSET domains (Fig. 1A). Interestingly, both the AWS and Post-SET domains are rich in cysteine, suggesting that the AWS-SET-Post-SET region may mediate an HMTase activity. 2) A WW domain located near the C terminus of HYPB protein (Fig. 1A). This domain is widespread in eukaryotic species and may mediate protein-protein interaction, especially interacting with proline-rich regions and SH3 motifs (34).
SAPS (Statistical Analysis of Protein Sequences) program was then used to evaluate other sequence properties of HYPB protein, including compositional extremes, clusters and runs of charge and other amino acid types, and repetitive structures, etc. (35). As a result, a significantly low charged region (residues 1634 -1863) was identified (Fig. 1A). This region includes 230 amino acids, 220 of which are uncharged. Within the charged amino acids, there are 9 negatively charged amino acids (Asp and Glu) and only 1 positively charged amino acid (Lys). Thus, this region displays high net negative charge and therefore is indicated with "0/Ϫ". Furthermore, this region is significantly rich in glutamine and proline (14.3 and 16.5%, respectively). It has been known that long uncharged segments are frequent in the zinc finger-containing proteins and also occur in a number of other transcription factors (36), and net negative charge is one of the most common characteristics observed among transcriptional activation domains (37,38). Taken together, these remarkable features suggest that the low charged region of HYPB protein is likely to carry biologically significant function.
A BLAST search against public protein databases reveals that HYPB shares significant similarity with SET2 protein, a well defined Saccha-romyces cerevisiae H3-K36 HMTase (39 -45). Both HYPB and SET2 contain the AWS-SET-PostSET region and the WW domain, while SET2 lacks the low charged region. To further determine the similarities between HYPB and SET2 proteins, we performed a sequence alignment using FASTA program (46). As shown in Fig. 1B, HYPB and SET2 display 45.7 and 29.0% identities in the AWS-SET-PostSET region and the WW domain, respectively. Furthermore, the C terminus of HYPB shares 24.8% identity with the corresponding region of SET2, which is separate from the WW domain. Taken together, these data suggest that HYPB and SET2 may be functional homologue and that the regions of high similarities may define functionally important domains within these proteins.
HYPB Protein Possesses SET Domain-dependent HMTase Activity-Within the SET domain, an H(R) NHSC motif (where indicates a hydrophobic residue) has previously been shown to be an important catalytic site. For SUV39H1 protein, a histidine-to-arginine mutation of the first histidine (His 320 ) in the 320 HNHSC 326 motif resulted in a 20-fold higher catalytic activity (8). This observation suggests that the H(R)NHSC motif is correlated with the HMTase activity. Notably, HYPB protein contains a 1122 RNHSC 1128 motif ( Fig. 2A, boxed), implying that it may possess an HMTase activity.
In vitro HMTase activity assay was performed to determine the potential HMTase activity of HYPB. A fragment of HYPB protein (residues 915-1211) fused with GST was used as enzyme and core histone as substrate. GST-SUV39H1 (residues 82-412), GST-MLL (residues 3745-3966), and GST-mG9a (residues 621-1000) were used as positive controls. Results indicate that GST-HYPB, as well as GST-SUV39H1 and GST-mG9a, possess HMTase activity that transferring 14 C-labeled methyl group onto histone H3 (Fig. 2B, left). Furthermore, recombinant N terminus of H3 (residues 1-47; Fig. 2B, right) and nucleosomes purified from HEK293 cells (Fig. 2C) are also methylated by GST-HYPB. It is interesting that HYPB displays higher activity of methylating nucleosomes than that of methylating purified histones, suggesting that the native nucleosome structure may effectively promote the HMTase activity of HYPB protein. Notably, GST-MLL does not exhibit detectable HMTase activity in these experiments, which is consistent with the previous hypothesis that the HMTase activity requires the combination of the SET domain with adjacent cysteine-rich regions (8), although weak HMTase activity of MLL protein had been observed in certain context (47,48).
Surprisingly, significant radiation signals located at the position of the GST-HYPB protein are detected (Fig. 2B, top panels, Enzymes), and they are much stronger than the nonspecific signals (asterisk), suggesting that the HYPB enzyme is methylated in these experiments. To further determine whether histone is required for this process of auto-methylation, we examined the auto-methylation activity in the presence or absence of core histone or GST, respectively. As shown in Fig. 2D, the auto-methylation signal of HYPB protein is consistently detected (top panel, HYPB), indicating that the auto-methylation of HYPB protein is independent of the existence of histone.
To further investigate the contribution of 1122 RNHSC 1128 motif to the HYPB HMTase activity, we created a mutant in which the arginine 1122 was replaced with histidine ( Fig. 2A, arrow) and then examined the changes of the activities. As a result, both the HMTase activity and auto-methylation activity were significantly impaired by this mutation (Fig. 2E). These results indicate that the methyltransferase activities of HYPB protein modifying both histone and itself are SET domain-dependent, and the arginine 1122 is necessary for these activities.
HYPB Protein Selectively Methylates H3-Lysine 36-We next investigated the site specificity of H3 methylation by HYPB protein. A series of recombinant GST-tagged mouse histone H3 (residues 1-47) proteins with several lysine-to-arginine substitutions were used as substrates. As summarized in Fig. 3A, H3N is a wild-type mouse H3 fused with an N-terminal GST. N4, N9, and N27 are double lysine-to-arginine mutants as indicated (28). K36R is a mutant in which lysine 36 is replaced with arginine. The previously defined HMTase mG9a, which exhibits selectivity to lysines 9 and 27 of histone H3, was used as a positive control. As indicated by Fig. 3B, only K36R fails to be methylated by GST-HYPB protein (Fig. 3B, lane 10), while the wild-type H3 (H3N) and the other lysine-to-arginine mutants (N4, N9, and N27) can be methylated (Fig. 3B, lanes 6 -9). In contrast, GST-mG9a can methylate these substrates except N4 (Fig. 3B, lanes 1-5). These results indicate that HYPB protein possesses the specificity for methylating lysine 36 of H3.

The Low Charged Region of HYPB Protein Is Evolutionarily Conserved in Vertebrates and Shows
Transcriptional Activation Activity-The fact that the low-charged region of HYPB protein displays significant structural features promoted us to investigate the biological function of this region. Generally, functional sequences are subject to evolutionary selection; the purifying (negative) selection causes the functional sequences change more slowly than the bulk, nonfunctional sequences (49). Therefore we initially investigated the evolutionary conservation of the structural features of this low-charged region. By using the SAPS program, we found similar low charged regions, immediately followed by the WW domains, in the putative homologues of other five vertebrates, namely Rattus norvegicus (Rn), Mus musculus (Mm), Gallus gallus (Gg), Tetraodon nigroviridis (Tn), and Fugu rubripes (Fr). The low charged region of human HYPB protein share 92, 90, 73, and 42% amino acid identities to its homologues of rat, mouse, chicken, and fish, respectively. We aligned these sequences and compared their charge distribution and amino acid composition, taking reference from the average amino acid frequencies of Swiss-Prot data base (supplemental Fig. 1). The comparisons led to the following observations: 1) all these sequences are rich in uncharged amino acids (Ͼ95%; the average frequency in Swiss-Prot is ϳ77%), 2) they display high ratio of negatively charged amino acids to positively charged ones (Ն3.5:1; the average ratio in Swiss-Prot is 1.06:1), and 3) they are rich in glutamine and proline (Ն14.30 and Ն11.90%; the average frequencies in Swiss-Prot are 3.94 and 4.83%, respectively). Thus, statistically, the structural features in the low charged region of HYPB protein are conserved in vertebrates.
To investigate whether the low charged region of HYPB protein could function as a transcriptional regulatory domain, different regions of HYPB protein fused with the DNA-binding domain (DBD) of Gal4 protein were examined for their effects on a Gal4 recognition sites-driven luciferase reporter gene (5Gal4-luc; Fig. 4A). Western blotting indicates the appropriate expression of all constructs, although the expression level of the full-length HYPB construct is relatively low, likely due to the very large insert (data not shown). Although longer HYPB constructs did not display transcriptional activation activity ( Fig. 4A and B, constructs 1 and 2), the Gal4-fused low charged region induced the luciferase reporter up to 500-fold in HEK293 cells and 860-fold in NIH3T3 cells, respectively, relative to the induction by Gal4 DBD expression vector (Fig. 4B, columns 1-3). A dose response was seen when increasing amounts of Gal4-fused low charged region construct were transfected in both cells (data not shown). These results indicate that the low charged region of HYPB protein possesses transcriptional activation activity.
In Vivo Association between HYPB Protein and Hyperphosphorylated RNA Polymerase II-Since HYPB protein possesses the histone modification ability, a potent transcriptional activation domain identified herein and a DNA binding activity reported elsewhere (26), we postulate that it could be directly involved into transcriptional regulation. Furthermore, it has been known that some WW domains in several promethyl group onto both histone H3 of core histones (left, lanes 2 and 5, H3) and recombinant N terminus of H3 (right, lane 6, rH3). The GST-HYPB proteins are also methylated (lanes 2, 5, and 6, Enzymes). GST-fused SUV39H1 (residues 82-412), MLL (residues 3745-3966), and mG9a (residues 621-1000) were used as positive control (lanes 1, 3 and 4). The GST-MLL protein displays no detectable activity in these experiments (lane 3). Asterisk indicates the nonspecific signals, and double asterisks indicate the position of H4. C, Nucleosomes purified from HEK293 cells can be methylated by GST-HYPB protein (left, lane 3) but cannot be methylated without enzyme (lane 1) or with GST only (lane 2). The methylation level of nucleosomes is higher than that of core histones (compare lanes 3 and 4). The nucleosomes were extracted with phenol-chloroform and examined with agarose gel (right). D, The auto-methylation activity of GST-HYPB protein is independent of the presence of histone or GST. The auto-methylation signals are consistently detected when adding core histone (lane 1), GST (lane 2) or no substrate (lane 3). E, The R1122H mutation of HYPB protein significantly reduces the H3 HMTase activities to both core histone (left, H3) and recombinant  H3 (right, H3). This mutation also significantly reduces the auto-methylation activities (lanes 2 and 4, Enzymes). In B-E, Coomassie stain (bottom) shows purified proteins, and the autoradiograph (top) shows corresponding 14 C-labeled products. WT, wild type.  6 -10). GST-mG9a (residues 621-1000) is used as positive control (lanes 1-5). Coomassie stain (bottom) shows purified proteins and autoradiograph (top) shows corresponding 14 C-labeled products.
teins (e.g. Pin1/Ess1p, Nedd4/Rsp5p, and Prp40p) mediate the RNA polymerase II (RNAPII)-association (34). The C-terminal domain of RNAPII large subunit is composed of heptapeptide (YSPTSPS) repeats, which can be phosphorylated in different patterns and therefore provide recognition sites for a number of factors (50).
Coimmunoprecipitation assay was performed to investigate whether HYPB protein could associate with RNAPII in vivo. FLAG-tagged HYPB (residues 915-2061, including the AWS-SET-PostSET domains, low charged region, and WW domain; Fig. 5A) was transfected into HEK293 cells. As a negative control, a transcription factor, STAT5A fused with FLAG, was also transfected separately. Anti-FLAG monoclonal antibody (mAb) M2-linked beads were used to immunoprecipitate FLAGtagged proteins from extracts of transfected cells. Precipitated complexes were subject to Western blotting probed with either anti-RNAPII or anti-FLAG antibodies (Fig. 5, B and C). A serine 2-phosphorylated RNAPII-specific mAb H5, a serine 5-phosphorylated RNAPII-specific mAb H14, and an unmodified RNAPII-specific mAb 8WG16 were used to probe different forms of RNAPII. The results indicate that hyperphosphorylated RNAPII (recognized by H5 and H14), but not unmodified RNAPII (recognized by 8WG16), can be coimmunoprecipitated by FLAG-HYPB (915-2061) protein (Fig. 5B). In contrast, the hyperphosphorylated RNAPII cannot be coimmunoprecipitated by the FLAG-STAT5A protein (Fig. 5C). Notably, in the hyperphosphorylated RNA-PII complexes coprecipitated with HYPB proteins, both serine 2-phosphorylated and serine 5-phosphorylated RNAPII can be detected, suggesting that HYPB protein may associate with the two phosphorylation forms or that these two phosphorylations may simultaneously occur on a single molecule.
To further confirm the association between HYPB protein and hyperphosphorylated RNAPII, Gal4-tagged HYPB proteins were used to perform the coimmunoprecipitation experiments. Both the C terminus (residues 915-2061) and the N terminus (residues 1-995) of HYPB protein were fused with Gal4 DBD (Fig. 5A). Similar coimmunoprecipitation and Western blotting assays were performed, except that anti-Gal4 antibody was used to immunoprecipitate the Gal4-tagged proteins. The results indicate that the Gal4-tagged C terminus of HYPB protein (Fig. 5D), but not the N terminus (Fig. 5E), is sufficient to coimmunoprecipitate with the hyperphosphorylated RNAPII.
The RNAPII Association Region of HYPB Protein-To define the RNAPII association region of HYPB protein, GST pull-down assay was performed. Initially, we analyzed two regions: residues 915-1211 (including the AWS-SET-PostSET domains; Fig. 6A, construct 1) and residues 1695-2061 (including the low charged region and WW domain; Fig. 6A, construct 2). Furthermore, to determine whether the WW domain could interact with RNAPII as reported for other proteins, we created a WW-delete construct (Fig. 6A, construct 3). These GSTtagged proteins were expressed and purified with GST-Sepharose 4B beads from bacteria (Fig. 6B, bottom panel) and incubated with HEK293 cell lysates for 3 h. Then the complexes were precipitated, followed by Western blotting to detect the RNAPII. As a result, the C-terminal FIGURE 5. In vivo association between HYPB protein and hyperphosphorylated RNAPII. A, schematic representation of the FLAG-and Gal4-tagged HYPB constructs used in the coimmunoprecipitation assays. B, HEK293 cells were transfected with FLAG-HYPB (residues 915-2061) plasmid. Coimmunoprecipitation assays were performed using anti-FLAG antibody as described under "Experimental Procedures." The complexes, flow-throughs (FT), and 10% input were analyzed by Western blotting (WB) with various antibodies as indicated. C, as a negative control, HEK293 cells were transfected with FLAG-STAT5A plasmid. Similar coimmunoprecipitation and Western blotting as described for B were performed. D and E, HEK293 cells were transfected with Gal4-HYPB (residues 915-2061) plasmid (D) or Gal4-HYPB (residues 1-995) plasmid (E). Coimmunoprecipitation assays were performed with anti-Gal4 antibody. Separate assays using protein A/G-linked beads without antibody were performed as negative controls (control). Western blotting assays were performed as indicated. IP, immunoprecipitation. region (residues 1695-2061; Fig. 6B, lane 2), but not the AWS-SET-PostSET region (residues 915-1121; Fig. 6B, lane 1), displays the RNA-PII association ability. In contrast, deletion of WW domain dose not significantly influences the RNAPII association (Fig. 6B, lane 3), indicating the WW domain is not necessary for this association.
To further map the RNAPII association region, five C-terminal mutants with deletions of different regions were used (Fig. 6A, constructs  4-8). The results indicate that the C-terminal region downstream of WW domain is sufficient for the association with hyperphosphorylated RNAPII (Fig. 6C, lanes 6 -8), whereas the low charged region is not necessary (Fig.  6C, lanes 4 and 5). Thus, the hyperphosphorylated RNAPII association region is localized on the last 142 amino acids of HYPB protein.

DISCUSSION
Recent studies have demonstrated the functional significances of histone methylation particularly through identification and characteriza-  ). B and C, the GST-HYPB proteins were purified from bacteria (bottom panels) and used for GST pull-down assays as described under "Experimental Procedures." GST protein was used as a negative control. The complexes were analyzed by Western blotting (WB) with various antibodies as indicated. In B, the RNAPII coprecipitated with GST-HYPB proteins migrates slower than those of input (asterisks), because these portions of RNAPII are rich in hyperphosphorylated forms. It is not visible when reducing the time of electrophoresis, as shown in C.
tion of HMTases that mediate site-specific methylation of histone lysine and arginine residues. Major progress has been made in understanding the functions of some histone lysine methylations. For example, histone H3-K9 methylation mediated by SUV39H1 is evident to be involved in some processes such as heterochromatin organization (51,52) and X inactivation (4,5), whereas H3-K27 methylation mediated by the EED-EZH2 complex is correlated with Hox gene silencing (53,54) and also X inactivation (55). Recently, it has been reported that in metazoans both di-and trimethylation of H3-K36 distribute on active genes and the levels of the methylation peak toward the 3Ј end of the transcribed regions. These studies clearly suggest that in metazoans the H3-K36 methylation is associated with the process of active transcription, although the mechanism and the function are still unclear (56). We report here the identification of a novel human H3-K36-specific HMTase HYPB, which associates with hyperphosphorylated RNAPII and contains a transcriptional activation domain. These results suggest that HYPB protein may play an important role in transcriptional regulation, and it could serve as an ideal model to interpret the nature of H3-K36 methylation in mammals.
Distinct Functions of HYPB HMTase-HYPB is not the only H3-K36 methyltransferase in humans. For example, NSD1 has been identified as an HMTase with the selectivity to H3-K36 and H4-K20 (18). Although several HMTases can methylate the same site of histones, their functions are not exactly the same. This is largely because they have diverse structural features that are correlated with distinct functions. In addition to the HMTase activity, HYPB is able to methylate itself in vitro. Two positive control, SUV39H1 and mG9a, did not show this activity in our experimental system, in agreement with previous observations (8,28). The biological function of this activity is still unclear, but it is possible that the auto-methylation may influence the HMTase activity in a direct or indirect manner. Furthermore, our observation also implies that the substrate of HYPB may not be restricted to the histones.
We have characterized a low charged region of HYPB protein as a novel transcriptional activation domain. Although the high net negative charge is a common characteristic among transcriptional activation domains, it is still not well understood about the structural feature(s) that is required for the transcriptional activation activity (37,38). As to the low charged region of HYPB protein, the high net negative charge and glutamine and proline richness may all contribute to the transcriptional activation activity. Notably, a C-terminal portion of HYPB protein containing a truncated low charged region, with a negatively charged amino acid-to-positively charged amino acid ratio of 1.15:1, also showed significant, but weaker, transcriptional activation activity (Fig. 4B, column 4). This result suggests that the transcriptional activation activity of the low charged region may not be completely provided by the high net negative charge. The precise mechanism of this activity remains to be studied. Generally, the activation domain may interact with certain general transcription factors (GTFs), and this interaction may recruit and modify the activity of RNAPII holoenzymes. Although previous studies have identified HYPB protein as a DNA-binding factor (26), we suppose, based on the structural and functional studies, that HYPB protein may act as a cofactor involved in transcriptional regulation through interacting with GTFs in certain context.
Another finding in this work is the association between HYPB protein and hyperphosphorylated RNAPII. The unphosphorylated RNAPII preferentially enters the preinitiation complex, where it is subsequently phosphorylated during or shortly after initiation. Transcript elongation is catalyzed by hyperphosphorylated RNAPII (50). Certain transcriptional activators (e.g. VP16 and p53) can affect elongation, as well as initiation. Both processes may involve the contacts between transcriptional activation domains and GTFs (57). Consequently, HYPB protein may be recruited by hyperphosphorylated RNAPII to the transcriptional machinery and play a role in transcriptional regulation. Very recently, it has been reported that S. cerevisiae SET2 protein contains a C-terminal domain, namely Set2 Rpb1 interacting domain, which directly interacts with phosphorylated RNAPII and that this domain appears to be conserved in potential SET2 homologues (45). Notably, the RNAPII association region of HYPB identified herein displays relatively high identity with the Set2 Rpb1 interacting domain of SET2 (Fig. 1B), further supporting that HYPB and SET2 are functional homologues. Moreover, the SET2 data suggest that HYPB may directly interact with the hyperphosphorylated RNAPII, although the precise mechanism of this interaction remains to be studied.
Evolution of HYPB HMTase-In humans, there are at least 40 SET domain-containing proteins, some of which have been characterized as HMTases (6). Evolutionary analysis of these proteins and their homologues in other species may contribute to the understanding of their biological functions (58,59). To outline the evolution of HYPB protein, we have extensively searched and analyzed its closest homologues in humans and in lower organisms. Furthermore, the "reciprocal-besthits" method was used to predict the orthologous relationship (60). As shown in Fig. 7A, four human proteins, namely ASH1L, NSD1, WHSC1, and WHSC1L1, are clustered into a subfamily with HYPB and its putative orthologues from mouse, fish, fruit fly, and yeast. These members are further divided into two groups: one of them contains HYPB and its putative orthologues, whereas the other contains the above-mentioned human proteins except HYPB (Fig. 7A). This analysis, in turn, supports the orthologue prediction and provides an evolutionary outline of HYPB HMTase from yeast to human.
According to this analysis, S. cerevisiae SET2 protein is considered as the putative orthologue of HYPB protein. Interestingly, the H3-K36 methylation mediated by SET2 depends on the phosphorylation of RNAPII and influences the RNAPII elongation (42,43,45). Thus, a model for the role of SET2 in yeast transcriptional regulation can be proposed (Fig. 7B, top). Although there is much to be studied for the physiological function of HYPB protein, both the sequence similarities and biochemical functions suggest that HYPB may exert similar functions in mammals as SET2 in yeasts (Fig. 7B, bottom). However, there are some interesting and potentially significant differences between HYPB and SET2 and the related biological processes in humans and yeasts. (Fig. 7B, compare human to yeast). First, HYPB protein contains a transcriptional activation domain that is conserved in vertebrates, implying that it may interact with GTFs in certain context. From the fact that HYPB sequence is much longer than SET2, it is possible that HYPB may contain other distinct functional regions that have not been uncovered. Second, while SET2 is the only H3-K36 HMTase in yeasts, human has several potential H3-K36 HMTases. In addition to HYPB and NSD1 that have been defined, WHSC1, WHSC1L1, and ASH1L may also possess H3-K36 HMTase activities. Since there is no evidence to suggest the human H3-K36 HMTases other than HYPB contain the RNAPII-interaction domains, H3-K36 methylation in humans may not fully depend on the phosphorylation of RNAPII. Finally, HYPB has been suggested to be involved in the pathogenesis of HD (see below for more discussion), and most of human potential H3-K36 HMTases are related with diseases ( Fig.  7B) (61)(62)(63)(64)(65). Furthermore, NSD1-deficient mice died before E10.5 (18), suggesting that NSD1 gene is essential for embryo development and that other H3-K36 HMTases, such as HYPB, cannot complement its functions. Taken together, the evolution of HYPB protein, as well as the related biological processes, is consistent with the complexity of an organism.
An Implication for the Molecular Pathogenesis of HD-A potential application of our findings is to elucidate the molecular mechanism underlying the pathogenesis of HD. HD is caused by a CAG repeat expansion that is translated into an abnormally long polyglutamine (poly(Q)) tract in the huntingtin protein. The molecular pathogenesis of HD has not been fully elucidated, but recent microarray studies suggest that alterations in gene expression could be involved (66). It has been reported that the poly(Q) tract of huntingtin interacts with a C-terminal region (residues 1875-2001) of HYPB, likely via recognizing the WW domain (24). According to our results, the huntingtin interaction region of HYPB includes the WW domain and a portion of the RNAPII association region. In view of the fact that the WW domain contains only 29 amino acids and is immediately followed by the RNAPII-association region, it is possible that the huntingtin-HYPB interaction interferes with the RNAPII association of HYPB, thereby influencing the transcription that is regulated. Furthermore, the huntingtin interaction may alter the cellular localization of HYPB protein, thus both the HMTase and transcriptional activation activities will be aberrant. Further studies are required to test these hypotheses.