Structural Basis for Substrate Preference of SMYD3, a SET Domain-containing Protein Lysine Methyltransferase*

SMYD3 is a SET domain-containing N-lysine methyltransferase associated with multiple cancers. Its reported substrates include histones (H3K4 and H4K5), vascular endothelial growth factor receptor 1 (VEGFR1 Lys831) and MAP3 kinase kinase (MAP3K2 Lys260). To reveal the structural basis for substrate preference and the catalytic mechanism of SMYD3, we have solved its co-crystal structures with VEGFR1 and MAP3K2 peptides. Our structural and biochemical analyses show that MAP3K2 serves as a robust substrate of SMYD3 because of the presence of a phenylalanine residue at the −2 position. A shallow hydrophobic pocket on SMYD3 accommodates the binding of the phenylalanine and promotes efficient catalytic activities of SMYD3. By contrast, SMYD3 displayed a weak activity toward a VEGFR1 peptide, and the location of the acceptor lysine in the folded kinase domain of VEGFR1 requires drastic conformational rearrangements for juxtaposition of the acceptor lysine with the enzymatic active site. Our results clearly revealed structural determinants for the substrate preference of SMYD3 and provided mechanistic insights into lysine methylation of MAP3K2. The knowledge should be useful for the development of SMYD3 inhibitors in the fight against MAP3K2 and Ras-driven cancer.

SMYD3 is a SET domain-containing N-lysine methyltransferase associated with multiple cancers. Its reported substrates include histones (H3K4 and H4K5), vascular endothelial growth factor receptor 1 (VEGFR1 Lys 831 ) and MAP3 kinase kinase (MAP3K2 Lys 260 ). To reveal the structural basis for substrate preference and the catalytic mechanism of SMYD3, we have solved its co-crystal structures with VEGFR1 and MAP3K2 peptides. Our structural and biochemical analyses show that MAP3K2 serves as a robust substrate of SMYD3 because of the presence of a phenylalanine residue at the ؊2 position. A shallow hydrophobic pocket on SMYD3 accommodates the binding of the phenylalanine and promotes efficient catalytic activities of SMYD3. By contrast, SMYD3 displayed a weak activity toward a VEGFR1 peptide, and the location of the acceptor lysine in the folded kinase domain of VEGFR1 requires drastic conformational rearrangements for juxtaposition of the acceptor lysine with the enzymatic active site. Our results clearly revealed structural determinants for the substrate preference of SMYD3 and provided mechanistic insights into lysine methylation of MAP3K2. The knowledge should be useful for the development of SMYD3 inhibitors in the fight against MAP3K2 and Ras-driven cancer.
Su(var), E(z), Trithorax (SET) 6 domain-containing proteins catalyze the transfer of the methyl group from S-adenosyl-Lmethionine (SAM) onto the N⑀ group of a lysine in the substrate protein. The best known and the most common substrates of the SET domain family of enzymes are histones. The tight site and methylation state specificities of SET domain proteins reflect the important roles of histone lysine methylation in epigenetic control of eukaryotic gene expression and regulation of higher order chromatin structure (1)(2)(3)(4)(5). Some SET domain proteins also methylate non-histone substrates, although only a small number of them are known to date. One example is the methylation of p53 by SET7/SET9 (6), which was first identified as an H3K4 monomethylase (7). There are more than 90 SET domain proteins in the human proteome, many of which remain poorly characterized, and more non-histone substrates may turn up through further studies. Recently, a group of SET and myeloid-Nervy-DEAF-1 (MYND) domain-containing proteins (SMYD) have been characterized as versatile lysine methyltransferases (8 -10). SMYD1 and SMYD3 were reported to methylate histone H3K4, and SMYD3 also methylates H4K5. SMYD2 had been shown to catalyze the dimethylation at H3K36 and repress the transcription of reporter genes. SMYD proteins also have non-histone substrates. It has been reported that p53 and estrogen receptor ␣ are good substrates of SMYD2, and VEGFR1 and MAP3K2 are substrates of SMYD3 (11)(12)(13)(14).
SMYD3 was found to be significantly overexpressed in multiple types of cancer, including colorectal, liver, and breast cancer (15)(16)(17)(18). It was proposed that the reported H3K4 trimethylase activity, enhanced by the help of heat shock protein HSP90, was responsible for the oncogenic property of SMYD3 (9). SMYD3 has also been shown to methylate histone H4K5 and regulate cancer cell phenotypes (19). VEGFR1, which is a single-pass membrane protein with an extracellular ligandbinding domain and a cytoplasmic tyrosine kinase domain (20 -23), has also been reported to be a substrate of SMYD3 (12). SMYD3 catalyzes the methylation on Lys 831 of VEGFR1 in vitro and in HEK293 cells. It is reported that this methylation enhanced autophosphorylation and kinase activity of VEGFR1, which is important for its role in cancer cell progression. A most recent study identified MAP3K2 as a cytoplasmic sub-strate of SMYD3. The study shows that in tumors or in LAC and PDAC cells, nearly all SMYD3 are located in the cytoplasm (13,24). In vivo studies show that SMYD3 catalyzes the methylation of MAP3K2 at Lys 260 , activates MAP kinase signaling module, and promotes Ras-driven tumorigenesis.
In this study, we set out to investigate the substrate preference of SMYD3 through structural and biochemical approaches. We showed by in vitro methyltransferase activity assays that histones and VEGFR1 are poor substrates of SMYD3 compared with MAP3K2. By solving the high resolution crystal structures of SMYD3 in complex with VEGFR1 and MAP3K2 peptides, we determined the structural basis for the substrate specificity of SMYD3, and our findings reinforce the notion that MAP3K2 is a physiological substrate of SMYD3. These results also provide novel insights for the design of SMYD3 inhibitors, which may be beneficial for the treatment of MAP3K2 and Rasdriven cancer.

Experimental Procedures
Protein Preparations-cDNA encoding full-length human SMYD3 was amplified by PCR and cloned into an engineered pET-28a-SMT3 vector between the EcoRI and XhoI restriction sites. The plasmid expresses a His 6 -SUMO N-terminally tagged fusion protein, His-SUMO-SMYD3. The fusion protein was expressed in the Escherichia coli BL21(DE3) codon Plus RIL strain by inducing with 0.2 mM isopropyl ␤-D-thiogalactopyranoside at 16°C for 20 h when the cell density reached A 600 ϳ0.9. The cells were then harvested by centrifugation, resuspended in a lysis buffer (20 mM Tris, pH 8.0, 300 mM NaCl, 10 mM imidazole), and lysed by sonication. Cell debris was removed by centrifugation, and the supernatant was loaded onto a nickelnitrilotriacetic acid column (Novagen) pre-equilibrated with lysis buffer. After washing with 10 column volumes of washing buffer (20 mM Tris-HCl, pH 8.0, 300 mM NaCl, 20 mM imidazole), the target protein was eluted with the elution buffer (20 mM Tris-HCl, 300 mM NaCl, 300 mM imidazole) and treated with SUMO protease at 4°C for 2 h to cleave the His-SUMO tag. Then the samples were diluted with buffer A (20 mM Tris-HCl, pH 8.0, 100 mM NaCl, 5% glycerol, 1‰ ␤-mercaptoethanol) and loaded onto a HiTrap TM SP HP column (GE Healthcare). The untagged protein was eluted with a 100 to 1,000 mM NaCl gradient and was pooled, concentrated, and further purified through a HiLoad TM 16/60 Superdex TM 75 column (GE Healthcare) pre-equilibrated with 20 mM Tris-HCl, pH 8.0, 100 mM NaCl, 5% glycerol, 1‰ ␤-mercaptoethanol. High purity fractions were pooled and concentrated for crystallization.
Site-directed mutagenesis of SMYD3 was performed using the TaKaRa MutanBEST kit and verified by DNA sequencing. Expression and purification procedures for mutant proteins were the same as that for the wild-type protein.
Crystallization-The purified SMYD3 protein was concentrated to ϳ8 -9 mg/ml, mixed with 1 mM SAH and 5-6 mM MAP3K2 peptide, and incubated on ice for 1 h before being used for crystal screening. Crystallization was performed using the hanging drop vapor diffusion method, with an equal volume of the protein solution and the reservoir solution. Crystals were obtained in a solution containing 3.4 M sodium acetate (pH 7.5) at 16°C. In case of for the SMYD3-VEGFR1 complex, SMYD3 was crystallized first in a solution containing 2.8 M sodium acetate (pH 7.0) at 20°C, and the VEGFR1 peptide was soaked in by the addition of 10 mM peptide.
Data Collection and Structure Determination-By adding 25 mM peptide to the crystallization drop, the SMYD3-MAP3K2 crystals were soaked in a higher concentration of MAP3K2 peptide for ϳ2 h before x-ray diffraction data collection. The crystals were flash-frozen in liquid nitrogen using a cryoprotectant prepared from the reservoir solution supplemented with 15% (v/v) glycerol. Diffraction data were collected at Beamline BL17U of the Shanghai Synchrotron Radiation Facility using an ADSC Q315r detector. The data were processed using HKL2000 software (25). Single-wavelength anomalous dispersion phasing using anomalous signals from three endogenous zinc atoms was carried out to solve the apo structure of SMYD3. The complex structures containing substrate peptide of MAP3K2 or VEGFR1 were solved by molecular replacement with the PHASER software (26), using the apo-SMYD3 structure as the search model. COOT (27) and PHENIX (28) were used for model rebuilding and refinement. Detailed statistics for data collection and refinement are shown in Table 1. Structure figures were prepared using PyMOL.
In Vitro Methyltransferase Assay-A 20-l reaction mixture containing 1 M 3 H-labeled S-adenosyl-L-methionine (PerkinElmer Life Science), 1.5 g of recombinant SMYD3, and 1 g of peptide in methyltransferase assay buffer (20 mM HEPES, pH 8.0, 10 mM KCl, 20 mM MgCl 2 , 10 mM DTT, 5% (v/v) glycerol) was incubated for 1 h at 30°C. The reaction products were separated by 13% SDS-PAGE, transferred onto PVDF membranes, and then subjected to autoradiography. For quantification, the membranes were stained by Coomassie Blue G250, followed by liquid scintillation counting for each peptide band.
Computational Docking of VEGFR1 Kinase Domain to the SMYD3 Structure-Comparison between the structures of SMYD1 (PDB code 3N71) and SMYD3 shows SMYD1 may have a more open conformation in the substrate-binding cleft. By homology modeling, an open conformation of SMYD3 was modeled using the structure of SMYD1 as a template. DynDom program (29) was used to simulate conformational rotate of the C-terminal domain of SMYD3. An extremely large widening of the substrate-binding cleft of SMYD3 is required if the structure of VEGFR1 kinase domain is docked in using ZDOCK program (30) in Discovery Studio (Accelrys).

Results
Substrate Preference of SMYD3-Previous studies suggested that SMYD3 may have several nuclear and cellular substrates, including histones H3 and H4, VEGFR1, and MAP3K2. We first set out to examine the substrate preference of SMYD3 in vitro. 26-amino acid peptides encompassing the reported lysine methylation sites were used in the in vitro methyltransferase assay (MTA), including histones H3 (a.a. 1-26, Lys 4 site) and H4 (a.a. 1-26, K5 site), VEGFR1 (a.a. 820 -845, Lys 831 site), and MAP3K2 (a.a. 249 -274, Lys 260 site). MTA results show that the catalytic activity of SMYD3 on histones H3K4 and H4K5 are too weak to be detected. In contrast, under the same condition, robust methylase activities of SMYD3 were detected with the MAP3K2 peptide as the substrate, although the activity for the VEGFR1 peptide was considerably lower compared with that for MAP3K2. The catalytic activity of SMYD3 on MAP3K2 peptide is ϳ14-fold higher than that on VEGFR1 peptide (Fig.  1). Furthermore, SMYD3 catalyzes methylation equally well on the unmethylated, monomethylated, and dimethylated MAP3K2 peptides (Fig. 1). These results demonstrate that SMYD3 is a trimethylase catalyzing mono-, di-, and trimethylation on lysine residues, but histones and VEGFR1 are poor substrates of SMYD3 compared with MAP3K2.
Structures of SMYD3 in Complex with MAP3K2 and VEGFR1 Peptides-We solved the crystal structure of full-length SMYD3 in complex with substrate peptides of MAP3K2 (a.a. 256 -265, EKFGK(me0)GGTYP) and VEGFR1 (a.a. 828 -834, KLGK(me2)SLG) in the presence and absence of S-adenosylhomocysteine (SAH), a reaction product of the methyl group donor SAM, at 2.7 and 2.4 Å resolution, respectively. The overall structure of the ternary complex of SMYD3 in complex with MAP3K2 and SAH is very similar to the previously reported apo form of SMYD3 (PDB code 3QWP) and to the binary complexes of SMYD3 with co-factors SAM (PDB code 3MEK) or SAH (PDB code 3OXL) (31) and with the small molecule inhibitor sinefungin (PDB codes 3PDN and 3RU0) (32, 33) (supplemental Fig. S1). In brief, SMYD3 is composed of three domains, an N-terminal catalytic domain (SET and post-SET), a MYND domain inserted into the SET domain, and a C-terminal domain ( Fig. 2A). A topological diagram of the arrangement of secondary structure elements of SMYD3 is shown in supplemental Fig. S2. Similar to that found in a previous study (31), three zinc ions are bound to one SMYD3 molecule; two of them are in the MYND domain, and the third one is in the post-SET region. The zinc ions are coordinated either by four cysteine residues or by three cysteine residues and one histidine residue. Co-factor SAH is bound at the SAM-binding pocket of SMYD3, which is composed of three loop regions, including the ␤1-␤2 loop and the 1-2 loop of the SET domain and the ␣6-␣7 loop of the post-SET region (supplemental Fig. S1B). The surrounding environment of the SAH molecule is highly similar to that of SAM (in 3MEK) and SAM analog Sinefungin (in 3PDN) (supplemental Fig. S1B).
The MAP3K2 and VEGFR1 peptides bind to the same amphiphilic pocket of SMYD3, which is a deep cavity embraced by the SET domain, the post-SET region, and the C-terminal domain (Fig. 2, A and B). All of the residues of MAP3K2 and VEGFR1 peptides have respectable electron density allowing unambiguous assignment of all residues except the first Glu 256 of MAP3K2 (supplemental Fig. S3). Substrate peptides principally interact with the SET domain residues of SMYD3 by intermolecular hydrogen bonds and hydrophobic interactions. Detailed interactions between MAP3K2 peptide, SAH molecule, and SMYD3 are shown in Fig. 2C. Lys 260 is stabilized through hydrophobic contacts with the phenyl rings of Phe 183 , Tyr 239 , and Tyr 257 . Distance between the N⑀ group of Lys 260 and sulfur atom of SAH is ϳ3.6 Å, which is suitable for the transfer of the methyl group when a SAM molecule, with the  The SAH molecule is stabilized both by astacking and by hydrogen bonds as previously described (31, 32) (Fig. 2C).
In the SMYD3-VEGFR1 binary structure, the peptide binds to the same pocket as that bound by the MAP3K2 peptide (Fig.  2, B and D). Seven residues of VEGFR1 are visible in the structure. The first four residues of VEGFR1 (Lys 828 -Lys 831 ) can be aligned with corresponding MAP3K2 residues (Lys 257 -Lys 260 ), whereas the last three residues point to different directions and are less involved in interaction with SMYD3 (Fig. 2D). VEGFR1 interacts with SMYD3 in a manner similar to the MAP3K2 peptide. The dimethylated Lys 831 residue of VEGFR1 is surrounded by hydrophobic residues of SMYD3 including Phe 183 , Tyr 239 , and Tyr 257 as well. Intermolecular hydrogen bonds are nearly identical in the two complex structures, except that in the structure of the VEGFR1 complex, the side chain conformations of Glu 192 and Asp 241 of SMYD3 are altered by interaction with the side chains of Lys 828 and Ser 832 of VEGFR1, respectively.
Structural Basis for Substrate Preference of SMYD3-Structural comparison between the MAP3K2 and VEGFR1 bound structures shows that the first four residues superimpose well, whereas the conformations of the remaining residues are more divergent (Fig. 2D). This observation suggests a common substrate-binding mechanism of SMYD3 involving the first four residues of the substrate peptides. Sequence alignment of the reported substrates of SMYD3, including MAP3K2, VEGFR1, and histones H3 and H4 shows that major differences lie at the Ϫ2 position and the ϩ3 and ϩ4 position with respect to the acceptor lysine of the substrates (Fig. 3A). The residues at the Ϫ2 position are hydrophobic in MAP3K2 (Phe 258 ) and VEGFR1 (Leu 829 ). By contrast, an arginine residue is found at the corresponding position of histones H3 and H4. As shown in the complex structures, Phe 258 of MAP3K2 and Leu 829 of VEGFR1 are inserted into a shallow hydrophobic pocket of SMYD3 formed by Val 178 , Ile 179 , and Val 195 (Figs. 2B and 3B). It is energetically unfavorable to place a positively charged arginine residue into the hydrophobic pocket. Furthermore, it is difficult to accommodate the long side chain of an arginine residue in the shallow pocket. Residues at the ϩ3 and ϩ4 positions are similar in VEGFR1 and H3K4 (Gly 834 -Arg 835 and Ala 7 -Arg 8 , respectively), but they differ greatly from those in MAP3K2 and H4K5 (Thr 263 -Tyr 264 and Lys 8 -Gly 9 , respectively). Thr 263 and Tyr 264 of MAP3K2 interact with SMYD3 mainly through their main chain amide groups, and Gly 834 of VEGFR1 does not interact with SMYD3 directly.
Next, we tested the roles of residues at the Ϫ2, ϩ3, and ϩ4 positions of the substrate by in vitro MTA using mutated MAP3K2 peptides, including Ϫ2 position mutants F258L (changing to a leucine residue as in VEGFR1) and F258R (mutating to an arginine residue as in histones H3 and H4), and a double mutant, T263G/Y264R (mutate to glycine and arginine residues as in VEGFR1), carrying changes at ϩ3 and ϩ4 positions. MTA results showed that the enzymatic activity of SMYD3 is greatly reduced when the F258L or F258R mutant peptides of MAP3K2 were used as substrates (Fig. 3C). The catalytic activity of SMYD3 toward the F258L peptide decreased ϳ6-fold compared with that with the wild-type peptide, whereas no methylation activity was detected when a F258R peptide was used as a substrate. In contrast, SMYD3 methylates the ϩ3 and ϩ4 position mutant peptide T263G/ Y264R at a level comparable with that with the wild-type MAP3K2 peptide. These results indicate that the phenylalanine residue at the Ϫ2 position of MAP3K2 is crucial for determining the substrate preference of SMYD3.
As previously noted, Phe 258 of MAP3K2 and Leu 829 of VEGFR1 are situated in a shallow hydrophobic pocket of SMYD3 formed by residues with hydrophobic side chains including Val 178 , Ile 179 , and Val 195 . However, two serine resi- dues, Ser 101 and Ser 182 , are located at the perimeter of this pocket as well (Fig. 3B). The side chain hydroxyl groups of the serine residues are hydrophilic and may not be optimal for the hydrophobic environment surrounding the phenyl ring of Phe 258 . To test our prediction that the hydrophobic environment of the shallow SMYD3 pocket is important for substrate binding, we changed the composition of this pocket in two opposite ways. First, we changed the existing hydrophobic residues to smaller, less hydrophobic residues and tested their enzymatic activities by the in vitro MTA assay using the MAP3K2 peptide as a substrate. The results show that the V195A mutant lost approximately half of its activity compared with the wild-type SMYD3, whereas the activity of the V195T mutant decreased ϳ5-fold (Fig. 3D). Similarly, an I179A mutation reduced the SMYD3 activity to a level comparable with that of the V195T mutant. Second, we changed the hydrophilic serines to more hydrophobic alanines. Our prediction implies that the serine to alanine mutants should be beneficial to the methylase activity of SMYD3. Reassuringly, we find that both the S101A and S182A mutants of SMYD3 showed ϳ2-fold higher activities than the wild-type enzyme (Fig. 3D). Therefore, we conclude that the implicated shallow hydrophobic pocket is a crucial determinant of substrate specificity of SMYD3.

Discussion
Confusing claims of various substrates for SMYD3 have been made, including histone H3 at lysine 4, histone H4 at lysine 5, VEGFR1 at lysine 831, and MAP3K2 at lysine 260 (9,12,13,19). Could they be all correct, only one correct, or some correct? The answer to this question is important for the interpretation of experimental data, but arriving at a definitive answer may be complicated because in vivo situations, such as interacting with other cellular factors, could alter the activity of SMYD3. Our approach to this question is to examine the intrinsic properties of SMYD3 using a combined structural and biochemical analysis, hoping to identify key elements in SMYD3 that would tell its substrate preference, and this information may be used in conjunction with in vivo analysis to aid reasonable interpretation and prediction of the biological functions of SMYD3. Judging by our structural and biochemical results, we conclude that lysine 260 of MAP3K2 is the most likely substrate of SMYD3, because of the presence of a shallow hydrophobic substratebinding pocket hosting the binding of the Ϫ2 residue of the substrate peptide.
Previous structural and biochemical studies have examined the important roles of residues in the SAM-binding pocket and the lysine-binding channel for the catalytic activities of SMYD3 using histone mixtures extracted from calf thymus as substrate. These residues include Phe 259 , Asn 132 , and Tyr 124 in the SAMbinding pocket and Tyr 239 and Phe 183 in the lysine-binding channel (31). These results agree well with our structural information derived from the ternary structure of SMYD3 in com- Ϫ2 residues of the substrates bind to a shallow hydrophobic pocket formed by two valine and one isoleucine residues, and two hydrophilic serine residues are located at the perimeter of the pocket. C, in vitro MTA results of SMYD3 activities on mutated MAP3K2 peptides. D, in vitro MTA results examine the methyltransferase activity of SMYD3 mutants. plex with MAP3K2 and SAH (Fig. 2C). In this study, we identified MAP3K2 as the preferred substrate of SMYD3 compared with VEGFR1 and histones H3 and H4. Our analysis shows that the identity of the amino acid at the Ϫ2 position of the substrate peptides is key for being an optimal substrate of SMYD3. The shallow hydrophobic pocket of SMYD3 that hosts the binding of the Ϫ2 position phenylalanine residue of MAP3K2 is critical for substrate binding and enzymatic activity. Optimal substrate-enzyme interaction through this shallow hydrophobic pocket largely accounts for the preference of SMYD3 for the reported substrates.
Obviously, natural features of the substrate other than the property of the Ϫ2 position residue, such as the folding of substrate protein, may pose additional hurdles for the catalytic activity of SMYD3. For example, Lys 831 is located in the kinase domain of VEGFR1 (23), residing in the middle of an outer ␤-strand of the N-terminal ␤-sheet domain (Fig. 4). Assuming that the kinase domain of VEGFR1 is not unfolded under a physiological setting, it is not possible to position Lys 831 next to the substrate-binding channel of SMYD3 without a large rearrangement of SMYD3 domains. Docking of a folded kinase domain of VEGFR1 with SMYD3 shows the requirement of a dramatic and unnatural widening (at least 30°is required) of the substrate-binding cleft formed by C-terminal domain and post-SET domain of SMYD3 (Fig. 4). In addition, the docked Lys 831 residue of VEGFR1 is ϳ90°away from the dimethylated Lys 831 residue of the co-crystallized VEGFR1 peptide, and the lysine is far away from the sulfur atom of the SAH molecule, which is next to the enzyme active site. In the case with histone H3, SMYD3 has a low methyltransferase activity on recombinant histone H3 to begin with and shows no detectable activities toward recombinant histone octamer, nucleosome, or oligonucleosome substrates (34) (supplemental Fig. S4A). For MAP3K2, Lys 260 is located in a long loop region between the N-terminal PB1 domain and C-terminal kinase domain, which leaves adequate flexibility for SMYD3 binding (supplemental Fig. S4B).
Taken together, our structural and biochemical analysis provided mechanistic understanding of the substrate preference of SMYD3, and the mechanistic insights should shed light on its biological functions. The structural information should also be useful for the development of SMYD3 inhibitors, which may serve as therapeutic agents against MAP3K2 and Ras-driven cancer.