A Novel Kleefstra Syndrome-associated Variant That Affects the Conserved TPLX Motif within the Ankyrin Repeat of EHMT1 Leads to Abnormal Protein Folding*

Kleefstra syndrome (KS) (Mendelian Inheritance in Man (MIM) no. 610253), also known as 9q34 deletion syndrome, is an autosomal dominant disorder caused by haploinsufficiency of euchromatic histone methyltransferase-1 (EHMT1). The clinical phenotype of KS includes moderate to severe intellectual disability with absent speech, hypotonia, brachycephaly, congenital heart defects, and dysmorphic facial features with hypertelorism, synophrys, macroglossia, protruding tongue, and prognathism. Only a few cases of de novo missense mutations in EHMT1 giving rise to KS have been described. However, some EHMT1 variants have been described in individuals presenting with autism spectrum disorder or mild intellectual disability, suggesting that the phenotypic spectrum resulting from EHMT1 alterations may be quite broad. In this report, we describe two unrelated patients with complex medical histories consistent with KS in whom next generation sequencing identified the same novel c.2426C>T (p.P809L) missense variant in EHMT1. To examine the functional significance of this novel variant, we performed molecular dynamics simulations of the wild type and p.P809L variant, which predicted that the latter would have a propensity to misfold, leading to abnormal histone mark binding. Recombinant EHMT1 p.P809L was also studied using far UV circular dichroism spectroscopy and intrinsic protein fluorescence. These functional studies confirmed the model-based hypotheses and provided evidence for protein misfolding and aberrant target recognition as the underlying pathogenic mechanism for this novel KS-associated variant. This is the first report to suggest that missense variants in EHMT1 that lead to protein misfolding and disrupted histone mark binding can lead to KS.

typic spectrum may be much broader than is currently appreciated (8,9).
The EHMT1 gene encodes a histone lysine methyltransferase (commonly known as G9a-like protein, GLP) that mono-and di-methylates lysine 9 of histone H3 (H3K9me1 and H3K9me2) together with its obligate binding partner, EHMT2 (known as G9a) (10). Both EHMT1 and EHMT2 have large N-terminal domains that include six ankyrin repeats that have been shown to confer binding specificity for H3K9me1 and H3K9me2, as a part of the chromatin reading function of both proteins (10). The EHMT1 ankryin repeat domain has 2-fold greater preference for H3K9me1 compared with EHMT2, whereas the opposite is true for H3K9me2, suggesting the differential binding conferred by each protein results in greater overall affinity for both substrates (10).
Pathogenic variants or deletions in EHMT1 resulting in haploinsufficiency lead to disease in patients. Studies in heterozygous Ehmt1 ϩ/Ϫ mice revealed that these animals exhibit reduced exploration and show altered social behavior with measurable deficits in spatial learning and memory compared with wild type animals (11). The Ehmt1 ϩ/Ϫ mice also have structural and functional postsynaptic defects, including significant reductions in spine density, mature spine number, and dendritic arborization, particularly within CA1 hippocampal neurons (12). Additional functional studies of synapses in the CA3-CA1 subfields revealed altered short term plasticity in these animals using the paired pulse facilitation assay (12). Subsequent studies have shown that EHMT1 regulates homeostatic plasticity through control of synaptic scaling (13). Specifically, EHMT1/2 appear to be involved in H3K9me2-mediated transcriptional repression of brain-derived neurotrophic factor during synaptic scaling up in mice, and loss of function mutations in humans are hypothesized to lead to improper neural circuit formation in patients with KS during early development (13). However, the mechanisms underlying abnormal function of pathogenic missense EHMT1 variants are not well understood.
In the current study, we provide novel insights into the molecular mechanisms associated with the development of KS, through the biochemical characterization of a new EHMT1 protein variant identified in unrelated patients affected by this disease within the setting of a precision medicine clinic. Specifically, we find the alteration of an evolutionarily conserved TPLX motif (p.P809L) within the ankyrin repeat affects the structure and dynamics of the protein, thereby impacting its ability to bind its histone mark substrate. Because TPLX repeats are present in other regions of the ankyrin repeat, these findings not only bear relevance for understanding the deleterious effects of the p.P809L variant but may help predict the pathogenicity of other variants affecting TPLX motifs throughout the protein. The results described herein therefore have both biochemical and biomedical relevance to the diagnosis and mechanistic characterization of KS associated variants.

Results
Clinical Description and Laboratory Evaluations-The novel EHMT1 variant characterized in this study, was discovered during clinical genetic testing in two unrelated patients that presented with clinical and phenotypic features consistent with KS. Briefly, patient 1 presented with global developmental delay, ASD, aphasia, distinctive craniofacial features, strabismus, an aberrant right subclavian artery, and atrial septal defect. The patient had a normal Canadian newborn screen, karyotype, oligonucleotide array comparative genomic hybridization (aCGH) and SNP (GeneDx) arrays, congenital disorders of glycosylation screen, and plasma/urine amino acids. In addition, MECP2 sequencing and multiplex ligation-dependent probe amplification (MLPA) testing, Angelman syndrome MS-MLPA, and Fragile X (FMR1 PCR and Southern blot) testing were all normal. The patient was referred to the Mayo Clinic (Rochester, MN) for additional testing. A purine and pyrimidine panel, creatine disorders panel, and Beckwith-Wiedemann syndrome/Russell-Silver syndrome molecular analysis were performed and found to be normal.
Similarly, patient 2 presented with significant intellectual disability, ASD, small size, and dysmorphic features including microbrachycephaly, hypertelorism, synophrys, and prognathia. The patient also had bilaterally increased asymmetric T2 and FLAIR signal in the periventricular and peritrigonal brain regions, with scattered white matter changes on MRI. Patient 2 was initially seen at Children's Hospital of Michigan, where several genetic studies including chromosomal microarray analysis (180K oligo-SNP array) and methylation PCR for Angelman syndrome/Prader-Willi syndrome were performed and found to be normal. Further clinical descriptions of both patients can be found in Table 1.
Because the differential diagnosis for neurodevelopmental disorders can be quite broad, extensive molecular evaluations were performed in both patients. Subsequently, clinical whole exome sequencing (Baylor College of Medicine Genetic Laboratory) in patient 1 and a comprehensive non-specific intellectual disability sequencing panel (University of Chicago) in patient 2 were pursued to determine a genetic diagnosis. Sequencing revealed several variants including an identical EHMT1 variant (Chr9(GRCh38): g.137790891CϾT, NM_024757.4(EHMT1): c.2426CϾT, NP_079033.4: p.P809L) in both individuals ( Table 2). The p.P809L EHMT1 variant was determined to be a de novo change in patient 1 ( Table 2). The inheritance status of this variant could not be determined in patient 2 because this family was lost to follow-up. The p.P809L variant was not observed in the Exome Aggregation Consortium or in the NHLBI GO Exome Sequencing Project databases (14). Similarly, this variant was also not observed in over 126,216 exomes and 15,137 genomes in the recently released Genome Aggregation Database (gnomAD) (14). According to American College of Medical Genetics and Genomics 2015 guidelines, this variant was classified as a variant of uncertain significance in both patients despite clinical and phenotypic evidence suggestive of pathogenicity (15), which prompted us to initiate the current study to search for potential mechanisms by which the function of this protein may be disrupted. Efforts of this type, which seek to provide a mechanistic insight compatible with pathogenicity for disease-associated variant of uncertain significance, are necessary for advancing the field of precision medicine and aid medical practitioners in the future diagnosis and management of genetic diseases.
Clinically, both patients have significant phenotypic overlap with what has been published for KS, including moderate to severe intellectual disability, ASD, global aphasia, and characteristic facial features. There is also variation in their clinical phenotype, including abnormal white matter signal on MRI in patient 2, which has been noted in other cases of KS but which was absent in patient 1 (16). In ϳ50% of individuals diagnosed with KS, a heart defect is present. Patient 1 had an atrial septal defect and aberrant right subclavian artery, whereas patient 2 did not have any evidence of a heart defect. Renal defects are also seen in ϳ30% of KS patients. Patient 2 had a left upper renal pole defect, but renal ultrasound in patient 1 revealed no defects. Together these studies suggest a possible role for the novel EHMT1 p.P809L variant identified in both patients in disease. Thus, subsequent studies aimed at determining the structural and biochemical impacts underlying the alteration in this protein were undertaken.
Structural Analyses Reveal That the p.P809L Variant Likely Alters the Biochemical Properties of EHMT1-The ankyrin repeat of the EHMT1 protein adopts a helix-loop-helix domain structure. This domain is known to fold cooperatively (17). Cooperative folding indicates that "nucleating" folding events increase the probability of further folding. In three dimensions, the protein domain is organized with an outer layer of longer helices and an inner layer of shorter helices, arranged in antiparallel fashion and followed by an outward facing loop region. Each repeat is comprised of one long and one short helix. These repeats pack against the helices of the adjacent repeat ( Fig. 1) and are stabilized by both hydrobophic interactions and hydrogen bonds (18). Studies of a large number of ankyrin repeats have shown that this domain contains a number of residues that are evolutionarily conserved in organisms ranging from Drosophila to humans ( Fig. 1) (19). It has been shown that the proline within the TPLX motif initiates a tight turn responsible for the helix-turn-helix conformation and contributes to conformational stability of the ankyrin repeat, which is stabilized by hydrogen bonding interactions with other residues within the helix and preceding loop. Thus, based on the conservation (Fig. 1) and contribution of the conserved proline to the proper folding and dynamics of the ankyrin domain, we hypothesized that the substitution of a TPLX repeat for TLLX, as in the p.P809L variant, alters either the structure and/or dynamics of the protein, thereby impacting its function. Consequently, we tested this hypothesis using molecular modeling and molecular dynamics (MD) simulations, as well as a combination of biophysical methods that compared the structural and dynamic properties of the WT EHMT1 and the p.P809L variant. Initially, we used a series of in silico prediction algo-

P809L variant in EHMT1 and correlation with previously reported phenotypes in Kleefstra syndrome
The symbols ϩ and Ϫ indicate that the phenotype is present or absent respectively. N/R indicates that the phenotype was not reported or not evaluated.

EHMT1 p.P809L Variant in Kleefstra Syndrome
rithms to determine the potential impact of the p.P809L variant. SIFT, PolyPhen-2, and MutationTaster2 predicted p.P809L to be tolerated, probably damaging, and diseasecausing, respectively (Table 2) (20 -22). These tools often have conflicting predictions and vary in accuracy and reliability, which prompted us to investigate this variant further using 3D structure-based analyses. The energy minimized structures were analyzed to better characterize alterations to EHMT1 induced by p.P809L. First, FoldX was used to evaluate the change in folding energy: ⌬⌬G fold ϭ 3.77 kcal/mol (23). Thus, FoldX predicts p.P809L to be highly destabilizing. Next, we quantified the changes in solvent-accessible surface area (SASA) using NACCESS (24). p.P809L leads to an increase in total SASA (150.2 Å 2 ) primarily via greater side chain non-polar SASA (95.7Å 2 ); backbone SASA was slightly lower (Ϫ18.2 Å 2 ) and polar side chain SASA greater (54.5 Å 2 ). After alignment with combinatorial extension, the energy minimized structures differed from one another by 3.0 Å root mean square deviation (RMSD) with the largest deviations at the N and C termini and the histone binding loops (25). Finally, we characterized changes in local bonding patterns associated with p.P809L. These changes included a gain of additional hydrogen bonds between p.T801 and p.H778 and between p.D800 and p.A830. Multiple changes (losses and gains) in hydrogen bonding within the loop from p.N798 to p.T801, as well as within the first and third helices, were observed. Together, these data indicate that, at least at a static level, the p.P809L substitution leads to structural and energetic shifts that differ significantly from the WT protein.
Molecular Simulations Suggest p.P809L Increases Misfolding Propensity-To gain insight into the impact of the p.P809L substitution, we performed MD simulations and compared the time-dependent behavior of the WT protein to p.P809L EHMT1. Room temperature (300 K) simulations were analyzed using RMSD and root mean square fluctuation (RMSF) (Fig. 2). RMSD indicates the magnitude of conformational change, whereas RMSF provides an indication of the flexibility of the protein, during the simulation. These measures indicated that p.P809L adopts different conformations than the WT protein and that the second repeat of the ankyrin domain is more mobile as a result of this change. Simulations in the presence of a docked histone tail peptide show that the mutated ankyrin repeat has a conformation more comparable with WT in the presence of substrate (substrateinduced stabilization). Therefore, p.P809L is associated with altered mobility primarily of the first two ankyrin repeats.
We next performed MD simulations at higher temperature (360 K) that better simulate experimental conditions often used to understand how mutations may impact protein folding. Indeed, this simulation was compared with subsequent experimental denaturation-renaturation data generated at high temperature. These simulations show that the changes in RMSD become even more pronounced at this temperature (Fig. 3). This behavior further indicates that the p.P809L variant likely interferes with protein folding. A comparison of the final adopted conformations in the MD simulations demonstrates a significant distortion of the N-terminal domain of the p.P809L variant as compared with the WT protein (Fig.  4). Thus, molecular simulations indicate a loss of local structural stability.
Histone tail peptide-bound simulations were performed to analyze hydrogen bonding patterns over time. This analysis was focused on interpeptide hydrogen bonds to more directly indicate how substrate binding may be affected by the variant. We consistently identified residue pairs that gained and lost hydrogen bonding contacts across replicates and over time (Fig. 5). These hydrogen-bonding patterns are known to stabilize the TPLX repeat. Overall, the first two and the final repeat of the ankyrin domain displayed similar conformational changes that were consistently observed across our triplicate simulations. Therefore, we conclude that the p.P809L variant in the critical TPLX motif impacts bonds that stabilize the typical helix-loophelix structure of the N-terminal ankyrin repeat domain of EHMT1 and affect the dynamics of the protein. We performed principal component analysis across replicates for each condition to identify and quantify the dominant features within each set of trajectories (Fig. 6). The first PC was a highly dominant motion across all three conditions, representing 37-56% of the total variance sampled. This first PC strongly separated WT from p.P809L simulations. Projecting the first PC's motion onto the initial structure, we identified that in all three conditions, there was correlated motion between the terminal ankyrin repeats that separated WT and p.P809L simulations. When the histone tail peptide was bound, the WT structure tended to maintain contact or establish closer contact with the substrate, whereas p.P809L conformations became more linear. The opposite occurred for apo (no histone tail peptide) simulations at both 300 and 360 K. In summary, our simulation results indicate that the p.P809L variant may increase the misfolding propensity within the N-terminal domain of the protein. Thus, subsequent experiments aimed at testing the folding rates of both proteins using denaturation-renaturation were performed to confirm the results of the molecular dynamic simulations. Our model of EHMT1 is shown with repeats colored alternating between red and orange. Each repeat has an outer and an inner helix. The inner helix always begins with a proline or cysteine and is followed by a conserved TPLX motif. The proline ␣ carbons from these motifs are marked with a sphere, and the sequence of each motif is shown. The p.P809L falls within the third helix and begins the motif within the second ankyrin repeat. To indicate the conservation of TPLX motifs, a section from a multiple sequence alignment of EHMT1 orthologs is shown for each; the species order is shown once for brevity; alignment and corresponding coloring were performed using Clustal Omega (33).
Biophysical Methods Provide Experimental Validation of the Impact of the p.P809L Substitution on the Folding of the EHMT1 Protein-Circular dichroism spectra (Fig. 7, left panel) were recorded to study the effect of the p.P809L mutation on the secondary structure of EHMT1 and confirm the in silico predictions that the p.P809L variant causes abnormal protein folding. The far UV CD spectrum of EHMT1 is dominated by the ␣-helical content of the protein showing two characteristic minima at 222 and 208 nm. Comparatively, the spectrum of p.P809L EHMT1 shows a reduced ␣-helical content, which is confirmed by the calculation of secondary structure contributions (Fig. 7, right panel). Therefore, in agreement with our molecular simulations, these results demonstrate that the p.P809L mutation leads to an abnormal protein folding behavior of the EHMT1 ankyrin domain. We sought further validation of these results by generating fluorescence emission spectra (Fig. 8) of the WT EHMT1 and the p.P809L variant, at a protein concentration of 1 M using excitation wavelengths of 280 and 295 nm for the selective excitation of Trp residues, respectively. The wild type protein has a max of ϳ350 nm, whereas the p.P809L mutation causes a blue shift of the spectrum to 347 nm and a decrease of the fluorescence emission, which is indicative of an altered tertiary structure. Lastly, because the ankyrin repeat domain of the EHMT1 is critical for binding to methylated histones to regulate gene expression, we also studied the interaction of EHMT1 and the p.P809L mutation with its dimethylated target, the histone H3K9 peptide, by monitoring the Trp fluorescence emission at 350 nm after excitation at 295 nm (Fig. 9, left panel). The addition of the H3K9me2 peptide causes a quenching of the fluorescence emission indicating an interaction of the peptide with both proteins. Analysis of binding curves shows an apparent affinity of 0.76 M for WT EHMT1 and 0.48 M for the p.P809L mutant (Fig. 9,  right panel). Thus, the p.P809L mutant protein not only displayed loss of secondary structure (unfolding) but also changes in histone reading affinity. In conclusion, molecular modeling   is located at the top of the middle helix of the first ankyrin repeat. Its mutation to leucine results in a shift of the neighboring helix and following loop. These differences propagate through the structure leading to the allosteric conformational change at the C terminus. These changes in the protein are congruent with the previously described cooperative effects of the helix-loop-helix domains for the folding and dynamics of the ankyrin repeat, suggesting that the p.P809L mutation may alter the folding of the protein.
and molecular dynamics simulations, when combined with experimental biophysical assays, clearly demonstrate that the p.P809L mutation causes an alteration in the folding and histone reading function of EHMT1.

Examination of Non-synonymous Human Population Level Variation within the Ankyrin Repeat Domain of EHMT1 Reveals Greater Amino Acid Conservation within the N-terminal Region
Where the p.P809L Variant Falls-By overlaying the frequency of non-synonymous single nucleotide variants within gnomAD on top of the ankyrin repeat domain structure using a color-coded log scale, we are able to show that the C-terminal side of the domain is more polymorphic (Fig. 10). The N-terminal side appears to be more intolerant to non-synonymous variation, which provides further support for the idea that variants located in this region, like the p.P809L reported here, likely affect the structure and/or function of this protein. The sites with the highest variant frequency were found within the loops connecting helices within repeats or were on the solvent exposed "back" side of the protein, suggesting that variation within these regions is more likely to be tolerated. We then looked at additional missense variants reported in ClinVar and show the location of these seven variants using blue spheres in FIGURE 5. Altered interactions between EHMT1 and its target. We show the WT (A) and p.P809L (B) with residues within and nearby the target peptide shown in detail. Hydrogen bonds are represented as dashed yellow lines. Throughout the simulations, the WT and p.P809L demonstrate differences in hydrogen bonding as visible in A and B and quantified in C where the probability of each hydrogen bond interaction is shown. Two pairs are present across WT replicates and lost for p.P809L, whereas two pairs are not present in the WT, but are common in p.P809L (boxed). The abscissa indicates the residue numbers of interacting residues with the donor residue listed first; residues 7-13 are from the histone tail peptide. FIGURE 6. A single collective motion dominates the differences induced by p. P809L. We perform PC analysis on the combined trajectories of three different initial conditions with three replicates for each. A, C, and E, each frame from each trajectory is shown as a point in PC space. B, D, and F, the motion indicated by the first PC is shown projected onto the initial WT conformation. The strong anti-correlation between the N and C termini is evident. Because the first PC separates WT from P809L, we colored the motion vectors such that the direction indicates which sequence context (WT, blue; p.P809L, orange) is represented. A and B, simulations at 300 K with histone tail peptide bound exhibit a lower degree of motion within the central repeat units, a propensity for the WT to close in around the peptide binding site, and for the p.P809L variant to favor a more extended conformation. C and D, simulations at 300 K without histone tail peptide demonstrate an opposite trend that is consistent across replicates. E and F, simulations at 360 K without histone tail peptide are in close agreement with simulations run at 300 K.

EHMT1 p.P809L Variant in Kleefstra Syndrome
an attempt to infer the effect of these variants (Fig. 10). Only one is within a TPLX motif (p.P809L). Two of these variants, which have a higher frequency of occurrence, are within the same loop in the middle of the domain. Three of the variants analyzed are packed between helices and could affect stability, though they are mostly conservative amino acid substitutions. The last one is an Ala to Thr at the end of a histone-facing loop (p.A947T) that does not interact with the histone tail peptide in our models, but it could interact when the peptide is analyzed within the context of the full histone protein assembled into a nucleosome. Thus, this analysis provides insights into potential deleterious effects of N-terminal substitutions within EHMT1, setting the stage for future studies focused on determining the role of these variants in disease.

Discussion
The current study integrates clinical genetic testing, molecular modeling, molecular dynamic simulations, and biochemical studies of protein folding and protein-protein interactions to advance our understanding of the biochemical mechanisms underlying functional alterations in the EHMT1 protein causing KS. Next generation sequencing was used to identify a novel missense variant in EHMT1 (Chr9(GRCh38): g.137790891CϾT, NM_024757.4(EHMT1): c.2426CϾT, NP_079033.4: p.P809L) in two unrelated patients with KS. We find that p.P809L falls within an evolutionarily conserved TPLX repeat motif of the ankyrin repeat found in many proteins (19). Interestingly, p.P809 falls immediately N-terminal to the beginning of an ␣ helix and stabilizes this secondary structure component of the protein by providing a distinct and necessary angle. This stabilizing aspect is quite different from the destabilizing impact a proline variant often has in an ␣-helix. Thus, the conservation and topology of the p.P809 within the EHMT1 by itself suggest a leucine substitution would be functionally deleterious to the protein. Given that the Pro to Leu substitution would alter the peptide angle after the adjoining helix and that the helical repeats are cooperative (17), it was likely that this change affects the proper folding and dynamics of EHMT1. Indeed, our molecular modeling and MD simulations support this hypothesis. Molecular dynamic simulations show altered RMSF values for the protein structure when performed at 300 K (26.85°C, room temperature), with enhanced effects at 360 K (86.85°C). Moreover, superimposition of the p.P809L with the WT protein structure at the end of the MD simulations clearly illustrate that the N-terminal region of the ankyrin repeat is highly distorted. More importantly, our in silico predictions are supported by experimental studies using the WT and p.P809L EHMT1 variant ankyrin domains expressed from Escherichia coli. Far UV circular dichroism spectroscopy and intrinsic protein fluorescence were used to evaluate the effect of the p.P809L variant on the structure and the function of the protein. These experiments clearly demonstrate a decrease in the ␣-helical content in the p.P809L variant when compared with WT. Thus, together, in silico studies using stringent molecular mechanics and dynamics algorithms combined with biophysical measurements demonstrate that a TPLX to TLLX substitution alters the structure and dynamics of the protein. Interestingly, the existence of additional TPLX motifs within the ankyrin repeat of the EHMT1 protein suggest that similar alterations in other TPLX domains could also result in disrupted protein folding. These results increase our understanding of biochemical mechanisms underlying disease-causing missense variants observed in EHMT1 that fall outside of the catalytic Su(var)3-9, E(z), and Trithorax (SET)domain.    MARCH 3, 2017 • VOLUME 292 • NUMBER 9

EHMT1 p.P809L Variant in Kleefstra Syndrome
Ankyrin repeats are critical to the ability of EHMT1 to read the H3K9me1 and 2 histone marks. Thus, there was a strong likelihood that the predicted and observed structural and dynamic alterations caused by the p.P809L variant may also alter this histone reading function. MD simulations demonstrate that p.P809L alters the time-dependent interactions of the EHMT1 reader domain with its corresponding histone substrate. We confirmed this effect in Trp-fluorescence quenching experiments using recombinant EHMT1 protein in the presence of the H3K9me2 peptide. These experiments showed altered substrate binding, characterized by a decreased K d in the p.P809L variant compared with the WT protein. Ultimately, these changes in the interaction of the p.P809L variant with the histone tail peptide reflect alterations in the association and/or dissociation rates of these proteins caused by the structural and dynamic alterations observed in the in silico and experimental studies.
Nevertheless, our study is the first to describe how mutations in any of the TPLX motifs of the EHMT1 ankyrin domain affect protein folding to induce a moderate change in its affinity for the H3K9me2 peptide. Thus, it becomes important to compare this information to other, better characterized mutations in ankyrin repeats that are known to be pathogenic in humans. Notably, an equivalent p.P81L mutation in the TPLX motif within the ankyrin repeat of the tumor suppressor p16, which causes familial melanoma, also leads to secondary structure alterations (26,27). Thus, this type of mutation (TPLX to TLLX) is not only destabilizing for EHMT1 but also for other proteins, further supporting the validity of our observations. Notably, however, it is not possible to compare how this change affects the affinity of p16 for its targets because no K D values have been published for the p.P81L p16 variant. However, it is known that p.P81L p16 displays impaired interaction with CDK4 when measured by a non-quantitative two-hybrid GAL4-based reporter assay (26). Based on these results, we conclude that impaired folding and changes in affinity are likely a shared mechanism underlying the functional alterations of conserved TPXL motifs in the ankyrin repeats of both proteins. These two lines of evidence, together with the fact that TPXL motifs are highly conserved in ankyrin repeats across evolution, raises the possibility that other proteins possessing similar mutations may also be pathogenic. Thus, this knowledge may guide future investigators in the identification and functional characterization of pathogenic variants, which may share a similar mutational event.
EHMT1 is found in a trimeric complex with EHMT2 (G9a) and WIS, which together maintain epigenetic states through human development and also maintain homeostasis in several organs, including the nervous system (28). Studies have been performed in a variety of organisms, from Drosophila melanogaster to human cells, which demonstrate that this complex represses gene expression through its ability to read and write the H3K9me2 histone mark (28). Both EHMT1 and EHMT2 bind to the histone code reader proteins, HP1␣, HP1␤, and HP1␥. These readers recruit the histone methyltransferase, SUVARH1, which converts H3K9me2 into H3K9me3, leading to further transcriptional repression by inducing the formation of heterochromatin via complex formation with DNA methyl-transferases (29). Our studies indicate that the p.P809L variant alters the structure, dynamics, and histone binding properties of the EHMT1 protein, suggesting that a sustained alteration in the regulation of gene expression in conserved developmental pathways caused by misfolding and defective substrate binding could similarly give rise to KS in patients.
In conclusion, we have identified a novel p.P809L missense variant in two unrelated patients affected by KS. The p.P809L mutation occurs within a conserved TPLX motif that is part of the ankyrin repeat of the EHMT1 protein. Because of the conservation and topology of this substitution, we predicted it may cause alterations in the structure and function of the protein.
Through a combination of both in silico and experimental techniques, we reveal that the p.P809L variant disrupts the structure and dynamics of the protein in a manner that alters the interaction of this protein with its H3K9 histone tail substrate. Together, these results provide insights into the biochemical mechanisms underlying the function of disease-causing variants involved in the pathogenesis of KS.

Experimental Procedures
Molecular Identification of a Novel EHMT1 Kleefstra Syndrome-associated Variant-Patient 1 underwent whole exome sequencing (Baylor Miraca Genetics Laboratories) as previously described (30). Briefly, genomic DNA was extracted from the proband, fragmented by sonication, and then ligated to multiplexing paired end adapters (Illumina). The adapterligated DNA was then PCR-amplified using primers with sequencing barcodes. For exome capture, the precapture library was enriched by hybridizing to biotin-labeled VCRome 2.1 (Roche NimbleGen) in-solution exome probes at 47°C for 64 -72 h. To improve overall exome coverage, probes to 1800 Mendelian disease genes were also included in the capture. The postcapture DNA library was subjected to massively parallel sequencing on an Illumina HiSeq 2000 platform with 100-bp paired end reads. On average over 70% of reads aligned to target, more than 95% of target bases have greater than 20ϫ coverage, more than 85% of target bases have greater than 40ϫ coverage, and the overall mean coverage of target bases is greater than 100ϫ. The output data from Illumina HiSeq was converted to FastQ file by CASAVA 1.8 (Illumina) and mapped using the Burrows-Wheeler Alignment tool to the Genome Reference Consortium human genome build 37, human genome 19 (GRCh37/hg19). The variant calls were performed using Atlas-SNP and Atlas-indel developed by the Baylor College of Medicine Human Genome Sequencing Center. The variants were annotated using HGSC-SNP-anno and HGSCindel-anno (Baylor College of Medicine Human Genome Sequencing Center). Variants were then compared with reported mutations from the professional version of the Human Gene Mutation Database. Variants in this database with a minor allele frequency of less than 5% according to either the 1000 Genomes Project or the ESP5400 data of the National Heart, Lung, and Blood Institute GO Exome Sequencing Project were kept. Synonymous variants, intronic variants greater than 5 bp from exon boundaries, and common benign variants (minor allele frequency, greater than 1%) were excluded unless they were reported as pathogenic by Human Gene Mutation Database. The variants were interpreted according to American College of Medical Genetics and Genomics guidelines and patient phenotypes. Variants of interest were then confirmed by Sanger sequencing in both the proband and parental samples to determine inheritance. Patient 2 was tested using the Comprehensive Non-Syndromic Intellectual Disability 144 Gene Panel (University of Chicago). Briefly, genomic coordinates were identified for all target regions in 144 genes related to a collection of intellectual disability-associated conditions. A custom Agilent SureSelect capture kit (Agilent Technologies, Santa Clara, CA) was used to target the coding sequence plus 10-bp flanking intronic or UTR sequence in the target genes. Sequencing was performed using MiSeq technology with 150-bp paired end reads. Fastq files were aligned using the UCSC human genome build hg19 as a reference. Variants within exons and the 10-bp flanking intronic regions within the 144 target genes were identified and evaluated using a validated, custom bioinformatic pipeline and interpreted by a team of board-certified Ph.D. geneticists and genetic counselors. Gaps or regions of poor coverage in the next generation data set were filled by Sanger sequencing. All novel and likely pathogenic variants on this 144-gene panel were confirmed by Sanger sequencing in the proband. The sensitivity of this test is estimated to be greater than 99% for single base changes and for insertions and deletions of less than 20 bp.
Molecular Modeling and Molecular Dynamic Simulations Studies on the p.P809L EHMT1 Variant-Molecular models of wild type EHMT1 solved by X-ray crystallography (Protein Data Bank code 3B95) and the p.P809L variant generated using in silico mutagenesis, as previously described, were analyzed using MD simulation (31). Simulations of the EHMT1 variants were performed using the all-atom force field in CHARMm c36b2 at temperatures of 300 and 360 K (constant number of particles, volume, and temperature ensemble) (32). The molecule was first energy-minimized using a two-step protocol of steepest descent and conjugated gradients. The SHAKE procedure was used in all stages (31). A distance-dependent dielectrics implicit solvent model was used with a dielectric constant of 80 and a pH of 7.4. Trajectories were run for 10 ns. Independent duplicate trajectories were run for 2 ns to assess the stability and consistency of each simulation. System setup was performed in Discovery Studio (35). Multiple analyses were performed in the R programming language (36), leveraging the bio3d (37) package version 2.2.4. Molecular visualizations were generated using PyMOL (38) version 1.7.6 and VMD (39) version 1.9.2. Trajectories were compared using RMSD, RMSF, and principal component (PC) analysis in Cartesian space, using C ␣ atoms aligned to the initial WT conformation.
Generation, Purification, Circular Dichroism, and Fluorescence Emission Spectroscopy Analyses of the p.P809L EHMT1 Variant-For experimental purposes, we produced and purified an N-terminal His 6 -tagged recombinant form of WT and the p.P809L EHMT1 proteins using the pET vector system (Novagen). The plasmids were grown in DE3 BL21 bacteria cells overnight and induced with 0.5 mM isopropyl ␤-D-thiogalactopyranoside for 90 min at 32°C. The recombinant proteins were purified using the Thermo Scientific HisPur cobalt resin kit according to the manufacturer's instructions. Protein was dialyzed overnight and concentrated to a final concentration of 1 mg/ml. Spectra in the far UV range (200 -260 nm) for WT and p.P809L EHMT1 were recorded on an Aviv Biomedical Model 420SF circular dichroism spectrometer. The path length of the quartz cuvette was 1 mm, the stepwidth and bandwidth were 1 nm, and the integration time was 20 s. The spectra were corrected by the CD signal of the corresponding buffer and converted to molar ellipticity per amino acid residue (⍜ MRW ). Fluorescence measurements were performed at 20°C on a Horiba Jobin-Yvon Fluorolog 3 spectrofluorometer equipped with a Wavelength Electronics LF1-3751 temperature controller. Fluorescence emission spectra of both proteins were recorded between 310 and 440 nm after excitation at 280 or 295 nm using a protein concentration of 1 M in a 1-cm quartz cell. The titration of the dimethylated histone H3 Lys 9 peptide was performed by stepwise addition of the peptide (36.3 M) to a solution containing 1 M of WT or p.P809L EHMT1 under slight stirring. After equilibration for 2 min, the fluorescence signal was recorded and averaged for 20 s using an excitation wavelength of 295 nm for the selective excitation of Trp residues and an emission wavelength of 350 nm. Apparent affinities were determined after normalization to fraction bound using a logistic function: f ϭ a*x b /(c b ϩ x b ), where a is to the maximum asymptote, b is the Hill slope, and c is the midpoint of the curve that is reported as apparent affinity.