Rare-variant pathogenicity triage and inclusion of synonymous variants improves analysis of disease associations of orphan G protein–coupled receptors

The pace of deorphanization of G protein–coupled receptors (GPCRs) has slowed, and new approaches are required. Small molecule targeting of orphan GPCRs can potentially be of clinical benefit even if the endogenous receptor ligand has not been identified. Many GPCRs lack common variants that lead to reproducible genome-wide disease associations, and rare-variant approaches have emerged as a viable alternative to identify disease associations for such genes. Therefore, our goal was to prioritize orphan GPCRs by determining their associations with human diseases in a large clinical population. We used sequence kernel association tests to assess the disease associations of 85 orphan or understudied GPCRs in an unselected cohort of 51,289 individuals. Using rare loss-of-function variants, missense variants predicted to be pathogenic or likely pathogenic, and a subset of rare synonymous variants that cause large changes in local codon bias as independent data sets, we found strong, phenome-wide disease associations shared by two or more variant categories for 39% of the GPCRs. To validate the bioinformatics and sequence kernel association test analyses, we functionally characterized rare missense and synonymous variants of GPR39, a family A GPCR, revealing altered expression or Zn2+-mediated signaling for members of both variant classes. These results support the utility of rare variant analyses for identifying disease associations for GPCRs that lack impactful common variants. We highlight the importance of rare synonymous variants in human physiology and argue for their routine inclusion in any comprehensive analysis of genomic variants as potential causes of disease.

The superfamily of G protein-coupled receptors (GPCRs) 3 translates extracellular signals from light, metabolites, and hor-mones into cellular changes, which makes them the targets of a significant fraction of drugs currently on the market (1). Genome-wide association studies (GWASs) on common variants in GPCRs have begun to identify their contributions to various disease processes (2,3). However, many GPCRs lack functionally important common variants, and alternate strategies are needed to understand their roles in human physiology. Recently, sequence kernel association tests (SKATs) have been applied to rare variants to assess disease associations. Initial approaches binned all rare variants in a genomic region or gene (4,5), but current methods group the rare variants most likely to contribute to disease associations or aggregate variants based on domain or family structures (6,7).
Large-scale studies of orphan or understudied GPCRs to characterize natural genetic variation in the human population can provide insights into the biological function and/or potential causal contributions to disease processes (8). The Dis-covEHR collaboration represents a tremendous resource (9) because whole exome sequences can be linked to the electronic health record (EHR) in a deidentified manner. Here we used whole exome sequences and clinical information from 51,289 individuals in the DiscovEHR cohort. We identified sequence variants (common variants, mean allele frequency (MAF) Ն 1%; and rare variants, MAF Ͻ 1%) for 85 orphan or understudied GPCRs not amenable to GWAS studies. Rare variants were binned into functional classes: loss-of-function (LOF, truncation and frameshift) variants; missense variants with predicted pathogenicity; or rare synonymous variants with altered codon bias. We performed independent SKAT analyses on the various functional classes to determine their disease associations. Remarkably, for those GPCRs with sufficient numbers of individuals for statistical analyses, we found that the top disease associations were common to all available variant classes in the GPCR.
For initial in vitro assessment of the validity of the disease associations, we focused on GPR39. Among its particular advantages, GPR39 had sufficient numbers of variants in all three functional classes to be amenable to rare variant approaches. Second, GPR39 is a small family A GPCR, allowing rapid generation of mutants. Third, GPR39 is activated by extracellular Zn 2ϩ and is coupled to inositol phosphate production (10,11), permitting straightforward functional analyses. Finally, we have only a rudimentary understanding of the role(s) of GPR39 in human physiology, despite its broad expression and multiple cellular functions (12,13); thus, results may provide insights into the contribution(s) of GPR39 variation to disease. Our results demonstrate the validity of combined computational and functional approaches to provide important insights into the clinical contributions of orphan and understudied GPCRs, and focus attention on the importance of rare synonymous variants for identifying disease associations. The overall strategy can be readily applied to other classes of genes without clinically important common variants.

Rare variant-based disease associations of orphan or understudied GPCRs
We identified rare variants in whole exome sequences from 51,289 individuals. To predict their functional impact, variants were annotated and sorted into three classes in order of predicted severity, i.e. LOF (premature stop codon, loss of a start or stop codon, or disruption of a canonical splice site), missense variants (predicted to be likely pathogenic (pLP) or pathogenic (pP) by RMPath scoring, derived from application of multiple pathogenicity algorithms (see "Experimental procedures")), and synonymous variants with significantly different codon frequency relative to the reference codon (termed SYN_⌬CB, i.e. synonymous variants with altered codon bias).
Determining disease associations for genes having only lowfrequency variants requires binning of variants to increase statistical power. Binning methods have focused on gene or genetic region and have recently been expanded to incorporate regulatory regions and/or pathways by incorporating biological information from curated knowledge databases (14). For this study, we were specifically looking for clinically impactful rare coding variants that could be validated by functional studies of the relevant GPCRs. We binned variants according to the functional classes described above and used SKAT, focusing on the disease associations that were common to more than one class of rare variant. Fig. 1 illustrates the analysis pipeline for the entire set of 85 GPCRs. Of the 72 GPCRs with sufficient variants to allow at least 2 variant classes to be analyzed by SKAT, 28 yielded significant disease associations. Because this was an exploratory analysis, we used a significance cutoff of p Ͻ 0.05, i.e. p values Ͻ 6.2E-07 after Bonferonni correction for multiple testing. Table 1 shows the top disease associations for the 13 GPCRs that had sufficient variants in the LOF class. Supporting validity, associations between GPR37L1 and epilepsy (15) and LGR5 and alopecia (16,17) have been previously identified by other means. Attesting to the discovery potential, other disease associations were novel, including GPR161 with hyperhidrosis, LGR4 with Sicca syndrome, GPR153 with hyperpotassemia, GPR84 with anemia, and GPR39 with peripheral nerve damage, among others. Notably, the phenome-wide disease associations for the predicted LOF class showed congruence across the other variant classes for the subset of GPCRs having sufficient missense or SYN_⌬CB variants for independent analyses. Although truncation and frameshift variants are easily identified and likely to have the greatest functional impact, missense variants classed as pLP ϩ pP can also have a significant impact on GPCR function. For the subset of GPCRs without sufficient LOF variants, we found significant disease associations for the missense and SYN_⌬CB variants ( Table 1). Some of the associations validate previous studies, including GPR183 and disorders of liver (18) and GPR85 with acute myocardial infarction (19). Other associations were novel and represent potential targets for future study, including GPR132 and disturbances in sulfur-bearing amino acid metabolism, GPR176 with asthma, GPR12 with epilepsy, and GPRC5D with renal osteodystrophy. Of note, the SYN_⌬CB variant class had significant disease associations in common with LOF and/or missense variants and most consistently had sufficient variants to be included in the analysis. Parallel SKAT analyses of the three distinct classes of rare variants produced two important results. First, independent analysis of distinct classes of variants yielded concordant disease associations, increasing confidence in the results. Second, the most consistent source of phenome-wide disease associations was SYN_⌬CB. Altogether, these results provide a strong rationale for including all three functional variant classes in disease association analyses. Rare variants in 85 GPCRs were identified in the 51,289 DiscovEHR WES data set and sorted into LOF, missense, and synonymous categories. Missense variants were further triaged to identify those predicted to be pLP or pP by RMPath scoring and the subset of synonymous variants with large changes in codon usage bias (SYN_⌬CB; see "Experimental procedures"). GPCRs with at least 2 categories of variant having Ն7 distinct members were further analyzed (72 of 85 GPCRs). SKAT analysis was run independently on 123 sets of variants against 656 disease codes having Ն200 individuals with Ն3 independent encounters in the EHR.

GPR39 common and rare variants in the 51,289 WES DiscovEHR cohort
Disease associations provide a rich source of hypotheses regarding the potential contributions of GPCRs to human physiology. The first step in defining a causal role for these rare variants, however, is functional validation of the SKAT results.
Here we focused in particular on rare synonymous variants with altered codon frequency (SYN_⌬CB), whose functional consequences are not well-understood. Many of the GPCRs have no known agonist, and their dominant signaling pathways have not been characterized (Table 1). We therefore chose to focus on GPR39, the Zn 2ϩ receptor, which has been characterized in cellular and knockout mouse models and for which both agonist and dominant signaling pathways are known (12,13).
All common and rare variants in GPR39 were identified in 51,289 individuals in the DiscovEHR cohort (9) (Fig. 2). None of the GPR39 variants were found in the ClinVar database. Accordingly, all LOF, missense, and synonymous variants were analyzed and classified as described in Table 1. The common variants were used for Phenome-wide Association Studies (PheWAS) and included a missense variant (2:133174764, pA50V) predicted to be benign, a synonymous variant (2:133174999; pT128T), which changed a common to the rarest codon (acA (28%) to acG (12%), i.e. a 2.33-fold decrease in frequency) and a noncoding variant (2:133402607). Notably, no phenome-wide disease associations were detected (Fig. 3A).
The rare variants used for SKAT, i.e. LOF, missense (pLP ϩ pP), and SYN_⌬CB variants, were distributed throughout the coding region of GPR39 (Fig. 4). Examination of the 28 missense variants that were categorized as pLP and pP confirmed that the RMPath pipeline identified variants which are likely to impact GPR39 function (Fig. 4). The three pP variants (S78L, C108R, and R133G) included a cysteine that participates in a disulfide bond at the extracellular face of GPR39 (20). The pLP variants included one that contributes to the Zn 2ϩ -binding site (H19R) (21), two variants in a disulfide bond cysteine (C191G and C191W) (20), a residue in the highly conserved NPXXY motif common to family A GPCRs (N340S) (22), and a variant distal to a putative palmitoylation site in the proximal C terminus (R352Q). Other residues in the pLP category introduce charges or proline residues into transmembrane helices or extracellular loops (S49N, M87I, T122M, A167S, P179T, R193C, S222Y, T291I, A293P, R302Q, A307V, P310A, Y314R, Table 1 Orphan GPCRs with congruent disease associations for multiple classes of rare variants All orphan GPCRs with sufficient numbers of rare LOF, MISS, or SYN_⌬CB variants in at least two categories were analyzed by SKAT. Absence of results in a category (marked by ϫ) indicates an insufficient number of variants and/or individuals for well-powered SKAT. A Bonferroni correction for multiple testing (123 sets, 656 ICD9 codes) was used to establish an exploratory significance cutoff of 0.05 (adjusted p value threshold 6.2E-7). NOS, not otherwise specified.  Common variants with MAF Ն 1% were categorized as likely benign (MAF between 1 and 5%) and benign (MAF Ն 5%). All variants having MAF of Ͻ1% were sorted to synonymous, LOF (frameshift or premature stop codon), and missense. Missense variants were thus variants of unknown significance (VUS) and subjected to bioinformatics pathogenicity triage using RMPath scoring ("Experimental procedures") and classified as pB, pLB, pLP, or pP and the subset of SYN variants having a significant change in codon usage bias (SYN_⌬CB).

Disease associations of GPR39 rare variant classes
The initial SKAT screen for GPR39 disease associations compared top associations for the three independent classes of rare variants. To more fully explore the results for GPR39 and increase power to identify all significant disease associations, we combined all individuals heterozygous for rare LOF, pLP, and pP variants (termed LOF/MISS). Analysis was restricted to only those ICD9 codes with Ն200 individuals independent of genotype (23), and the results are plotted against ICD9 codes in Fig. 3B. SKAT analysis was also run on rare SYN_⌬CB variants, plotted in Fig. 3C. Both the LOF/MISS and SYN_⌬CB analyses had a significant number of phenome-wide associations in common. All results for both variant classes that were above the significance cutoff for multiple testing (dotted lines in Fig. 3, B and C) are listed in Table 2. Top associations for GPR39 included lesion of the ulnar nerve and benign prostate hyperplasia with lower urinary tract obstruction. Note that individuals who had both rare LOF/MISS and SYN_⌬CB variants were excluded from analyses, because the sequence data are not phased, and we therefore cannot distinguish between 1 allele having both variants versus a variant on each allele. Also of note in Table 2, some significant disease associations were specific to either the LOF/MISS or SYN_⌬CB variant classes. To assess the importance of rare variant pathogenicity triage for binning prior to SKAT analysis, we combined all rare (MAF Ͻ 1%) variants (LOF ϩ missense ϩ synonymous ϭ 168 variants) and ran SKAT. The only significant disease association was upper quadrant abdominal pain (ICD9 789.02; p value 8.98E-06; significance were run for the indicated GPR39 common variants. No association results exceeded the minimal p value threshold of 0.05, uncorrected for multiple testing. B, rare LOF plus MISS variants were combined (39 total variants), and disease associations were determined by SKAT. Top phenome-wide associations are labeled. C, disease associations of rare SYN_⌬CB variants (21 total variants) were determined by SKAT; top phenome-wide associations are labeled. Full details of associations common to both variant classes (B and C) are presented in Table 2. For both B and C, the significance threshold was set at 0.05 (adjusted p value 3.8E-05, 2 sets, 656 codes) and is indicated by a dotted line. The GPR39 sequence was color-coded to indicate reference codon usage frequency. For all amino acids having multiple potential codons, red indicates the most common codon, with progression from common to rare colored as gradations to light pink (rare). Asterisks indicate the locations of rare synonymous variants exhibiting the largest changes in codon bias, either from common to rare or rare to common. indicates rare LOF variants (truncations or frameshifts), and @ indicates locations of rare missense variants combined for SKAT analysis.

Missense and synonymous variants alter GPCR function
cutoff at p Ͻ 7.6E-05). We also ran SKAT on those rare missense variants predicted to be benign (Fig. 2, pB; 61 variants); no statistically significant disease associations were obtained (data not shown). Likewise, we ran SKAT analysis on all rare synonymous variants and found top associations similar to those obtained for SYN_⌬CB but with less significant p values, suggesting dilution of impactful variants, i.e. lesion of ulnar nerve (p value 1.08E-07), and benign prostate hyperplasia with urinary tract obstruction and other lower urinary symptoms (p value 3.79E-05) (compare Tables 2 and  3). We conclude that pathogenicity triage to refine rare variant binning was crucial to identifying disease associations for the LOF and missense variants, and binning of the subset of rare synonymous variants having large changes in local codon frequency bias strengthened disease associations. Further, refining the rare variant input into the SKAT analyses yielded disease associations common to multiple independent classes of variants, strengthening confidence in the associations.

Validation of the functional impact of rare GPR39 variants
Point mutations can cause changes in GPR39 function by altering protein expression and/or by altering essential features of receptor activation. Therefore, to explore the functional basis for the observed disease associations, we generated the rare missense and SYN_⌬CB variants in the reference human GPR39 background. The cDNA construct included an N-terminal 3ϫ FLAG epitope followed by a minimal bungarotoxinbinding site, for ease of expression analysis and subcellular localization, respectively. Expression in HEK293 cells of equivalent amounts of cDNA and subsequent Western blotting of lysates showed significant differences in net expression levels of the rare missense and SYN_⌬CB variants. Fourteen missense variants had reduced and one had significantly increased expression relative to WT GPR39 (Fig. 5, A and C). Rare missense variants with altered expression were distributed throughout GPR39 structural domains as color-coded in the quantitation of multiple blots (Fig. 5C). Likewise, SYN_⌬CB variant expression was significantly greater than WT GPR39 for three variants, whereas expression of fourteen variants was significantly less that WT (Fig. 5B), as quantified in Fig. 5D. Surprisingly, despite the change from rare to common codon frequency for the majority of SYN_⌬CB variants, only four showed increased expression relative to WT. Four variants went from common to rare codons, denoted by asterisks on the x axis in Fig. 5D. Contrary to expectations, Gln-265 was expressed at significantly higher levels than WT (ϳ150%), despite the large decrease in codon frequency. Overall, results argue that factors in addition to site-specific codon usage bias determine net expression levels for rare synonymous variants.
GPR39 signaling is complex, with multiple signaling outputs. However, in both endogenous and heterologous expression systems, GPR39 activates inositol trisphosphate (IP 3 )-dependent release of intracellular Ca 2ϩ (10,11,24). The first step in the signaling pathway is G q -mediated activation of phospholipase C, which produces diacylglycerol and IP 3 . Here we exposed HEK293 cells transiently expressing WT or variant GPR39 to ZnCl 2 and used a FRET assay to quantify accumulation of inositol monophosphate (IP 1 ) in the presence of LiCl. Fig. 6A illustrates normalized maximal levels of IP 1 for WT and representative missense variants having decreased (R386C) or increased (W314R) EC 50 for activation by ZnCl 2 . Fig. 6C plots the EC 50 values and 95% confidence intervals (CIs) for WT and all other missense variants, color-coded for their locations in GPR39 domains. Three variants, S78L, T122M, and R133G, had no measurable activity, likely accounted for by low levels of expression (recall Fig. 5, A and C), whereas M87I was well-   (Fig. 6, E and H), and generally correlated with the extrapolated V max values for the IP 1 assays (Fig. 6, D and G). The data were corrected for the very low levels of unspecific BgTx-Alexa 594 binding to untransfected HEK293 cells (6.03 Ϯ 1.8% of WT (n ϭ 6 independent experiments)). Total (Western blotting) and plasma membrane localization (BgTx binding) were comparable for many of the missense variants (H19R, A167S, S222Y, A293P, R302Q, P310A, R320W, R362C, and R386C), but total expression was greater than plasma membrane localization for five variants (M87I, C191W, R193C, A307V, and S336R), suggesting potential alterations in trafficking, whereas the remaining 12 variants had reduced total expression relative to plasma

Missense and synonymous variants alter GPCR function
membrane localization, suggesting a potential increase in quality control-mediated degradation at the endoplasmic reticulum. SYN_⌬CB plasma membrane localization varied more widely relative to WT, with a subset of variants having significantly elevated plasma membrane localization (Fig. 6H), which was not a reflection of increased expression assessed by Western blotting (Fig. 5D). Six variants had proportional total and plasma membrane localization (Ser-78, Ala-321, Ser-248, Leu-363, Pro-391, and Ala-394), whereas three had higher total expression versus plasma membrane localization (Ser-147, Gln-265, and Gly-4370), and the remainder had reduced total expression but elevated plasma membrane targeting. The most striking SYN_⌬CB variant, Gln-265, had an increased EC 50 value relative to WT (Fig. 6F), increased V max , and surface local-ization relative to WT (Fig. 6, G and H), as well as increased total expression (Fig. 5D). Comparing total and plasma membrane expression of rare missense and SYN_⌬CB variants thus identified a subset of high-value variants that likely have significantly altered biosynthesis and/or trafficking, which could be confirmed by mechanistic studies.
Overall, the results demonstrate a striking parallel in functional impact on expression and IP 1 signaling for a subset of both rare missense and SYN_⌬CB variants of GPR39, rationalizing the disease associations identified for both classes of variants. The congruence of top disease associations between rare LOF, pathogenic missense, and SYN_⌬CB variants for a significant subset of GPCRs (recall Table 1) argues strongly for the utility of these analytical approaches for identification of functionally impactful rare variants and the  A), plotted as means Ϯ 95% CI for n ϭ 3 independent experiments. The dotted lines indicate the 95% CI range for WT. D, extrapolated V max of Michaelis-Menten fits to IP 1 dose-response relations of rare missense variants, plotted as mean of three independent experiments plus 95% CI. The dotted lines indicate the 95% CI range for WT. E, bungarotoxin binding was used to assess surface localization of rare missense variants, plotted as mean plus all points, normalized to WT (dotted line), and corrected for unspecific binding in untransfected HEK293 cells. F, EC 50 values of all rare SYN_⌬CB variants with measurable activity, analyzed and plotted as in C. G, extrapolated V max of fits to IP1 dose-response relations for rare SYN_⌬CB variants, analyzed, and plotted as in D. H, bungarotoxin binding to assess surface localization of rare SYN_⌬CB variants, analyzed, and plotted as in E. For all plots. *, p Ͻ 0.05; **, p Ͻ 0.01; ***, p Ͻ 0.001; ****, p Ͻ 0.0001, by two-tailed t test relative to WT, assuming unequal variances.
imperative to include rare synonymous variants in genomic and functional analyses.

Discussion
The related GWAS and PheWAS approaches, which utilize common variants, have produced strong, validated targets for drug development (2). However, many genes do not have functionally important common variants, and additional approaches are required. The advantages of rare coding variant approaches are 2-fold. First, rare variants can have stronger effects on protein function, and second, assessment of the impact of rare coding variants on protein expression and function is straightforward. However, because of their low frequency, it is necessary to combine rare coding variants to increase the statistical power for detecting disease associations. In this study, we used pathogenicity triage as a basis for rare variant binning, based on the hypothesis that aggregating variants with similar directions of effect on protein function would amplify disease associations. We considered, in order of the strength of their potential effects, predicted LOF variants, predicted likely pathogenic or pathogenic missense variants, and synonymous variants with altered codon frequency bias. As a test of the importance of rare variant triage on disease associations, we compared agnostic binning of all rare variants in GPR39 with the disease associations identified for the three sorted variant classes. Of note, binning of all rare variants yielded no significant disease associations, whereas curated classes of those same rare variants had concordant top phenome-wide disease associations, obtained with nonoverlapping individuals. Likewise, SKAT analysis of the subset of rare missense variants predicted to be benign yielded no significant disease associations. The results argue strongly that binning all rare variants in a gene or genetic region dilutes the effects of impactful variants, and appropriate pathogenicity triage of rare variants is an integral first step in successful application of disease association analyses.
Over the past 10 years, rare synonymous variants have been shown to impact protein expression and function with a frequency comparable with nonsynonymous variants, although the mechanism(s) by which they exert their biological function are diverse (25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36). Rare synonymous variants have not, however, been systematically utilized for disease association analysis (25,26,31,37). Here we demonstrate that 39% of understudied or orphan GPCRs have impactful rare synonymous variants that yield disease associations comparable with LOF or missense variants in the same gene, which may provide impetus for exploratory drug development despite the lack of identified endogenous ligands. Our data also highlight the necessity for improved approaches for pathogenicity triage of rare synonymous variants. For GPR39, SKAT analysis on all rare synonymous variants for the most part yielded disease associations similar to those obtained with the SYN_⌬CB subset but have reduced significance, arguing for dilution of impactful variants. However, a few disease associations, including psoriatic arthropathy, showed more significant p values in the combined synonymous variant analysis (3.87E-11) versus SYN_⌬CB (1.03E-06), implying that simply isolating those synonymous variants having altered local codon usage bias missed some variants with significant functional impacts (see Tables 2 and  3). Using both LOF/missense and synonymous variants as independent data sets in the SKAT analyses increases confidence in the identified disease associations and provided the impetus for further functional studies.
A disease association does not prove causation but implies an impact on protein expression and/or function that must be validated experimentally. As proof of concept, GPR39 was chosen because its agonist and dominant signaling pathway were known, and rare variant classes produced strongly congruent disease associations. We identified 168 rare variants in GPR39 in the 51,289 WES DiscovEHR cohort. Most individuals were heterozygous for the rare variants studied, and only 1 of 327 individuals had more than one rare variant. Of the 110 missense variants, the RMPath pipeline predicted that 28 were likely to be pathogenic (pLP ϩ pP). Among these, rare variants previously shown to be important for GPR39 function were identified, including the zinc-binding site (21), a critical extracellular disulfide bond (20), the pH sensor (11), and the NPXXY motif common to all family A GPCRs (22). When expressed in HEK293 cells, these variants had the expected effects on EC 50 values and maximal activities. Of the 47 synonymous variants, the codon usage table identified 21 variants having significantly altered codon usage frequency, and many had significant effects on expression, surface localization, and Zn 2ϩ -induced IP 1 production. Surprisingly, the distributions of rare missense and SYN_⌬CB variants in the GPR39 structural domains were distinct. The majority of missense variants were localized to the transmembrane domain (77%), half in the extracellular loops (10 of 26), one an intracellular loop, and the remainder in the transmembrane helices (9 of 26). In contrast, ϳ50% of SYN_⌬CB variants were in the C terminus; the remainder were distributed among transmembrane helices (3 of 20), intracellular (2 of 20), and extracellular (4 of 20) loops. The C-terminal variants in both classes had significant effects on expression and signaling, primarily reducing V max . In relation to many family A GPCRs, the C terminus of GPR39 is relatively large (108 amino acids) and has 22 putative phosphorylation sites. GPR39 signaling outputs are regulated by agonist-toggled binding of protein kinase inhibitor ␤ (38). In addition, phosphorylation by Rho kinase mediates desensitization of GPR39 (39). Thus, C-terminal missense or synonymous variants may alter secondary structure and/or phosphorylation to impact signaling. The results focus attention on the C terminus as a critical domain mediating distinct aspects of GPR39 signaling. It should be noted that GPR39 signaling is complex, including G s -mediated cAMP generation, G q -mediated IP 3 production, and serum response factor-serum response element (SRF-SRE)-mediated transcriptional regulation via G 12/13 (39). The current study examined only G q -mediated IP 3 generation, leaving open the possible impact of missense and/or synonymous variants on signaling bias. Further, our results suggest that structural studies on a subset of rare missense and SYN_⌬CB variants having strong effects on signaling may reveal important structural determinants for optimal GPR39 function.
Based on ClinVar and related databases, GPR39 has not been causally linked to disease. The present study focused on the general patient population, and all individuals in the present Missense and synonymous variants alter GPCR function study were heterozygous for GPR39 rare variants. Nevertheless, both LOF/MISS and SYN_⌬CB variants yielded disease associations by SKAT, which suggests a role for GPR39 in nerve function, benign hyperplasia of the prostate, psoriatic arthropathy, diseases of hair and hair follicles, and benign essential hypertension. These associations are in agreement with cellular and animal studies suggesting that GPR39 regulates neuronal transport of potassium and chloride, attenuating postsynaptic excitability (12, 40 -42). Animal models demonstrate GPR39-mediated up-regulation of the neuronal K ϩ /Cl Ϫ cotransporter attenuates seizure activity (42), and knockout of GPR39 promotes depression (43). GPR39 is up-regulated by some antidepressants (44) in mouse models. One study in humans showed that GPR39 levels were significantly reduced in the hippocampus and cortex of suicide victims (45). The highest levels of Zn 2ϩ in the body are in the prostate and seminal fluid (46), and GPR39 is differentially expressed in normal versus malignant prostate cells, where it regulates cell growth and proliferation (10,12,13), potentially contributing to the association with benign prostate hyperplasia observed in the present work. Finally, the observation that individuals heterozygous for rare variants in GPR39 have a significant association with diseases of hair and hair follicles should be considered in light of a recent study that found that GPR39 marks a class of stem cells in the sebaceous gland and contributes to wound healing in mice (47). Overall, the identification of a subset of clinical phenotypes that echo cell culture and animal model studies supports disparate roles for GPR39 in human physiology and validates the SKAT approaches used herein to identify disease associations. Additional phenotypes, not previously associated with GPR39 including psoriatic arthropathy and macular puckering of the retina, deserve further attention.
In conclusion, we have identified strong disease associations for GPR39 using either LOF/missense or rare synonymous variants that confirm and extend cellular and animal studies. Rare variants from both the missense and synonymous classes altered GPR39 expression and function, providing a rationale for the observed disease associations. The approach applied to a larger set of orphan or understudied GPCRs also yielded robust associations with phenome-wide significance, providing potential novel targets for further investigation. The methods described in this report are easily applied to any set of genes of potential clinical importance with available genomic and integrated longitudinal electronic health records. Finally, this study highlights the importance of rare synonymous variants in human physiology and argues for their routine inclusion in any comprehensive analysis of genomic variants as potential causes of disease.

Experimental procedures
The MyCode Community Health Initiative recruits Geisinger patients from a broad range of inpatient and outpatient clinics to integrate genetic information with their clinical EHR data to foster discoveries in health and disease (23). This study used only deidentified data and does not therefore involve "human subjects" as defined in 45CFR46.102(f). Although this study was not subject to oversight by the Geisinger Institutional Review Board, the specific uses of the genetic and associated clinical data described herein were approved by the Geisinger Institutional Review Board. We used WES from 51,289 participants (9), which includes patients enrolled through primary care and specialty outpatient clinics. Participants were 59% female, with a median age of 61 years, and predominantly Caucasian (98%) and non-Hispanic/Latino (95%). Patient demographics are in Table 4. Whole exome sequencing was performed using 75-bp paired-end sequencing on an Illumina v4 HiSeq 2500 to a coverage depth sufficient to provide greater than 20ϫ haploid read depth of over 85% of targeted bases for at least 96% of samples. Sample preparation, sequencing, sequence alignment, variant identification, genotype assignment, and quality control steps were carried out as previously described (23).
Rare variants were separated into two groups: 1) nonsense and frameshift variants and 2) missense variants. Nonsense and frameshift variants were scored by SnpEff fields LOF[*]⅐PERC (percent) Ն 0.5 or (nonsense mediated decay NMD)[*]. Number of transcripts (NUMTR) values Ͼ 3 are considered potential LOF variants. To score missense variants, we summed numerical values derived from pathogenicity prediction tools to generate a cumulative score as follows (RMPath, in silico radical mutations pathogenicity prediction; maximum score all other results ϭ 0. Variants with RMPath scores Յ8 were pB, those with scores in the range from 6 to 11 were predicted to be Likely Benign (pLB); those with scores between 11 and 17 were predicted to be likely pathogenic (pLP); and those with scores Ն17 were predicted to be pathogenic (pP). For clinical purposes, all variants are considered variants of unknown significance until family cosegregation and/or functional analyses definitively establish a variant as pathogenic. Our goal in this study was to identify variants likely to impact GPR39 expression or function.
For all rare synonymous codons, we determined the ratio of the genome-wide codon fraction of the GPR39 reference codon over the synonymous codon variant at that position using human-specific codon usage tables (48,49), e.g. for S147S (tcG/ tcA), 0.15/0.06 ϭ 2.5, i.e. a 2.5-fold change. All synonymous variants with at least a 2-fold change in site-specific codon usage were used in the SKAT analyses. Deidentified EHR data were obtained from an approved data broker. All unique ICD9 codes with Ն200 patients (regardless of genotype) with at least three independent incidences of the ICD9 code were extracted from the EHR (50). Individuals having one or two calls were excluded, and those having no calls of a particular ICD9 code were considered as controls. All non-Europeans were excluded from the analysis, as was one sample from pairs of closely related individuals up to first cousins. All models were adjusted for sex, age, age 2 , and the first four principal components. The PheWAS R package was used for association analyses, which were performed for three GPR39 common variants (2:133174764, pA50V; 2:133174999, pT128T; and 2:133402607, noncoding), and plotting of results was done using GraphPad Prism (version 6). We used the sequence kernel association test with default weights (SKAT R package) to compare the burden of rare variants in cases and controls. Cases and controls were defined as for PheWAS analysis (50). Age, age 2 , sex, body mass index, and the first four principal components were used as covariates. All non-European samples were excluded. We used principal components calculated from high quality common SNPs to estimate the ancestry of our samples. Using a kernel density estimator, the likelihood that an individual belonged to each HapMap class (African, admixed American, East Asian, European, or south Asian) was estimated; samples that fell into an unknown group were excluded (51). Statistical significance was determined using the Bonferroni correction, ␣/m, where ␣ is 0.05, and m is the number of variant sets and disease codes, as indicated for each analysis (52).
All missense and SYN_⌬CB variants were generated by gene synthesis (Thermo Fisher Scientific, Gene Art) based on the NP_001499.1 reference sequence (GeneID 2863, 453 amino acids) and included tandem tags at the N terminus to facilitate expression and functional studies. After the initiation methionine codon, we added a 3ϫ FLAG epitope plus a minimal bungarotoxin-binding site as shown: GACTACAAAGACCATGA-CGGTGATTATAAAGATCATGATATCGATTACAAGGA-TGACGATGACAAGGGTTGGAGATACTACGAGAGCT-CCCTGGAGCCCTACCCTGACGGT.
WT and variant constructs were transiently transfected into HEK293 cells. The cells were lysed after 48 h, and 25 g of lysates were separated on 4 -15% SDS-PAGE gels (Bio-Rad), blotted to nitrocellulose, blocked with 5% milk in TBS-T, and exposed for 1 h at room temperature to anti-FLAG mAb conjugated to HRP (Sigma). The blots were exposed to SuperSignal West Pico Chemiluminescence Substrate (Thermo Fisher), and the images were recorded on a FUJIFILM LAS-4000mini luminescence analyzer and processed with Image-Gauge version 3.0. GPR39 expression was quantified using Multi-Gauge software, normalized to WT expression, and corrected for HEK293 background on the same blot. Each variant was assayed in two to four independent transfections. To accurately quantify the range of expression observed for SYN_⌬CB variants, we segregated high-and low-expressing variants plus WT onto separate gels, allowing differential exposure times for chemiluminescence acquisition.
WT and variant constructs were transiently transfected into HEK293 cells and cultured for 24 h, and equivalent numbers of cells were plated in poly-D-lysinecoated 96-well plates for a further 24 h. Replicates were treated with various concentrations of ZnCl 2 for 10 min at 37°C. Levels of IP 1 were determined as recommended by manufacturer (IP-One ELISA kit, Cisbio USA, Inc.). The plates were read on a POLARStar Omega (BMG Labtech) plate reader. We made no assumptions regarding signaling mechanism and its potential cooperativity and estimated V max and EC 50 from Zn 2ϩ versus IP 1 data combined from three independent experiments using GraphPad PRISM (V6) and a general dose-response equation with the potential for cooperativity (Y ϭ Y 0 ϩ [(V max Ϫ Y 0 )/(1 ϩ 10 (logEC 50 Ϫ [Zn 2ϩ ] n)]), where Y 0 is baseline IP 1 in the absence of Zn 2ϩ , V max is the maximal response, EC 50 is the Zn 2ϩ concentration at 50% of V max , and n is the Hill coefficient (slope). Parallel plates from three independent transfections were exposed to Alexa 594 -conjugated bungarotoxin (3 g/ml, 5 min at 37°C, 5% CO 2 ) to quantify surface expression of GPR39 variants. After removal of bungarotoxin and washes with PBS, fluorescence was read on the POLARStar Omega (BMG Labtech) plate reader. WT, standards, and solution blanks were included in each assay, and unspecific binding of BgTx-Alexa 594 to HEK293 cells was determined relative to cells transfected with WT GPR39 in the same experiment.