Rare variant pathogenicity triage and inclusion of synonymous variants improves analysis of disease associations

Many G protein-coupled receptors (GPCRs) lack common variants that lead to reproducible genome-wide disease associations. Here we used rare variant approaches to assess the disease associations of 85 orphan or understudied GPCRs in an unselected cohort of 51,289 individuals. Rare loss-of-function variants, missense variants predicted to be pathogenic or likely pathogenic, and a subset of rare synonymous variants were used as independent data sets for sequence kernel association testing (SKAT). Strong, phenome-wide disease associations shared by two or more variant categories were found for 39% of the GPCRs. Validating the bioinformatics and SKAT analyses, functional characterization of rare missense and synonymous variants of GPR39, a Family A GPCR, showed altered expression and/or Zn2+-mediated signaling for members of both variant classes. Results support the utility of rare variant analyses for identifying disease associations for genes that lack common variants, while also highlighting the functional importance of rare synonymous variants. Author summary Rare variant approaches have emerged as a viable way to identify disease associations for genes without clinically important common variants. Rare synonymous variants are generally considered benign. We demonstrate that rare synonymous variants represent a potentially important dataset for deriving disease associations, here applied to analysis of a set of orphan or understudied GPCRs. Synonymous variants yielded disease associations in common with loss-of-function or missense variants in the same gene. We rationalize their associations with disease by confirming their impact on expression and agonist activation of a representative example, GPR39. This study highlights the importance of rare synonymous variants in human physiology, and argues for their routine inclusion in any comprehensive analysis of genomic variants as potential causes of disease.

The initial SKAT screen for GPR39 disease associations compared top associations for the 155 three independent classes of rare variants. To more fully explore the results for GPR39 and 156 increase power to identify all significant disease associations, we combined all individuals 157 heterozygous for rare LOF, pLP and pP variants (termed LOF/MISS). Analysis was restricted to 158 only those ICD9 codes with ≥200 individuals independent of genotype [23], and the results are 159 plotted against ICD9 codes in Figure 4A. SKAT analysis was also run on rare SYN_∆CB variants, 160 plotted in Figure 4B. Both the LOF/MISS and SYN_∆CB analyses had a significant number of 161 phenome-wide associations in common. All results for both variant classes that were above the 162 significance cut-off for multiple testing are listed in Table 2. Top associations for GPR39 included 163 lesion of the ulnar nerve and benign prostate hyperplasia with lower urinary tract obstruction. 164 Note that individuals who had both rare LOF/MISS and SYN_∆CB variants were excluded from 165 analyses. Also of note in Table 2, some significant disease associations were specific to either the 166 LOF/MISS or SYN_∆CB variant classes. To assess the importance of rare variant pathogenicity 167 triage for binning prior to SKAT analysis, we combined all rare variants (LOF, missense, and 168 synonymous variants; 209 variants, 177 used in analysis) and ran SKAT. The only significant 169 disease association was upper quadrant abdominal pain (ICD9 789.02; P-value 8.98E-06; 170 significance cut-off at P < 7.6E-05). We also ran SKAT on those rare missense variants predicted 171 to be benign (Figure 2, pB; 61 variants); no statistically significant disease associations were 172 obtained. Likewise, we ran SKAT analysis on all rare synonymous variants and found top 173 associations similar to those obtained for SYN_∆CB, but with less significant P-values, suggesting 174 dilution of impactful variants, i.e., lesion of ulnar nerve (P-value 1.08E-07), and benign prostate 175 hyperplasia with urinary tract obstruction and other lower urinary symptoms (P-value 3.79E-05) 176 (compare Tables 2 and 3). We conclude that pathogenicity triage to refine rare variant binning was crucial to identifying disease associations for the LOF and missense variants, and binning of 178 the subset of rare synonymous variants having large changes in local codon frequency bias 179 strengthened disease associations. Further, refining the rare variant input into the SKAT analyses 180 yielded disease associations common to multiple independent classes of variants, strengthening 181 confidence in the associations.  Figure S2), quantified in Figure 5B. Surprisingly, despite the change from rare to common codon 196 frequency for the majority of SYN_∆CB variants, only four showed increased expression relative 197 to WT. Four variants went from common to rare codons, denoted by * on the x-axis in Figure 5B.
despite the large decrease in codon frequency. Overall, results argue that factors in addition to 200 site-specific codon usage bias determine net expression levels for rare synonymous variants.

201
GPR39 signaling is complex, with multiple signaling outputs. However, in both 202 endogenous and heterologous expression systems, GPR39 activates inositol trisphosphate (IP3)-203 dependent release of intracellular Ca 2+ [10,11,24]. The first step in the signaling pathway is Gq-204 mediated activation of phospholipase C, which produces diacylglycerol and IP3. Here we exposed 205 HEK293 cells transiently expressing WT or variant GPR39 to its agonist, ZnCl2, and used a FRET 206 assay to quantify accumulation of intracellular inositol monophosphate (IP1) in the presence of 207 LiCl. Figure 6A illustrates normalized maximal levels of IP1 for WT and representative missense 208 variants having decreased (R386C) or increased (W314R) EC50 for activation by ZnCl2. Figure   209 6B plots the EC50s for WT and all other missense variants, color-coded for their locations in 210 GPR39 domains. Four variants, S78L, M87I, T122M and R133G, had no measurable activity, 211 likely accounted for by low levels of expression (recall Figure 5A). Three additional variants, 212 C108R, R193C and R302Q, had dose-response curves that were significantly right-shifted, 213 precluding determination of EC50 or VMAX, despite expression levels not significantly different 214 from WT. Likewise, Figure 6C illustrates the normalized dose-response relations for WT GPR39 215 and representative SYN_∆CB variants which decrease (V354) or increase (I29) the EC50 for ZnCl2 216 activation, and Figure 6D plots the EC50s of WT GPR39 and all SYN_∆CB variants. In contrast 217 to the missense variants, all SYN_∆CB variants had ZnCl2-stimulated IP1 activity. Six variants 218 had increased EC50s and three variants having significantly reduced EC50s relative to WT. Surface 219 expression of all rare missense and SYN_∆CB variants was assessed by BgTx-Alexa594 binding 220 on non-permeabilized cells (Figures 6E, 6F), and generally correlated with the extrapolated Vmax 221 values for the IP1 assays (Figures 6G, 6H). Notable exceptions include the missense variants C108R, R193C and R320Q, which had surface expression comparable to WT but little IP1 activity.

223
Likewise, the most striking SYN_∆CB variant, Q265, had an overall increase in net expression 224 ( Figure 5B), increased EC50 relative to WT (Figure 6D), and increased surface localization and 225 VMAX relative to WT (Figure 6F, 6H). Overall, results argue for a striking similarity in functional 226 impact on expression and IP1 signaling for a subset of both rare missense and SYN_∆CB variants 227 of GPR39, rationalizing the disease associations identified for both classes of rare variants. The 228 congruence of top disease associations between rare LOF, pathogenic missense and SYN_∆CB 229 variants for a significant subset of GPCRs (recall Table 1) argues strongly for the utility of these 230 analytical approaches for identification of functionally impactful rare variants, and the imperative 231 to include rare synonymous variants in genomic and functional analyses. straightforward. However, due to their low frequency, it is necessary to combine rare coding 240 variants to increase the statistical power for detecting disease associations. In this study, we used 241 pathogenicity triage as a basis for rare variant binning, based on the hypothesis that aggregating 242 variants with similar directions of effect on protein function would amplify disease associations. 243 We considered, in order of the strength of their potential effects, predicted LOF variants, predicted

302
Based on ClinVar and related databases, GPR39 has not been causally linked to disease.

303
The present study focused on the general patient population, and all patients in the present study 304 were heterozygous for GPR39 rare variants. Nevertheless, both LOF/MISS and SYN_∆CB 305 variants yielded disease associations by SKAT which suggest a role for GPR39 in nerve function, 306 benign hyperplasia of the prostate, psoriatic arthropathy, diseases of hair and hair follicles, and 307 benign essential hypertension. These associations are in agreement with cellular and animal 308 studies which suggest that GPR39 regulates neuronal transport of potassium and chloride,    De-identified EHR data was obtained from an approved data broker. All unique ICD9 359 codes with ≥200 patients (regardless of genotype) with at least 3 independent incidences of the ICD9 code were extracted from the EHR [52]. Individuals having 1-2 calls were excluded, and 361 those having no calls of a particular ICD9 code were considered as controls. All non-Europeans 362 were excluded from the analysis, as was one sample from pairs of closely related individuals up to 363 first cousins. All models were adjusted for sex, age, age 2 and first 4 principal components. The

364
PheWAS R package were used for association analyses, which were performed for three GPR39  (1)  blotted to nitrocellulose, blocked with 5% milk in TBS-T, and exposed for 1 hour at room 394 temperature to anti-FLAG monoclonal antibody conjugated to HRP (Sigma). Blots were exposed  VMAX was separately calculated on raw data (GraphPad Prism, V6). Parallel plates were exposed 408 to Alexa-594-conjugated bungarotoxin (3 µg/ml, 5 min at 37 °C, 5% CO2) to assess surface

612
The GPR39 sequence was color-coded to indicate reference codon usage frequency. For all amino 613 acids having multiple potential codons, red indicates the most common codon, with progression 614 from common to rare colored as gradations to light pink (rare). Arrowheads indicate the locations 615 of rare synonymous variants exhibiting the largest changes in codon bias, either from common to rare or rare to common. Asterisks indicate rare LOF variants (truncations or frameshifts) and 617 arrows indicate locations of rare missense variants combined for SKAT analysis.