Impact of amino acid substitutions at secondary structures in the BRCT domains of the tumor suppressor BRCA1: Implications for clinical annotation

Genetic testing for BRCA1, a DNA repair protein, can identify carriers of pathogenic variants associated with a substantially increased risk for breast and ovarian cancers. However, an association with increased risk is unclear for a large fraction of BRCA1 variants present in the human population. Most of these variants of uncertain clinical significance lead to amino acid changes in the BRCA1 protein. Functional assays are valuable tools to assess the potential pathogenicity of these variants. Here, we systematically probed the effects of substitutions in the C terminus of BRCA1: the N- and C-terminal borders of its tandem BRCT domain, the BRCT-[N-C] linker region, and the α1 and α′1 helices in BRCT-[N] and -[C]. Using a validated transcriptional assay based on a fusion of the GAL4 DNA-binding domain to the BRCA1 C terminus (amino acids 1396–1863), we assessed the functional impact of 99 missense variants of BRCA1. We include the data obtained for these 99 missense variants in a joint analysis to generate the likelihood of pathogenicity for 347 missense variants in BRCA1 using VarCall, a Bayesian integrative statistical model. The results from this analysis increase our understanding of BRCA1 regions less tolerant to changes, identify functional borders of structural domains, and predict the likelihood of pathogenicity for 98% of all BRCA1 missense variants in this region recorded in the population. This knowledge will be critical for improving risk assessment and clinical treatment of carriers of BRCA1 variants.

Carriers of germline variants in BRCA1 that disrupt protein expression, stability, or function are at increased high risk for developing early-onset breast and ovarian cancer (1,2). The identification of women at risk is now an important aspect to inform and manage preventive and therapeutic clinical decisions. It requires accurate discrimination of genetic variants associated with high risk (relative risk Ͼ 4) from those variants that are not associated with a clinically relevant increase in risk (3,4).
Variants in BRCA1 whose effect can be inferred using the genetic code can be accurately classified according to their pathogenicity (5). However, the effect of missense variants, inframe deletions or insertions, and intronic variants need to be assessed using multiple sources of data posing a challenge for clinical annotation. Currently, rigorous clinical annotation of BRCA1 variants regarding pathogenicity is obtained using a multifactorial likelihood model that incorporates family history, segregation data, personal history, and co-occurrence data (5)(6)(7)(8). Importantly, clinical recommendations associated with each class (i.e. 5, pathogenic; 4, likely pathogenic; 3, uncertain; 2, likely not pathogenic; and 1, not pathogenic) are available (9). Variants for which the association with increased risk has not been established are called variants of uncertain clinical significance (VUS). 3 Missense variants, in particular, constitute a challenge for clinical annotation. Most of these missense VUS alleles are observed in very low frequency (Ͻ1/10,000) in the population and, thus family and population-based data are often insufficient to determine pathogenicity. Functional assays that directly or indirectly test for a biological or biochemical function of BRCA1 in the laboratory can also provide information to clinically annotate missense variants (10 -15). However, functional data are not currently considered for the calculation of the multifactorial likelihood but only as a confirmatory source of information. Although genetic and epidemiological models will remain the gold standard methods to associate variant and increased risk, several variants may only be understood through functional assays (16,17).
BRCA1 is a multifunctional protein central to the maintenance of genome integrity through homologous recombination (18). BRCA1 has also been implicated in transcription (19 -21), is associated with the RNA polymerase II holoenzyme (22,23), and is enriched at transcription start sites (24). The transcriptional activation (TA) assay has been extensively used to evaluate BRCA1 VUS (25). In this validated assay, a fusion of a heterologous DNA-binding domain (yeast GAL4 DBD) and the C terminus of BRCA1 drives the expression of a reporter gene. Pathogenic variants have compromised transcription activation, whereas nonpathogenic variants display activity comparable with the reference "WT" allele.
To allow for integration of functional assay results in the multifactorial model a Bayesian two-component mixture model, called VarCall, was developed that generates probabilities of pathogenicity given the functional data (26). Recent analysis of 249 missense variants confirmed the importance of the BRCT domains for BRCA1 function, but also revealed regions within the BRCT domains that seemed tolerant to mutations (25). Thus, to refine and extend the coverage of missense VUS in the BRCA1 C-terminal we tested 102 variants (99 missense and 3 truncating) probing the functional borders of the BRCT domains, ␣ helices in BRCT-[N] and BRCT- [C], and the linker region between BRCT units. Finally, we conducted a joint analysis including all 347 BRCA1 missense variants tested to date.

BRCA1 variants in this study
We tested 102 (99 missense and 3 truncating) VUS in the BRCA1 C-terminal region (aa 1396 -1863) not previously analyzed using the TA assay, with a focus on structural features, divided in five groups: (a) 20 variants located at the border region between the disordered region and the BRCT-[N] at residues p.K1648 (to Gln, Glu, Thr, Arg, Ile, and Asn), p.R1649 (to Gly, Thr, Lys, Ile, and Ser), p.M1650 (to Leu, Val, Thr, Arg, and Lys), and p.S1651 (to Ala, Cys, Pro, and Tyr); (b) 67 variants located in the linker region connecting BRCT-[N] and BRCT-[C]; (c) 9 variants in ␣-helices ␣1 and ␣3 in BRCT-[N] and in ␣Ј1 and ␣Ј3 in BRCT-[C]; (d) 3 deletion mutants to probe the C-terminal border of BRCT domain; and (e) 3 assorted variants documented in the population (p.S1577Y, p.E1609G, and p.G1770E) ( Fig. 1) (Table S1). We also examined two approaches for transcription data normalization using internal vector only and ectopic protein expression levels.

N-and C-terminal border of BRCT domains
We interrogated 4 amino acid residue positions at the N-terminal border of BRCT domains (aa 1648 -1651). When normalizing against an internal vector control for 20 variants in this group, 19 displayed transcription activity comparable with that of WT BRCA1 protein and variant p.S1651P was the only variant in this set with significantly compromised (Ͻ80%) function ( Fig. 2A). These results suggest that residue p.S1651 is the first position of the tandem BRCT functional and structural unit sensitive to missense changes.
Nonsense variants that result in an 11-amino acid truncation at the C-terminal region of BRCA1 lead to a dysfunctional protein (2). Deletion of eight final amino acid residues may also disrupt tBRCT function (27). To investigate the minimal region needed for BRCA1 activity we generated three nonsense variants: p.I1855X, p.P1856X, and p.Q1857X (Fig. 1B). All three deletions dramatically affected activity and expression levels of the protein carrying the deletion variants (Fig. 2B). This finding indicates that deletion of as few as 7 amino acid residues from the C terminus leads to protein instability, abrogation of BRCT function, and may therefore be pathogenic.

Linker region
We tested 67 variants at 18 residue positions located across the BRCT linker region (Fig. 1B), and 49 variants displayed reduced activity (mean Ͻ80% WT activity). Although the majority of these low activity variants localize to secondary structures, every residue position probed (with the exception of 1755 and 1757) had at least one change that impacted function (Ͻ80% WT activity) (Fig. 3, A-C). Residue positions present in intervening sequences were more tolerant to changes than residues at the linker helices. In summary, missense changes in the linker region are likely to have an impact on function and most variants impact protein stability.

␣ Helices
We evaluated nine variants in four ␣-helices in both BRCT domains (␣1, ␣3, ␣Ј1, and ␣Ј3) (Fig. 3, D and E). Seven variants had significantly reduced activity indicating the importance of these ␣-helices, despite the lack of functional impact of most variants previously analyzed (25).

Assorted variants
We tested two variants described in the population and located in the disordered region (p.S1577Y and p.E1609G) but they did not have an effect on activity (Fig. 3D), as expected. Variant p.G1770E, located in an intervening segment between ␤Ј1 and ␣Ј1 had a significant impact on function (Fig. 3E).

Protein expression levels
The previous analysis controlled for differences in transfection efficiency in each well by normalizing transcriptional activity, measured by firefly luciferase, by a Renilla luciferase control reporter (driven by HSV thymidine kinase promoter) (Fig.  S1A). Protein expression levels of GAL4DBD:BRCA1 variants were assessed by Western blotting (Fig. S1B) to determine whether missense variants caused protein instability but were not used to normalize the results.
To test whether using protein levels for normalization would improve the assay accuracy we also assessed each variant's activity normalized by protein levels (GAL4DBD:BRCA1 and ␤-actin) (Fig. 4). Similar coefficient of variation distributions Thirty-five variants, all located in the BRCT domains, showed discordant results between the two normalization methods (Fig. 4, A-D) when using the 80% threshold of activity. Of those, 33 variants were called defective when normalizing by vector only but similar to WT when normalizing to vector and protein levels due to low levels of expression (Fig. S1). However, although these variants were discordant using the arbitrary 80% activity threshold, after the generation of likelihood of pathogenicity by VarCall (see below), only five variants were discordant, indicating that this threshold may be too stringent.

Amino acid substitutions in BRCT domains
Expression of these variants was consistently low in independent experiments suggesting these amino acid changes lead to protein instability rather than reflecting errors during transient transfections. Consistent with this, seven of 15 selected variants with compromised expression in mammalian cells also failed to express in stable yeast transformants (Fig. S2B). We selected variants p.V1740E and p.H1746D, which showed instability in mammalian cells but are stably expressed in yeast to assess reporter function. Both variants showed markedly reduced reporter transcription activation function, even when reaching steady state levels comparable with WT (Fig. S2C). Taken together, these results indicate that normalization using protein levels may lead to inflated activity levels when lower levels of expression are due to instability caused by the amino acid changes, which ultimately lead to defective function, and not due to errors during transient transfections.

Likelihood of pathogenicity
Next, we incorporated the results obtained for the 99 missense variants in the present study into the VarCall algorithm in a joint analysis with data from published variants to estimate the likelihood of pathogenicity of 347 variants (25, 26) ( Table  S2). The output from VarCall represents the likelihood of path-

Amino acid substitutions in BRCT domains
ogenicity given the effects on the functional capacity of the variant as previously described ( Fig. 5A) (Table S3).
To assess the performance of VarCall we used a reference panel of 49 known variants classified by the multifactorial model (6) (Table S4). For this assessment we combined IARC Class 1 with 2 (nonpathogenic and likely nonpathogenic) and IARC Class 4 with 5 classes (likely pathogenic and pathogenic). The assay displayed sensitivity of 0.95 (95% CI ϭ 0.72-1.00) and specificity of 1.00 (95% CI ϭ 0.84 -1.00) when including a missense variant whose pathogenicity is due to splicing defects (p.R1495M). When using a panel of variants excluding this variant, with prior knowledge of splicing defects, sensitivity improves to 1 (95% CI ϭ 0.78 -1.0), confirming that the model can be used to classify VUS reliably given the functional data.
The contributions of the present data set (n ϭ 99) are not limited to the proposed classification of these individual variants but also will provide refinement of previous classifications when the VarCall is run with the complete dataset (n ϭ 347). Indeed, not only the fraction of VUS decreased significantly but also more variants are now classified as fClass 1 (Table 1) (Fig.  5B). The improvement of functional classifications with the present dataset can be visualized in Fig. 5B, which tracks how variants previously assigned fClasses (P1-5) are now assigned to fClasses after the addition of the present dataset (N1-N5). Note that 100% of variants assigned to P1 remained in the corresponding N1 fClass and also 100% of variants in P5 remain in N5. Thus, nonpathogenic (fClass 1) and pathogenic (fClass 5) assignments were stable (Fig. 5B). Importantly, all variants in P4 were further refined to N5 as did a large fraction (85%) of variants in P2, which were further refined to N1 (Fig. 5B).
All  Table S5). Notably, no variant in the disordered region (n ϭ 95) had a significant impact on function.

Data integration from multiple assays
The data presented above corresponds to a single functional assay. Thus, it is conceivable that variants that have no func-

Amino acid substitutions in BRCT domains
tional impact on transcription might be defective in different assays. We compiled published data for missense variants from 35 functional assays that assess variants in the C terminus of BRCA1 (aa 1315-1863) (Table S6). Using the reference panel (Table S4) we calculated specificity and sensitivity for this region of the protein for 24 assays for which there was at least

Amino acid substitutions in BRCT domains
three pathogenic and 3 nonpathogenic reference variants tested ( Table 2) (for full table, see Table S7).
All assays displayed high specificity (range 0.80 -1.00) but a more variable sensitivity (range 0.63-1.00) with lower sensitivity found for yeast-based or protein stability assays, presumably due to a subset of pathogenic variants that are stabilized at lower temperatures and variants that do not affect protein stability, respectively (Table 2). Notably, the transcription assay described here achieves, for this region of the protein, the highest sensitivity and specificity of all assays, surpassing other functional assays assessing other canonical functions of BRCA1 such as homologous recombination (HR) ( Table 2).
Next, we determined the concordance between VarCall classification for transcription activation and homologous recombination assays (Tracks 18 -23; Table S6). There were 60/70 variants (86%) scored by VarCall with at least one additional homologous recombination assay result in complete agreement. Three additional variants had agreement between VarCall and at least one HR assay (Fig. 6, A and B).
However, due to a small sample size there was limited overlap with variants assessed in the present study (three, all concordant). To assess the variants in the present study we compared results from high throughput mutagenesis and cell viability for which there 87 variants classified by both meth-

. Transcription activation assays for BRCA1 missense VUS normalized by internal vector and protein levels.
A-E, transcription activity of missense variants located at the transition between the disordered region and the BRCT domain normalized against vector control and protein levels (Fig. S1). For comparison, squares denote whether the variant displayed activity similar to WT (open squares) or reduced (Ͻ80% WT; filled squares) activity when normalizing against vector control only.

Discussion
VUS pose a challenge for genetic counseling. For BRCA1 and BRCA2 variants determination of pathogenicity is based on a multifactorial model incorporating data on segregation analysis, family and personal breast or ovarian cancer history, and co-occurrence (5,7). Due to their low allele frequency many missense alleles of BRCA1 remain as VUS.
Functional assays have emerged as source of empirical data to aid in the determination of pathogenicity of missense VUS (16,25). The transcriptional assay for variants in the BRCA1 C terminus, in combination with VarCall, was recently assessed for its performance and has been proposed to be used for clinical annotation (25). Recent progress in saturation genome editing for BRCA1 VUS functional analysis demonstrates the feasibility of achieving comprehensive functional assessment of VUS in cancer susceptibility genes for the purposes of clinical annotation of variants (28).
Here we compared the use of two normalization methods using internal vector control only and vector control plus GAL4 DBD BRCA1 fusion protein levels. Normalization using protein levels may lead to incorrect variant classification when loss-of-

Amino acid substitutions in BRCT domains
function variants are associated with protein instability. Although we do not recommend the use of protein expression levels to normalize reporter activity results, protein expression analysis provides important insights regarding the underlying reason for the loss of function. Interestingly, we estimate that 1 ⁄ 3 to 1 ⁄ 2 of variants that score as likely pathogenic or pathogenic in VarCall lead to unstable protein with a higher fraction of variants in secondary structures in the BRCT domains.
Inspection of VarCall results in Woods et al. (25) showed clearly that the likelihood of being pathogenic varies significantly depending on its location, reflecting the importance of secondary and tertiary structures for function, in particular the tandem BRCT domains. Thus, we focused on three areas of improvement on our ability to identify pathogenic variants using the transcription assay: first, the definition of the N-and C-terminal borders of functional domains; second, a more granular analysis of the linker region; and finally, the role of ␣-helices in the BRCT domains.
In the present study we analyzed 99 missense and 3 deletion variants and define the functional borders of the BRCT domains. Because digestion of a bacterially expressed BRCA1 C-terminal region (aa 1528 -1863) resulted in a proteolytically stable fragment (aa 1646 -1863) and further deletion of residues 1860 -1863 yielded a soluble fragment used for structural determination, residue positions (1646 -1859) have been loosely considered the functional borders of the BRCT domain (29). Structural determination further refined it to 1650 -1859 as the first secondary structure in the BRCT-[N], a ␤-sheet (␤1), starts at residue Met-1650 (29). Our data suggest that this position is also tolerant to changes, as all six variants tested at this position were fClass 1 or 2. Deletion analysis of the C-terminal end of BRCA1 in yeast indicated that 1856 -1863 were dispensable for function (in the transcription assay), whereas aa p.I1855 was required for BRCT stability (30). p.Y1853, p.L1854, and p.I1855 represent a cluster of hydrophobic residues present at the end of Motif II in several BRCT domains (31). Our data showed that deletion starting at Gln-1857 significantly compromised the function in mammalian cells, consistent with a higher sensitivity of the transcriptional assay in mammalian cells than in yeast (32,33). Thus, we propose that for clinical variant annotation the BRCT domains in BRCA1 should be conservatively considered (1650 -1857). That is, a variant predicted to modify or truncate protein sequence at residues 1858 -1863 is not considered clinically important.
Hayes et al. (30) identified several variants in the liker domain as sensitive to amino acid changes suggesting the existence of unidentified structural requirements for the linker region. A flexible linker with a helical central portion which connects both BRCT units was identified by X-ray crystallography (29). However, this region had not been systematically explored. Our dense mutagenesis of this region showed that L␤1-intervening sequence-L␣2 is extremely sensitive to most changes resulting from single nucleotide changes in those residue positions. Previous data had shown that most variants in ␣1 and ␣Ј1 (two corresponding helices in the N-and C-BRCT domains) did not affect function, a case different from ␣3 and ␣Ј3 where most variants have a dramatic effect on function (25). Analysis of the current set shows that some variants in these structures lead to a functional defect. The lack of missense variants in the disordered region showing any impact on function (0 of 95 tested), confirms that this region is unlikely to harbor pathogenic variants. The significant fraction of pathogenic/likely pathogenic variants across all BRCT-related features (␤ sheets, ␣ helices, loops and turns, BRCT-[N], BRCT-[C], linker, and tandem BRCTs) highlights their requirement for the integrity of the tandem BRCT domains.
We had previously identified a patch of highly conserved amino acid residues forming a groove on the opposite face from the known phosphopeptide-binding pocket formed by residues Thr-1684, Thr-1685, His-1686, Lys-1711, Trp-1712, and Arg-1753, specific to BRCA1 BRCT (34). Interestingly, the conservative change p.R1753K, which is predicted by in silico tools to have no impact on function, shows severely impaired activity, further supporting a role for this putative binding site. Further studies will be needed to investigate this groove on the BRCT surface.

Amino acid substitutions in BRCT domains
A significant challenge in functional assessment of VUS is the concern that a variant scoring as nonpathogenic in one assay may confer risk of disease through a function other than the one being tested in the assay. Here, using the same reference panel we show that most functional assays achieve high sensitivity and specificity and there is a high level of concordance between assays. Importantly, there was no discordance in variants that scored nonpathogenic in VarCall and in homologous recombination assays. The reason for the excellent agreement across assays of distinct functions (HR, transcription, viability, and cisplatin sensitivity) may be partly due to the significant fraction of variants that lead to the absence of stable protein product, which would affect any readout. It is unlikely that any assay will be ideal and integration of results from different systems will be critical to improve clinical classification of variants.
In summary, data from this study confirms the high accuracy of the transcription assay for BRCA1 missense variants indicating that it can aid in the classification of VUS for clinical decisions. We performed a systematic analysis of several structural features and provide assessment for 98% of all variants recorded in the C-terminal region of BRCA1 with a small fraction (4%) of variants remaining as VUS. Although at this point these data should not be used as a sole source of information to classify variants, results are encouraging and warrant further exploration on how to use functional data for clinical annotation of VUS.
VUS were introduced into pcDNA3 (SG) BRCA1 13/24 (WT) by site-directed mutagenesis using the QuikChange II Site-directed Mutagenesis Kit, following the manufacturer's protocol (Agilent, Santa Clara, CA). Primers used for site-directed mutagenesis are listed in Table S1. All plasmids containing variants were subjected to Sanger sequencing to confirm the correct reading frame with GAL4 DBD, the presence of the variant to be tested, and absence of additional mutations introduced during PCR amplification.

TA assay in human cells
Variants were assessed by the TA assay (27,35,36) using three technical replicates for each variant in at least two independent experiments for a total of at least six measurements per variant. Briefly, BRCA1 constructs containing missense VUS or controls were co-transfected in HEK293FT cells with the pG5Luc plasmid with a Photinus pyralis luciferase reporter gene driven by GAL4-binding sites, and the phGR-TK plasmid, which constitutively expresses the internal control Renilla luciferase. Cells were transfected using FuGENE HD (Roche Applied Science) or polyethylenimine (Sigma) and harvested 24 h after transfection. Transcriptional activity was assayed using the Dual-Luciferase Reporter Assay System (Promega, Madison, WI). For use in VarCall luciferase activity was normalized using the internal constitutive expression of Renilla sp. luciferase and variant activity was depicted as % of the "WT" activity. For comparison purposes we also provide luciferase activity (as described above) normalized by expression levels measured by Western blotting. Note that Western blots for expression levels are obtained from parallel cultures and not directly from the wells in which luciferase activity is being measured.

TA assay in yeast cells
The TA assay was also conducted in EGY48 yeast cells using a plasmid reporter JK103, which has one LexA operator driving LacZ, and expressing a fusion of LexA:BRCA1 aa 1396 -1864 as previously described (30).

VarCall
Results were analyzed using the computation model VarCall (26) with the input dataset as in Table S2, which merged the results from the Woods et al. (25) dataset with the missense variants in this study. The Woods et al. (25) dataset incorrectly included variant p.M1652W, which was not a natural occurrence and variant p.A1752R, which contained a typographical error. These are corrected in the present dataset. The input dataset contained (Table S2) assays on 367 variants and the WT reference (includes one frameshift, three nonsense, and four in-frame deletions, including the 99 missense variants in the present study, and 4,974 individual measurements (1,291 from  Table S6). First row depicts IARC Class using the multifactorial model (Table S4) with red squares (Classes 4 -5) and blue squares (Classes 1-2). Discordant results between VarCall and any HR assay are indicated by a red star. Green stars indicate discordant from any assay but in agreement with another HR assay. Variants are in bold font. B, diagram showing classification of variants tested in this study as defective (red squares) or not (blue squares) according to VarCall (Track 8) or to a saturation mutagenesis study (Track 30; Findlay et al. (28)) based on cell viability. Discordant results are indicated by a red star. C, fraction of missense variants with disagreements between assays.