Structure-based prediction of Wnt binding affinities for Frizzled-type cysteine-rich domains

Wnt signaling pathways are of significant interest in development and oncogenesis. The first step in these pathways typically involves the binding of a Wnt protein to the cysteine-rich domain (CRD) of a Frizzled receptor. Wnt-Frizzled interactions can be antagonized by secreted Frizzled-related proteins (SFRPs), which also contain a Frizzled-like CRD. The large number of Wnts, Frizzleds, and SFRPs, as well as the hydrophobic nature of Wnt, poses challenges to laboratory-based investigations of interactions involving Wnt. Here, utilizing structural knowledge of a representative Wnt-Frizzled CRD interaction, as well as experimentally determined binding affinities for a selection of Wnt-Frizzled CRD interactions, we generated homology models of Wnt-Frizzled CRD interactions and developed a quantitative structure-activity relationship for predicting their binding affinities. The derived model incorporates a small selection of terms derived from scoring functions used in protein-protein docking, as well as an energetic term considering the contribution made by the lipid of Wnt to the Wnt-Frizzled binding affinity. Validation with an external test set suggests that the model can accurately predict binding affinity for 75% of cases and that the error associated with the predictions is comparable with the experimental error. The model was applied to predict the binding affinities of the full range of mouse and human Wnt-Frizzled and Wnt-SFRP interactions, indicating trends in Wnt binding affinity for Frizzled and SFRP CRDs. The comprehensive predictions made in this study provide the basis for laboratory-based studies of previously unexplored Wnt-Frizzled and Wnt-SFRP interactions, which, in turn, may reveal further Wnt signaling pathways.

nuclear factor of activated T cells lead to the transcription of genes in cardiomyocytes, neuronal cells, and skeletal muscle (15). Signal transduction via the PCP pathway (Fig. 1c) is initiated through Wnt binding to Fzd and co-receptors ROR and Ryk. Fzd activation leads to Dvl-mediated activation of Rac and Rho. JNK and Rho-associated protein kinase (ROCK) are activated by Rac and Rho, respectively, which mediates actin polymerization and activates transcription factors AP-1 and JUN (16).
Wnts comprise a group of 19 proteins that are subject to numerous post-translational modifications, including the formation of a large number of characteristic disulfide bonds, glycosylation in the endoplasmic reticulum (17), and palmitoylation by Porcupine, which aids in their secretion and facilitates their interaction with Frizzled (18). Structurally, as determined by the co-crystallization of Xenopus Wnt8 (XWnt8) with the mouse Fzd8 CRD, Wnts are composed of two domains: an N-terminal domain and a C-terminal domain (19). The N-terminal domain contains 10 cysteine residues forming five disulfide bridges in a cluster of ␣-helices, whereas the C-terminal domain contains six disulfide bridges and a two-stranded ␤-sheet (19). Frizzled receptors are a group of 10 membranebound receptors comprising the majority of Class F G proteincoupled receptors (GPCRs). Frizzleds, like other GPCRs, consist of seven hydrophobic transmembrane helices but feature an extracellular cysteine-rich domain (CRD) in their N terminus (20). The CRD is characterized by a conserved pattern of 10 cysteines and can bind Wnt and Norrin ligands (21,22). The five mammalian secreted Frizzled-related proteins (SFRPs) are secreted glycoproteins composed of an N-terminal CRD and a C-terminal netrin-like domain (23). These proteins function to antagonize the Wnt signaling pathway (24) through binding of either the CRD (25) or the netrin-like domain to Wnt ligands (26), thus interfering with Wnt binding to Fzd and preventing ␤-catenin-mediated gene transcription. The SFRPs have been studied in great detail for their potential roles as tumor suppressors and their implications in carcinogenesis (23).
Because of the large number of possible Wnt-Fzd CRD interactions (which, considering CRDs from both Fzds and SFRPs, totals 285 interactions per species), it is challenging to investigate these experimentally. A recent study utilized biolayer interferometry (BLI) to investigate a small set of mouse Wnt-Fzd CRD interactions in a combinatorial manner (27). Numerous other interactions have been identified via co-immunoprecipitation (coIP) or proposed based on co-expression of particular Wnts with particular Fzds (6). Although coIP and co-expression are valuable methods for suggesting the existence of specific protein-protein interactions, they are unable to provide an indication of the likely strength of an interaction. Computational studies provide the opportunity to complete the knowledge of interactions between Wnts and Fzd CRDs and may reveal previously unexplored high-affinity interactions.
In this study, we have generated homology models of Wnt complexes with both Fzd and SFRP CRDs and predict the likely binding affinity associated with these interactions. For a series of Wnt-Fzd CRD interactions for which dissociation constants have been reported (27), we then evaluated the interaction energy for the protein-protein and lipid-protein components of the interactions; this was achieved through scoring the interactions against the full set of functions contained in CCharPPI (28) (for the protein-protein component) and scoring using Prime MM-GB/SA (for the lipid-protein component). Strike was then used to develop and evaluate binding affinity prediction models using scores obtained from CCharPPI and Prime MM-GB/SA as descriptors for the model building. A model with high predictive performance was identified and subsequently applied to predict the binding affinities of all Wnt-Fzd and Wnt-SFRP CRD interactions in both mouse and human cases. A, canonical Wnt signaling. Wnt binding to Fzd CRD initiates the destabilization of the cytoplasmic destruction complex (adenomatous polyposis coli protein (APC), Axin, GSK3, CK1, and Dvl). This allows cytosolic ␤-catenin (␤-cat) accumulation and subsequent translocation to the nucleus where it binds to T cell factor/lymphoid enhancer-binding factor (TCF/LEF) transcription factors to transcribe Wnt target genes. SFRPs antagonize this cascade, and ␤-catenin is polyubiquitinated by ␤-transducin repeats-containing protein (␤-TrCP) and degraded by proteolysis. B, the Wnt/Ca 2ϩ pathway. Wnt binding to Fzd CRD or Ryk co-receptor activates Dvl, which stimulates calcium release. Downstream effectors PKC, calmodulin-dependent protein kinase II (CaMKII), and Cn activate transcription factors cAMP-response element-binding protein (CREB), NF-B, and nuclear factor of activated T cells (NFAT). C, the PCP pathway. Wnt stimulation is effected initially through Fzd-Dvl interaction and co-receptors ROR/Ryk and passed through multiple effectors (Rac, phospholipase C (PLC), Disheveled-associated activator of morphogenesis (DAAM)) downstream to ROCK and JNK. ROCK regulates the actin cytoskeleton, and JNK activates AP1 and JUN transcription factors to regulate cell polarity and migration.

Preparation of homology models of Wnt-Fzd CRD complexes
We prepared homology models of all mouse and human Wnts and all mouse and human Fzd and SFRP CRDs; details of UniProt accession numbers, sequence ranges, and sequence alignments used to build the models are provided in supplemental Table S1 and Figs. S2 and S3. The vast majority of proteins modeled did not feature large insertions or deletions relative to either XWnt8 or mFzd8 CRD with the exceptions of mouse and human Wnt6, Wnt10a, and Wnt10b; these Wnts feature insertions relative to XWnt8 larger than that able to be built by Prime (greater than 20 residues). To build these structures, we utilized an alternative procedure incorporating the I-TASSER server (described in detail under "Experimental procedures"), which is capable of building much longer insertions than Prime through its use of an iterative template fragment assembly approach to model building (29).
Following assembly of the complexes and refinement using a procedure automated using KNIME (supplemental Fig. S4), the MolProbity score of all models was calculated. The MolProbity score provides a single value metric of structural quality, summarizing the number of atomic clashes, percentage of backbone conformations in regions outside the Ramachandran favored regions, and the percentage of bad side-chain rotamers (30). The TM-score and the root-mean-squared deviation (RMSD) of the C␣ atoms of the models with respect to the XWnt8-mFzd8 CRD complex structure (Protein Data Bank code 4F0A) (19), which was the template for all models, were also calculated. These measures assess differences in the coordinates of two structures (31). The mean value for the MolProbity scores for the mouse and human models was slightly greater than the MolProbity score obtained for the XWnt8-mFzd8 CRD complex structure (Table 1) but nonetheless comparable, indicating the generally high quality of the models. The mean values for the model TM-scores with respect to the template crystal structure were generally high, and the mean values for the model C␣ RMSD values very low, further indicating the generally high quality of the models and their limited divergence from the template crystal structure. Selected complexes are shown in Fig. 2. Quality metrics are summarized in Table 1, and full details are provided in supplemental Tables S5-S10.

Development and validation of a Wnt-Fzd CRD binding affinity prediction model
We used a set of mouse Wnt-Fzd CRD binding affinities determined by BLI (27) to develop and validate our binding affinity prediction model. The model building and evaluation procedure is summarized in Fig. 3 and herein described.
Within the BLI data, we designated a training set, used to optimize the model, and a test set, used to demonstrate the performance of the model for data against which it had not been trained. Our training set was designated as complexes that were not part of our test set; our test set consisted of complexes involving interactions with either mFzd1 or mWnt4. The definition of the test set in this manner provided a simple means of selecting a test set covering a wide range of affinities. For all of these complexes, we then rescored, with separate procedures, the protein-protein portion and the lipid-protein portion of the interaction. The protein-protein portion was rescored against the majority of functions available within CCharPPI (28) (listed in supplemental Table S11), a server compiling a wide range of scoring functions suitable for use in protein-protein docking. The lipid-protein portion was rescored used Prime MM-GB/SA, which provides a rapid means for evaluating ligand-receptor binding energies with improved accuracy compared with typical docking scoring functions. The Prime MM-GB/SA calculation is also decomposed into its components (Coulomb/electrostatic, covalent binding, van der Waals, lipophilic, polar solvation/desolvation, hydrogen bonding, andcomponents; components used in this study listed in supplemental Table S11). The two strategies function complementarily to one another; the functions in CCharPPI are only capable of considering interactions between standard protein amino acids, whereas Prime MM-GB/SA is capable of studying interactions between small organic molecules with proteins. With this in mind, the Wnt lipid was removed from the CCharPPI calculations, and the Wnt protein was removed from the Prime MM-GB/SA calculations (that is, only the interaction between the Wnt lipid and the Fzd CRD was assessed by Prime MM-GB/ SA). Multiple linear regression models combining one Prime MM-GB/SA component with one or more CCharPPI components (all herein referred to as descriptors) were then generated, thus allowing the development of a single model considering both the protein-lipid and the protein-protein portions of the interaction.
As it was computationally accessible to consider all possible three-descriptor models incorporating one Prime MM-GB/SA term and two CCharPPI-derived terms, we initially explored these. The performance of all models was evaluated using two principal metrics: 1) the root-mean-squared error (RMSE) between the predicted values and the average experimental values (RMSE train and RMSE test ); lower values indicate a better fit between the predictions and experimental values; and 2) the percentage of complexes for which the predicted value occurred within the experimental range reported (InExp train and InExp test ); higher values indicate a better fit between the predictions and experimental values.
High-performing three-descriptor models of Wnt-Fzd CRD binding typically incorporated the van der Waals term of the Prime MM-GB/SA calculation (supplemental Table S12). The lipophilic term of the Prime MM-GB/SA calculation also appears frequently in high-performing models. This is unsurprising considering the physicochemical properties of palmito-leic acid, which would suggest that the binding energy will likely be associated with van der Waals/non-polar interactions. The best performing three-descriptor models generally displayed RMSEs for both the training and test sets in the range of 0.3-0.4 kcal/mol, which is well outside the error range of the experiments of ϳ0.2-0.3 kcal/mol (27); this indicates that three-descriptor models are insufficiently predictive.
Two models containing four descriptors were identified that were capable of high-prediction performance ( Table 2). Both of these displayed RMSEs for the training and test sets less than 0.3 kcal/mol. Both included the van der Waals term of the Prime MM-GB/SA calculation, the PyRosetta hydrogen bonding potential (HBOND2) (32), and either the RW or RWplus statistical potentials (AP_calRW and AP_calRWp) (33). The fourth term in Model 1 is the antibody-antigen energy function of FireDock (FIREDOCK_AB) (34), whereas in Model 2, it is the total RosettaDock weighted energy (ROSETTADOCK). As the performance of Model 1 appeared slightly improved over Model 2, this model was selected for further study. Additionally, Model 1 was preferred over Model 2 for featuring a smaller constant term, suggesting that it may be able to predict affinities over a wider range than Model 2. The RMSE values for Model 1 suggest that the error associated with its use will be slightly larger than, but nonetheless similar to, the error range achieved by experiment.
The maximum difference between any prediction made by the model, either in the training set or the test set, is ϳ0.6 kcal/mol, which corresponds to a difference in K d of approximately an order of magnitude ( Fig. 4 and Table 3). Because there appears to be no particular Wnts or Fzds for which poor   73 75 predictions are made, failure to make accurate predictions most likely occurs randomly and is not associated with a particular Wnt or Fzd structure; this is perhaps expected given the overall high structural quality of the models used. The binding affinities of the vast majority of cases in the training and test sets are predicted within 0.25 kcal/mol of the mean experimental values reported, which is within the experimental error range. Further elaboration of the selected four-descriptor models into five-descriptor models was performed but did not result in models providing significant improvements in predictions (data not shown); similar RMSEs and a similar number of predictions occurring within the experimental ranges in both the training and test sets were obtained for the best four-and fivedescriptor models. Thus, four-descriptor models were deemed sufficient for use in predicting binding affinities.

Prediction of binding affinities of Wnt-CRD interactions
In applying Model 1 to predict Wnt-CRD binding affinities in the mouse proteins, numerous trends are apparent ( Fig. 5A and supplemental Table S13). Fzd3, Fzd5, SFRP3, and SFRP4 generally display high-affinity, nonspecific binding of Wnts, as evidenced by more than half of the interactions predicted to afford strong binding affinities (i.e. Ͻ10 nM). Fzd8 also displays nonspecific binding of Wnts; however, the majority of interactions are predicted to be of lower affinity than those with Fzd3, Fzd5, SFRP3, and SFRP4. Fzd1, Fzd4, Fzd7, and Fzd9 generally display moderate affinity for a wide variety of Wnts. Fzd1, Fzd7, and Fzd9 display high affinity for limited Wnts, indicating more selective binding compared with Fzd3, Fzd5, Fzd8, SFRP3, and SFRP4, whereas Fzd4 displays high affinity for several Wnts, indicating less selective binding. Fzd1 displays high affinity for Wnt6, Fzd7 displays high affinity for Wnt10a, and Fzd9 displays high affinity for both Wnt7a and Wnt16. Fzd2, Fzd6, Fzd10, SFRP1, SFRP2, and SFRP5 all display moderate-to high-affinity binding to less than half of the Wnts. However, this does not strictly translate to high selectivity; Fzd6 and Fzd10 bind with moderate affinity to several Wnts. Fzd2 displays high affinity for Wnt3a, Wnt7b, and Wnt10a. SFRP1, SFRP2, and SFRP5 all display high selectivity for specific Wnt ligands but retain moderate affinity for the majority of remaining Wnts. SFRP1 appears highly selective for Wnt7a, whereas SFRP2 is selective for Wnt2b and Wnt3a. SFRP5 displays moderate affinity for Wnt2b, Wnt5b, and Wnt6.
The human data generally display trends similar to the mouse data ( Fig. 5B and supplemental Table S14). Fzd3, Fzd5, SFRP3, and SFRP4 still display generally high-affinity, nonspecific binding of Wnts; however, there are some specific points of difference. The interactions of human Fzd3 with Wnt8a and Wnt9a are predicted to be much higher affinity than in the case of the mouse, although the hFzd9-hWnt9a interaction is still predicted to be of only moderate affinity. Conversely, the inter-  action of human Fzd3 with Wnt5b is predicted to be of much lower affinity than the equivalent mouse interaction. The affinity of the mouse Wnt2 for SFRP3 and SFRP4 is predicted to be lower than the equivalent interactions in humans; however, Wnt9a is predicted to have increased affinity for these proteins in mouse compared with human. Significant differences in the predicted affinities of human Fzd4 for Wnt1, Wnt5a, and Wnt11 compared with the mouse interactions are observed; all of these interactions are predicted to be very low in binding affinity in humans, whereas in mice these are all predicted to be very high affinity. Large differences in the predicted affinities occur when comparing the interactions of mouse and human Fzd6, Fzd10, SFRP1, SFRP2, and SFRP5 (Fig. 5C); however, these interactions are generally predicted to be of low to moderate affinity and may not be indicative of different roles for Wnt interactions with these proteins in the two species.

Analysis of residues of functional importance to Wnt-Fzd CRD interactions
To propose residues of functional importance to Wnt-Fzd interactions, all 570 Wnt-Fzd CRD models were subject to MM-GB/SA analysis with per-residue decomposition using AMBER14 (35). This calculation allows the identification of specific residues making large contributions to the binding energy, which, in turn, can be used to suggest the most significant intermolecular contacts in the interaction. High-affinity complexes will generally have more residues making large contributions to the binding energy compared with low-affinity complexes; thus, high-affinity complexes will have greater influence on the designation of sequence positions of general importance to Wnt-Fzd CRD interactions.
Analysis of Fzd CRD-binding regions of Wnt indicates two major regions utilized by Wnt in binding Fzd CRDs (Fig. 6A). These correspond to the thumb and index finger regions of Wnt, which are already well known as Fzd CRD-binding regions (19,36). Interestingly, Wnt residues beyond these two regions are rarely implicated in Fzd CRD binding (supplemental Fig. S15), and the majority of Wnt residues in these regions frequently implicated in Fzd CRD binding are highly (often entirely) conserved in human and mouse Wnts.
In contrast to the Fzd CRD-binding regions of Wnt, which appear highly conserved and occupy relatively small sections of the Wnt sequence, the Wnt-binding regions of Fzd CRDs are distributed across several segments of the CRDs and often incorporate poorly conserved residues. Four sequences in the Fzd CRDs can be defined (Fig. 6A), two of which interact with the Wnt thumb region and two of which interact with the Wnt index finger region, with several additional residues of importance identified in specific cases (supplemental Fig. S16). Highly conserved Fzd CRD residues frequently implicated in Wnt binding are generally associated with lipid binding: the FXP motif, which frequently occurs within a helix forming one side of the lipid-binding site of the Fzd CRD, and the phenylalanine of an FXW motif in the latter part of the sequence both interact directly with the Wnt lipid (Fig. 6B). Hydrophobic residues adjacent to the final cysteine in the Fzd CRD are frequently implicated in binding the Wnt index finger, as are hydrophobic residues adjacent to the fourth cysteine of the Fzd CRD. However, the involvement of particular Fzd CRD residues in binding is often highly influenced by sequence variation, even for positions frequently implicated in Wnt binding. The greatest deviations in the utilization of Wnt-binding residues with respect to the set of Fzd CRDs occur in Fzd3, Fzd6, SFRP3, and SFRP4. The region corresponding to the FXP motif in SFRP4 occurs as YEE; the tyrosine and glutamate residues in this sequence are never implicated as strong contributors to binding any Wnt. In SFRP3, the phenylalanine of the motif is retained, but the proline is replaced by glycine; the phenylalanine is strongly implicated in binding to all Wnts, whereas this is never the case for glycine. The glutamates of a motif frequently occurring as EAGLE are often implicated in Fzd CRD binding to Wnt. In Fzd3 and Fzd6, the residue corresponding to the first glutamate is never strongly implicated in binding to any Wnt; this is replaced by a threonine in Fzd3 and an isoleucine in Fzd6. Substitution of this residue with aspartate (as occurs in several Fzds and SFRPs) or glutamine (as occurs in SFRP3 and SFRP4) does not appear to greatly influence the frequency with which this residue is involved in Wnt binding. Similarly, replacement of the second glutamate in the motif with alanine, as occurs in Fzd3, SFRP3, and SFRP4, eliminates the importance of this position to Wnt binding, whereas retaining it as a glutamate (as in Fzd6 and other Fzds and SFRPs), aspartate, or even glutamine does not seem to affect the frequency of its importance to binding.

Discussion
In this study, we have developed a model for predicting the binding affinity of Wnt-Fzd interactions. Although the model was developed against a relatively small set of data from a single study, there is nonetheless excellent agreement between affinities predicted in the current study and those experimentally determined in other studies that were not included in model building and testing here. The binding affinity of Wnt3a for the mouse SFRP3 was determined by surface plasmon resonance to be 7.9 nM (37); our model predicts this interaction to be at 0.28 nM, suggesting strong binding affinity. Binding affinities of Wnt3a, Wnt7a, Fzd10, and SFRP4 measured using ELISA (38) confirm our model's prediction that the Fzd5-Wnt3a interaction was stronger than that of Fzd10-Wnt7a and Wnt7a-SFRP4. However, direct comparisons of K d values predicted by our model and those determined by ELISA are challenging as our model has been optimized against BLI data, where a direct interaction is measured, whereas ELISA is a coupled assay; thus, K d values obtained from BLI are likely to indicate higher affinity than those obtained from ELISA.
As experimentally determined binding affinities of Wnt-Fzd CRD interactions are largely limited to those included in our training and test sets, it is also pertinent to investigate whether interactions demonstrated experimentally through coIP were predicted by our model to have strong binding affinities. mFzd4-mWnt2b (39), hFzd4-hWnt2 (40), mFzd4-mWnt7b (41), mFzd6-Wnt4 (42), and hWnt3a-hSFRP4 (43), which were shown by coIP to interact, are predicted by our model to bind with an affinity in the intermediate or tighter range (Ͻ40 nM). However, the interaction of SFRP1 with Wnt5a, which has been demonstrated by coIP (44), is suggested by our model to bind in the low micromolar range. Although this would be within the range detectable by coIP and is indeed a typical range for other interactions of biological relevance, particularly protein-carbohydrate interactions (45), binding affinities of functionally relevant Wnt-Fzd CRD interactions generally appear to occur in the low-to-mid nanomolar range, as evidenced in the data upon which we have based our prediction model. Therefore, it is likely that the affinity of the SFRP1-Wnt5a interaction is drastically underestimated by the model.
Despite the failure of the model in selected cases to achieve accurate predictions, the model nonetheless performs remarkably well at predicting binding affinities and likely interactors, particularly when considering that the Wnt-Fzd CRD interaction is rather complex due to involvement of both protein-protein and protein-lipid interactions at different sites. This would further suggest its usefulness in predicting the effect of Wnt/ Fzd mutations to residues involved in either of the binding sites. The predictive success of the model is likely attributable to two main factors. The first is the use of a test set of cases separated from the training set to validate the model, which is not always performed in developing quantitative structure-activity relationships; even more remarkably, the use of an external data set for model validation appears to be a matter of some debate in the quantitative structure-activity relationship literature (46). The second is the incorporation of a term in the model specifically considering the contribution to binding made by the lipid. The direct involvement of Wnt lipidation in facilitating the Wnt-Fzd interaction is likely unusual among protein-protein interactions; lipidation typically appears to influence the solubility and localization of proteins, rather than directly facilitate protein-protein interactions (47). However, other post-translational modification of proteins, such as glycosylation, phosphorylation, and methylation, are very common and are often involved in facilitating protein folding and mediating proteinprotein interactions (48 -50). Because post-translational modifications such as these are generally not accommodated in protein-protein docking and scoring, the strategy demonstrated here is one that could be adapted to facilitate their inclusion in protein-protein docking and scoring.
This study has revealed trends with regard to the selectivity and promiscuity of Wnt ligands for Fzd CRDs. The study particularly highlights the promiscuous nature of SFRP4, a Wnt antagonist of interest to our group (51)(52)(53)(54)(55)(56). SFRP3 is predicted to display similarly low selectivity for Wnt ligands, whereas SFRP1, SFRP2, and SFRP5 are predicted to display much higher selectivity. The various levels of selectivity are likely to be due to the evolutionary development of tissue expression patterns of Wnt ligands and Fzd receptors, where SFRPs can partially limit aberrant Wnt signaling for controlled tissue development (57).
This study has focused on the interactions of Wnt proteins with the Frizzled-type cysteine-rich domains of the Fzd receptors and the secreted Frizzled-related proteins. However, a variety of other proteins also contain Frizzled-type CRDs, albeit less closely sequence-related to those of the Fzds and SFRPs. These include Smoothened, atrial natriuretic peptide-converting enzyme (CORIN), the tyrosine protein kinase transmembrane receptors ROR1 and ROR2, the skeletal muscle receptor tyrosine protein kinase (MuSK, for which a structure of the Fzd CRD has been experimentally solved (58)), the collagen XVIII ␣-1 chain, carboxypeptidase Z, and the membrane Frizzledrelated protein. With the exception of the RORs (59 -62), it is unknown whether any Wnt binds to these proteins and, if so, whether such an interaction is functionally relevant in the context of Wnt signaling. The approaches utilized in the current study could be applied to investigate the binding of Wnts to the Frizzled-type CRDs of these proteins, which in turn could stimulate further research into alternative Wnt signaling pathways.
It is important to note that a high-affinity interaction between a given Wnt and a given CRD does not necessarily translate into a signal transduction event. Wnt signaling involves several additional proteins both extracellularly and intracellularly. For example, in canonical Wnt signaling, Wnt binds to a Fzd CRD as well as the co-receptor LRP5/6 (63); on the intracellular side, this likely causes a conformational change in Fzd, resulting in movement of the Fzd intracellular loop 3 and C-terminal helix, which in turn permits Dvl binding and subsequent signal transduction (64,65). Thus, the biological relevance of given Wnt-Fzd CRD interactions will be influenced by the co-expression/co-localization of these other proteins. Recent structural data on LRP6 (66 -68) and the Smoothened receptor, a Class F GPCR related to Fzd receptors (69 -72), as well as the availability of Dvl domain structures (73)(74)(75)(76) and knowledge of key residues in the Fzd-Dvl interaction (64,77,78) provide the opportunity to investigate more completely the structural basis of canonical Wnt signaling. Additionally, the structures of several intracellular components in non-canonical Wnt signaling pathways are known or adopt structurally characterized folds, suggesting the potential for structural investigations. The models generated in this study provide a solid basis by which to pursue further structural studies of Wnt signaling and, perhaps of greater importance, given the combinatorial nature of potential Wnt-Fzd interactions, suggest specific interacting partners on which to focus experimental and computational efforts.

Template preparation
The template structure for all models was the complex of the XWnt8 with the mouse Fzd8 CRD (Protein Data Bank code 4F0A) (19). This structure was initially processed by the Protein Preparation Wizard, with missing side chains and loops filled in by Prime. Although the identity of the lipid modification to XWnt8 in this structure could not be conclusively determined (19), we have presumed this modification to be palmitoleic acid, as indicated either by direct experimental evidence or comparison with similar sequences for which this modification has been demonstrated (18, 79 -83). The lipid in the structure was manually modified using Maestro to be a palmitoleic acid modification, which involved the creation of a double bond between carbons 9 and 10 and the addition of carbons 15 and 16 to the lipid, which were missing from the structure. The lipid was subject to a Monte Carlo multiple minimum conformational search using Macromodel, with the region comprising carbons 9 -16, as well as the hydrogen atoms attached to these carbons, defined as a freely moving substructure, residues within 6.0 Å of this defined as a frozen shell, and a torsional constraint to ensure cis double bond geometry about carbons 9 and 10. Automatic setup of the substructure was used to define rotatable bonds to be searched; however, all torsion check parameters were removed. Extended torsion sampling was used. A maximum of 10,000 steps was used for the search with a maximum of 2000 steps per rotatable bond. The lowest energy structure obtained from the search provided a template structure for the lipid that was used in all models.

Homology modeling
Sequences of Wnts, Frizzled, and SFRP CRDs from both mouse and human were obtained from the UniProt database (84) (accession numbers are provided in supplemental Table  S1). Homology models were prepared using Prime 4.1 (85) (sequence ranges and alignments used are provided in supplemental Figs. S2 and S3). All models were prepared using knowledge-based building; however, because of the presence of large insertions in the human and mouse Wnt6, Wnt10a, and Wnt10b sequences relative to XWnt8, an alternative strategy to building these structures was performed (see below). Disulfide bonds c13-c17 and c16 -c24 in Wnt (see Ref. 36 for description of cysteine numbering in Wnts) typically could not be created during the model building process due to being adjacent to insertions/gaps in the sequence alignment; these bonds were manually inserted, and the residues involved were energy-minimized. The lipid structure generated during template preparation was not included during Wnt model building but manually attached following model building.
To build the structures of the human and mouse Wnt6, Wnt10a, and Wnt10b, an initial model of the complete mouse Wnt10a was generated using the I-TASSER server (86). Structures of the remaining Wnts were then built using knowledgebased building in Prime against both the mouse Wnt10a model generated by I-TASSER (to provide the structure of the insertion) and the XWnt8 structure (to provide a template for modeling the remainder of the structure).

Complex generation and refinement
All combinations of Wnt-CRD complexes were generated by merging the structures of each of the models built in the previous step. The generated complexes were subject to refinement using Prime 4.1. The refinement process was facilitated through the use of a KNIME workflow (supplemental Fig. S4). In each complex, non-template residues and residues within 6.0 Å of the binding interface were subject to Prime Minimization and Prime Side-Chain Prediction, followed by a second Prime Minimization. The Wnt lipidation was excluded from the first minimization to allow CRD residues to relax around it but included in the second minimization. For complexes involving mouse and human Wnt6, Wnt10a, and Wnt10b, the large insertions modeled by I-TASSER were also subject to the refinement procedure. The quality of the refined models was assessed using the MolProbity score as calculated by the Mol-Probity module within PHENIX (87). The quality of the refined models was also assessed by calculating the RMSD of the C␣ atoms and the TM-score with respect to the XWnt8-mFzd8 CRD complex. These measurements were both calculated using MM-align (88), with the option to enforce interface alignment by the default cutoff enabled.

Development and validation of the binding affinity prediction model
Complexes of mouse Wnt3a, Wnt4, Wnt5a, and Wnt5b with mouse Fzd1, Fzd2, Fzd4, Fzd5, Fzd7, and Fzd8 were rescored using all of the scoring functions contained in the CCharPPI server (28). As the scoring functions are generally only capable of considering interactions between protein residues, the lipid modification to Wnt was removed prior to rescoring. To consider contributions to the binding affinity made by the lipid, Prime MM-GB/SA calculations on the interaction between the lipid and the CRDs were performed. For these calculations, the protein component of Wnt was removed.
The scores for each complex by each scoring function in CCharPPI as well as the values of the terms provided by the Prime MM-GB/SA calculations were loaded into Maestro. A property containing the dissociation constants determined by BLI for selected mouse Wnt-Fzd CRD pairs (27) was manually created and used to define the activity property. Complexes involving interactions with either Wnt4 or Fzd1 comprised the test set, whereas all other complexes comprised the training set; the training and test sets are summarized under "Results" (Table 3). Both the training and test sets cover a diverse range of Wnts, Fzds, and binding affinities for Wnt-Fzd interactions.
Strike was used to generate affinity prediction models. Multiple linear regression was used to build models. Functions from CCharPPI and properties from Prime MM-GB/SA provided the descriptors used in model building; the full list of functions and properties considered in model building is provided in supplemental Table S5. The success of the models in predicting binding affinities for complexes in both the training and test sets was evaluated using RMSE and the percentage of complexes for which the predicted value occurred within the experimental range (RMSE train , RMSE test , InExp train , and InExp test ).
All possible three-descriptor models incorporating one term from Prime MM-GB/SA and the remaining two terms from CCharPPI were investigated. Models with RMSE train less than 0.5 kcal/mol and InExp train greater than 50% were selected for testing. Models performing at least as well for the test set as for the training set (i.e. RMSE test Յ 0.5 kcal/mol and InExp test Ն 50%) were selected for further elaboration into four-descriptor models, which were generated by adding an additional term from CCharPPI to the best performing three-descriptor models. Four-descriptor models giving RMSE train and RMSE test less than 0.3 kcal/mol and InExp train and InExp test greater than 75% were selected as high-performing models. Elaboration of the four-descriptor models into five-descriptor models was also pursued by adding another term from CCharPPI.
As a final check of model quality, we also checked whether the approximate range of binding affinity predicted by the best models is in that expected. Dijksterhuis et al. (27) used a simplified scheme wherein Wnt-Fzd binding affinities were classified as strong (Ͻ10 nM; ϩϩϩϩ), intermediate (10 -40 nM; ϩϩϩ), weak (40 -100 nM; ϩϩ), very weak (Ͼ100 nM; ϩ), and non-binding (Ϫ). We have utilized this scheme with some modification; we have considered predictions of 100 -400 nM to constitute the very weak (ϩ) category and predictions greater than 400 nM to be effectively non-binding (Ϫ); the 400 nM limit was chosen in relation to the intermediate/weak affinity range defined.

Analysis of functional residues in Wnt-Fzd interactions
All 570 Wnt-Fzd CRD models were subject to MM-GB/SA analysis using AMBER14 (35). Wnt-Fzd complexes were parameterized using the ff14SB force field (89). Parameter generation for O-palmitoleoylserine was facilitated by Antechamber (90), adapting procedures described in both the AMBER14 reference manual and AMBER tutorials. MMPBSA.py facilitated MM-GB/SA calculations (91). The modified generalized Born model of Onufriev et al. (igb ϭ 5) (92) with a salt concentration of 0.1 M was used to calculate the polar desolvation energy. The non-polar desolvation energy was calculated using surface areas derived from the linear combinations of pairwise overlaps (LCPO) method (93) multiplied by surface tension (the default of 0.0072 kcal/(mol Å 2 ) was used). Energies calculated by MM-GB/SA were decomposed on a per-residue basis with 1-4 terms added to the internal potential terms (idecomp ϭ 1) (94). Residues contributing greater than Ϯ2.0 kcal/ mol to the total MM-GB/SA binding energy were selected as being of functional importance to binding. Logo analysis of regions within the Wnt and Fzd sequences frequently found to contain residues of functional importance to Wnt-Fzd binding was performed using the WebLogo server (95). Sequence logos were generated as frequency plots.
Author contributions-M. A. conceived the idea for the work, conducted the experiments, and analyzed the results. M. A. and S. Ö .-G. P. prepared the manuscript. M. A., S. Ö .-G. P., and A. D. critically reviewed and revised the manuscript. All authors reviewed the results and approved the final version of the manuscript.