Combinatorial and Computational Approaches to Identify Interactions of Macrophage Colony-stimulating Factor (M-CSF) and Its Receptor c-FMS*

Background: Identifying residues crucial for M-CSF·c-FMS binding remains difficult. Results: Using a combination of experimental and computational methods, we identified mutations on M-CSF that reduce affinity to c-FMS. Conclusion: Affinity-reducing mutations are located both inside and outside of the binding interface. Significance: Knowledge of the critical residues will facilitate a better understanding of the M-CSF mechanism and facilitate drug design. The molecular interactions between macrophage colony-stimulating factor (M-CSF) and the tyrosine kinase receptor c-FMS play a key role in the immune response, bone metabolism, and the development of some cancers. Because no x-ray structure is available for the human M-CSF·c-FMS complex, the binding epitope for this complex is largely unknown. Our goal was to identify the residues that are essential for binding of the human M-CSF to c-FMS. For this purpose, we used a yeast surface display (YSD) approach. We expressed a combinatorial library of monomeric M-CSF (M-CSFM) single mutants and screened this library to isolate variants with reduced affinity for c-FMS using FACS. Sequencing yielded a number of single M-CSFM variants with mutations both in the direct binding interface and distant from the binding site. In addition, we used computational modeling to map the identified mutations onto the M-CSFM structure and to classify the mutations into three groups as follows: those that significantly decrease protein stability; those that destroy favorable intermolecular interactions; and those that decrease affinity through allosteric effects. To validate the YSD and computational data, M-CSFM and three variants were produced as soluble proteins; their affinity and structure were analyzed; and very good correlations with both YSD data and computational predictions were obtained. By identifying the M-CSFM residues critical for M-CSF·c-FMS interactions, we have laid down the basis for a deeper understanding of the M-CSF·c-FMS signaling mechanism and for the development of target-specific therapeutic agents with the ability to sterically occlude the M-CSF·c-FMS binding interface.

Ligand⅐receptor interactions are fundamental to many processes occurring in living organisms. The evolution of these protein-protein interactions (PPIs) 3 is dependent on the development of highly specific sites of interaction within the ligand⅐receptor complex. A comprehensive understanding of how specific PPIs evolve requires the precise identification of the binding recognition sites on the protein agonists (1). Ligand⅐receptor recognition depends mainly on the physical and chemical properties of the binding interfaces of the two protein surfaces that become buried upon complex formation. A number of studies on the properties of the binding interfaces have revealed that non-covalent interactions, such as hydrogen bonding, burial of the hydrophobic area, and van der Waals interactions collectively determine the binding affinity between the proteins (2,3). This binding affinity may be affected by mutations in one or both of the binding partners. In addition to residues lying in direct binding interfaces, mutations of more distant residues can also influence the binding strength of ligand⅐receptor complexes, either through allosteric effects (4) or through changes in the folding of the ligand protein.
Identification of the residues most crucial for binding of a particular ligand⅐receptor complex is important not only for understanding the evolution of inter-molecular interactions but also for the design of therapeutic molecules directed at inhibiting these interactions. However, the identification of such residues is frequently a difficult task, especially when a high resolution structure of the ligand⅐receptor complex is not available. In the absence of a high resolution structure, such residues have to be identified experimentally (i.e. through epitope mapping). One possible technique for such identification is alanine scanning mutagenesis, in which each residue, in turn, is replaced with Ala, and the change in the free energy of binding (⌬⌬G binding ) is measured (5,6). However, such a pro-cess is time-consuming and laborious, involving protein production and purification and binding affinity measurements for each mutant. Better alternatives lie in approaches based on various display technologies that allow for a quick exploration of all possible mutations and qualitative mapping of the contribution of each residue to the binding affinity (7)(8)(9).
In this study, we chose to perform epitope mapping by using the yeast surface display (YSD) technology (10 -12), because this technique facilitates both rapid and quantitative library screening by fluorescence-activated cell sorting (FACS) and the minimization of artifacts resulting from host-expression bias through concurrent expression labeling (12,13). For epitope mapping, in particular, there are two important advantages of YSD as follows: there is no need to produce and purify protein variants (11), and in contrast to phage display, the technique is suitable for eukaryotic proteins, preserving their folds and allowing glycosylation, albeit through slightly different sugar compositions compared with those of humans (15,16). Indeed, epitope mapping by YSD has been successfully performed for epidermal growth factor receptor (17) and gp120 (18) as model systems and has also been used to map the neutralizing antibodies of botulinum neurotoxin (19).
Alongside experimental approaches, binding epitopes have also been mapped by computational means. Recently, Sharabi et al. (20,21) developed a computational saturation mutagenesis protocol that introduces all possible single mutations into the binding interface and predicts ⌬⌬G binding due to such mutations. This computational protocol, although successful for predicting the effect of mutations on experimental ⌬⌬G binding in several PPIs, requires a high resolution threedimensional structure of the ligand⅐receptor complex and can predict only the effect of residues within the ligand⅐receptor interface (22)(23)(24). In addition to the protocol of Sharabi et al. (20,21), there are a number of other protocols that can predict changes in the thermodynamic stability of proteins due to mutations, which may result in ligand unfolding and consequently in a reduction in target binding (25)(26)(27). Nevertheless, computational and experimental combinatorial epitope mapping approaches are not mutually exclusive, and their combination is a powerful strategy for studying PPIs. In this study, our combinatorial approach was complemented with computational modeling with the aim to analyze and explain the effects of mutations identified by experimental YSD.
For the purpose of this study, we chose to investigate the important complex between the ligand, macrophage colonystimulating factor (M-CSF), and the tyrosine kinase receptor, c-FMS, whose epitope is not known due to the absence of the x-ray structure of the complex. The interactions between M-CSF and c-FMS play key roles in the immune response and in bone metabolism. M-CSF is the most pleiotropic macrophage growth factor, regulating macrophage survival, proliferation, differentiation, and chemotaxis (28 -30). Furthermore, M-CSF together with the receptor activator of the NF-B ligand are both essential and sufficient to induce the differentiation of mononuclear monocytes into osteoclasts, the boneresorbing cells (31,32). In this context, the significance of the interaction of M-CSF with c-FMS is clearly evident in both M-CSF and c-FMS-deficient mice, which suffer from retarded skeletal growth and osteopetrosis as a result of diminished osteoclast function (33)(34)(35). Moreover, the M-CSF⅐c-FMS complex plays an important role in cancer (36). It was found that M-CSF is linked to macrophage recruitment in tumors, thus promoting their survival (37)(38)(39). Indeed, the M-CSF⅐c-FMS complex was shown to be involved in the development and survival of many cancer types, including breast cancer (39,40), prostate cancer (41), melanoma (42), and bone metastasis (39).
The human M-CSF ligand consists of a signal peptide, a 149amino acid receptor-binding domain, a variable spacer region, and a transmembrane region with a cytoplasmic tail (29). The human M-CSF receptor (c-FMS) is a 972-amino acid glycosylated protein that includes an extracellular region composed of five immunoglobulin-like domains (D1-D5), a transmembrane region, and an intracellular autocatalytic kinase domain (29). The bivalent binding of the soluble M-CSF covalent homodimer to the D2 and D3 extracellular domains of c-FMS results in receptor homodimerization through the membraneproximal D4 and D5 domains of c-FMS and activation of the M-CSF⅐c-FMS signal transduction cascade, resulting in rapid stimulation of gene transcription, protein translation, and cytoskeletal remodeling (28,29,35,41,43).
Identifying the M-CSF residues that are crucial for its binding to c-FMS is important for understanding the mechanism of the diseases caused by misregulation of the M-CSF⅐c-FMS signaling pathway (29,34,44) and could assist in the development of drugs targeting this interaction. In this study, we used YSD to map M-CSF positions where mutations result in a significant decrease in the affinity of M-CSF for c-FMS. With purified proteins and computational modeling, we were able to further explain the mechanism by which the identified mutations are predicted to affect the binding affinity of the complex.

Experimental Procedures
Constructing a DNA Library in Yeast-The gene encoding for the secreted isoform of monomeric human M-CSF (M-CSF M ), which is 474 nucleotides long, was purchased from Genscript. To produce M-CSF M , a mutation was inserted at position 31 to convert cysteine to serine so as to prevent dimerization of the monomers. Homology sites to pCTCON (45) and recognition sites for the restriction enzymes EcoRI and AvrII were inserted in the beginning and the end of the gene, respectively. The gene was amplified using a PCR with Phusion DNA polymerase (New England Biolabs) with primers that fit pCTCON homology (pCTCON forward, 5Ј-TAAGGACAATAGCTCGACG-ATTGAAG-3Ј, and pCTCON reverse, 5Ј-CAGATCTCGAG-CTATTACAAGTCCTCTTC-3Ј). Thereafter, the gene was transformed into a competent EBY100 Saccharomyces cerevisiae yeast strain, along with a linear pCTCON plasmid, by using a Mi-croPulser electroporator (Bio-Rad) for the sake of homologous recombination, as described previously by Wittrup and co-workers (11); the resultant yeast contained the gene for the monomeric version of M-CSF, designated M-CSF M . The plasmid was extracted from the yeast using Zymoprep TM yeast plasmid miniprep I kit (Zymo Research) and was amplified again using a PCR with Phusion enzyme (primers: SGA forward 5Ј-CAGTAACGTTTG-TCAGTAATTGCG-3Ј, and SGA reverse 5Ј-GTGTAAAGTTG-GTAACGGAACG-3Ј). The PCR product was amplified and mutated using the GeneMorph II random mutagenesis kit (Agilent Technologies). The conditions of the PCRs for constructing the three libraries, having different mutation frequencies, were set according to the kit protocol, using DNA polymerase, dNTP mix, and reaction buffer that were supplied with the kit. The primers for the reaction were pCTCON forward and reverse. Different amounts of template DNA (500, 750, and 1000 ng) were inserted into each 50-l PCR, which had 30 amplification cycles with the aim to obtain different mutation rates in the libraries. The gene libraries were amplified, using another PCR (primers: pCTCON inner forward, 5Ј-GTTCCAGACTACGCTCTGCAGG-3Ј, and pCTCON reverse) using Phusion DNA polymerase, to produce ϳ30 g of DNA insert and were transformed into competent S. cerevisiae yeast strain EBY100. The transformed yeast was grown on SDCAA plates (0.54% Na 2 HPO 4 , 0.856% Na 2 HPO 4 ⅐-H 2 O, 18.2% sorbitol, 1.5% agar, 2% dextrose, 0.67% yeast nitrogen base, 0.5% bacto casamino acids), with dilution plating to determine library size, and in SDCAA medium (2% dextrose, 0.67% yeast nitrogen base, 0.5% bacto casamino acids, 1.47% sodium citrate, 0.429% citric acid monohydrate, pH 4.5) (11). The libraries were combined to produce a library with 6 ϫ 10 4 variants. To determine whether there were enough single point mutation variants to cover all possibilities, 90 different plasmids were extracted from the bacteria using HiYield plasmid mini kit (RBC Bioscience, Taiwan) and then sequenced (sequencing laboratory of the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Israel). Out of the sequenced plasmids, 29% contained a single point mutation; thus, the library contained 17,400 single point mutation variants at the DNA level, i.e. more than an order of magnitude larger than all the possibilities for single point mutations, which are three times the number of nucleotides in the gene. The M-CSF M gene library was inserted into the pCTCON plasmid, containing all the genetic information for expression of proteins in the YSD system (10), which was subsequently cultivated in SGCAA medium (2% galactose, 0.67% yeast nitrogen base, 0.5% bacto casamino acids, 1.47% sodium citrate, 0.429% citric acid monohydrate) (11), which induces the expression of the YSD system (45) (Fig. 1A).
Flow Cytometry Screening and Sorting-The yeast containing the gene for M-CSF M was grown in SGCAA medium overnight, until the culture reached A 600 ϭ 10 (10 8 cells per ml). Cells were then collected and washed with 1 ml of PBSA (phosphate-buffered saline (PBS) ϩ 1% of bovine serum albumin (BSA)). The expression of the displayed proteins was detected by labeling the cells with 9E10 mouse anti-c-Myc antibody (Abcam) in a 1:50 ratio, followed by sheep anti-mouse antibody conjugated to phycoerythrin (PE) (Sigma) in a 1:50 ratio. The binding of displayed M-CSF M to the five extracellular domains of the human c-FMS receptor was detected with 1 nM soluble c-FMS conjugated to human Fc (R&D Systems), followed by goat antihuman Fc conjugated to fluorescein isothiocyanate (FITC) in a 1:50 dilution (Sigma). The labeling process was performed according to the protocol previously described by Boder and Wittrup (10). The yeast displaying the labeled proteins were screened using an Accuri C6 flow cytometry analyzer (BD Biosciences). The horizontal axis provides an indication of the expression of the displayed protein (PE stain) and the vertical axis the binding of the presented protein to the target receptor (FITC stain). The same labeling procedure was used for the M-CSF M library and for the individual selected clones, with the latter experiment being repeated three times. For the flow cytometry M-CSF M library sorting process, a desired population (see details below) was enriched using a diagonal sorting gate with iCyt Synergy FACS apparatus (Sony Biotechnology). For the YSD-based titration curves, the same labeling process was implemented, and the percent of the binding population of M-CSF M and the variants V78A, S84P, and L85S was measured at receptor concentrations of 10, 50, 100, and 500 pM, 1, 10, 50, 100, and 500 nM, and 1 M. Each measurement was normalized to the same clone's highest binding percent, which was set as 1, to receive a plot of normalized binding percent against receptor concentrations. A non-linear specific binding fit was implemented and all the curves using GraphPad Prism (GraphPad Software). The values of K D for each protein were calculated using the fit.
DNA Sequencing-To perform DNA sequencing of the yeast library, the plasmid was first extracted from the yeast by using the Zymoprep TM yeast plasmid miniprep I kit (Zymo Research). The extracted plasmid was transformed into competent Escherichia coli cells via electroporation with a Micro-Pulser Electroporator (Bio-Rad). The transformed bacteria were grown overnight on 100 g/ml LB-ampicillin (amp) agar plates. The bacterial colonies were picked out and grown in LB-amp medium, and the plasmid was extracted from them with the HiYield plasmid mini kit (RBC Bioscience, Taiwan). The purified plasmids were then sequenced using the Sanger sequencing method (sequencing lab, National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev). The results were analyzed using Geneious R7 (Biomatters, New Zealand).
Protein Expression and Purification-Several variants of M-CSF M were produced. Proteins were expressed in Pichia pastoris strain GS115 (46). The DNA encoding for the M-CSF M variants was restricted using EcoRI and AvrII enzymes (New England Biolabs) and then ligated into a linear pPIC9K plasmid (Invitrogen), containing the AOX1 promoter for the expression of the gene (47,48). A FLAG tag was added onto the N terminus of the protein and a His tag onto the C terminus. The ligated plasmid was transformed into E. coli, as described under "DNA Sequencing," and plated on LB-amp plates. Colonies were picked out and sequenced. A colony with the correct sequence was grown in LB-amp medium, and the plasmid was extracted from the bacteria. The plasmids were then linearized with SacI restriction enzyme (New England Biolabs) and transformed into electrocompetent P. pastoris GS115 cells by using the Multi-Copy Pichia expression kit protocol (Invitrogen). The transformed cells were plated on RDB plates (18.6% sorbitol, 2% agar, 2% dextrose, 1.34% yeast nitrogen base, 4 ϫ 10 Ϫ5 % biotin, and 5 ϫ 10 Ϫ3 % each of L-glutamic acid, L-methionine, L-leucine, L-lysine, and L-isoleucine) (49) for 48 h, collected, and plated on YPD-G418 (4 mg/ml G418) (49) plates for 48 h for multiple copy screening (50). Colonies were picked out of the plates and grown on 5 ml of BMGY medium (2% peptone, 1% yeast extract, 0.23% K 2 H(PO 4 ), 1.1812% KH 2 (PO 4 ), 1.34% yeast nitrogen base, 4 ϫ 10 Ϫ5 % biotin, 1% glycerol) (51) overnight and in 5 ml of BMMY medium (2% peptone, 1% yeast extract, 0.23% K 2 H(PO 4 ), 1.1812% KH 2 (PO 4 ), 1.34% yeast nitrogen base, 4 ϫ 10 Ϫ5 % biotin, 0.5% methanol) (51) for 3 days, with 0.5% methanol being added each day. The cells were suspended, and overexpression of the desired protein was examined using the Western blot method, with 1:1000 ratio of mouse anti-FLAG primary antibody (Sigma), 1:5000 ratio of anti-mouse secondary antibody conjugated to alkaline phosphatase (Jackson ImmunoResearch), and 2 ml of 5-bromo-4-chloro-3-indolyl phosphate reagent for signal development (Sigma). The culture with the highest recombinant protein expression was grown in 50 ml of BMGY medium overnight and then in 0.5 liters of BMMY medium for 3 days, with 0.5% methanol being added each day. The cell suspension was subjected to two rounds of centrifugation and filtration as follows. The suspension was centrifuged at 3800 ϫ g for 10 min; the supernatant was filtered off using a 0.22-m vacuum filter; the pH was adjusted to 8.0; and 300 mM NaCl and 5 ml of 1 M imidazole were added for 1 h at 4°C to give a final concentration of 10 mM imidazole. The protein was then purified on nickel-nitrilotriacetic acid-Sepharose beads (Invitrogen) that bind to the His tag within the protein and eluted with 20 ml of a solution containing 50 mM Tris, pH 7, 300 mM NaCl, 250 mM imidazole, and 10% glycerol. The protein buffer was replaced with PBS by using Vivaspin with a 3-kDa cutoff (Vivaproducts). Sugars linked via N-glycosylation sites were removed from half of the concentrated protein by using 1:10 G5 buffer and 2 l of endoglycosidase HF enzyme (New England Biolabs). Both glycosylated and non-glycosylated proteins were further purified using AKTA pure 150 FPLC with a Superdex 200 16/600 size-exclusion column (GE Healthcare). The yield of each of the purification processes was 10 -27 mg per 0.5 liters of yeast culture, as determined using a NanoDrop spectrophotometer (Thermo Scientific) (extinction coefficient 10,345 M Ϫ1 cm Ϫ1 ), based on protein absorbance at 280 nm. Protein samples were also subjected to mass spectrometry analysis (Ilse Katz Institute for Nanoscale Science and Technology, Ben-Gurion University of the Negev).
Circular Dichroism Analysis-Structural analysis of the purified proteins was performed using a J-815 CD spectrometer (Jasco, Tokyo, Japan) with a 1-mm path length quartz cuvette. Spectra of 5 M of purified protein in 400 l of PBS were obtained at room temperature. The average of three spectra was normalized to obtain ellipticity (degree ϫ cm 2 /dmol). Data points with a diode voltage Ͼ1000 V were excluded. For M-CSF M , the CD spectra were obtained for both glycosylated and non-glycosylated proteins.
Surface Plasmon Resonance (SPR) Spectroscopy-The binding constants between M-CSF M and the soluble variants and c-FMS receptor, and between M-CSF M and itself (dimerization assay) were determined by SPR spectroscopy on a ProteOn XPR36 (Bio-Rad). Glycosylated M-CSF M and non-glycosylated M-CSF M and its variants were immobilized on the surface of the chip by using the amine coupling reagents sulfo-NHS (0.1 M N-hydroxysuccinimide) and EDC (0.4 M 1-ethyl-3-(3-dimethylaminopropyl)-carbodiimide, Bio-Rad). The proteins (1 g) were each covalently immobilized on the chip in 10 mM sodium acetate buffer, pH 4.0, to give 1507, 1960, 2204, 1670, and 2486 response units (RU) for M-CSF M -glycosylated, M-CSF M -nonglycosylated, and the V78A, L85S, and S84P variants, respectively. BSA (3 g; 4514 RU) was immobilized on the chip as a negative control. Unbound esters were deactivated with 1 M ethanolamine HCl at pH 8.5. Before each binding assay, the temperature was set at 25°C. Soluble human c-FMS receptor (Sino Biological, China) was then allowed to flow over the surface-bound M-CSF at concentrations of 4.375, 8.75, 17.5, 35, and 70 nM and a flow rate of 50 l/min for 6 min 50 s. While the analyte was flowing over the surface, the interactions between M-CSF and c-FMS were determined. The next step was to examine the dissociation of the proteins, while allowing PBST (phosphate-buffered saline ϩ 0.005% Tween) to flow over the surface for 5 min. After each run, a regeneration step was conducted with 50 mM NaOH at a flow rate of 100 l/min. For each protein complex, a sensorgram was generated from the RUs measured during the course of the PPI minus the values of the BSA channel. The dissociation constant (K D ) was determined from the sensorgram of the equilibrium binding phase. Glycosylated M-CSF M (31.25, 62.5, 125, 250, and 500 nM) was allowed to flow through the system to ensure that dimerization of the monomeric variants did not occur.

Construction of a Model for M-CSF⅐c-FMS Complexes-
Because there was no x-ray structure available for human M-CSF in a complex with human c-FMS, we created a model for this complex based on available x-ray structures of each chain, either as an unbound monomer (M-CSF M ) or in a complex with another target (c-FMS). To this end, the structure of the c-FMS monomer was taken from chain C of the structure of human IL-34 bound to human c-FMS (Protein Data Bank code 4DKD (52)), and the structure of the human M-CSF dimer was taken from chains A and B of its x-ray structure (Protein Data Bank code 3UF2 (53)). These separate structures were then superimposed on the murine homolog of the M-CSF⅐c-FMS complex (Protein Data Bank code 3EJJ (54)) with the aid of PyMOL software. This preliminary superimposed structure was then further refined using the Rosetta docking protocol of the ROSIE server (55,56). The structure with the lowest total score and the smallest root-mean-square deviation with respect to the murine complex structure was chosen.
Calculation of Changes in Binding Free Energy and Protein Stability-For selected mutants, the effect of the mutation on the local stability and change in free energy of binding was determined in silico. For these calculations, we utilized the ORBIT protein design software (26) using the model of the M-CSF M ⅐c-FMS complex as an input. The binding interface was defined and limited to those residues on M-CSF M with atoms located within 5 Å from the c-FMS chain. All other residues in M-CSF M were divided into three groups: core, surface, and boundary, depending on their proximity to the protein surface (26). Core residues are those whose ␣-carbons are at a distance of 5 Å or greater from the protein surface and ␤-carbons at a distance of 2 Å or greater from the surface. Surface residues are those residues whose sum of ␣-carbon and ␤-carbon distances from the protein surface is less than or equal to 2.7 Å. Boundary residues are all other residues. For calculating the free energy of binding for mutations in the interface, we utilized the energy function optimized by Sharabi et al. (57). In the course of the calculation, mutation was introduced into M-CSF M , a shell around the mutated residue consisting of both the interface residues, and the residues in direct contact with the interface were allowed to repack, and the free energy of the M-CSF M ⅐c-FMS complex was calculated for the WT and for the mutant complex. The two chains were then separated, and the free energy of each single chain in the WT and the mutant complex was calculated. The change in free energy (⌬G) of the complex was calculated by subtracting the free energy of the WT complex from the free energy of the mutant complex. For calculating ⌬G of each individual chain, the free energy of the respective WT chain was subtracted from the free energy of the respective mutant chain. The value of ⌬⌬G binding was determined by subtracting the ⌬G value of the individual chains from the ⌬G value of the complex as a result of the mutation. For calculating changes in the thermodynamic stability of the M-CSF M mutants, the standard ORBIT energy function (optimized for design of monomeric proteins) was utilized (26); only the M-CSF M chain was used in the calculation, and residues within 2 Å of the mutant residue were allowed to repack. ⌬G was calculated by subtracting the energy of the M-CSF M struc-ture from that of the mutant structure. The following variants were excluded from this analysis: E1V, which is not visible in the crystal structure of M-CSF; all the mutations to/from cysteine that were assumed to have changed the protein conformation due to altered disulfide bond formation; and mutations to proline, which cannot be accurately modeled with this method. The energy functions consist of terms that describe van der Waals attractive and repulsive interactions, hydrogen bond interactions, electrostatic interactions, and surface areabased solvation (see Sharabi et al. (57)). Rotamer libraries were used, which were based on the backbone-dependent library of Dunbrack and Karplus (58) with additional rotamers expanded by one standard deviation around their mean 1 and 2 values. The lowest energy rotameric conformation was found using the dead-end elimination theorem (59,60).  tion (Fig. 1A). First, we constructed a M-CSF single mutant in which Cys-31 was replaced by Ser to prevent covalent dimerization of the protein on the yeast surface. This mutant, designated M-CSF M , served as the base for all additional mutations studied in this work. We then confirmed, by SPR spectroscopy of the purified M-CSF M protein, that M-CSF M does not associate with itself at concentrations of up to 500 nM, as described under "Production and Characterization of Selected M-CSF M Variants" of the M-CSF M proteins. Next, M-CSF M expression on the yeast surface was monitored by labeling the protein with PE, and binding of M-CSF M to the receptor was monitored by labeling c-FMS with FITC. To verify the expression and the correct folding of M-CSF M on yeast, we first analyzed the FACS signal for the interaction between M-CSF M and c-FMS. Our results showed that the binding interaction between the YSD M-CSF M and 1 nM soluble c-FMS is indeed detected in the YSD setup with respect to the negative control, in which M-CSF M is not labeled (Fig. 1, B and C). We next constructed a library that contained predominantly single mutation M-CSF M variants by using the error-prone PCR method (11). The library was sequenced to verify correct incorporation of mutations and showed that 29% of the sequences indeed contained only single mutations within the gene, and the majority of the remaining sequences contained no mutations. The library diversity of 17,400 single point mutations exceeded the number of 1422 corresponding to all possible single M-CSF M mutants by more than 10-fold, thus possibly containing most, if not all, point mutants. As a first step of our selection experiments, we screened the constructed M-CSF M library for binding to c-FMS to verify that the binding signal detected for the M-CSF M ⅐c-FMS interaction was indeed retained (Fig. 1D). To eliminate the M-CSF M mutants that were not well expressed on the yeast surface, we pre-sorted the library by collecting only the top 41% of the population with a good expression signal (Fig. 1E). We subsequently sorted the library twice more, each time collecting populations of M-CSF M variants that exhibited reduced binding signal to the receptor compared with that of the parental M-CSF M . The first sort was conducted against 1 nM c-FMS, yielding a population enriched with mutants exhibiting high expression and a low binding signal (6.7% of the screened population), compared with that of M-CSF M (Fig. 1F, S1). Using the selected population as an input, we then performed another sort for decreased binding in the same concentration. In the second sort, we collected the two populations S2 (1.01% of the screened population) and S3 (2.02% of the screened population), with slightly different levels of binding, where binding in S3 is weaker than that in S2 (Fig. 1, H and I, respectively). We subsequently sequenced ϳ100 clones from each population pool (S1, S2, and S3). Sequencing results showed that population S1 (Fig. 1F) included 59 different clones carrying a total of 41 different mutations, with each clone containing a single unique mutation in the M-CSF M gene. Population S2 contained only eight variants carrying a single mutation, and population S3 contained 16 point variants (data not shown). The remaining sequences either contained a stop codon or multiple mutations or were missing a part of the gene and were excluded from further analysis. Table 1 summarizes all identified single mutations from S1 population that resulted in decreased binding to c-FMS together with the associated change in the chemical nature of the residue. As can be seen in Table 1, several of the identified mutations were either to proline or to/from cysteine. It is quite likely that the reduction in binding signal of such M-CSF M mutants to the receptor was due to a substantial conformational change, to protein misfolding, or to dimerization through an additional disulfide bond (61). Fig. 2A maps the location of the identified affinity-decreasing mutations on the M-CSF M structure, dividing them into the following four categories according to the homology-modeled structure of the human M-CSF⅐c-FMS complex: receptor binding interface, surface, boundary, and core (see "Experimental Procedures"). We further screened all the identified M-CSF M clones individually against 1 nM c-FMS in the YSD setup and plotted their binding signals to c-FMS normalized to that of the parental M-CSF M . Fig. 2B shows that the majority of the interface, boundary, and core mutations exhibited a greatly decreased YSD binding strength to c-FMS compared with that of M-CSF M , although surface mutations exhibited a more moderate decrease. In addition, most of the mutations that decrease the affinity to c-FMS were located in the interface and the core of M-CSF M . We further varied the receptor concentration and performed full titration experiments on YSD for four proteins, M-CSF M and three of its variants as follows: V78A, S84P, and L85S (Fig. 3). Fitting of the data showed that M-CSF M binds the c-FMS receptor with the best affinity (K D ϭ 0.23 nM), followed by V78A (K D ϭ 1.83 nM), L85S (K D ϭ 8.27 nM), and S84P (K D ϭ 36.43 nM). These results correlate well with the results shown in Fig. 2.

Sorting and Flow Cytometry
Computational and Structural Analysis of the M-CSF M Variants-To investigate whether the decrease in c-FMS binding to the M-CSF M mutants (shown by the YSD data) was due to the elimination of favorable intermolecular interactions between the two proteins or to destabilization and local or global unfolding of M-CSF M upon mutation, we computationally modeled the two possibilities for the mutations identified in the experimental YSD setup as crucial for M-CSF M ⅐c-FMS binding. Because an x-ray structure of the human M-CSF M ⅐c-FMS complex is not available, we constructed a model for such a complex structure using the homologous complex for the murine M-CSF M and c-Fms proteins and the structures of the unbound human M-CSF M and c-FMS. Applying a previously described protocol (20,57) on the model of the M-CSF M ⅐c-FMS complex, we defined the M-CSF M binding interface and calculated the ⌬⌬G binding due to each experimentally identified mutation in the binding interface, excluding S13P (see "Experimental Procedures") ( Table 2). Our computational results predicted that most of the identified interfacial mutations will reduce M-CSF M binding affinity to the receptor. To explore whether the remaining mutations identified by YSD (that do not belong to the binding interface) could lead to significant M-CSF M destabilization and misfolding, we calculated the change in M-CSF M stability due to single mutations (⌬G M-CSF ), according to the previously published protocol (see "Experimental Procedures") (57). A mutation was defined as being significantly destabilizing if the ⌬G M-CSF value was greater than 2 kcal/mol. To obtain a clear picture of the effect of each mutation on M-CSF M stability, we examined three classes of muta-tions separately: surface, boundary, and core. Stability changes for surface mutations were minor except for some positions on the M-CSF M binding interface (Fig. 4A), although moderate destabilization (⌬G M-CSF ϭ 0 -2 kcal/mol) was observed for most M-CSF M boundary positions (Fig. 4B). Fig. 4C shows that the majority of the experimentally identified core mutations led to highly predicted M-CSF M destabilization, supporting the conclusion that the identified core mutations lead to protein unfolding. We then mapped the experimentally identified mutations to the M-CSF M structure and color-coded them according to their surface accessibility and their effect on protein stability (Fig. 5). For the interface residues, we found that mutations in positions in which the residue was buried more deeply (i.e. H15L and M10V) resulted in a higher decrease in stability (Fig. 5A). None of the surface or boundary non-interfacial mutations were found to be highly destabilizing (Fig. 5, B and C). In contrast, most identified mutations located in the core of the protein had marked destabilizing effects on the protein (Figs. 4C and 5D).
Production and Characterization of Selected M-CSF M Variants-To validate the YSD and computational results, we produced and purified the parental M-CSF M protein and some representative M-CSF M variants. Because of the complexity and labor intensity of the production protocol, we selected only three M-CSF M variants for full characterization, namely L85S, V78A, and S84P, in which the mutations were located on the helix that directly interacts with the receptor according to the complex model (interface mutations) and that were predicted to affect the binding affinity directly or indirectly through protein misfolding (Fig. 6A). L85S carries a mutation from a hydrophobic to a smaller polar residue and was predicted to decrease the affinity between the two proteins without destabilizing M-CSF M . V78A was also predicted to exhibit weaker binding to the receptor due to the creation of an unfavorable cavity in the binding interface. The S84P mutation is located in the middle of the helix and is hence likely to damage the structure of M-CSF M . We first developed the expression and purification protocol using the parental M-CSF M and then used the same protocol for preparation of the variants. To examine the effects of glycosylation on the structure and binding properties of M-CSF M , the parental M-CSF M preparation was divided into two fractions. In the first fraction, the N-glycosylations were removed, and in the second fraction, they were allowed to remain intact. The glycosylated and non-glycosylated proteins were then purified by size-exclusion chromatography (Fig. 6B). MS analysis of the non-glycosylated parental M-CSF M protein gave a molecular mass of 21.6 kDa (Fig. 6C), which is similar to the calculated value of 21.1 kDa for the non-glycosylated protein, while the MS spectra of the glycosylated protein contained peaks corresponding to one glycosylation and two glycosyla- tions present on the recombinant protein (data not shown). SDS-PAGE confirmed the same size of ϳ21 kDa for both preparations (Fig. 6D). CD spectroscopy of both glycosylated and non-glycosylated proteins (Fig. 7) revealed that the glycosylations did not affect the protein secondary structure. Therefore, we did not remove the glycosylations in the preparation of the M-CSF M variants. Comparison of the CD spectra of the three M-CSF M variants to the spectrum of M-CSF M showed that only     OCTOBER 23, 2015 • VOLUME 290 • NUMBER 43 Red residues indicate predicted destabilizing mutations. Green residues indicate predicted structure maintaining mutations. For example, H15L is in the interface and is destabilizing, and H9R is in the interface but does not destabilize the structure. one variant, L85S, completely retained the structure of M-CSF M (Fig. 7). V78A, although mostly folded, showed the presence of some unfolded species, as indicated by a more pronounced peak at 210 nm in the CD spectrum. Not unexpectedly, S84P was at least partially unfolded (Fig. 7). To verify that M-CSF M was indeed a monomer, the binding affinity between two monomers of M-CSF M was analyzed using SPR at concentrations of 31.25-500 nM. No binding signal was detected (Fig.  8A). Then, to verify that the binding affinities of the produced mutants were weaker than the binding affinity of M-CSF M , we measured M-CSF M ⅐c-FMS receptor interaction using SPR. In such experiments, M-CSF M (in both glycosylated and non-glycosylated forms) and its variants were immobilized on the surface of the sensor chip, and the c-FMS receptor was added in solution at concentrations of 4.375-70 nM (Fig. 8, B-F). SPR experiments showed that the binding affinities of the glycosylated and non-glycosylated forms of M-CSF M for c-FMS were very similar, being 42.6 Ϯ 0.92 and 31.6 Ϯ 1.11 nM, respectively ( Table 3). The affinity of V78A was reduced by about 2-fold to 77 nM and that for L85S was reduced by about 4-fold to 174 nM. The affinity of S84P could not be measured due to the absence of the binding SPR signal under the conditions of the experiment. Thus, all M-CSF M variants identified by YSD exhibited reduced binding to the receptor in experiments involving soluble M-CSF M proteins. In this context, we note that the experimentally measured K D value correlated well with the differences in the binding percentile by YSD (Fig. 2B), with the K D values measured by YSD (Fig. 3), and with the computationally predicted values of ⌬⌬G binding (Table 3).

Discussion
This work demonstrates a fast approach to identify residues important for the interaction of human M-CSF with its receptor c-FMS, which is a crucial interaction in several diseases, including osteoporosis and some cancers. The demonstrated method, based on a YSD setup, facilitates the identification of all the residues that are important for binding, not only those in the binding interface but also those that are distant from the interaction site yet reduce the binding affinity either through allosteric effects or as a result of protein unfolding. These more  distant residues cannot be identified by other techniques, such as peptide epitope mapping, alanine scanning, or similar approaches that use both combinatorial and rational library design (62). Furthermore, we confirmed the utility of the technique by showing that the YSD results correlate well with the binding affinities measured by SPR for purified proteins in solution ( Table 2). The described method may thus be applied for the identification of crucial residues in other PPIs, particularly for protein complexes such as the human M-CSF M ⅐c-FMS complex, for which there are no high resolution structures or for which laborious and time-consuming expression and purification protocols are required.
The computational modeling performed in this work complemented the experimental results, as it allowed us to predict the binding interface residues and to better understand the nature of the mutations that lead to affinity reduction highlighting the importance of the wild-type residue interaction. The group of mutations located in the binding interface (M10V, H15L, V78A, Q81R, E82A, E82G, and L85S) is predicted to reduce the free energy of binding by destroying some favorable intermolecular interactions with the receptor. The effects produced by these mutations, such as replacing hydrophobic residues with polar residues and vice versa, replacing large residues with smaller ones, and introducing unfavorable electrostatic interactions, have been previously shown to reduce binding affinities in proteins (63,64). The effects of some of the interface mutations can be explained through structural analysis based on the modeled structure of the complex. The H15L mutation, for example, may cause a loss of a potential polar contact with the M-CSF backbone, destabilizing the interface. In the V78A mutation, the replacement of one of the hydrophobic side chains with a shorter one may reduce hydrophobic interactions with the receptor. The Q81R mutation is clearly unfavorable, as it would position an arginine residue opposite an arginine residue at position 150 of c-FMS. The E82K mutation may create a clash with an arginine or histidine residue at positions 150 and 151, respectively, on c-FMS. The polar-tohydrophobic E82A mutation may be unfavorable, as the local environment on c-FMS is mainly polar. The E82G mutation in the middle of an ␣-helix is likely to disturb the local folding of the protein. It is therefore clear that it is critical for residue 82 to be either polar or negatively charged. In the L85S mutant, a hydrophobic residue is replaced with a polar residue opposite a hydrophobic patch on c-FMS. Mutations H9R and Q79R on M-CSF M are shown to reduce binding affinity experimentally, but computational binding energy evaluations identify them as affinity-enhancing. We attribute this discrepancy to possible inaccuracies in the three-dimensional orientation of the binding partners in our model of the M-CSF M ⅐c-FMS complex. According to this model, His-9 and Gln-79 on M-CSF M are at the periphery of the binding interface and do not directly contact c-FMS. Mutation to large residues such as arginine at these positions is predicted to introduce direct interaction with the receptor, explaining the computationally calculated improvement in binding affinity. These mutations may be able to reduce binding in reality if the binding partners were closer to one another, where such mutations would present steric clashes and would therefore corroborate the experimental results.
Although we cannot predict the effects of mutations distant from the binding site, we speculate that they affect binding through an allosteric mechanism, as was previously observed in various studies (4). The mutations at positions 26, 27, 39, and 45 might have caused such an effect and will be the subject of a follow-up biophysical and structural study.
Our YSD method was able to identify residues in M-CSF that are energetically important for c-FMS receptor binding without prior knowledge of the structure of the M-CSF⅐c-FMS complex, and this information directed the production of several key M-CSF M purified mutants to confirm our YSD results. For the two mutants tested in vitro (V78A and L85S), changes in the measured binding affinities correlated well with the predicted values. In addition to these mutations, we observed a number of mutations that are very likely to lead to protein unfolding (Fig.  4). Such mutations include all substitutions from cysteine, because they destroy the disulfide bonds crucial for M-CSF M stability and mutations to/from proline. One such mutation (S84P) did indeed result in at least partial protein unfolding (as determined by CD for the purified protein). In addition, we identified a large number of mutations that were located in the core of M-CSF M and were predicted to significantly destabilize the M-CSF M protein, resulting in either global unfolding or local conformational changes that were propagated to the binding site (Fig. 4C). We also observed a number of charge-altering mutations that, although distant from the binding site, would be likely to alter the long range electrostatic interactions and could thus reduce binding by changing the interactions in the encounter complex (65). In addition, it is possible that some of the identified mutations created new glycosylation sites on the protein variants, for example mutants P72T, I75T, L128S, and F135S. However, these glycosylation sites are not in proximity to the receptor, according to our model of the M-CSF M ⅐c-FMS complex and are hence not likely to affect binding affinity. Measurements were repeated three times and the standard deviation is given. b ⌬⌬G binding was calculated from experimentally measured K D as ⌬⌬G binding ϭ ϪRT ln(K D ). c ⌬⌬G binding was predicted using our computational protocol outlined under "Experimental Procedures." d ND means not determined.
Because the generation of the combinatorial library was based on random mutagenesis, mutations that require a change at two consecutive nucleotides bases (for example, Gly/Leu) were less likely to be included in the library. In a set-up such as the one used here, we might have also missed some mutations by sequencing only ϳ100 clones from each selection pool. In the future, next generation sequencing might help us to obtain a more comprehensive picture of binding hot spots (66).
In a previous site-directed mutagenesis study, Taylor et al. (14) discovered four positions that were regarded as important for the M-CSF⅐c-FMS interaction: 9, 15, 20, and 78. In this study, we confirmed three of these positions, namely 9, 15, and 78, as being the sites of affinity-reducing mutations. At one of these positions, position 78, we confirmed the critical role of the V78A, which had already been identified as a binding hot spot in the alanine scanning mutagenesis experiment of Taylor et al. (14). The fact that we were able to isolate the majority of the previously identified positions encourages us to believe that YSD is indeed as effective as and less laborious than alternative methods for epitope mapping.
Our findings regarding the identity of the residues that are important for the M-CSF⅐c-FMS interaction strongly suggest that therapeutic drugs that bind to binding and non-binding site residues on M-CSF might be beneficial. Such drugs would potentially facilitate disruption of the M-CSF⅐c-FMS interaction, or alternatively promote a M-CSF conformational change, without the need to sterically occlude the M-CSF⅐c-FMS binding interface. Given that the crystal structure of the human M-CSF⅐c-FMS complex is not available and that, according to our findings, the M-CSF interface residues are not the only contributors to ligand binding, the information produced in this study could constitute the first step in the design of therapeutics targeting not only the M-CSF⅐c-FMS interface but also other regions within the ligand. The identification of these surface, boundary, and core residues important for M-CSF⅐c-FMS interactions may also aid in the understanding of M-CSF⅐c-FMS mechanisms of action and suggest new targets for M-CSF inhibition.