Rapid Fine Conformational Epitope Mapping Using Comprehensive Mutagenesis and Deep Sequencing*

Background: A new method using comprehensive mutagenesis libraries, yeast display, and deep sequencing is proposed to determine fine conformational epitopes for three antibody-antigen interactions. Results: For three separate antigens, the experimentally determined conformational epitope is consistent with orthogonal experimental datasets. Conclusion: We conclude that this new methodology is reliable and sound. Significance: With this new method, four antibody-antigen interactions can be mapped per day. Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day.

Pinpointing the fine conformational epitope targeted by a given antibody affords a better understanding of the structural basis of its mechanism of protection, which provides an intellectual property basis and can lead to improved prophylactic or therapeutic interventions against human diseases (1)(2)(3)(4)(5)(6)(7)(8)(9). Recent technical advances allow an unprecedented look at the adaptive immune response to an immunogen (10 -12). For example, sin-gle cell isolation methods coupled to deep sequencing have revealed the identification of thousands of patient-specific paired antibody heavy and light chain sequences elicited in response to infection or vaccination, and such information has begun to be used in antibody discovery. Whereas functional or neutralization assays can be used to determine the efficacy of individual members in these repertoires, a full utilization of this wealth of information awaits the development of a high throughput method of determining conformational epitopes targeted by these antibodies (13).
Existing methods either do not identify conformational epitopes (14,15) or are labor-intensive and costly. Co-crystallization provides unambiguous epitope identification but can require considerable effort and generation of many antigen variants to identify one that is compatible with crystallization (2). Mass spectrometry-based methods utilizing hydrogen/ deuterium exchange identify epitopes to a ϳ5-amino acid resolution only under rigorous control experiments that limit throughput (16,17). Competing display-based methods use many sorts (18), identify only partial epitopes (19,20), or are limited by restricting mutations to alanine (21).
Recently, yeast surface display (22) coupled to deep mutational scanning (23) was used to understand the sequence effects of binding for nearly every single point mutant for two computationally designed proteins targeting a conserved epitope on influenza hemagglutinin (7). This method was used to confirm the paratope for both small proteins, as validated by crystal structures. More recently, other approaches using yeast display and deep sequencing for the purposes of conformational epitope mapping have been demonstrated (18,20). However, current implementations require several sorting steps that severely hinder throughput. Because additional inefficiencies exist at several stages in the deep sequencing and analysis workflow, we asked whether we could simplify the yeast display-deep sequencing pipeline to increase the method throughput, reduce cost, and improve the ability to resolve complete conformational epitopes for full-length proteins.
GenScript. Fabs were produced using the Pierce Fab Preparation kit (Life Technologies). Concentrations were determined using A280 with the recommended estimated extinction coefficient (1 mg/ml) of 1.4 and was biotinylated at a molar ratio of 1:20 protein:biotin using the EZ link NHS-biotin kit following the manufacturer's instructions (Life Technologies).
Dissociation Constant Determination-Equilibrium dissociation constants (K D ) were determined using clonal population yeast display titrations according to Chao et al. (22). Fab concentrations between 50 pM and 1 M were tested.
Yeast Display Sorts-1 ϫ 10 7 cells were grown in 2 ml of synthetic dextrose plus casein amino acids (SDCAA) for 6 h at 30°C and re-inoculated at A 600 ϭ 1.0 in 2 ml of synthetic galactose plus casein amino acids (SGCAA) at 20°C for 18 h. 3 ϫ 10 7 cells were labeled with biotinylated Fab or scFv for 30 min at room temperature in Dulbecco's phosphate-buffered saline with 1 g/liter BSA at a concentration of half of the experimentally determined dissociation constant on the yeast surface. Cells were then secondarily labeled with anti-cmyc-FITC (Miltenyi Biotec, San Diego, CA) and streptavidin-phycoerythrin (Thermo Fisher, Waltham, MA). Sorting was done on an Influx Cell Sorter (BD Biosciences). FSC/SSC (gate 1), FSC/FITC (gate 2), and phosphatidylethanolamine/FITC (gate 3) gates were set. Three populations were collected: an unselected population satisfying gate 1, a displayed population satisfying gates 1 and 2, and a bound population satisfying all three gates. The number of cells collected for each population was at least 100-fold higher than the theoretical library complexity. Sorting statistics for each population collected are listed in supplemental Table 2. Following the sort, recovered populations were grown for 48 h in 10 ml of SDCAA (22), and 1 ϫ 10 7 cells from this culture cells were stored in 1 ml of yeast storage buffer at Ϫ80°C.
Deep Sequencing Preparation-Yeast plasmid DNA was prepared for deep sequencing following the protocol in Kowalsky et al. (30) (primers are listed in supplemental Table 3). 5 l of the PCR products were run on a 2% agarose gel stained with SYBR-GOLD (Thermo Fisher) to ensure one band was obtained at the correct size (ϳ250 -350 bp). Agencourt AMPure XP PCR Purification (Beckman Coulter, Beverly, MA) was used per the manufacturer's protocol to purify the PCR product. Library DNA was sequenced on an Illumina MiSeq using either the 300 ϫ 2 or 250 ϫ 2 Illumina MiSeq kits (Illumina, San Diego, CA) at the Michigan State University Sequencing Core.
Data Analysis-A modified version of Enrich-0.2 as described in Kowalsky et al. (30) was used to compute enrichment ratios of individual mutants from the raw Illumina sequencing files. To normalize the data across the multiple tiles we define the fitness metric for variant i ( i ) as the binary logarithm of the fluorescence of variant i to the fluorescence of the wild-type sequence (F wt ) (30), This results in the following equation.
where is the percentage of cells collected, ⑀ i is the enrichment ratio for variant i, Ј is the log normal standard deviation of a clonal population, and the subscript wt denotes the wild type. Custom python scripts used to calculate the fitness metric and statistics are at Github. The full deep sequencing datasets are provided at figshare. Sorting parameters needed for each tile normalization are listed in Supplemental Table 1. Shannon (sequence) entropy values at a given position j in the protein sequence (E j ) were calculated by where ⑀ ij is the enrichment ratio of substitution i at position j, and X j is the number of mutants with adequate sequencing counts in the unselected population at position j. The derivation for Equation 3 is shown under "Theory." We excluded residues with an X j Ͻ 12 from analysis. Enrichment ratios for stop codons were not included in the Shannon entropy analysis. Soluble Expression of PTxS1-BL21(DE3) cells containing pAK400_ PTx-S1-220K (24) with an added C-terminal lysine residue were grown in terrific broth at 25°C to an A 600 of 1.5, then induced with 1 mM isopropyl 1-thio-␤-D-galactopyranoside for 5 h. Cell pellets were collected, and the outer membrane was lysed by osmotic shock. Lysate was purified by immobilized metal affinity chromatography followed by size exclusion chromatography in PBS (S75, AKTA FPLC) as previously reported (24).Yields were between 0.2 and 0.6 mg/liter. 3 g of pertussis toxin (PTx; 26.1 KDa) or PTx-S1-220K (25.9 KDa) were run on a 12% polyacrylamide gel at 130 V and stained with GelCode blue.
PTxS1 ELISA Binding Assay-A high binding 96-well plate (Costar, Corning, NY) was coated with either 4 nM PTx or 4 nM PTx-S1-220K in PBS and incubated overnight at 4°C. The plate was blocked with a solution of 5% nonfat dry milk in PBS with 0.05% Tween (PBSTM) for 1 h at room temperature. hu1B7 was serially diluted across the plate in duplicate with a starting concentration of 5 g/ml in PBSTM and incubated for 1 h at room temp. Secondary antibody was G␣hFc-HRP prepared at 1:2500 in PBSTM and incubated for 1 h at room temp. The plate was developed with tetramethylbenzidine and quenched with HCl, and absorbance was read on a plate reader at 450 nm.
PTxS1 Western Blot-0.3 g of PTx or PTx-S1-220K were run on a 12% poly acrylamide gel at 130 V and transferred to a PVDF membrane. The membrane was blocked for 1 h at room temperature with PBSTM, then incubated for 1 h at room temperature with 1 g/ml hu1B7A in PBSTM. The secondary antibody was G␣hFc-HRP prepared at 1:10,000 in PBSTM and incubated for 1 h at room temperature followed by washing and development using SuperSignal West Pico Chemiluminescent Substrate (Pierce) and a 30-s exposure time to x-ray film.
SiGENOMESMARTpool ON-TARGETplusTROP2siRNA and scramblesiRNA were purchased from Thermo Scientific (Asheville, NC) and Ambion (Grand Island, NY). The SMARTpoolsiRNAs and the transfection reagent Lipofectamine were diluted with Opti-MEM (Invitrogen) as described (33,34). The diluted SMARTpoolsiRNAs were mixed with RNAiMAX to form siRNA-RNAiMAX complexes. The cell culture medium was replaced with antibiotic-free medium containing the siRNA-RNAiMAX complexes at a final concentration of 10 nM siRNA. Media were changed after 12 h, and the cells were incubated in fresh media.
TROP2 Western Blot Analysis-The protein concentrations of the cell extracts were measured by the Bradford method. Protein samples of 15-30 g were subjected to Western blot analysis as previously described (34,35) using TROP2 antibody (Abcam, Cambridge, MA), ␤-actin (Sigma), anti-mouse, and anti-rabbit HRP-conjugated secondary antibodies (Thermo Scientific) and donkey anti-goat IgG-HRP (Santa Cruz Biotechnology, Dallas, TX). The blots were visualized by Super-Signal West Femto maximum sensitivity substrate (Thermo Scientific).
Total mRNA Extraction and Quantitative Real Time PCR-Total mRNA from cells was extracted using the RNeasy Plus kit (QiaGen, Valencia, CA) according to the manufacturer's instructions. The total mRNA was reverse-transcribed into cDNA using the cDNA synthesis kit (Bio-Rad) as previously described (36,37). The primer sets (Operon, Huntsville, AL) used for PCR:human actin were 5Ј-tggacttcgagcaagagatg-3Ј and 5Ј-aggaaggaaggctggaagag-3Ј and for human TROP2 were 5Ј-gagattcccccgaagttctc-3Ј and 5Ј-aactcccccagttccttgat-3Ј. Quantitative real-time PCR was performed using iQSYBR Green Supermix and the real-time PCR detection system (Bio-Rad). The cycle threshold values were determined by the MyIQ software (Bio-Rad).
Transwell Migration and Invasion Assays-Chemotactic migration or invasion was quantified using a Boyden chamber transwell assay (8-m pore size; Corning Costar, Cambridge, MA) with either uncoated or Matrigel-coated filters, respectively. Cells were deprived of serum overnight, trypsinized, and introduced into the upper chamber. Mitomycin C (Sigma) was added to the cultured media. The chemoattractant in the lower chamber was medium-supplemented with 5% FBS. After 8 h of incubation at 37°C, the cells were fixed and stained. Migrated cells in five randomly chosen fields were counted. The experiments were performed in triplicate wells, and each experiment was performed three times as indicated.
Wounding-healing Assay-MDA-MB-231 cells were grown to confluence. The growth medium was replaced with fresh medium containing 5% FBS and supplemented with mitomycin C (1 mg/ml) (Sigma), and the monolayer of cells was subsequently scratched using a 200-l pipette tip. Wound width was monitored over time by microscopy.
Cytotoxicity and Proliferation Assays-Cytotoxicity and proliferation were assessed using lactate dehydrogenase and Ala-mar Blue microplate assays, respectively. Cells were seeded in 96-well plates and treated with either PBS (control) or various concentrations of m7EG IgG or Fab for 48 h. Cytotoxicity was detected using the lactate dehydrogenase cytotoxicity detection kit (Roche Applied Science) after the manufacturer's protocol. Proliferation was determined by AN Alamar Blue assay (Pierce) following the manufacturer's protocol.
Confocal Microscopy-MDA-MB-231 cells were seeded in glass-bottom 24-well plates (In Vitro Scientific, Sunnyvale, CA) and treated with either PBS (control), m7E6 IgG, or Fab. After treatment, cells were washed 2ϫ with ice-cold PBS, fixed with 3.7% formaldehyde for 15 min at 37°C, and washed 3ϫ with PBS. Next, the samples were incubated in blocking buffer (1% BSA, 0.5% Triton X-100 in PBS) for 1 h at 37°C. The cells were incubated with TROP2 intracellular domain-specific primary antibodies (EMD Millipore #ABC425, 1:500 dilution) in incubation buffer (0.5% BSA, 0.5% Triton X-100) overnight at 4°C. The samples were then washed 3ϫ with PBS and incubated in the respective secondary antibodies (AlexaFluor 488) diluted in PBS for 1 h at 37°C in the dark. Cells were washed 2ϫ with PBS and incubated in nuclear counter stain Hoechst 3342 (Invitrogen) for 10 min at room temperature. After the final incubation cells were washed twice with PBS and covered with anti-fade solution for imaging. Images were recorded with an Olympus FluoView 1000 Inverted IX81 microscope using a 10ϫ or 60ϫ oil objective using identical exposure and photomultiplier tube settings for each primary antibody-fluorophore pair across the different treatment conditions.
Statistical Analysis-All experiments were performed at least three times. Representative results are shown as the mean Ϯ S.D. Statistical analysis were performed using unpaired, two-tail Student's t tests. * indicates p Ͻ 0.05, ** indicates p Ͻ 0.01, and *** indicates p Ͻ 0.001.

Theory
Derivation of Sequence Entropy Metric and Calculation of Estimated Errors-At any given position j in a protein sequence, Shannon entropy for that position (E j ) is defined as, Here, P ij is the probability of finding amino acid i at position j after sorting given an equal representation of all 20 amino acids in the starting population. To determine p ij , we first define p ij for a single substitution at position j using the frequency of mutant i in the initial (f o,ij ) and final (f f,ij ) populations, This can be written in terms of an enrichment ratio of a given substitution i at position j (⑀ ij ) such that p ij ϭ 2 ⑀ij (Eq. 6) p ij gives the probability of a variant with a mutation i at position j passing through the sort. Because the summation of these probabilities will not necessarily sum to unity, we can normalize the probabilities over a single residue such that Combining with the definition of Shannon Entropy in Equation 1, Because some positions do not have adequate sequencing counts for all 20 amino acids, the sequence entropy metric must be normalized by the maximum possible Shannon entropy, where X j is the number of mutants with adequate sequencing counts in the unselected population at position j. Equation 9 is the final form of the sequence entropy metric used in the manuscript. In practice, we excluded residues with an X j Ͻ 12. This removed 13/647 (2.0%) positions tested in the present work.
To estimate the expected error in E j , we can define the variance in E j as, Similarly, the variance on P ij can be defined as, Because the minimal error associated with counting sequences approximates Poisson noise (7,30), we can write the variance for the two unknowns as, Here, x f,ij and x o,ij are the raw sequencing counts of mutation i at position j in the final and initial population, respectively. We can write a similar derivation for the variance of the denominator in the probability term, Accurate calculation of the variance on the sequence entropy for a given position requires the raw sequencing counts for each position in the unselected and selected populations. Using these numbers, Equations 12 and 13 can be solved, which can then be plugged into Equations 10 and 11 to solve for the unknown. The variance on sequence entropy should be maximized when the raw sequencing counts are just above the inclusion threshold in the unselected populations. Even in this case, the S.D. of sequence entropy metric is 0.02. Accordingly, we are justified in not including sequence entropy errors in the determination of the conformational epitope.
Calculation of Relative Dissociation Constants from Sequencing Counts-For a clonal population of yeast cells displaying variant i and labeled with antibody at a labeling concentration of [L o ], we can write the mean fluorescence (Fi) as, Here F max and F min are, respectively, the maximum and minimum average fluorescence for clone i in the fluorescence channel used for antibody binding. Using the labeling conditions Similarly, the mean fluorescence for the wild-type variant can be expressed as, The ratio of mean fluorescence for a variant to wild-type can be written in terms of the fitness metric ( i ) derived from the sequencing data (30),

Substitution of Equations 15 and 16 into Equation 17 leads to,
If we assume that F max Ͼ Ͼ F min , Equation 18 simplifies to, This assumption is valid for the ratio of F max /F min typically seen in yeast display. The ratio depends on multiple factors, including the sensitivity of the fluorescent detection on the cell sorter, the quantum yield of the fluorescent dye used, the biotin labeling per antibody, and the surface expression of a given antigen. In our hands we observe a range from 50 to 2000 for this ratio. Here the range of K d values that we can see is relative to the interval of fitness metrics observed from the sequencing counts. Practically speaking, the range on the lower end of fitness metrics should be the average fitness metric for the stop codons (Ϫ1.1 to Ϫ0.75 depending on the percent collected), which gives a relative dissociation constant of ϳ2.7. This highlights the sensitivity of the method for differentiating small energetic changes in binding activity (⌬⌬G binding ϳ 0.1-0.4 kcal/mol). The drawback, however, is we are unable to discriminate smaller perturbations with larger energetic changes typically associated with interface "hot spots" (⌬⌬G binding ϳ 1-2 kcal/mol).
Another important consideration is the error associated with digital counting of variants from deep sequencing data, as this will also introduce error on both the fitness metric and corresponding relative dissociation constants for variants with low numbers of counts in the unselected population. We have previously shown that digital counting of variants from deep sequencing error using our methods result in minimal error associated with Poisson noise (30).
The variance on the fitness metric can be defined as, where ⑀ i is the enrichment ratio for variant i. The variance for ⑀ i ( ⑀ i 2 ) can be estimated from Poisson noise as where x oi and x fi are the number of counts in the unselected and selected populations respectively. For variants with many counts the error approaches zero, highlighting the importance of sequencing depth of coverage in these experiments. The fitness metric is defined as (30), where is the percentage of cells collected from the respective gate(s), Ј is the log normal standard deviations of a clonal population, and the subscript wt denotes the wild type. Combining these we can estimate the variance of the fitness metric as, The error here is largest with variants with low fitness metrics and few counts in the unselected population.
The error for the relative dissociation constant is defined as, which can be written as

Results
The streamlined method is shown schematically in Fig. 1. In the first step, single-site saturation mutagenesis (SSM) libraries for 250 -300-nucleotide contiguous sections of the gene of interest are prepared by PFunkel mutagenesis (39) and trans-formed into yeast. These yeast libraries are labeled with a biotinylated Fab or scFv and sorted once by FACS. The labeling concentration and FACS gates are set such that the capture probability of any given variant is a monotonically increasing function of its binding affinity. Three distinct populations are collected: an unselected population of cells that passed through a cell-size gate (unselected population), a displayed population of cells passing through the previous gate as well as a gate confirming display of the C-terminal c-myc epitope tag (displayed population), and a binding population of cells satisfying these two previous gates as well as a gate on the fluorescence channel associated with antibody binding (bound population). The DNA from each population is prepared and sequenced on an Illumina platform. Then the frequencies of each variant in the population are compared and merged into a single fitness metric (30) that allows direct, quantitative comparisons across different mutational libraries. Together, this approach allows for the rapid and comprehensive reconstruction of the sequence binding determinants of full-length proteins for a given antibody.
As a first test of our approach we chose to evaluate the binding of yeast-displayed TNF (TNF␣) on the monoclonal antibody infliximab (marketed by Johnson & Johnson as Remicade TM ). A structure for this complex is known (40), allowing assessment of the ability of the sequence-function method to demarcate discon-tinuous conformational binding epitopes. TNF is a homo-trimeric, multi-disulfide linked, marginally stable protein and thus represents a stringent test of the ability of yeast to surface-display complicated proteins. We ordered a codon-optimized gene encompassing the Gly-57-Leu-233 extracellular portion of TNF and subcloned it into the yeast display plasmid pETCON (6). Next, we created three SSM libraries of TNF and induced yeast surface expression of library variants (22). For each library we performed a single FACS sort using cells labeled with the biotinylated scFv of infliximab (inflix_scFv) at 32 nM, which is half of the observed dissociation constant for the interaction. Approximately 200,000 cells from each library were collected for the unselected, displayed, and bound populations (Supplemental Table 1). Deep sequencing was used to determine the enrichment ratios of the bound and displayed populations compared with the unselected population. These enrichment ratios were then transformed to a fitness metric that allows direct comparisons across the different mutational libraries, allowing the sequence determinants of binding to be evaluated for nearly every possible single point mutant in the protein sequence (Fig. 2a,  supplemental Fig. 1). Overall, we observed 95.1% coverage of all possible single non-synonymous mutants in the extracellular TNF sequence (n ϭ 2985/3140) (supplemental Table 2).
To identify the conformational epitope, we reasoned that residues essential to the protein-protein interaction would be conserved in the bound population. Conversely, residues that do not participate at the protein-protein interface would be mostly non-conserved. To discriminate among these positions we introduced a positional Shannon (sequence) entropy metric that is calculated using the enrichment ratios of every variant at a given position (see "Theory"). Because sequence conservation will depend strongly on the stringency of the sorting conditions, we next asked for cutoffs to discriminate between a conserved and non-conserved position. We considered a position to be conserved if the sequence entropy in the bound population compared with the unselected population was less than or equal to the midpoint of the sequence entropy range. Even using this stringent cutoff, 56/177 TNF residues were identified as conserved. Many of these residues are buried in the core of TNF and presumably disrupt the fold of the protein. Positions located at the epitope can be partially discriminated from those that disrupt protein stability by calculating the sequence entropy of the displayed sort and using a cutoff of the midpoint of the sequence entropy range. This analysis removed 22 of the 56 residues from consideration as epitope positions. The removed residues were almost all buried, with a mean fraction solvent-accessible surface area of 0.03 (range 0.00 -0.22). The 34 remaining residues were a combination of surface positions (using a fraction accessible surface area cutoff of 0.10, n ϭ 18) and buried positions (Fig. 2b).
The conserved surface positions clustered in three regions on TNF (all subsequent residues use PDB numbering): the AB loop (Asn-19, Pro-20, Gly-24), the EF loop (Gln-102, Glu-104, Thr-105, Glu-107, Ala-111), and the GH loop (Asn-137-Tyr-141). There are also a handful of noncontiguous, partially surfaceexposed positions scattered throughout (Fig. 2b). To further discriminate epitope from non-epitope positions, we reasoned that the epitope would be depleted in non-conserved positions. Identifying non-conserved positions as the upper quintile of the sequence entropy range removed 48/177 positions from consideration. Of these, only Glu-110 was within 4 Å of infliximab in the bound complex. However, the C␣-C␤ vector of Glu-110 was pointed away from the interface, and its side chain did not make significant interactions to infliximab (Fig. 2c), suggesting that its mutation to other amino acids would not disrupt the affinity of the TNF-infliximab complex.
Considering both the conserved and non-conserved positions highlights the EF loop and GH loop as essential to the interaction. The single most conserved section documented by sequence entropy analysis is on the EF loop between Asn-137-Tyr-141, and these residues mapped neatly to the center of the experimentally determined binding region (Fig. 2c). Additionally, conservation of several residues on the EF loop were consistent with the infliximab-TNF structure, including Glu-107, which made a salt bridge interaction across the interface. Fur-FIGURE 2. TNF-infliximab conformational epitope determination. a, a subset (41/177 residues) of the fitness-metric heat map for bound population of the TNF-inflix_scFv interaction. Sequence entropy for the display (green) and bound population (black) is plotted below with their respective cut-offs (dashed lines). b, subtractive sequence entropy analysis for TNF-infliximab interaction. Conserved residues (orange) are found mainly within the binding footprint of the TNF-infliximab interaction (cyan). Non-conserved residues (purple) can also be mapped onto structure and fall outside of the footprint (middle). These nonconserved residues can be used to find regions where false positive conserved residues appear. For clarity, only one TNF monomer is shown. c, close-up view of the structural interface between TNF (ribbon) and infliximab (cyan surface). TNF residues are colored according to sequence conservation as in panel b. thermore, the importance of these two loops to the energetics of the interaction has been confirmed by mutagenesis (40).
Examination of non-conserved residues located at the interface identifies limitations in using a single metric for epitope determination. For example, Pro-70, Ser-71, and His-73 on the CD loop and Thr-77 in strand D are interface residues that are potentially energetically important but are above the sequence entropy cutoff (Fig. 2c). For the CD loop residues the positions were conserved, but the sequence entropy were just slightly above the cutoff. Proline at position 70 and serine 71 were the most favored amino acids, whereas a substitution of H73K was slightly favored. Thr-77 was removed from consideration of the epitope as it was relatively conserved in the displayed population. Additional epitope residues that were not identified include the above-mentioned Glu-110 and Gln-67. However, neither was expected to be energetically significant as determined by alanine scanning mutagenesis and its position in the bound complex. Indeed, mutation of Gln-67 to aromatic residue increased binding affinity for the infliximab interaction (Fig. 2a). Based on this comprehensive mutagenesis dataset enabled by deep sequencing, we conclude that this improved yeast display-deep sequencing pipeline is effective in identifying fine conformational epitopes for antibody-antigen interactions.
We next asked whether the automated identification of the epitope using sequence entropy could be used to map binding footprints of other antibody-antigen interactions. To accomplish this, we evaluated the binding of yeast-displayed PTxS1 against a single humanized neutralizing antibody. Whooping cough, a respiratory disease caused by the bacteria Bordetella pertussis, remains a major cause of infant mortality in both the developing and industrialized countries despite widespread vaccination (41). Recently, Nguyen et al. 4 demonstrated the ability of a binary antibody mixture to halt whooping cough disease progression in a baboon model. Although one of the mixture antibodies, hu1B7, is able to bind to the S1 subunit on a Western blot, indicating a linear component of the epitope, previous studies using 15-mer peptides covering the entire S1 sequence were unable to identify a peptide showing binding activity against murine 1B7 (42). Further information about the epitope on S1 targeted by hu1B7, one of the mixture antibodies, will help elucidate its neutralizing mechanism.
After a similar procedure to that of TNF, we used PFunkel mutagenesis to create an SSM library of nearly all possible single point mutants in the Asp-1-Gly-220 fragment of PTxS1 (PTx-S1-220) and performed a single FACS sort collecting 400,000 cells in each of the three populations. Soluble PTx-S1-220 can be expressed in E. coli and retains affinity for hu1B7 (Fig. 3). Positional Shannon entropy was used to determine the most conserved residues at the interface using the same cutoffs identified in the TNF test case (Fig. 4a, supplemental Fig. 2). As before, epitope residues were discriminated from residues that result in disruption of the protein fold by analysis of sequence conservation in the displaying population. Altogether, this procedure identified 16 residues at the proposed antibody-antigen conformational epitope: Glu-75, Gly-78 -His-83, Ile-85-Tyr-87, Ala-93, Tyr-148, Asn-150 -Ile-152, and Asn-163. Mapping these residues onto the structure of the PTx (43) shows that 14 of 16 residues are located in a spatially contiguous location (Fig.  4, b and c). This proposed epitope is consistent with a previous alanine scanning dataset developed by Sutherland and Maynard (24). The two conserved residues outside of this region, Ala-93 and Asn-163, are most likely of structural importance for the conformational epitope as they are buried in the protein core. The epitope surface is typical of antibody-antigen interactions with charged and aromatic residues along with hydrophobic patches (Fig. 4c). Partial conservation of buried residues Phe-84, Gly-86, and His-149 near the identified epitope indicate that the hu1B7 binding affinity depends somewhat on the conformation in the PTxS1 folded state.
In principle, relative dissociation constants (K d,i /K d,WT ) for antibody-antigen interactions can be calculated directly from the digital counting (21) (see "Theory"). However, our method relies on counting individual sequences after a single cell sort, resulting in a limited dynamic range. To determine whether dissociation constants calculated from deep sequencing results are quantitative, we compared our results with an alanine scanning dataset for in vitro binding of PTx-S1-220 and murine 1B7 (24). Our results are consistent with scanning data for 16 of the 17 mutations (Table 1). The one discrepancy, R39A, is not in spatial proximity of the highlighted epitope, potentially indicating different long range interactions between PTx-S1-220 and the murine and humanized 1B7 antibodies used in the separate experiments. Consistent with the limited dynamic range of the deep sequencing method, the relative dissociation constants for hot spot residues Arg-79, His-83, Tyr-148, and Asn-150 are significantly underestimated in the deep sequencing datasets compared with in vitro measurements (Table 1). This limitation caused by digital counting a handful of sequences restricts the experimentally determined range of relative dissociation constants to 0.4 -2.5 (Fig. 5).
For further validation we tested four additional mutations identified from our deep mutational scanning datasets. PTxS1-220 variants T81K, T81H, I152M, and I152P were produced in E. coli and purified. A polyclonal anti-PTx antibody preparation was titrated against ELISA wells coated with 5 nM PTx S1 variants. Similar binding to all variants suggests that no variant FIGURE 3. A soluble version of the pertussis toxin S1 subunit can be expressed in E. coli and retains affinity for hu1B7. Truncated S1 in a pAK400 expression vector was produced in BL21(DE3), harvested by osmotic shock, and purified by immobilized metal affinity chromatography and size exclusion. a, SDS-PAGE of truncated S1 (S1-220K, 25.9 kDa) and full-length PTx (26.1 kDa). b, Western blot of S1-220K and PTx, probed by hu1B7 and G␣hFc-HRP. c, ELISA of hu1B7 on a 4 nM coat of PTx or S1-220K, detected by G␣hFc-HRP.
has severe folding defects. Relative binding dissociation constants were calculated from observed EC 50 values by titration of the hu1B7 antibody on an ELISA plate coated with 5 nM concentrations of the truncated PTx S1 or each variant (Table 1). In vitro binding for variants T81H, I152M, and I152P were quantitatively predicted by deep sequencing data. In contrast and consistent with the limited dynamic range highlighted above, the relative binding for binding knock-out variant T81K is sig-nificantly underestimated in the deep sequencing datasets. We conclude that although relative dissociation constants calculated directly from the deep sequencing data are consistent with in vitro measurements, care must be exercised when calculating a quantitative energetic contribution.
Next, we asked whether the method could be used to map the conformational epitope of an antibody targeting tumor-associated calcium signal transducer 2 (TACST2, also known as TROP2), a 323-amino acid, 36-kDa transmembrane glycoprotein. TROP2 is overexpressed in numerous human epithelial cancers (44) and identified as an oncogene in colon cancer, with metastatic and invasive abilities (36,45). Studies have linked Trop2 to increased tumor growth, as ectopic expression of Trop2 in cancer cell lines causes them to become highly tumorigenic when implanted in mice, whereas silencing Trop2 inhibits cell proliferation in vitro (46). Furthermore, silencing Trop2 in breast cancer cell line MDA-MB-231 decreased migration as observed by transwell migration and wound-healing assays (Fig. 6, a-e).
The extracellular portion of TROP2 (TROP2Ex) contains three domains: an N-terminal domain (ND), a middle TY domain, and a C-terminal domain (CD) (Fig. 7a). Like its close paralogue EpCAM (epithelial cell adhesion molecule), TROP2 is a nuclear signal transducer activated by regulated intramembrane proteolysis (47,48). TROP2Ex is first shed by proteolysis followed by intramembrane cleavage to release intracellular TROP2 (TROP2Ic). Because recombinant TROP2Ex forms a homodimer in solution (49), it has been speculated that destabilization of the extracellular FIGURE 4. PTxS1 conformational epitope determination. a, a subset (29/220 residues) of the fitness-metric heat map for the PTxS1-hu1B7 interaction. Sequence entropy for the unselected/display population (green) and unselected/bound population (black) is plotted below with their respective cut-offs (dashed lines). b, subtractive sequence entropy analysis for PTxS1-hu1B7interaction. The light gray surface represents the S1 subunit, and the dark gray represents other subunits of PTx. Conserved residues (orange) are found on the S1 subunit proximal to the S5 and S6 subunits. Non-conserved residues (purple) are found over most of the solvent-accessible surface area. c, close-up view of the conserved residues at the epitope interface. PTxS1 is represented with a schematic and sticks format, whereas the other subunits are represented as the dark gray surface.  (29). To investigate the structural basis of m7E6 efficacy, we prepared SSM libraries covering 95.8% of possible single non-synonymous mutations for the yeast-displayed TROP2Ex (residues Thr-28 -Thr-274) (supplemental Table 2) and performed a single FACS sort  against the biotinylated Fab of m7E6 at a labeling concentration of 22 nM. After the experimental workflow, the same sequence conservation analysis as above was used to determine residues contributing to the epitope. The subtractive sequence entropy measure completely removed most portions of TROP2Ex from consideration of the epitope, resulting in unambiguous determination of the binding footprint. In contrast to most previously described mAbs that bind at or near the N-terminal cysteine-rich domain, residues Asp-171, Arg-178, and the ridge on CD (RCD) loop Gly-241-Pro-250 were identified as contributing to the m7e6-TROP2 interaction (Fig. 7, a and b). This epitope were in agreement with m7e6 binding affinity results from domain swapping experiments (29). Using a homology model of TROP2 guided by the structure of the paralogue epithelial cell adhesion molecule EpCAM (38), these residues mapped to a membrane-distal region opposite to the face on the CD domain that putatively makes specific interdimer contacts with the TY loop (Fig. 7b).
There are several reasons why targeting this epitope could be effective. For example, m7e6 could partially block the agonist(s) binding site on TROP2, preventing activation. Alternatively, m7e6 could prevent destabilization of the extracellular region by sterically blocking proteolytic cleavage or by preventing dimer dissociation after proteolytic cleavage. Because 4.1-nm separates the centers of the proposed epitope between the dimer subunits (Fig. 7b), an IgG could occupy both subunits of the dimer. In this scenario we speculate that the antibody could prevent destabilization of the extracellular region or prevent an agonist site by the steric bulk of the IgG. As a test of this hypothesis, we reasoned that, because of its monovalency, m7e6 Fab would not be as effective as the corresponding IgG in preventing dimer destabilization. We performed Boyden's chamber assays to investigate the influence of m7E6 IgG and Fab on migration rates in MDA-MB-231 cells. Whereas m7e6 IgG treatment inhibited migration (p ϭ 0.004) (Fig. 7, c and d), m7e6 Fab was unable to inhibit migration (p ϭ 0.187) at the highest tested concentration (40 g/ml) (Fig. 7, e and f). We confirmed that both m7e6 Fab and IgG were able to label breast cancer cell line MDA-MB-231 (Fig. 8a). Additionally, we tested the influence of m7E6 IgG on inducing proliferation (metabolic activity) and cytotoxicity in MDA-MB-231 cells. We found that the treatment did not result in a statistically significant decrease in proliferation rates (one way analysis of variance p ϭ 0.384) or increase in cytotoxicity levels (one way analysis of variance p ϭ 0.141), indicating that the mechanism by which m7EG IgG resulted in reduced migration rates was independent of reduced proliferation or cell death (Fig. 8, b-c). We further investigated the localization of the TROP2 intracellular domain in response to the IgG treatment using confocal microscopy. We found that nuclear expression levels of TROP2Ic were retained in both the IgG-and Fab-treated cells (Fig. 9). This scenario suggests that the effect of m7E6 binding to TROP2 on reducing migration rates may be mediated by blocking the agonist binding site or by influencing the downstream signaling cascade. Whatever the exact mechanism behind m7e6 efficacy, the conformational epitope uncovered by the present method was used to predict that m7e6 Fab could not inhibit migration.

Discussion
The sequence-function mapping pipeline using a yeast-displayed antigen can be used to elucidate conformational, discontinuous epitopes of complex proteins. As demonstrated with TROP2, a solved structure of the antigen is not essential for identification of the conformational epitope. The methodological improvements developed in this paper allowed us to com-plete the pipeline using a single cell sorter in 14 days for 24 different antibody antigen complexes at an approximate material and supply cost of $330 per antibody-antigen epitope. The cost and speed of this method offer significant advantages compared with competing display-based protocols. Notably, our method requires only a few micrograms of the starting antibody and so can be used directly downstream of immortalized B cell or hybridoma screening. Additionally, the ability to comprehensively map sequence determinants to binding may help elucidate potential escape mutants and be used to predict whether the antibody will maintain affinity for antigen homologs from model organisms. The sequence-function maps may also be integrated into computational prediction software to improve  the predictions of specific antibody-antigen structural contacts at the atomic level or improve computational predictions of individual mutations on protein-protein interactions (28).
There are minor limitations in the current protocol. For example, antigens requiring multiple subunits to fold may be difficult to express on the yeast surface. Additionally, conformational epitopes requiring heterogeneous peptide-glycosyl surfaces will not be able to be mapped. Nevertheless, our results show that antibody binding surfaces for complicated homodimer and homotrimer human proteins TROP2 and TNF can be assessed, and we speculate that similar proteins can be mapped. Although our approach can be used as is to interrogate antibody panels of 10 -50 members, further improvements in speed and cost must be addressed for integration of the method with other high throughput or single cell technologies. Methodological advances should focus on replacing the bottleneck FACS step with a more high throughput sorting technique and removing the need to prepare multiple libraries for each antigen.