High resolution protein-protein interaction mapping using all-versus-all sequencing (AVA-seq)

Two-hybrid systems test for protein-protein interactions and can provide important information for genes with unknown function. Despite their success, two-hybrid systems have remained mostly untouched by improvements from next-generation DNA sequencing. Here we present a method for all-versus-all protein interaction mapping (AVA-seq) that utilizes next-generation sequencing to remove multiple bottlenecks of the two-hybrid process. The method allows for high resolution protein-protein interaction mapping of a small set of proteins, or the potential for lower-resolution mapping of entire proteomes. Features of the system include open-reading frame selection to improve efficiency, high bacterial transformation efficiency, a convergent fusion vector to allow paired-end sequencing of interactors, and the use of protein fragments rather than full-length genes to better resolve specific protein contact points. We demonstrate the system’s strengths and limitations on a set of proteins known to interact in humans and provide a framework for future large-scale projects.


Introduction
Methods of studying protein-protein interactions (PPIs) can broadly be categorized as one-vsone or one-vs-all studies. The goal of this study is to develop and apply a novel methodology that allows screening in an all-vs-all fashion to compare complex libraries. Here, we have applied our method to define the interacting regions of a set of human proteins with high resolution.
Yeast-and bacterial-2-hybrid screens and their derivatives have long been a staple of largescale protein interaction mapping 1,2 . Though there are multiple forms of two-hybrid systems, Sequencing and analysis of the ORF selected fragments revealed on average 48% of plasmids from pBORF-AD contained in-frame fragments of one of the 6 known proteins used in this study. This compared to approximately 16% in the non-ORF selected libraries in the same vector. ORF selection was less efficient for fragments in pBORF-DBD with approximately 24% being in-frame versus 13% for the non-ORF selected libraries.

Convergent Fragment Generation and Interaction Screening
The ORF selected fragments in the pBORF-AD and pBORF-DBD were amplified, "stitched" together using overlap extension PCR and inserted into pAVA (Fig. 1d-e). When convergentfusion fragments were considered, approximately 19% of paired products analyzed contained both DBD and AD fused fragments expressed in-frame with one of the 6 human proteins compared to only 1.5% in the non-ORF selected libraries. ORF selection provides at least a 12-fold improvement in screening efficiency over non-ORF selected libraries, especially given that both fusion fragments are required to be in-frame for interaction. The convergentfusion fragments in the pAVA plasmid were transformed into the BacterioMatch II Electrocompetent Reporter Cells 2 and challenged in triplicate with 0, 2 or 5 mM 3-AT ( Fig.   1g) in histidine-free media. The same was process was repeated for the non-ORF fragments.

Library Construction and Sequencing
Libraries were constructed using the NEBNext Ultra II DNA Library Prep Kit for Illumina according to the manufacturer protocol. A total of 9 libraries, 3 replicates for each 0, 2, and 5 mM 3-AT selection were combined for sequencing using a MiSeq. A total of 6.1 million paired-end sequences were generated for which both the DBD and AD sequences were of high enough quality to allow translation of the fused fragments. This resulted in approximately 680,000 paired reads for each replicate. As control for the selection and sequencing analysis process, the pre-constructed positive control with Gal11p and LFG2 domains ( Fig. 2b) was spiked in at low concentration prior to the 3-AT selection.

Sequence Analysis
Paired-end sequencing reads were translated in-frame with the DBD or AD fragments they were fused to. Sequencing primers sit upstream of the of DBD or AD specific sequence allowing enough sequence (~150bp) downstream to identify the fused fragment and whether the fusion is to DBD or AD. The translated sequences were then aligned to a database of the six human proteins with BLASTP. The gene sequence and starting amino-acid position a fragment aligned to were documented and considered as a unique identifier. Paired sequences that revealed both fused fragments were in-frame with a known protein were then carried forward. This process was repeated for all replicates in the analysis and the results combined in a table of counts for each unique fragment pair in each replicate. We observed a total of 146,531 unique fragment pairs (distinct protein/amino acid start point combinations) detected in any of the replicates of the ORF selected libraries and 10,564 in replicates of the non-ORF selected libraries. The ORF selected libraries had approximately 120,000 paired-end, inframe read counts per replicate distributed across the 146,531 unique fragment pairs. The non-ORF selected libraries had approximately 6,500 paired-end, in-frame read counts per replicate distributed across the 10,564 unique fragment pairs. Differential growth in the higher concentrations of 3-AT versus 0 mM 3-AT is an indication of a potential protein interaction. Fragments pairs were then tested for a statistically significant increase in the number of read counts, after normalization, in the 2 and 5 mM 3-AT replicates using DESeq2 13 . A fold-change cutoff (based on read counts) of at least 3 and a false-discovery rate of less than 5% was applied. As expected, the positive control (Gal11p- LGF2) showed a highly significant increase in read counts in selective media (Table 1 and

Protein Interaction Analysis
Using this method, we have tested at high-resolution 96.14% (5686520/5914624 pairwise amino acid combinations) of the possible interaction space when all libraries are considered ( Fig. 3a). This was possible by generating protein fragments with multiple starting points.
The tested interaction space was reduced to 5.6% (331580/5914624) and 1.9%   (Tables 1 and 2). Additionally, in the non-ORF filtered data this interaction was 35/38 and 28/31 significant interactions in 2 and 5 mM, respectively (Sup . Tables 2 and   3). Importantly, the fragments align to the interaction regions for the proteins ( In addition, Fig. 4b shows AKAP5 and PKA also had interacting fragment pairs in both orientations. For the ORF filtered fragments, the significant interactions for AKAP5 and PKA comprised 4/23 and 2/28 significant interactions in 2 and 5 mM, respectively (Tables 1 and   2). An advantage to this fragment-based method, rather than more traditional full-length protein, is the ability to identify the interaction region(s) between the proteins. Looking at Although the literature shows p53 and MDM2 interact, the AVA-seq method was not able to detect statistically significant interactions between the two proteins ( Fig. 4c). This is likely due to the complex nature of the interaction. The interaction site of MDM2 is large, and the interaction complex of MDM2-p53 is dependent primarily on van der Waals which is different from most identified proteins. Furthermore, interaction occurs in a buried surface that consists mostly of hydrophobic interaction between pseudosymmetry domains 11,15 .
Fragments from these proteins were however detected as significant interactions with other proteins (Sup. Fig. 3). Interestingly, there was a significant self-interaction between p53-p53 and PKA-PKA ( Fig. 4d and e, respectively). The yellow bars in Fig. 4d-e represent residues involved in the p53-p53 dimer interface 16

Discussion
Protein-protein interaction data can provide important information for understanding how an individual protein functions and its system-wide role in the context of other proteins. Despite the importance, methods such as the two-hybrid system have not seen significant reductions in labor and cost since its first use. Though some improvements have been made using nextgeneration sequencing, deep screening or large-scale two-hybrid studies remain labor intensive.
At the center of our approach to utilizing NGS for two-hybrid based protein-interaction mapping is the convergent fusion vector, pAVA. The novelty of pAVA is it joins the traditionally individual "bait" and "prey" DNA sequences on a single DNA molecule allowing them to be amplified and paired-end sequenced. This combined with the high transformation efficiency has allowed us to test almost the entire interaction space of the 6 proteins at high resolution (146,000 paired fragments tested). As with methods such as RNAseq, this system is limited simply by diversity of the library and depth of sequencing. Higher resolution of interacting domains is achievable with deeper sequencing of diverse libraries.
An important feature of the system is the ability to "dial-in" the level of selection by changing the concentration of 3-AT. Here, we used 2 and 5 mM concentrations but this could be changed depending on the strength of the interactions being targeted. Our results show that many of the interactions detected in the more stringent 5 mM selection are also found in the 2 mM 3-AT selection, while some interactions are only detected in one of the conditions. It may be possible, in the future, to rank the strength of interactions in the system relative to the positive control (Gal11p-LGF2) as it is included in every experiment. Here, for example, the JUN-FOS fragment interactions in general were stronger relative to the positive control in the 2 mM 3-AT selection while the PKA-AKAP5 fragments were weaker.
We observed 2 of 3 known interactions among the 6 proteins tested. Those are the JUN-FOS and PKA-AKAP5 interactions. Homodimerization of p53 and PKA were also observed, albeit by one fragment pair each (Fig. 4d-e). The p53-MDM2 interaction was not detected at a statistically significant level by the system showing its limitation in detecting interactions between large domains (Fig. 4c). Frequent criticism of two-hybrid systems centers around the potential for false-positives, or interactions that should not be detected; and false-negatives, that is, interactions that were missed. The AVA-seq system employs various features to mitigate both types of errors. While this does not guarantee that the interactions detected occur in vivo, it increases the likelihood that interactions are not spurious within the system or were simply missed. These features include interactions being observed in both orientations, that is with the bait and prey fragments fused to the DBD and AD and vice-versa.
Additionally, applying a requirement that multiple, overlapping fragments from the same genes be observed to interact decreases the possibility that the interaction is invalid. Removal of auto-activators, those fragments that activate the system regardless of which protein fragment it is paired with, can be removed by applying an upper bound cutoff to the number of fragments it is reported to interact with. While current limitations of NGS fragment lengths require that in general we test fragments rather than full-length genes, we observed benefits from testing multiple fusions at various amino acids positions for each protein. We noted that not all fragment pairs expected to interact were observed and that likely not every fusion point creates a functionally active protein to allow interaction 12 . Thus, using multiple fusion points within a gene, rather than a full-length gene, may help overcome both false-positives and false-negatives. However, the limitation of fragment length was clear in the missed p53-MDM2 interaction likely due to larger fragments being required to span the interaction region 11 . Lastly, the ability to screen with multiple levels of selective pressure in a cost-effective manner should increase the chances of detecting a range of interactions.
The reduction of the potential interaction space by approximately 17-fold (2 mM) and 50-fold (5 mM) to a small portion of interactors is significant. It is even more significant a reduction when considering that 3 pairs of the proteins were expected to interact. The additional interactions detected in this study (Sup. Fig. 3) provide a starting point for future validation studies. Some of these interactions such as that between AKAP5 and p53 show interaction between multiple overlapping fragment pairs with at least one in the reverse-fusion orientation.
With the improvements we have made to the two-hybrid system we envision two uses for AVA-seq. The first, as demonstrated here, is the high resolution protein-protein interaction mapping of a small set of proteins either all-versus-all, or few-versus-all. This might include proteins of unknown function being tested against a whole cDNA library or those from a single pathway. The number of transformants and sequencing depth is readily achievable for high resolution domain mapping. The second application will expand to large all-versus-all interaction screening for entire bacterial genomes and beyond. By utilizing the ORF selection process, the number of screening events have been drastically reduced while maintaining the benefits of fragment-based mapping described above. Indeed, using the high transformation efficiency of a single vector and deep sequencing we believe whole-genome proteininteraction mapping could be achieved by small laboratory groups in a relatively short amount of time.

Design of pBORF vectors
The pBORF plasmid was designed by replacing the AmpR in the pBlueScript II SK (+) vector (Stratagene #212205) with a kanamycin resistance gene just after the AmpR promoter.
The β-lactamase localization sequence and the enzyme-coding portion of β-lactamase were inserted in the multiple cloning site region of the modified pBlueScript II SK (+) vector.
Approximately 25 bp inserts were designed and inserted between the β-lactamase localization sequence and the β-lactamase gene to allow for ORF filtering and to differentiate pBORF associated with DBD (λcI), referred to as pBORF-DBD, from the pBORF associated with AD (RNAP), referred to as pBORF-AD. Using the modified pBlueScript vector from above as a template, primer A and primer B were used to create an insert for pBORF-AD and primer C and primer D to create an insert for pBORF-DBD. Primers were subjected to annealing program (95 °C for 2 min, slow cool to 25 °C) then diluted 1:1000 from 50 µM to 50 nM. To prepare for ligation, the modified pBlueScript plasmid was linearized with primers E and F. Ligation of the linearized plasmid was setup using 3:1 insert to vector where the insert. The resulting ligations created pBORF-DBD or pBORF-AD.
When needed, pBORF-DBD and pBORF-AD were linearly amplified the same day as ligation. The PCR used primers G and H for the pBORF-AD plasmid and primers G and I for the pBORF-DBD plasmid. The column purified linearized pBORF-DBD and -AD vectors were then subjected to Dpn1 (NEB; R0176S) digestion followed by another PCR column cleanup.

Design of pAVA
The pAVA plasmid was constructed from the BacterioMatch II two-hybrid system (AGILENT) vectors pBT and pTRG. First, the AD domain in pTRG was amplified using primers that included XhoI and NotI restriction sites. Restriction digestion was performed for both pBT plasmid and the amplified AD PCR product using XhoI and NotI enzyme sites.
Next, ligation was performed and resulted in the AD in convergent orientation with DBD in pBT plasmid. This new construct will be referred to as pAVA (plasmid all-versus-all). The pAVA plasmid was linearized using primers N and O. Column cleanup was performed after amplification using GenElute PCR Cleanup Kit then subjected to DpnI treatment using standard protocol. To introduce BstX1 restriction enzyme sites, primers P and Q were used.
Ligation was performed with 3:1 insert to vector ratio using linearized pAVA plasmid digested with BstX1 and BstX1 digested insert obtained from P and Q primer amplification.

pAVA controls
Gal11p and LGF2 sequences (sp|P04386|GAL4_YEAST and sp|P19659|MED15_YEAST, respectively) were amplified from the pBT-LGF2 and pTRG-Gal11p control plasmids from the BacterioMatch II Two-Hybrid System and ligated into both pBORF-AD and pBORF-DBD. The final converging positive control constructs in pAVA were made as described above (Fig. 2b-c). The first negative control is the empty plasmid containing AD and DBD without DNA insert (Fig. 2a). The second and third negative controls are the positive control constructs with the addition of one nucleotide to introduce a frame shift to LGF2 only or both LGF2 and Gal11p (Fig. 2d-e). Each control plasmid was tested separately in the presence of 0, 2, and 5 mM 3-amino-1, 2, 4-triazole (3-AT) in DMSO as well as unselected sample which includes no DMSO or 3-AT as an additional control.

Amplification of 6 Human Genes
The following 6 human genes were purchased from Origene as 10 µg stocks in pCMV6 Entry media. This wash of the pellet was repeated for a total of 4 washes followed by resuspension in 5 mL fresh minimal media. The OD 600 of the cells was measured and diluted to OD 600 = 0.05 in 75 mL minimal media. were allowed to grow for 9 h, 250 RPM, 37 °C. After 9 h of growth, OD 600 was measured.
Samples were centrifuged at 3,000 g for 5 min. DNA was extracted using GenElute MiniPrep.

Library Construction
To begin library construction, interacting fragments from the 3-AT selection were amplified from the pAVA plasmid using primers R and S (Sup. Table 1

Preparation of media and reagents for selection 3-AT selection
For each selection experiment, 100 mL of fresh minimal medium was made using the following recipe adapted from the BacterioMatch II system manual. In the following order, 2 mL of 20% glucose, 1 mL 20 mM adenine HCl and 10 mL 10x His-dropout amino acid mix (Clontech, 630415, autoclaved according to manufacturer's directions, stored at 4 °C until use) were combined. Then 100 µL of each of the following: 1 M MgSO 4 , 1 M Thiamine-HCl, 10 mM ZnSO 4 , 100 mM CaCl 2 and 50 mM IPTG were added. After mixing well, 76 mL autoclaved millipure water, 10 mL 10x M9 salts and 100 µL of 25 mg/mL CAM were added.
Adenine, IPTG, Thiamine-HCl and 3-AT aliquots were used once, and any remaining was discarded.

Analysis of significant interactions
A higher growth in the presence of 3-AT at 2 mM and 5 mM concentrations versus 0 mM is indication of a potential protein-protein interaction. Statistical significance of differential growth for each comparison was evaluated from three replicates in each growth condition at a positive predictive value of 95% (FDR < 0.05) using DESeq2 13 . For each comparison, only those protein fragment pairs that were observed in at least 4 of the samples out of 6 replicates were taken for differential growth analysis to estimate the significant differences. DESeq2 performs an internal normalization step where the geometric mean is calculated for each row across all samples and counts in each sample are then divided by the mean. The median of the ratios in a sample is used as the size factor for that sample to correct for the library size and composition bias. Rows containing count outliers are automatically removed using Cooks's distance. In addition, an optimization procedure further removes the fragment pairs with low counts by filtering the rows where mean of normalized counts is below a determined threshold. Finally, a negative binomial generalized linear model is fitted to determine differential growth using the Wald test for significance testing which computes p-value and the adjusted p-values (FDR) for each protein fragment pair. Only fragment pairs that show a positive log2 fold change with an FDR < 0.05 in presence of 3-AT when compared to 0 mM 3-AT were deemed as significantly interacting.  The optical density (OD 600 ) was normalized to 0 mM growth after 9 hours expression.   Tables   Table 1

. Significant interaction pairs of ORF filtered fragments 2 mM vs 0 mM 3-AT.
Significantly interacting ORF filtered fragment pairs are listed with the gene name of protein 1: starting amino acid of the fragment: gene name of protein 2: starting amino acid of the fragment. The first protein in a fragment pair was fused to DBD while the second to AD.
Only fragment pairs that show a positive log2 fold change with a p-adjusted (FDR) < 0.05 in presence of 2 mM 3-AT when compared to 0 mM 3-AT were deemed as significantly interacting.