Application of the “Codon-shuffling” Method

Library-based methods of non-rational and part-rational designed de novo peptides are worthy beacons in the search for bioactive peptides and proteins of medicinal importance. In this report, we have used a recently developed directed evolution method called “codon shuffling” for the synthesis and selection of bioactive proteins. The selection of such proteins was based on the creation of an inducible library of “codon-shuffled” genes that are constructed from the ligation-based assembly of judiciously designed hexamer DNA duplexes called dicodons. Upon induction with isopropyl 1-thio-β-d-galactopyranoside, some library members were found to express dicodon-incorporated proteins. Because of this, the host cells, in our case Escherichia coli, were unable to grow any further. The bactereostatic/lytic nature of the dicodon proteins was monitored by growth curves as well as by zone clearance studies. Transmission electron microscopy of the affected cells illustrated the extent of cell damage. The proteins themselves were overexpressed as fusion partners and subsequently purified to homogeneity. One such purified protein was found to strongly bind heparin, an indication that the interaction of the de novo proteins may be with the nucleic acids of the host cell, much like many of the naturally occurring antibacterial peptides, e.g. Buforin. Therefore, our approach may help in generating a multitude of finely tuned antibacterial proteins that can potentially be regarded as lead compounds once the method is extended to pathogenic hosts, such as Mycobacteria, for example.

The objective of recasting or constructing proteins from scratch to achieve altogether new entities that are then able to carry out a preordained catalytic or structural function is currently an area of intense scientific research. That small proteins or peptides can be constructed based on the first principles of secondary structure design has been known for some time, largely through some seminal contributions of DeGrado and co-workers (1) and others (2,3). This so-called de novo protein design is the rational approach toward the creation of functional proteins, and it is perhaps only now, with the realization of the hitherto unreachable requirements of computational power, that one is witness to an increasing number of functional peptides and proteins being successfully generated de novo (4,5). On the other hand, non-rational and part-rational approaches toward creating de novo proteins designed for specific purposes seem to be much more powerful methods at generating functional proteins, especially for in vivo biological requirements (6,7). Methods like the "phage display" (8) and "mRNA display" (9) increasingly allow one to effectively generate and then subsequently pan non-rational peptide and protein libraries in order to select for the right protein. In situations where an effective assay-based selection system is available, part-rational and non-rational approaches may be more effective than a rational approach in the isolation of a functional protein, largely because the part-rational and non-rational approaches mimic evolutionary pathways of selection (of the fittest). In this context, two such studies are worthy of special attention.
A little more than a decade ago, Hecht and co-workers (10) put forth a most novel hypothesis, that of binary patterning of proteins. According to this simple concept, it is the binary patterning of polar (E) and non-polar (•) amino acids in the designed protein and not the exact identity of those amino acids that eventually dictates the secondary structure of the designed protein. The binary-patterning concept may be viewed as a part-rational approach, whereby one can predetermine the library folds but not the exact folds of its individual members. Indeed, de novo protein libraries based on binary patterning have furnished many functional as well as structural proteins (11,12), although problems of a suitable assay system, as well as the limitation of a fixed length of library members, have yet to be addressed satisfactorily. The second seminal study by Mekalanos and co-workers (13) involves the generation of nonrational peptide libraries based on degenerate oligonucleotide design. Their work elegantly demonstrated that an inducible non-rational DNA library, once translated in vivo, could lead to the selection of a functional peptide. This so-called ABBIS approach was successfully used for isolation of a number of peptides that, upon overexpression in the host cell, led to severe growth attenuation of the host. The ABBIS method too has its limitations that are primarily because of its use of fixedlength degenerate oligonucleotides as a starting point for making a peptide library. Nonetheless, it remains a powerful example of a non-rational approach toward the de novo synthesis of useful proteins, in their case antimicrobial peptides (AMP). 1 Of late, there is an urgent need for generating newer more potent AMPs, given their newfound importance in the context of widespread antibiotic resistance among human pathogens (14 -16). AMPs have been isolated from a variety of species, including around 40 odd AMPs from humans (15,17). Thus far, around 700 AMPs have been isolated and characterized (18) and many among them are in advanced clinical trials as drug candidates against bacterial and fungal infections (15,19). A general characteristic of AMPs is their net positive charge that is believed to be crucial for gaining entry into negatively charged bacterial outer membranes, although lately, some anionic AMPs have also been isolated (20,21). Consequently, many models for bacterial entry have been postulated (17,22). However, recent DNA-microarray studies on bacteria affected by AMPs have suggested that cell wall lysis may not be the sole mode of AMP action, given that the expression of a host of cytoplasmic genes is severely affected upon AMP cell entry (23,24). Notwithstanding the uncertainty surrounding their action, it is clear that AMPs represent a class of molecules that may in the near future provide an effective alternative to the currently used antibiotics (14). Therefore, we believe that an approach that leads to the synthesis of a diverse pool of AMPs and one that addresses the earlier problems associated with AMP synthesis would be of considerable help.
In this report, we describe one such approach that is based on the application of a directed-evolution technique that we recently developed called codon shuffling (25). We also describe how, through codon shuffling, libraries are generated that: (a) can be both non-rational as well as part-rational; (b) have members large enough to be classified as proteins and not just peptides; (c) are not restricted by limits of length of their corresponding genes; and (d) can be preferentially skewed in amino acid attributes like charge and hydrophobicity, thereby narrowing down the search for a successful AMP.

EXPERIMENTAL PROCEDURES
Construction of Expression Plasmid pDNp28 -An unrelated DNA fragment of 875 bp in length from the plasmid pET21c was PCRamplified using pfu polymerase and the oligonucleotides 5Ј-aaccatgggttacgtacatttccgtgtcgccctt-3Ј and 5Ј-aactcgagtacgtaccaatgcttaatcagtgaggc-3Ј as forward and reverse primers, respectively. The PCR product was cloned in pBluescriptSRF vector, and the fragment was excised using NcoI and XhoI restriction enzymes and cloned in pET28a vector previously cut with the same two enzymes. The resulting plasmid was isolated and digested with SnaBI to finally yield the expression plasmid pDNp28. Plasmid pDNp28 carries the SnaBI site into which dicodon (DC) fragment libraries can be inserted. The site is immediately downstream of the ATG start codon and just upstream of the His 6 tag.
Construction of ds Hairpin-encapsulated DC Fragment (HPDN) Library-For the initial part of library construction, the protocol followed was as described earlier (25) with one variation. Immediately after the mixing of 14 DCs, the solution was made to 7.5% polyethylene glycol and the DC mixture was incubated at 4°C for 24 h. To this mixture was added 100 pmols of 5Ј-phosphorylated and PAGE-purified ds hairpin (5Ј-tttaaacacgtggcggccgctctagaggcccgcgcgggcctctagagcggccgccacgtgtttaaa-3Ј) that had been self-annealed earlier. The ligation temperature was increased to 16°C, and the incubation prolonged for another 12 h. The ligation mixture DNA was precipitated and extracted once with phenol/chloroform. The resuspended DNA was digested with XbaI for 4 h at 37°C, after which 1 l of the digested DNA was used as a template for PCR using the 5Ј-phosphorylated oligonucleotide 5Ј-agcggccgccacgtgtttaaa-3Ј that served both as a forward and reverse primer. The PCR products were eluted using DEAE membrane and fractionated based on their lengths (50 -400 bp). The purified fragments were used directly as inserts for creation of de novo libraries.
Construction of Expression Plasmid pTEMDNp28 -A 420-bp long DNA fragment from the TEM-1 gene of plasmid pSC1 (25) was PCRamplified using pfu polymerase and the oligonucleotides 5Ј-aagcatgcaaggagatggcgcccaacagtccc-3Ј and 5Ј-cctacgtatgcacccaactgatcttcagcatc-3Ј as forward and reverse primers, respectively. The PCR product was cloned in pBluescriptSRF vector, and the fragment was excised using NcoI and SnaBI restriction enzymes and cloned in pDNp28 vector previously cut with the same two enzymes. The resulting plasmid was designated pTEMDNp28. Plasmid pTEMDNp28 carries the SnaBI site into which DC fragment libraries can be inserted. The site is immediately downstream of the TEM-1 secretion signal sequence and just upstream of the His 6 tag.
Heparin Binding Studies-Procedures relating to the construction of GST-SKAMP1 gene and subsequent purification of the corresponding protein are described under supplemental data. 20 g of pure GST-SKAMP1 protein was loaded onto a column containing heparin-Sepharose. The column was washed extensively with buffer C. Elution buffer (buffer C with an NaCl gradient from 250 mM to 1 M) was then poured over the column, and fractions were collected.
Western Blot-The eluted fractions, as well as the heparin resin, were probed with anti-GST antibodies (1:25000 dilution) for 1 h. Nitrocellulose membrane was then washed four times with 1ϫ phosphatebuffered saline plus Tween 20 (each wash of 5-min duration). Secondary probing was done with anti-rabbit IgG heavy and light chain APconjugated antibodies (1:5000 dilution) for 1 h. After washing the membrane with 1ϫ phosphate-buffered saline plus Tween 20, the blot was developed with 5-bromo-4-chloro-3-indolyl phosphate/nitro blue tetrazolium substrate.

RESULTS AND DISCUSSION
Exploring the Possibility of Using Codon Shuffling for Constructing de Novo Protein Libraries-The codon-shuffling method was initially conceptualized by us as a directed-evolution technique. However, because the method is based not on the selection of a progeny through inducement of random mutations in the parent gene as in gene shuffling (26) but rather on extending a truncated parent gene with new DNA fragments that are constructed de novo, codon shuffling can also be viewed as a method for creating stand-alone new DNA fragments. This is because codon shuffling fundamentally entails randomly assembling 6-bp DNA duplexes (called DCs) from a set containing 14 such dicodons (Table I), each of which are judiciously chosen to encode amino acid pairs, such that in an equimolar dicodon pool, all 20 natural amino acids are represented and in proportions that mirror their usage in Escherichia coli. Furthermore, there is no restriction on the lengths of the resulting DNA fragments. Consequently, the amount of degeneracy achievable from ligation of the dicodons in making DNA fragments of ϳ250 -300 bp in length is astronomically high and much beyond what may be required to exhaust all of the possible protein folds. Therefore, at first glance, codon shuffling represents a useful method for generating de novo protein libraries.
Here, it is pertinent to mention one disadvantageous feature of peptide libraries made from degenerate oligonucleotides. In addition to the problems of length restriction (the upper limit being 100 nt on average, resulting in 30 -40-amino acid-long peptides) as well as random appearances of stop codons, such peptide libraries yield few folded peptides, thereby severely limiting their usefulness (27,28). One reason why few peptides from such libraries are able to fold properly may be the completely non-rational nature of their parent DNA fragments that may result in a peptide that contains a string of polar (EEEEEE . . . ) or non-polar (•••••• . . . ) residues, rather than an arrangement of such residues that promotes a secondary structure feature Indeed, by predetermining the degeneracy of parent oligonucleotides, such that the resulting peptides may possess polar/non-polar arrangements, Hecht and coworkers (10, 29) were able to generate a library of peptides wherein most members displayed a well-folded behavior. Consequently, the library size in such situations may not be nec-essarily large to begin with, much like in codon shuffling. One possible explanation in the codon-shuffling context may be that the 14 dicodons themselves possess a range of inherent secondary structure-forming attributes because of the manner in which they are paired, such as polar/non-polar, non-polar/polar, non-polar/turn, non-polar/non-polar, and other such combinations (Table I). For example, Glu-Leu or Asp-Ile repeats yield a binary pattern (E•E•E•E•E•E•) that represents ␤-sheet structure, whereas the string Met-His-Asp-Ile-Met-His-Glu-Leu-Ser-Thr yields a binary pattern (•EE••EE•EE) that would preferentially form an ␣-helix. Additionally, the appearance of the dicodons Trp-Pro, Pro-Gly, or Gly-Ala may induce turns or breaks, whereas sputtering of Cys-Ala within the DC protein sequence may create intramolecular or intermolecular disulfide bridges.
To further investigate this hypothesis, we undertook a search of all of the proteins in the Protein Data Bank to find the location of dicodon pairs within protein structures. The search string was established as a dicodon-dicodon pair (DC-DC) amounting to a search of a total of 196 DC-DC motifs (14DC ϫ 14DC). The DC preferences for particular secondary structure elements are listed in Table I. Predictably, the proline-containing DCs (Trp-Pro and Pro-Gly) were preferentially found in turn elements, whereas DCs that were formed of amino acid pairs of high ␣-helix forming propensity, such as Phe-Glu, for example, were found to be embedded in ␣-helices. Noteworthy, only 11% DC-DC pairs were found in unstructured and random coil elements. This may indicate that a de novo DC protein library would contain proteins that possess secondary structure elements and are therefore well folded as opposed to proteins from a degenerate oligonucleotide library. To test our hypothesis, we cloned codon-shuffled genes in specially designed vectors (see "Experimental Procedures") using the equimolar DC-protocol described previously (25). Two representative clones were sequenced and shown to be wholly DC incorporates (Supplemental Fig. S1A). When overexpressed in E. coli, the clones yielded DC proteins that were preferentially found in the insoluble fractions. Nonetheless, because of reasonably good expression, the proteins could be routinely purified by metal-affinity chromatography in 8 M urea buffer. Postdialysis, one of the proteins could be solubilized in as little as 300 mM urea (Supplemental Fig. S1B).
These results indicated to us that de novo DC protein librar-ies were viable at generating reasonably large quantities of proteins of expected sizes and that subsequent application of a selection system could fish out desirable proteins. However, one caveat needed to be addressed before codon shuffling could be used efficiently for constructing de novo protein libraries.
Because codon-shuffled DNA fragments were ligated directly to expression vectors for their eventual translation, the numbers of such fragments available for cloning were not all exhaustibly represented in the DC library since there was only one ds DNA fragment (i.e. insert fragment) of each type available. In other words, the impossibility of experimentally cloning each and every ds DNA fragment meant that the available complexity of the DC library was being compromised. Use of ds DNA Hairpins for Amplification of Codon-shuffled Fragments-To address the above-mentioned problem, we envisaged the use of ds DNA hairpin that would firstly cap the two exposed ends of the DC-assembled DNA fragment and then be used as a template for fragment amplification using oligonucleotides that are complementary to the hairpin sequence (30). Moreover, the ds hairpin, in addition to its primary role of amplifying DC fragments, could also be used as a scaffold for encapsulating DC proteins. Protein scaffolds have previously been employed to tighten randomly generated de novo peptides in hopes that the peptides may fold better as their N and C termini are no longer free (31,32). One such scaffold, the thioredoxin (trx) protein, has been employed on numerous occasions (13,33,34). However, whether the peptides are as effective once they are removed from their scaffolds or, more worringly, whether the de novo peptide activity is partly due to the scaffold itself are some of the issues that remain largely unaddressed. However, one could be prudent and reduce the deleterious effects of the scaffold, simply by keeping its size to a minimum. Using a large protein scaffold may stimulate the de novo protein to take a shape that without the scaffold would be altogether different. Therefore, in designing the ds hairpin, we focused on what was our initial goal: to isolate proteins that act as an antibacterial.
A significant number of isolated AMPs are ␣-helical in nature, and many that are not become so once they interact with bacterial membranes (15,35). Therefore, it would be helpful to construct AMPs over an ␣-helical scaffold. As a starting point, we envisaged that the ds hairpin should code for a string of 6 -7 amino acid residues. After a preliminary study of exploring binary patterns for the scaffold, we settled on the pattern shown in Fig. 1. The binary pattern, E•EE••E, along with its "antisense" pattern, •EE••E•, are both predicted to form an ␣-helix based on the Hecht hypothesis (10). We then searched the Protein Data Bank to document the presence of these two patterns in the various fold elements. Interestingly, both patterns show a generous preference for ␣-helices, thereby confirming the practicability of using Hecht binary patterning for de novo peptide design (Fig. 1). As a next step, we embedded some unique restriction enzyme (RE) sites (none of which ap-pears in any given DC fragment) within the ds hairpin sequence, such that they would not disturb the binary pattern of the translated hairpin (Fig. 1B). The final sequence of the designed hairpin (shown in Fig. 1) coded for the following two sequences, "SGRHVFK" and "FKHVAAA," depending upon which hairpin strand was translated. Importantly, both these sequences conform to an ␣-helix-forming Hecht pattern.
Construction of de Novo DC Protein Libraries with the Hairpin Scaffold-Our next objective was to experimentally investigate the merit of using ds hairpin for amplifying DC frag- FIG. 1. A, search-string sequences denoted by the drawn Hecht patterns that are predicted to form ␣-helical secondary structure elements. The Protein Data Bank was searched using "regular expressions" wherein n represents any of the following non-polar residues Leu, Val, Ile, Phe, Met, Gly, and Ala and p represents polar residues Asp, Glu, Lys, His, Arg, Ser, and Cys. The search was carried out from the Medical Research Council server. Boldface numbers within parentheses depict the percentages of particular search strings found within helices. B, illustration of ds hairpin design and its use as a scaffold for DC proteins. Unique restriction enzymes sites that cannot appear within any possible DC fragment were incorporated within the hairpin sequence and are as shown. The two strands of the annealed hairpin are shown in normal and boldface, as is the corresponding translated amino acid hairpin sequence. ments as well as for ably providing a scaffold for the DC proteins. As a starting point, we added a molar excess of ds hairpin to a mixture of equimolar DC pool (see "Experimental Procedures" for details). This was followed by amplification of encapsulated DC fragments with a 5Ј-phosphorylated oligonucleotide that was complimentary to the ds hairpin and therefore could act as both forward and reverse primers for the purposes of amplification. The PCR products were separated based on their length (50 -400 bp) and ligated directly with SnaBI-cut expression vector pDNp28. Upon transformation of E. coli BL21(DE3) followed by selection on kanamycin, we obtained colonies roughly to the order of 10 4 . We selected and sequenced some representative clones and found all of them to indeed contain DC sequences that were flanked on either end by the ds hairpin. The corresponding DC protein library showed little sequence similarity among its members and was vastly variant in terms of length and primary sequence attributes ( Fig. 2A). To test the protein expression levels of our DC library, we induced growing cultures of a representative clone HPDN5 with IPTG. We observed overexpression of HPDN5 protein and were subsequently able to purify the protein using Ni-NTA chromatography (Fig. 2B). These results firmly indicated to us that the generation of DC proteins from ds hairpin-encapsulated DC fragment library represents a viable route toward isolation of de novo proteins that are expressed from their corresponding genes at good levels and therefore can be easily purified.
Finally, we wanted to explore a case where the DC protein is encapsulated by the traditional, often employed trx scaffold. Therefore, we ligated the hairpin-encapsulated DC fragment library with an expression vector (pTRXDNp21) that carried the modified trx gene wherein a unique SnaBI site had been positioned within the trx sequence that corresponded to the region between the two cystines (see supplemental data for details). The DNA of many of the obtained clones was sequenced and found to contain hairpin-encapsulated DC fragment sequences expectedly positioned within the SnaBI site (Supplemental Fig. S2A). One representative member of the trx-DC library was further analyzed to check for protein expression. Upon induction with IPTG, the protein trxHPDN3 was found to be generously overexpressed. However, most of the protein was found to be in the insoluble fractions, albeit the protein could be purified to homogeneity under denaturing conditions (8 M urea) using Ni-NTA chromatography (Supplemental Fig. S2B). Upon dialysis, we found the protein to be soluble in a minimum of 150 mM urea. Circular dichroism studies of THPDN3 along with native trx protein showed that the chimera displayed a predominantly ␣-helical content (helix, 24.1%; sheet, 10.0%; turn, 34.4%; random, 31.4%), much like the parent trx protein (helix, 21.5%; sheet, 14.3%; turn, 31.3%; random, 32.8%; Supplemental Fig. S2C). Without the use more incisive techniques such as x-ray crystallography or NMR, it is not possible to speculate whether the DC protein has folded outside the trx scaffold or whether the obtained CD data are for a single protein entity. Further work in this direction is currently underway in our laboratory.
In summary, we have investigated the usefulness of DC proteins as candidates for de novo protein libraries. We have also found that it is worthwhile to encapsulate the DC proteins using scaffolds. Of the two scaffolds that we used for this purpose, we found the trx scaffold to yield chimeras that were insoluble and thus of little use in vivo.
Design and Synthesis of de Novo Anti-bacterial Proteins-Having put in place a method for the design, synthesis, and selection of de novo proteins with two different scaffolds, we now focused our efforts on the synthesis of de novo proteins that would act as an antibacterial. Our strategy was to select for AMPs in vivo by identifying those transformed bacterial colonies that were unable to grow in the presence of IPTG in contrast to robust growth in the absence of the inducer. Additionally, we decided to tether the DC proteins with an Nterminal signal sequence for the following reason. Because the AMP site of action could be the cytoplasm, the periplasm, or indeed the extracellular environment of the host, in our case E. coli, we felt that it was important not to exclude, by a method of selection, DC proteins that were acting at any of the abovementioned three sites. For example, by not choosing to tether the DC library with an N-terminal signal, all of the DC proteins would be restricted to the cytoplasm of the cell. On the other hand, choosing the ompA signal sequence as the primary tether for the AMPs, all DC proteins would be directed toward the extracellular environment of the host cell. As a result, AMPs that would have had a cytoplasmic or a periplasmic site of action would not be selected.
Keeping this viewpoint in mind, we decided to employ the N-terminal signal sequence of transmission electron microscopy-1 (TEM-1) ␤-lactamase as the primary AMP tether. This was because we and others (25,36) have shown that, under induction conditions, the TEM signal sequence directs transport of the ␤-lactamase protein to all three regions of the bacterial cell (the cytoplasm, the periplasm, and the extracellular environment) in the approximate ratio of 70:25:5. Therefore, by choosing the TEM-1 signal, we would not be restricting the site of action of de novo AMPs to a particular cellular environment. The expression vector that would accept the incoming hairpin-encapsulated DC fragments was redesigned to include the segment of TEM-1 gene that codes for the 23-amino acid-long N-terminal signal (see "Experimental Procedures"). This vector (Supplemental Fig. S3, pTEMDNp28) was digested with SnaBI and introduced to an equimolar hairpin-encapsulated DC mixture. The ligation mixture was used to transform E. coli BL21(DE3)-competent cells, and the cells were plated onto media containing kanamycin (the vector marker). We obtained a library size of ϳ10 3 colony-forming units, the size varied by a factor of a fold when the experiments were repeated, presumably because of differences in ligation and transformation efficiencies. The colony-forming units were then replica-plated onto plates that, in addition to kanamycin, also contained 0.5 mM IPTG. Although the majority of colonies were able to display robust growth, we were able to isolate four plate I colonies that showed no growth on plate II (kanamycin ϩ IPTG). The sequencing of DNA isolated from the colonies showed that all of them were hairpin-DC sequences. Predicted translation products as well as physical properties of the three unique sequences are shown in Fig. 3. There is a wide variance seen in the protein length and pI among the three proteins named EQAMP1, EQAMP2, and EQAMP3. As a control, the DNA from a colony that grew well on both plates, I and II, was also sequenced (cEQ1). Consensus secondary structure predictions on all four proteins displayed a dramatic difference between the three proteins and the control protein (Fig. 3). Whereas EQAMP1-3 were predicted to be rich in ␣-helical content, the control protein, cEQ1, was predicted to be rich in random coils. This incongruity among the three EQAMP proteins and the control made us revisit the initial HPDN DC protein library (Fig. 2). Indeed, all HPDN1-8 proteins were predicted to have a random coil content Ͼ55%. Therefore, it is tempting to suggest that the EQAMP proteins, because of their in vivo AMP behavior, display pronounced helical fold characteristics that stand in contrast to the predicted fold characteristics of non-selected DC proteins.
Confirmation of AMP Behavior of EQAMP Proteins-To confirm beyond reasonable doubt that the in vivo AMP activity witnessed was due to DC proteins and nothing else, we carried out the following experimental checks. (a) Plasmid DNA from the strains exhibiting growth inhibition upon addition of IPTG were isolated, purified, and used to transform a fresh lot of E. coli competent cells. The addition of IPTG to growing cultures of the freshly transformed cells also displayed growth inhibition, thereby pointing to the fact that the growth inhibition was a property of the plasmid DNA used for transformation. (b) The DC fragment within each plasmid was amplified using universal primers that bound to only the vector regions flanking the DC fragments and nowhere else. The amplified products were digested with restriction enzymes that cut within the vector sequences and ligated to freshly cut pET28a vector. The resulting plasmids (that should be identical to the starting plasmids) were used to investigate growth inhibition and were all found to exhibit the same. This ruled out growth inhibition properties by any means (such as contamination and other factors) other than the DC fragments themselves. (c) The supernatant of IPTG-induced growth cultures of EQAMPs was isolated and spread on agarose plates containing E. coli DH5␣ lawns, and the plates were incubated for a period of 14 -18 h at 37°C. The absence of any plaque formation ruled out phagemediated inhibition as the reason for growth inhibition of EQAMP cultures. (d) As a final check, we wanted to confirm that the growth inhibition was due to DC proteins and not because of their DC-mRNA. For this purpose, the unique NcoI site in each EQAMP plasmid (that also contained the ATG start codon) was cut and filled in using Klenow polymerase and the plasmids were self-ligated. This resulted in a frameshift starting from the NcoI ATG codon, which no longer yielded the DC proteins as the translated products. However, the mRNA in each case remained identical to the wild-type EQAMP mRNA. When the resulting plasmids EQAMPnco1-3 were used, no growth inhibition was seen (Supplemental Fig. S4), thereby confirming that it was the DC protein and not the DC-mRNA that caused inhibition.
Monitoring of Growth Inhibition by EQAMPs-The extent of inhibition by EQAMP1-3 was directly followed by "growthinhibition" curves. Again, the experiments were carried out in combination with extensive controls that were as described: pET28a vector; the scaffold vector pDNp28; E. coli BL21(DE3) strain; and the scaffold vector containing a dummy gene (in our case trx), the translated product of which does not cause growth inhibition. The control curves as well as those for EQAMP1-3 are shown in Supplemental Fig. S5 and Fig. 4, respectively. Also shown in Fig. 4 is the growth curve for the positive control cEQ1. As can be seen, there is a near complete growth inhibition upon IPTG addition in EQAMP samples, whereas there is little or no difference between the induced and uninduced curves for the controls, including cEQ1. Interestingly, the EQAMP samples seemed to display a bacteriostatic rather than a bacteriolytic behavior, because small aliquots of induced cultures taken at the endpoints when used as seed samples to inoculate IPTG-less media resulted in robust growth (results not shown). Identical growth inhibition results were also obtained when we employed "zone-clearance" studies. Small filter paper discs that were resting on growing lawns of EQAMP1-2 cultures were gently soaked with IPTG, and the plates were incubated for 14 -18 h. Clearance zones appeared in EQAMP1-3 samples that were not present in the control plates (Supplemental Fig. S6), thereby confirming that the inhibition is inducer-driven and is the result of DC protein expression.
Identification of DC Proteins Responsible for Growth Inhibition-It is well recorded that gratuitous overexpression of a protein can also lead to severe growth inhibition or bacteriostasis of the host (37). To rule out the possibility that the growth inhibition we were witnessing was because the EQAMP proteins were being hugely overexpressed upon IPTG induction, we decided to look for the proteins in lysed cultures of EQAMPs as well as those of cEQ1. The collection of reasonable amounts of biomass for the above-mentioned purpose proved to be a tough challenge because of the bactereostatic nature of the expressed DC proteins. Nonetheless, pellets of 1-liter cultures could routinely be lysed and an adequate amount of total protein was extracted for SDS-PAGE analysis. We could not detect EQAMP1-3 proteins at their expected sizes through Coomassie Blue staining. On the other hand, the control cEQ1 protein was readily visible under the same conditions (with and without the TEM-1 signal; Fig. 5A). However, because all of the DC proteins are expressed as C-terminal His tag fusions, proteins EQAMP1-3 could be detected in Western blots using monoclonal anti-His antibodies (Fig. 5B). The presence of minute amounts of EQAMP proteins at their expected sizes and large amounts of the control cEQ1 protein provided strong evidence that the growth inhibition of the host was not because of gratuitous overexpression of DC proteins, but rather a consequence of the antibacterial nature of the latter. All of the efforts to isolate purified EQAMP proteins have as yet proven to be unsuccessful. In vitro transcription/translation of EQAMP genes may however yield sufficient amounts of the proteins for us to carry out further investigations on their antibacterial nature as well as their structure, and such efforts are currently underway in our laboratory.
As stated earlier, two major factors, namely protein charge and its tertiary structure, are responsible for AMP activity of peptide and proteins. Total charge profiles of EQAMP proteins provided an inconclusive picture. Whereas EQAMP1 was predicted to be positively charged at physiological pH, EQAMP2 and EQAMP3 possessed a pI of 6.6 and 6.5, respectively. However, an inspection of charge distribution of the latter two proteins provided for a possibility that these proteins may possess a domain structure wherein one domain could be positively charged (results not shown). Also, the charge state (whether negative or positive) at physiological pH of multiple histidine residues that are present in all EQAMPs could affect the overall charge of the proteins. In the matter of tertiary structure, consensus secondary structure predictions of the three EQAMP proteins label them as rich in ␣-helical elements (Fig. 3). Indeed, if one were to assume this prediction as correct, the helical-wheel projection drawing of EQAMP2, for example, nicely illustrates the demarcation of its residues, non-polar in the interior and polar at the exterior of the assumed all-helical bundle (Supplemental Fig. S7). However, presently, these are at best conjectures. Isolation of purified EQAMP proteins and their subsequent structural characterization would lead us to an accurate representation. As a next best alternative, we isolated representative AMPs (EQAMP2 and SKAMP1) as trx fusion proteins and obtained their CD spectra (Fig. 6). The AMP fusion proteins displayed a well folded structure as indicated by the CD data for EQAMP2 (helix, 31%; sheet, 0%; turn, 38%; random, 31%) and for SKAMP1 (helix, 29%; sheet, 4%; turn, 34%; random, 33%; Fig. 6B).
Direct Visualization of the in Vivo Effect of EQAMPs-Although growth inhibition experiments had earlier indicated to us that the EQAMP1-3 proteins were most probably bacteriostatic, rather than bacteriolytic in nature, we decided to investigate by TEM the state of the host cell post-IPTG induction with a non-induced cell culture as a reference control. TEM results (Supplemental Fig. S8, A-D) illustrated to us that induced samples displayed a morphology that was markedly different from the non-induced samples. In addition to the disintegration of cell wall, widespread cytoplasmic contraction was also clearly visible. The latter is generally caused when a membrane protein of a cell is affected or sequestered or when the nucleic acids of the cell are targeted (17). We decided to investigate this phenomenon further by studying the possible association of EQAMPs with heparin (heparan sulfate). It is well known that proteins that bind heparin also bind nucleic acids (38,39). In addition, many AMPs have been shown to bind heparin directly (40). Although obtaining pure EQAMPs had proved earlier to be unsuccessful, we were able to obtain good quantities of soluble EQAMPs as fusion partners of the GST protein. As a representative example of such fusion proteins, we decided to investigate the SKAMP1-GST fusion protein (see below for the synthesis and isolation of SKAMP1 protein). Purified SKAMP1-GST (molecular mass of 36.7 kDa) was able to bind to heparin at varying pH conditions, and the fusion protein could partly be eluted with buffer containing 1 M NaCl (Fig. 7). Some of the protein was seen bound to heparin resin, even after elution with the above-mentioned buffer, indicating very strong binding. It has previously been shown that GST on its own does not bind to heparin (41), a finding also confirmed by us using purified GST protein as a control (Fig. 7). Therefore, these results indicate association of AMPs with a negatively charged moiety (in this case heparin) that could be either a nucleic acid or indeed a negatively charged membrane protein. Further studies in this direction are ongoing.
Generating AMPs Using Skewed DC Libraries-As mentioned earlier, one exclusive advantage of codon shuffling over other methods of library generation is that skewed libraries can be generated wherein proteins can possess desired physical or chemical attributes on account of the excess presence or complete absence of one or many dicodons. We decided to take advantage of this facet of the codon-shuffling method by constructing an HPDN protein library from a DC mixture wherein the positively charged dicodons Lys-Leu and Arg-Thr were increased three times in proportion to the other 12 dicodons. Following the same method for synthesis and selection of AMPs as described previously, we obtained (from an initial library size of 10 2 ) one colony that was not able to grow on plate II (kanamycin ϩ IPTG). The predicted sequence of this protein, designated SKAMP1, is shown in Fig. 3. The protein is extremely positively charged (pI 10.1) and is predicted to preferentially possess ␣-helical elements. Growth inhibition and zone clearance studies with SKAMP1 yielded results similar to those with EQAMPs, indicating that expression of SKAMP1 was also severely deleterious to the host cell ( Fig. 4 and Supplemental Fig. S6). Thus, the isolation of an AMP from a skewed DC library is a step in the direction of tailoring protein libraries to meet specific needs. We are currently studying the prospect of creating "severely skewed" libraries wherein the negatively charged dicodons have been altogether removed from the DC mixture.

CONCLUSIONS
In this report, we have described an application of the codonshuffling method toward the creation of de novo protein libraries of use. From the initial idea of using the DC set for making de novo proteins, we have streamlined the method for library creation by designing tailored hairpin scaffolds that encapsulate the DC proteins favorably. The method that we have described results in de novo proteins and not just peptides. Because of this, we believe that the library is much less amenable to proteolysis by cellular proteases. The longer average length of the library members also presumably enhances protein folding. Secondly, we have also explored an exclusive property of the codon-shuffling method, which is the allowance for skewing library properties by inclusion, exclusion, or predominance of some DCs. We believe that the ability to skew a protein library further narrows the protein space that needs to be explored. Therefore, one can design skewed libraries to isolate proteins for specific needs. For example, a protein library of argininerich sequences may be generated by an overwhelming use of the Arg-Thr dicodon. Polyarginine peptides have lately gained much interest for their ability to act as cell-penetrating peptides that act by binding to the cell surface of a pathogen (42). Furthermore, the recent sequencing of organisms such as Mycobacterium tuberculosis and Plasmodium falciparum has unearthed many virulence-determining proteins that are exclusively composed of short sequence repeats (43). It is not known what selective advantage this may provide the pathogen. Proteins rich in DC-repeats constructed using our method could serve as a model for studying this hitherto unexplored natural phenomenon. Finally, our proof-of-concept experiments relating to the generation of AMPs against E. coli can be conveniently extended to pathogenic microorganisms like M. tuberculosis. The molecular targets of AMPs then can be identified using DNA microarrays. Such studies we believe would provide a novel direction toward discovering new molecules against common pathogens and, importantly, would further extend the scope of de novo protein design as an important chemical tool for tackling generic problems in medicine.