Sequence Requirements of the GPNG β-Turn of the Ecballium elaterium Trypsin Inhibitor II Explored by Combinatorial Library Screening*

The Ecballium elaterium trypsin inhibitor II (EETI-II) contains 28 amino acids and three disulfides forming a cystine knot. Reduced EETI-II refolds spontaneously and quantitatively in vitro and regains its native structure. Due to its high propensity to form a reverse turn, the GPNG sequence of segment 22–25 comprising a β-turn in native EETI-II is a possible candidate for a folding initiation site. We generated a molecular repertoire of EETI-II variants with variegated 22–25 tetrapeptide sequences and presented these proteins on the outer membrane ofEscherichia coli cells via fusion to the Igaβautotransporter. Functional trypsin-binding variants were selected by combination of magnetic and fluorescence-activated cell sorting. At least 1–5% of all possible tetrapeptide sequences were compatible with formation of the correct three disulfides. Occurrence of amino acid residues in functional variants is positively correlated with their propensity to be generally found in β-turns. The folding pathway of two selected variants, EETI-βNEDE and EETI-βTNNK, was found to be indistinguishable from EETI-II and occurs through formation of a stable 2-disulfide intermediate. Substantial amounts of misfolded byproducts, however, were obtained upon refolding of these variants corroborating the importance of the wild type EETI-II GPNG sequence to direct quantitative formation of the cystine knot architecture.

The Ecballium elaterium trypsin inhibitor II (EETI-II) contains 28 amino acids and three disulfides forming a cystine knot. Reduced EETI-II refolds spontaneously and quantitatively in vitro and regains its native structure. Due to its high propensity to form a reverse turn, the GPNG sequence of segment 22-25 comprising a ␤-turn in native EETI-II is a possible candidate for a folding initiation site. We generated a molecular repertoire of EETI-II variants with variegated 22-25 tetrapeptide sequences and presented these proteins on the outer membrane of Escherichia coli cells via fusion to the Iga ␤ autotransporter. Functional trypsin-binding variants were selected by combination of magnetic and fluorescence-activated cell sorting. At least 1-5% of all possible tetrapeptide sequences were compatible with formation of the correct three disulfides. Occurrence of amino acid residues in functional variants is positively correlated with their propensity to be generally found in ␤-turns. The folding pathway of two selected variants, EETI-␤ NEDE and EETI-␤ TNNK , was found to be indistinguishable from EETI-II and occurs through formation of a stable 2-disulfide intermediate. Substantial amounts of misfolded byproducts, however, were obtained upon refolding of these variants corroborating the importance of the wild type EETI-II GPNG sequence to direct quantitative formation of the cystine knot architecture.
Numerous small proteins, typically not longer than 40 residues in length, share a common structural motif consisting of a cystine knot and a small triple-stranded ␤-sheet. Members of the "knottin" family (1) of small proteins have a common architecture, but diverse biological activities and negligible amino acid sequence identity. Examples are (i) -conotoxin MVIIa, a 26-residue polypeptide found in the venom of the cone snail Conus magus, which acts as a neurotoxin by its high affinity binding to voltage-gated Ca 2ϩ channels (2); (ii) potato carboxypeptidase inhibitor (PCI), 1 a 39-amino acid peptide (3); and (iii) EETI-II from the squirting cucumber Ecballium ela-terium, a member of the squash family of protease inhibitors (4). These proteins are mainly stabilized by three intramolecular disulfide bonds, where the first cysteine residue in the polypeptide chain is connected with the fourth, the second with the fifth, and the third with the sixth (Fig. 1, top panel). The cystine knot formed by these three disulfide bonds is defined by a ring formed by the first and the second disulfide bond in the peptide sequence and the intervening polypeptide backbone, through which the third disulfide bond passes (Fig. 1, bottom panel). Structural alignment of several cystine knot proteins revealed that the second and third disulfide bond together with the three ␤-strands that are interconnected by these disulfide bridges superimpose very well (5)(6)(7). In contrast, the first disulfide bond shows greater structural variation. Therefore, it was proposed that the region stabilized by the other two bridges forms a 2-disulfide motif serving as a basic scaffold (5,8). This view is supported by the finding that the cellulose binding domain of the fungal enzyme cellobiohydrolase from Trichoderma reesei displays the typical cystine knot fold, but contains the central two pairs of cysteine residues only, and lacks the cysteine residues forming the first disulfide bond (9).
Cucurbitaceae seeds have proven to be a rich source of trypsin inhibitors with cystine knot folding motif, thereby defining a family of serine protease inhibitors known as the squash family (10). All members possess an extended amino-terminal inhibitor loop, which is tethered at its amino and carboxyl termini to the cystine knot framework. A structurally and functionally well characterized member of the squash inhibitor family is the E. elaterium trypsin inhibitor II (EETI-II) (1,4). The major features of the EETI-II secondary structure are a short 3 10 -helix for sequence 11-15, a ␤-turn 16 -19, and a triple antiparallel ␤-sheet 20 -28, with a ␤-turn formed by residues 22-25 (4) (Fig. 1). During the folding of EETI-II, formation of a rigid core, which contains two native disulfide bonds (C9-C21, C15-C27) precedes inhibitor loop anchoring. This core, which comprises residues 9 -28, appears to be the direct precursor of the natural, fully oxidized product and is structurally closely related to it (8).
Two-dimensional NMR study of an EETI-II variant, where all six cysteine residues have been replaced by serine, revealed presence of native-like secondary structures for segments 10 -15 (3 10 -helix) and ␤-turns 16 -19 and 22-25, but no native tertiary interactions were observed (12). Hence, it was hypothesized that these native-like local conformations could play a major role early in the folding of EETI-II (12). Folding of EETI-II was found to be a clean and quantitative process (8). Unlike any other protein of the cystine knot family, EETI-II contains the sequence GPNG in the 22-25 ␤-turn segment connecting ␤-strands 2 and 3 (Fig. 1). The quantitative folding of EETI-II has been mainly attributed to the high ␤-turn propensity of this tetrapeptide sequence (13,14), which may facil-itate association of ␤-strands in early folding stages of EETI-II followed by covalent fixation of the tertiary fold by disulfide bond formation (8). In order to investigate the influence of the 22-25 turn sequence on the folding of EETI-II, we have generated a molecular repertoire of EETI-II variants with variegated 22-25 turn sequences and displayed this repertoire on the surface of Escherichia coli cells. This library was then selected for binding to trypsin to assess whether we could generate EETI-II mutants that allow formation of the correctly disulfide-bonded cystine knot framework. We have obtained numerous inhibitory active EETI-II derivatives and analyzed the in vitro refolding kinetics of selected variants compared with the wild type protein.
Library Construction and Selection of Trypsin-binding Variants-The library of variant EETI-II genes was constructed by PCR amplification of the bla-EETI-II gene residing in pHK-EETI-II using the primers AW-blabio and betaturn. Polymerase chain reaction using Tfl polymerase was as follows: 30 s of denaturation at 94°C, 30 s of annealing at 55°C, and 40 s of elongation at 72°C, 30 cycles. The PCR product was used as template DNA for PCR amplification with the primer pair AW-blabio and AW-Etilo. The EETI-II encoding sequence was released from the resulting PCR product by cleavage with AvaI and BamHI. After removal of the bla fragment by binding to streptavidincoated paramagnetic beads (Promega), the library of EETI-II genes was ligated to similarly digested pHK-EETI-CK Send -Iga ␤ with the EETI-CK Send encoding fragment removed by sucrose gradient density centrifugation (17). pHK-EETI-CK Send -Iga ␤ was constructed by ligation of a BamHI/XbaI-cleaved Iga ␤ encoding PCR product (obtained with primers IgAseup and IgAselo from vector pEX1000; kindly provided by J. Pohlner and T. F. Meyer) into similarly digested pHK-EETI-CK Send plasmid. 2 With the ligation mixture, electrocompetent cells of BMH71-18 dsbA were transformed. Cells were grown overnight in rich medium in the presence of chloramphenicol (25 g/ml). 5 ml of medium containing 1 mM isopropyl-1-thio-␤-D-galactopyranoside and 25 g/ml chloramphenicol were inoculated 1:100 with the library cell culture and incubated for 14 h at 37°C. 50 l of the bacterial culture were centrifuged, and cells were resuspended in 10 l of biotinylated trypsin (1 mg/ml, diluted in 50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 5 mM CaCl 2 ) and incubated on ice for 10 min. After washing in the same buffer without trypsin, cells were consecutively labeled with streptavidin superparamagnetic beads and streptavidin, R-phycoerythrin conjugate and subjected to magnetic and fluorescence-activated cell sorting. 2 Cells obtained from the enrichment were lysed by heating to 98°C for 2 min and directly subjected to PCR amplification of the variant EETI-II genes using the primer pair JuFoup and RSPX, which flank the EETI-II gene at its 5Ј and 3Ј ends. The resulting DNA fragment was recloned into pHK-EETI-CK Send -Iga ␤ as described above.
EETI-II Protein Production-EETI-II wild type and selected variants were produced in E. coli as soluble periplasmic proteins via fusion to maltose-binding protein by cloning the respective EETI-II gene as an AvaI/BamHI fragment into similarly digested pMETI-II plasmid. 2 To obtain sufficient amounts of the respective variant for refolding experiments, the respective EETI-II gene was amplified by PCR using the primers PLZMET and RSPX, cleaved with EcoRI and XbaI, and ligated into similarly digested pLZPWB1 (18). E. coli W3110 (18) was used as expression host. Inclusion bodies were obtained from French press lysates of bacterial liquid cultures overnight grown at 37°C in rich medium with 1 mM isopropyl-1-thio-␤-D-galactopyranoside and ampicillin (100 g/ml) added. These were dissolved in 15 ml of 70% formic acid/g of inclusion bodies. After 24 h of incubation with cyanogen bromide (150 mg/g of inclusion bodies), proteins were precipitated with two volumes of diethylether. The precipitate was dissolved by sonication in 8 M urea, 100 mM NaCl, 100 mM sodium phosphate buffer, pH 8.0, and subjected to immobilized metal ion adsorption chromatography. 2 To the EETI-II containing fractions from the imidazole step gradient elution, dithiothreitol was added at 50 mM final concentration and samples were subjected to reversed-phase HPLC using the following conditions. The HPLC column was Waters Bondapak C18 (3.9 ϫ 150 mm); solvent A was water, containing 0.1% trifluoroacetic acid; solvent B was acetonitrile containing 0.1% trifluoroacetic acid. The gradient was 10% B to 37% B linear in 16 min at a flow rate of 1 ml/min. The detector wavelength was set to 217 nm. Samples were freeze-dried and dissolved in 10 mM HCl. Protein concentration was determined by derivatization of fully reduced EETI-II with 5,5Ј-dithio-bis-(2-nitrobenzoic acid) (19).
Folding Experiments-Refolding was performed in 100 mM NH 4 HCO 3 , pH 9.1, at 25°C at a concentration of 3.5 M for EETI-II and 10 M for EETI-␤ NEDE and EETI-␤ TNNK . Folding intermediates were trapped in a time-course manner by mixing aliquots of the sample with 1/20 volume of concentrated phosphoric acid and analyzed by reversedphase HPLC using the conditions described above.
Calculation of Residue Scores-The nucleotide sequence of 40 individual clones of the EETI-␤ XXXX unselected library and of 30 trypsinbinding variants was determined. Of the 480 nucleotides of codons 22-25 from the unselected library, 119 were A, 45 T, 167 C, and 149 G. Using a NN(G/C) coding scheme, one would expect to find in a set of 480 nucleotides 80 A, 80 T, 160 G and 160 C nucleotides. Since the observed distribution deviated significantly from the expected, a correcting weight factor was calculated for a given nucleotide N, as shown in Equation 1, where F N is the scored number of nucleotide N and P N is the expected number.
The probability of obtaining a particular codon was then calculated as shown in Equation 2.
The probability for a particular amino acid residue aa defined by i codons is given as shown in Equation 3.

FIG. 1. Primary and three-dimensional structure of EETI-II.
A, amino acid sequence of the EETI-II peptide. The disulfide bonds found in the native peptide are indicated by solid lines linking the Cys residues. B, the ribbon diagram of mature EETI-II was drawn with the program MOLSCRIPT (11) from the atomic coordinates of one of the two-dimensional NMR solution structures (4). The ␤-turn segment of residues 22-25 is indicated as a bold ribbon.
The score for a particular residue aa is given by Equation 4 (20), where F(aa) is the observed frequency of a residue found in region 22-25 of the set of trypsin-binding EETI-II variants. If the number of residue aa in the obtained data set was found to be zero, it was increased to 0.5 so that F(aa) was never zero.

Display of EETI-II Variants on the Surface of E. coli Cells-In squash inhibitors, the inhibitor loop is held in place
by amino-and carboxyl-terminal cysteine residues tethering it through disulfide bond formation to the structural framework. Removal of any of the three disulfide bonds in EETI-II by cysteine to serine replacement (C2S, C21S, C27S) was found to abolish cystine knot formation and trypsin binding (data not shown). Likewise, misfolded variants of Cucurbita maxima trypsin inhibitor I (CMTI-I) containing three non-native intramolecular disulfide bonds had no inhibitory activity (21). These findings indicated that trypsin binding could be used as an indicative for manifestation of the correct cystine knot framework.
Initial experiments to present EETI-II and derived variants on the surface of phage in order to enrich functional variants via binding to immobilized trypsin were unsuccessful (data not shown). As a practicable alternative, we opted for the presentation of a repertoire of EETI-II variants on the outer membrane of E. coli cells. Cells displaying an EETI-II variant, which retains the ability to bind trypsin can then be isolated by labeling with biotinylated trypsin followed by specific fluorescence labeling with streptavidin, R-phycoerythrin conjugate and enrichment of binders by magnetic and fluorescence-activated cell sorting.
Display of proteins on the surface of E. coli cells requires passage across two membranes, the inner and the outer membrane. The Neisseria gonorrhoeae IgA protease COOH-terminal domain (Iga ␤ ) has been shown to translocate passenger proteins fused to the amino terminus through the outer membrane and anchor them on the bacterial cell surface (22). Iga ␤ integrates into the outer membrane and is assumed to form a translocation pore for covalently attached protein domains. To ensure transport of the EETI-Iga ␤ fusion across the cytoplasmic membrane, we fused in-frame the EETI-Iga ␤ encoding sequence to the gene coding for periplasmically located ␤-lactamase (23). This tripartite gene fusion resides in expression vector pHK-EETI-Iga ␤ under lac promoter/operator control (23). Efficient translocation of a passenger domain fused to Iga ␤ , which contains an intramolecular disulfide bond, was achieved only in an E. coli mutant carrying a defect in the dsbA gene encoding periplasmic disulfide oxidoreductase. This finding suggests that translocation through the outer membrane requires an unfolded conformation of the passenger domain and can be blocked by disulfide loop formation (24). Fluorescence microscopy of BMH71-18 dsbA cells containing pHK-EETI-Iga ␤ treated with biotinylated trypsin, followed by incubation with streptavidin, R-phycoerythrin revealed that only a small fraction of the cell population (approximately 5-10%) were phycoerythrin-labeled (No fluorescent cells were detected with non-presenting control cells (data not shown).). A majority of the cells that carried a phycoerythrin label could be simultaneously stained with propidium iodide (data not shown), which preferentially stains non-viable cells (25). Whether the high mortality of EETI-II presenting cells is the cause or the result of the cell surface presentation of the ␤-lactamase-EETI-II fusion protein is currently not clear. Nevertheless, the finding that EETI-II producing cells labeled with biotinylated trypsin/streptavidin, R-phycoerythrin conjugate were distinguishable by FACS from cells producing an EETI-II variant that is unable to bind trypsin (Fig. 2) prompted us to use indirect fluorescent trypsin labeling followed by magnetic and fluorescent-activated cell sorting to identify and isolate cells carrying a functional EETI-II protein.
Construction of an EETI-II Library and Selection of Trypsinbinding Proteins-A library of EETI-II genes was generated by randomizing PCR, where the EETI-II codons 22-25 were randomized using a NN(G/C) coding scheme. These variant EETI-II genes were ligated to pHK-EETI-Iga ␤ . Transformation of BMH71-18 dsbA yielded 2.2 ϫ 10 7 individual transformants, which represents a 20-fold coverage of the theoretical number of possible combinations (32 4 ). Cells from 10 individual transformants of the EETI-␤ XXXX repertoire were tested for their ability to bind anti-␤-lactamase antibody and trypsin. Upon labeling with anti-␤-lactamase antibody followed by incubation with biotinylated anti-rabbit antibody and streptavidin, R-phycoerythrin conjugate, at least 10% of the cells of each clone revealed a phycoerythrin label, while no specific label was obtained with biotinylated trypsin (data not shown). This indicates that only a fraction of the variants in the repertoire has trypsin binding activity. To enrich cells presenting trypsinbinding EETI-II variants on their cell surface, 5 ϫ 10 7 cells of the library were subjected to labeling with biotinylated trypsin followed by incubation with streptavidin-coupled colloidal superparamagnetic microbeads. After washing, cells were incu- bated with streptavidin, R-phycoerythrin conjugate. Prior to FACS, labeled cells were presorted by magnetic separation by passage through a high gradient magnetic separation column 2 (26). Since the colloidal magnetic particles are too small to be detected by the flow cytometer, the 8 ϫ 10 6 cells eluted from the magnetic separation column could be immediately subjected to FACS and were run through a MoFlo cell sorter (Cytomation) and sorted on the basis of fluorescence intensity. After immediate resorting, 65% of the 450,000 cells obtained fell into the positive window (Fig. 3).
To rescue the EETI-␤ XXXX genes from the non-viable cells, the EETI-II genes encoded in the enriched population of bacterial cells were amplified by PCR from cell lysates and ligated into pHK-EETI-Iga␤ expression vector. After transformation, overnight cultures from 120 individual transformants were labeled with trypsin and subjected to flow cytometer analysis. Clones were classified as trypsin-binding, when at least 4% of the cells fell into the positive window. Of the 120 clones, 58 were positive for trypsin binding.
Characterization of Trypsin-binding EETI-II Variants-The nucleotide sequence of the respective EETI-II gene was determined for 30 of the 58 positive clones exhibiting trypsin binding activity (Table I). 40 clones from the initial, unselected library were also sequenced to compare the distribution of residues in the 22-25 region before and after enrichment of trypsin-binding EETI-II variants (data not shown). A score was calculated for each residue, which reflects the deviation of the occurrence of a particular amino acid in the data set of residues 22-25 of trypsin-binding EETI-II variants from the probability to find the respective residue in the initial unselected library. A negative value indicates underrepresentation of the respective amino acid, a positive value overrepresentation. The 20 amino acids were ordered according to their overall ␤-turn potentials, which have been determined by Hutchinson and Thornton by classification of ␤-turns from proteins with known three-dimensional structure (14). As can be seen from Fig. 4, amino acid residues that are rarely found in ␤-turns are also underrepresented in the set of functional EETI-II variants, while asparagine, which has the second highest overall ␤-turn potential, is most frequently found (26/180). Exceptions to this correlation are cysteine and proline.
To quantitatively assess the trypsin binding activity of the selected peptides, six EETI-II variants together with the EETI-II wild type protein were produced in soluble form via secretion into the E. coli periplasmic space by fusion to maltose-binding protein. 2 These proteins were purified from osmotic shock fluid of the bacterial cells by immobilized metal ion adsorption chromatography. Very similar dissociation constants toward trypsin were measured for MalE-EETI-II wild type and the selected library variants (Table I).
We have further characterized the disulfide-coupled folding of EETI-II wild type, of the trypsin-binding variants EETI-␤ NEDE and EETI-␤ TNNK , and of EETI-␤ ATVF , which was obtained from library screening as a variant devoid of trypsin binding activity. To obtain sufficient amounts of protein for the study of refolding kinetics, proteins were produced in E. coli as insoluble inclusion bodies via fusion to a modified ␤-galactosidase shortened to 449 residues, where all methionine as well as cysteine residues are replaced by leucine (18). By PCR amplification of the respective EETI gene, methionine codon 7 was replaced by a leucine codon. Simultaneously, a single methionine codon was introduced at the junction between the lacZ and the EETI-II gene to allow release of the EETI-II moiety from the fusion protein by cyanogen bromide cleavage. EETI-II variants were isolated by immobilized metal ion affinity chromatography in the presence of 8 M urea, completely reduced by addition of 50 mM dithiothreitol, and finally purified by reversed-phase HPLC. To study the folding kinetics of EETI-II, EETI-␤ NEDE , EETI-␤ TNNK , and EETI-␤ ATVF , respectively, completely reduced variants were allowed to refold by air oxidation in 100 mM NH 4 HCO 3 , pH 9.1. Samples were withdrawn at various time points from a folding reaction, quenched by mixing with 1/20 volume of concentrated phosporic acid, and subsequently analyzed by HPLC. Fig. 5 shows HPLC profiles of the disulfidebonded forms trapped after various times of refolding. Under the chromatographic conditions used here, molecules with the largest fraction of exposed nonpolar surface area are expected to elute late from the column. In accordance with this expectation, the fully reduced peptide of each variant eluted later than any other species containing disulfide bonds, while the native form eluted first from the column. With EETI-II wild type, over 90% of the reduced form yielded the native peptide after overnight oxidation. EETI-␤ NEDE and EETI-␤ TNNK , however, were only obtained in yields of approximately 65%. The non-trypsinbinding EETI-␤ ATVF behaved in a completely different manner. This peptide eluted in its reduced form from the reversed-phase column in several overlapping peaks, which indicates aggregation and oligomer formation. Both the reduced and the oxidized forms of the protein absorbed to a large extent to walls of the reaction vial. Furthermore, refolding of EETI-␤ ATVF resulted in a large number of various species, and no predominant HPLC peak corresponding to the native form was found (data not shown).
For EETI-II wild type refolding, the distribution of species was dominated within 5 min by a major component. This folding intermediate was identified by Castro and co-workers as dihydro-2,19 EETI-II, a species that lacks the (C2-C19) disulfide bond and is structurally closely related to the native molecule (8). The peak of the intermediate disappeared within the next 120 min to the benefit of the peak corresponding to the native product. A predominant folding intermediate with HPLC retention times very similar to the dihydro-2,19 intermediate of wild type EETI-II emerged in the same manner during the folding reaction of EETI-␤ NEDE and EETI-␤ TNNK together with several other species. This indicates that folding of these variants most likely proceeds through the same folding pathway as the wild type EETI-II molecule with the dihydro-2,19 species being the major intermediate. Of all three folding reactions, the rate-limiting step is the formation of the (C2-C19) disulfide bond. Compared with wild type EETI-II, EETI-␤ NEDE folds slightly and EETI-␤ TNNK exceedingly more sluggish, requiring overnight incubation for formation of the native product. DISCUSSION Work in other laboratories has previously demonstrated that several of the squash inhibitors can correctly form their disulfides in vitro with yields varying among different species (1,21,28). EETI-II folds quantitatively simply by air oxidation, while another inhibitor of the squash family under similar conditions yields only about 60 -80% of native protein (21). We have recently shown that EETI-II folds in high yield in vivo when expressed in E. coli via secretion into the periplasmic space, where oxidizing conditions prevail. 2 Comparison of the amino acid sequences of ␤-strands connecting turns of squash inhibitors and other cystine knot proteins reveals an unique feature of EETI-II, which is the GPNG sequence of segment 22-25. This segment forms a typical type I ␤-turn connecting ␤-strands 2 and 3. Type I is a frequently found ␤-turn type with near-helical , values that can be adopted by any amino acid (14). The structure of the corresponding region in C. maxima trypsin inhibitor I, -conotoxin, and kalata BI, a 29-residue peptide from the tropical plant Oldelania affinis DC with cystine knot fold shows similar linkage of ␤-strands 2 and 3 by a type I turn, where the central residues lie in the ␣-region of a combinatorial library screening The combinatorial library of EETI-II variants differing in the amino acid sequence of the 22-25 turn region was presented on the surface of E. coli cells. Cells presenting an EETI-II variant capable of trypsin binding were trypsin-labeled and isolated by combination of magnetic and fluorescence-activated cell sorting. The amino acid sequences of 30 EETI-II variants were deduced from the nucleotide sequences of the respective EETI-II genes residing in cell surface display vector pHKBLa-EETI-Iga␤. The EETI-II variants displayed in the first column were produced as maltose-binding protein fusions and purified to homogeneity by metal chelate affinity chromatography. Varying amounts of MalE-EETI fusion protein were incubated with 5 nM trypsin for 10 min at 37°C in 100 l of 50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 5 mM CaCl 2 . Residual trypsin activity was monitored by following the linear release at 412 nm of 4-nitroaniline from added Boc-Leu-Gly-Arg-pNA (0.4 mM). The dissociation constants Ϯ S.D. were calculated as described (27). 22 4. Occurrence of amino acid residues in the 22-25 segment of trypsin-binding EETI-II variants. For each residue, the score was calculated as described under "Experimental Procedures," which reflects the preference to be found in the set of trypsin-binding EETI-II variants compared with the unselected library. Residues are ordered according to their overall turn potentials (14). These indicate the preference for each residue to occur in a ␤-turn generally and are averaged over a large data set of ␤-turns from various proteins and over all classified turn types (14).
Ramachandran plot (5). Unlike these proteins and all other 20 known squash inhibitors, EETI-II contains at position i ϩ 1 of the ␤-turn a proline residue. Proline is by far the most favored residue at the second position because of the restriction of its angle to about Ϫ60°. The other preferred residues at the second position, glutamic acid and serine, can stabilize the angles by forming a hydrogen bond between their side chain oxygen atoms and the main chain amide (14). Indeed, many of the squash inhibitors contain a glutamic acid or serine residue at the respective position (11/21).
The high turn forming propensity of the GPNG sequence was corroborated in the EETI-II sequence context by the finding that a well defined local structure (reverse turn) of the 22-25 segment can be observed in a Cys 6 3 Ser 6 EETI-II variant lacking all six cysteine residues (12). This segment has therefore been invoked as a candidate for a folding initiation site (12). To elucidate the possible function of that segment to guide the formation of the correct fold of EETI-II, we have generated a molecular repertoire of residue 22-25 variants and selected EETI-II derivatives that retain the ability to promote cystine knot formation. Of the 5 ϫ 10 7 cells presenting a particular EETI-II variant on their outer membrane, about 1.5 ϫ 10 5 proved to be functional trypsin binders. Taking into account that in our experimental system only about 5-10% of positive cells can be detected by fluorescence labeling, one can estimate that the percentage of functional variants is approximately as much as 1-5% of all possible tetrapeptide sequences. The moiety of EETI-II variants forming a correct cystine knot may even be higher, since we cannot exclude the possibility that some residue 22-25 variants possess the canonical tethered cystine knot but display reduced affinity to trypsin due to conformational changes of the inhibitor loop and may therefore be lost during the FACS enrichment of trypsin binders.
From the 30 sequences analyzed, no consensus sequence can be deduced. However, a clear preference for hydrophilic amino acid residues is found (Fig. 4). We have compared the occur-rence of residues in the sequenced data set of functional variants with their propensity to be generally found in ␤-turns. Hutchinson and Thornton have identified and classified 3899 ␤-turns using a nonhomologous data set of 205 protein chains (14). These data were used to derive ␤-turn overall and positional potentials for the different turn types. As shown in Fig.  4, a correlation between the overall ␤-turn potential of a particular residue in the whole data base of high resolution structures and its occurrence in the ␤-turn of folded EETI-II variants exists. Exceptions to this rule are cysteine and proline. No cysteine residue was found in the sequenced data set of functional EETI-II variants. Sequence analysis of unselected clones revealed the presence of a variant containing the sequence N 22 CRH 25 , which proved to be nonfunctional (data not shown). This is not unexpected, since the additional cysteine residue in the ␤-turn raises the number of possible combinations of three disulfide pairs in a molecule from 15 to 48, thereby expanding the possibilities for misfolded species with non-native disulfide bonds considerably. Proline was only found twice in the sequences of trypsin-binding variants, less frequently than expected from its overall ␤-turn potential. In type I turns, proline is frequently found at position i ϩ 1 of the loop, but occurs rarely at positions i ϩ 3 and i ϩ 4 (14). In the 22-25 segment of EETI-II variants, proline residue in the i ϩ 1 position is obviously not necessarily required and may be even disfavored in positions i ϩ 3 and i ϩ 4, which might in total account for its relatively low occurrence.
We have investigated the folding kinetics of EETI-II wild type and variants EETI-␤ NEDE and EETI-␤ TNNK by in vitro refolding the purified fully reduced variants and subsequent acid trapping the products at various time points. Differences in the yields of native protein notwithstanding, the folding pathway of the EETI-␤ NEDE and EETI-␤ TNNK variants is fundamentally indistinguishable from that of EETI-II wild type protein. With all three proteins, a predominant folding intermediate emerges during early stages of folding. The EETI-II wild type intermediate has been shown to be the stable (C9-C21, C15-C27) intermediate lacking the (C2-C19) disulfide bridge (8). Two-dimensional NMR studies showed that the intermediate is very similar in structure to the native EETI-II (8). Our finding that dihydro-2,19-EETI-II and the EETI-␤ NEDE intermediate isolated by acid trapping were both converted to fully folded EETI-II without accumulation of any other intermediate supports the notion that this form is the direct precursor of the natural product (data not shown). Dihydro-2,19-EETI-II was the predominant folding intermediate already after 0.5 min of refolding (data not shown). However, progression from the fully reduced state to the main folding intermediate and to the native product occurs more sluggishly with the two EETI-II ␤-turn derivatives. The process of (C2-C19) disulfide bond formation represents the major rate-limiting step in the folding of EETI-II and the variants with different ␤-turn sequences. This may be attributed to the fact that loop crossing has to occur at this last step in the folding pathway. In native EETI-II, the macrocycle made up of two disulfides linking the sequences of C 9 KQDSDC 15 and C 21 GPNGFC 27 is penetrated by the disulfide bridge (C15-C27) thus forming a tight pseudo-knot structure (5). Fixation of the amino-terminal 6-residue inhibitor loop (3-8) by (C2-C19) disulfide bond formation occurs after formation of the structural framework. As a consequence, formation of the cystine knot fold is largely independent of the length and amino acid sequence of the inhibitor loop. This renders EETI-II an ideal scaffold for the presentation of a repertoire of conformationally constrained peptides aimed at isolating variants with novel binding characteristics. 2 The folding pathway of EETI-II contrasts remarkably with that observed with other proteins of the cystine knot family that share striking structural but little sequence homology, i.e. -conotoxin MVIIA and PCI (29 -32). The folding mechanism of PCI can be dissected into two steps (32). The sequential flow of fully reduced PCI through equilibrated one-and two-disulfide species results in formation of equilibrated scrambled species. In the final and major rate-limiting step, PCI reorganizes to attain the native disulfide structure. From these findings, it was concluded that there is not a predominant folding pathway for PCI (32). Experiments performed with toxins derived from the Conus species reached similar results (29 -31). These conotoxins are able to refold correctly with efficiencies ranging from 15% to 50%. The distribution of equilibrated disulfide-bonded species compared with the native form observed under optimal reshuffling conditions indicate that the stability of the native conformation relative to other forms is only marginal. Furthermore, in contrast to EETI-II, the forms with two native disulfides appear to be largely devoid of folded structure. Both in PCI and in conotoxins, there appears to be little specificity on the formation of the initial disulfides during the folding reaction. Again, the non-native two-disulfide forms are able to form scrambled non-native three-disulfide species which can only interconvert through partial reduction. Hence, without any thiol reshuffling reagents like GSH present, folding progresses to fully oxidized scrambled species which become trapped due to the concomitant loss of free thiols acting as reshuffling catalysts. Folding of EETI-II, however, occurs by air oxidation without accumulation of scrambled species and does not require addition of thiol compounds as catalysts.
Our data show that several thousand of the 160,000 possible tetrapeptide combinations in the 22-25 ␤-turn segment of EETI-II are compatible with the formation of the correct cystine knot fold. In contrast to EETI-II wild type, however, which refolds nearly quantitatively, the 24-h samples of EETI-␤ NEDE and EETI-␤ TNNK refolded by air oxidation contain substantial amounts of misfolded byproducts and yields of correctly folded proteins are only around 65%. This is strikingly similar to the refolding by air oxidation of CMTI-I, which contains a LEHG ␤-turn sequence (21). In conclusion, the GPNG sequence of EETI-II directs the clean and quantitative formation of the dihydro-2,19 EETI-II intermediate, which is finally oxidized to the native three-disulfide form. Hence, it may be interesting to see whether transplantation of that sequence into the sequence context of other cystine knot proteins like PCI or conotoxins might influence their folding pathway and the yield of correctly folded protein.