Combinatorial Analysis of the Structural Requirements of theEscherichia coli Hemolysin Signal Sequence* 210

We have investigated the substrate specificity of the Escherichia coli hemolysin transporter system. Translocation of hemolysin is dependent on a C-terminal signal sequence located within the last 60 amino acids of this protein. Previous comparative studies of the signal sequence have revealed a conserved helix(α1)-linker-helix(α2) motif, suggesting that secondary structure is important for transport. In this study, we generated three random libraries in the α1, linker, and α2 regions, as well as an α1-amphiphilic helical library to identify features buried within the structural motif that contribute to transport. Combinatorial variants were generated by altering the primary sequence of specific regions, and correlation between the genotype and phenotype of the mutant populations allowed us to objectively identify any functional features involved. It was found that the α1-amphiphilic helix and the linker are both important for function. To our surprise, the second helix of the conserved structural motif was not essential for transport. The finding that a predicted amphiphilic helix and hydrophobicity, rather than primary sequence, contribute to transport in the α1 region allows us to speculate on the mechanism of multiple substrate recognition. This may have implications for understanding the broad substrate specificity common among other ATP-binding cassette transporters.

The transport of substrates across biological membranes is an essential function of all cells. The ATP-binding cassette (ABC) 1 transporter superfamily of proteins plays an active role in this process. They are found in all kingdoms of life, and are involved, for example, in amino acid uptake, phosphate import, protein secretion, polysaccharide export, ion transport, and cellular drug efflux (1,2). Because of the high degree of sequence homology among members of this superfamily, it is believed that the fundamental mechanism of transport among ABC proteins is likely to be similar. One of the most captivating questions in the field of ABC transporters is the phenomenon of broad substrate specificity, either as a group or within single members. For instance, in the latter case, two of the best characterized ABC proteins, P-glycoprotein and multidrug resistance protein, actively pump a wide variety of chemotherapeutic agents out of cells (3). Overexpression of these proteins contributes to the multidrug resistance phenotype in many types of cancer in humans.
In the present study, we have investigated the substrate specificity of the Escherichia coli hemolysin system. The hemolysin system is a well characterized bacterial ABC exporter (4). Studies to investigate substrate specificity in this system have been facilitated by the fact that hemolysin is a protein substrate, which allows for easy genetic manipulation. Secretion of hemolysin is directed by a signal sequence that is located within the C-terminal 60 amino acids. This short peptide can be secreted by itself (5), or as demonstrated by fusion protein studies, is sufficient to guide the translocation of foreign proteins directly from the cytoplasm to the outside of cells (6 -9).
Defining the specificity of hemolysin transport has been elusive, however. Although the system apparently has one unique substrate, it is able to tolerate a wide variety of primary sequences. Extensive point mutations and minor deletions have been created in the signal sequence but few have had a dramatic effect on transport efficiency (7,10). This has led to the proposal that there exists a handful of critical residues scattered throughout the signal sequence that are specifically recognized by the complex to trigger the transport process (11,12). Thus, it is assumed that the other amino acids in the signal sequence are able to accommodate a wide range of changes.
At the same time, work done in our laboratory has demonstrated that the signal sequence of hemolysin can be replaced by that of Pasteurella hemolytica leukotoxin and still retain wild-type secretion, providing evidence that the two C-terminal signal peptides are functionally equivalent (13). Comparison of the two signal sequences revealed that they share very little primary sequence homology; however, there appears to be a common predicted helix-linker-helix motif (Fig. 1). This structure has been confirmed by circular dichroism (9) and 15 N nuclear magnetic resonance studies (14). Both of these peptides exhibit similar biophysical properties in that they appear unstructured in an aqueous environment, but they assume an amphiphilic helical secondary structure under certain membrane mimetic environments. A similar motif has also been reported for the C-terminal 56 residues of Erwinia chrysanthemi protease G (15), which is secreted by a transport system analogous to the hemolysin system. This conservation of secondary structure among different organisms has led to the hypothesis that the helix-linker-helix structural motif within the signal sequence may be a prerequisite for transport. Such a secondary structure may be required for the signal sequence to interact with a "binding pocket" in the transporter complex.
The critical residues model and the conserved structural model present two distinct, although potentially overlapping, explanatory frameworks for understanding how the hemolysin signal sequence is recognized by its transporter complex. If both models are correct, it may mean that the "critical" amino acids are presented to the presumptive binding pocket of the transporter in the appropriate three-dimensional orientation as dictated by the conserved secondary structure. In this study, we have investigated the validity of these models in greater detail. We have utilized a combinatorial approach to systematically investigate the range and nature of primary sequences that can be accommodated by the hemolysin transporter system. Three random libraries (␣1, linker, and ␣2) were created based on the structural anatomy of the conserved helix-linkerhelix motif. The ability to support transport was then measured in the random variants. This approach has allowed us to generate a very large number of variants, creating a rich data set with the potential to substantiate or refute the above models. For example, we have made the surprising observation that the second helix (␣2) conserved in the signal sequence is not required for transport, while the amphiphilic nature of the first helix (␣1) is a critical determinant of function.
Construction of Plasmids-The pUCAC494 plasmid containing the hlyA gene was constructed as follows: pUC19 was digested with SalI and HindIII, blunt end ligated, and then digested with BamHI and KpnI to facilitate the directional insertion of a BamHI-KpnI fragment (2.9 kilobases encoding HlyA) from pLG583sk/#2 (9).
The plasmid pLGBCD containing hlyB, hlyD, and hlyC was constructed as follows: pLG575 (16), a pACYC derivative encoding HlyB and HlyD, was digested with SalI, and the ends were fixed with Klenow. The blunt ended product was subsequently ligated with fragments from PvuII-digested pLG570 (16), a plasmid that contains hlyA, hlyB, hlyC, and hlyD, and transformed into cells harboring pUCAC494. Plasmid DNA was obtained from transformants that were hemolytic (and thus contain a full complement of the hemolysin genes) and re-transformed into E. coli to facilitate the isolation of pLGBCD.
Random oligonucleotide mutagenesis involved insertion of a cassette containing random sequence into the coding region of the signal sequence at two flanking restriction sites. For practical reasons, we limited ourselves to assemble cassettes of 185 bp or less (maximum length of each oligonucleotide synthesized by Life Technologies, Inc. is 100 bp, Ϫ15 bp of overlap for annealing. This implies that the distance between the two restriction sites must be less than 185 bp.) To meet this requirement in the construction of linker random library, two sites were engineered within pUCAC494 to create pUCAC494BN.
Construction of pUCAC494BN from pUCAC494 consisted of two steps. The first step involved the creation of a BstBI restriction site in the ␣1 region of the hemolysin signal sequence. Primers ASalC494 (5Ј-TCGATCGTGAACACCTTGGAAGGTAACGCG-3Ј) and HlyA-BstBI-R (5Ј-CAGCTGAAATGATTTTCGAAATTTCATTAATTAATG-3Ј, underlined region represents nucleotides changed) were used to generate a 97-bp fragment from pUCAC494. This PCR product was used directly as a primer, in conjunction with M13R1 (5Ј-AAAACGACGGC-CAGTGAATTC-3Ј) to amplify a 319-bp fragment from pUCAC494. The resulting reaction product was purified with QIAquick PCR purification kit (Qiagen, Inc.), then digested with KpnI and SalI, and finally inserted into the pUCAC494 at these two restriction sites to generate pUCAC494B.
The second step involved creation of an NheI site in the ␣2 region of pUCAC494B. Primers HlyA-NheI-F (5Ј-GAAAGATCTGCCGCTAGCT-TATTGCAGTTGTCC-3Ј) and M13R1 were used to amplify a 198-bp fragment from pUCAC494. This reaction product was purified with a QIAquick PCR purification kit, and ligated into the KpnI and BglII sites of pUCAC494B. The resulting vector, pUCAC494BN differed from pU-CAC494 in that it contained two additional unique restriction sites, BstBI (in the ␣1 region) and NheI (in the ␣2 region). These sites were designed such that the original amino acid sequence remained unchanged. The hemolytic zone of pUCAC494B and pUCAC494BN were determined to be identical to that of pUCAC494.
The plasmid pUCAC494-adapter was generated by replacing 89 bp of the hemolysin signal sequence with a 27-bp adapter. The two oligonucleotides, forward adapter (5Ј-TCGACGCGGCCGCTTTAATAGTGAT-CGATGA-3Ј, underlined region represents complementary sequence where annealing occurs) and reverse adapter (5Ј-GATCTCATCGATC-ACTATTAAAGCGGCCGCG-3Ј), were used to form a double-stranded cassette and inserted in pUCAC494 at the SalI and BglII sites. pU-CAC494-adapter was used for the generation of ␣1-amphiphilic helical variants, which allowed selection of positive clones by colony PCR based on differences in band size.
Cloning of Random Library Mutants-For each of the three random libraries, a pair of oligonucleotides (Life Technologies, Inc.) was annealed to form a cassette, which was facilitated by the presence of a 15-base pair complementary sequence. Each cassette contained the mutated target sequence and flanking regions, and was used to replace the wild-type sequence. Insertion of each cassette into pUCAC494 or pUCAC494BN was facilitated by the presence of two unique restriction sites located at the ends of the cassette (Fig. 2).
Oligonucleotides were dissolved in 10 mM Tris, pH 8.5, to a concentration of 100 pmol/l. 10 l of each was then mixed with its partner, heated to 80°C for 5 min, cooled to 25°C (ramp time 1 h), and incubated at 25°C for 1 h. The annealed oligonucleotides were filled in with Klenow, and purified using a QIAquick PCR purification kit (Qiagen), before restriction digestion. Table I illustrates the cloning plasmids, restriction sites, and oligonucleotide sequences for each library. The ligation product was transformed into Top10FЈ E. coli and the transformants were screened for the proper insertion of combinatorial cassette using colony PCR.
Since random oligonucleotide mutagenesis would likely destroy many of the previously mapped restriction sites within the targeted region (for example, there is only a 1/1024 chance for a specific 6 FIG. 1. The helix-linker-helix motif of the E. coli hemolysin signal sequence. Residues representing the two conserved helices are underlined. In this study, the ␣1 and linker regions (f) were found to contain some critical elements of transport, while the ␣2 region (Ⅺ) could tolerate almost any amino acid combinations.

FIG. 2. Cloning of the combinatorial signal sequence variants.
Shown above is a linear representation of the DNA sequence of the hemolysin signal sequence coding region, with the unique restriction sites and names of different regions labeled. Note that the BstBI and NheI sites are only present in pUCAC494BN. Using the ␣1 region as an example, the design of a combinatorial cassette was illustrated. Two oligonucleotides (dark lines), HlyA-␣1-F and HlyA-␣1-R, were annealed to each other. A double strand cassette was formed after the Klenow reaction (dotted line). The second oligonucleotide contained a string of Ns, which represents the degenerate region.
nucleotide restriction site to appear at the same position), amplification of a PCR fragment and subsequent digestion with a selected restriction enzyme would allow the selection of clones with a PCR product that cannot be cleaved, and thus likely contain a random sequence. For the ␣1, linker, and ␣2 random libraries, a 579-bp fragment was amplified from each colony using primers HA24 (5Ј-GATTTCCGGGACGTTGCC-3Ј) and M13R1, and then subjected to digestion with PstI, BglII, and HpaII, respectively. All positive colonies were grown overnight and plasmid DNA of each variant was prepared with the Quantum miniprep kit (Bio-Rad) following the manufacturer's instructions. DNA sequencing was performed on each variant to obtain the genotype.
Cloning of Amphiphilic Helical Variants-The cloning procedure of ␣1-amphiphilic helical variants was slightly different from that of random variants. Four oligonucleotides were mixed at equimolar concentrations and incubated at 95°C for 3 min, then left to cool at room temperature. The annealed oligonucleotides were incubated at 70°C for 1 h with Taq DNA polymerase, purified with a Qiaex II Gel Isolation Kit (Qiagen), and then digested with BglII and SalI. The double-stranded, digested cassette was purified using the QIAquick Nucleotide Removal Kit (Qiagen). Table I illustrates the cloning plasmid, restriction sites, and oligonucleotide sequences for the construction of the amphiphilic helical library. The ligation product was transformed into JM83 cells containing pLGBCD and plated on blood agar plates.
All transformants were picked on a plate regardless of hemolytic zone and subjected to colony PCR. Unlike the random variants, selection of ␣1-amphiphilic helical variants was based on the size of PCR fragment rather than the absence of a restriction site. This was facilitated by the use of pUCAC494-adapter as a cloning vector, which contained a 63-bp deletion within the signal sequence coding region. Insertion of an ␣1 cassette into this plasmid would restore its length to the original size. The primers HA23 (5Ј-GACGGCAGGGTAATCACA-3Ј) and M13R1 were used to amplify a 411-bp fragment from a positive candidate and a 348-bp fragment from a negative clone. All colony forming units on a given plate were analyzed by PCR, regardless of the size of the hemolytic zone in order to generate an unbiased library of helices. Upon identification of amphiphilic helical clones, the plasmid DNA was isolated (mixture of pUCAC494 and pLGBCD) and re-transformed into JM83 selected with ampicillin in order to obtain a pure source of plasmid for DNA sequencing.
DNA Sequencing-Plasmid DNA was quantitated with the use of an SSF-600 solid state fluorimeter (Tyler Research Instruments Corp.) by the ethidium bromide fluorescence assay described in the instruction manual. DNA samples were prepared with ABI PRISM BigDye terminator cycle sequencing ready reaction kit according to manufacturer's instruction, and analyzed on a 310 Genetic Analyzer (PE Biosystems). The forward sequence was obtained for all variants using ␣SEQ (5Ј-GACGGCAGGGTAATCACACC-3Ј) as primer. The reverse sequence of selected variants was obtained using M13R1.
Blood Agar Plate Assay-The hemolysin secretion level of each variant was determined using blood agar plate assay. Five different plating conditions were tested: 20 ml of LB agar with 1, 2, or 5% defibrinated sheep blood (PML Microbiologicals), or two-layered plates with 10 ml of plain LB bottom agar and 10 ml of 2 or 5% blood LB top agar. In general, high secretors could be resolved better on high percent blood plates. Plates with double layers were found to provide a better contrast. The optimal condition was established to be 10 ml of LB bottom agar with 10 ml of 5% blood LB top agar, and these plates were used to determine the phenotype of all clones.
Since an average of 100 variants were assayed for each library at the same time, the blood agar plate assay proved to be the only feasible method for phenotype determination because it is quick and convenient. Plasmid DNA isolated from each variant was transformed into JM83 E. coli containing pLGBCD and spread on blood agar plate in triplicate. After an incubation period of 19 h at 37°C, each variant was assigned a zone rank from 0 (no hemolysin) to 6 (wild-type) by comparing to a set of standards. To control for lysis, cells transformed with pUCAC494 and pLGCD (i.e. without HlyB) had a zone size of zero (no secretion), and were identical to cells transformed with pLGBCD only (i.e. without HlyA). Attributes such as hemolytic zone size, brightness, and colony size were all taken into account. All plates were examined twice separately. Altogether, six readings were taken for each variant. The average and standard deviation were calculated to provide an indication of the secretion level and variability, respectively. Assignments were most consistent at the two extremes (0, 1, and 6), with the variability greatest for clones secreting at levels 3 and 4. It should be emphasized that the scale for zone assignment (rank 0 to 6) was non-linear, and that the numbers obtained were only semiquantitative.
All variants within one library were assayed the same day to minimize any inconsistencies. In addition, selected variants from each of the three random libraries, as well as all amphiphilic helical variants, were plated together in a final round to facilitate a direct comparison between the different libraries. To determine the reproducibility of this assay, averages taken on the first and second occasions were analyzed. Out of a total of 54 random variants, 11 were given the same assignments as before. Thirty-nine mutants had averages that differed by less than one, and five other variants had averages that differed by more than one. The Pearson correlation coefficient between the two sets of measurements was 0.96 (data not shown).
Data Analysis-DNA sequences were translated and analyzed with the Wisconsin Package Version 9.1, Genetics Computer Group (GCG), Madison, WI. Additional Perl scripts were written by Dr. Eric Cabot and David Hui to facilitate bulk analysis. Peptool version 1.1 (Biotools Inc., Edmonton, Canada) was used for helix hydrophobic moment determination.
SDS-Polyacrylamide Gel Electrophoresis and Western Blotting-To determine the amount of endogenous hemolysin, JM83 bacteria were transformed with plasmids encoding the hemolysin variants and harvested at A 600 ϭ 0.85 Ϯ 0.05. Following centrifugation (7,000 ϫ g for 15 min), cell pellets were resuspended in STE buffer (10 mM Tris, pH 8.0, 150 mM NaCl, 1 mM EDTA) supplemented with various protease inhibitors. For each sample, an equivalent of 200 l of cells was boiled for 3 min and ran on an SDS-polyacrylamide gel (7.5% separating) under reducing condition. After standard Western transfer and blotting procedure (with anti-hemolysin antiserum at 20,000 ϫ dilution and goat anti-rabbit antibody at 10,000 ϫ dilution (Jackson ImmunoResearch Laboratories, Inc.)), visualization was achieved by using the enhanced chemiluminescence Western blotting detection reagents (Amersham Pharmacia Biotech). It was found that the majority of the combinatorial mutations in the signal sequence did not have a dramatic effect on the quantity of intracellular hemolysin (data not shown).
Enzyme-linked Immunosorbent Assay-The amount of hemolysin secreted into the medium was quantitated by emzyme-linked immunosorbent assay. Mutant plasmids were transformed into JM83 bacteria harboring pLGBCD and harvested at A 600 ϭ 0.85 Ϯ 0.05. After centrifugation (3,000 ϫ g for 5 min), the supernatant was concentrated with Centricon-30 filtration columns (Amicon). The concentrate was then applied to immulon-2 microtiter plates (Dynatech Labs) that had been pre-blocked with phosphate-buffered saline containing 1% bovine serum albumin and 0.05% Tween 20 and incubated at 37°C for 2 h. Upon removal of the supernatant, anti-hemolysin antiserum (diluted 5,000 ϫ with phosphate-buffered saline containing 0.1% bovine serum albumin and 0.005% Tween 20) was added and the plates were incubated at room temperature for 2 h. Following this, the primary antibody was replaced by goat anti-rabbit antibody (diluted 10,000 ϫ with phosphatebuffered saline containing 0.1% bovine serum albumin and 0.005% Tween 20) and incubated at room temperature for 1 h. For detection, the ABTS (Sigma) method was used following the manufacturer's instructions.

Nucleotide and Amino Acid Distributions of Random Library
Variants-A random library was generated in each of the ␣1, linker, and ␣2 regions by replacing the targeted gene-coding segment with a string of degenerate nucleotides (G, A, T, C) of the same length (Table II). The resulting variants had a random combination of amino acids that replaced the wild-type sequence in the targeted region. Upon isolation of the variants, the genotype was determined by sequencing of the entire secretion signal. Each mutant was classified into one of three classes upon examination of the DNA sequence: those of the intended design (simply called random mutants), those that had at least one stop codon (stop mutants), and those that contained unanticipated mutations elsewhere in the signal sequence (not analyzed). For a region with k residues, 20 k different random variants and ⌺20 x (where x ϭ 0 to kϪ1) distinct stop variants could be generated theoretically.
To obtain an idea of the randomness of the resulting variants, the nucleotide frequency within the mutated region was determined for all random and stop variants for each library. Although the random regions were designed such that all four nucleotides had an equal chance of being incorporated at each position (i.e. 25%), the actual proportion deviated dramatically. In fact, all three random libraries had different nucleotide distributions (Table III). Based on 2 statistics, it was deter-mined that these frequencies could not have arisen by chance. This skewing could be either a result of manipulations during library generation and/or biological selection.
Analysis of the amino acid distribution provided a partial answer to the above observation. The frequencies of the 20 amino acids as well as stop codons were tabulated for each random library. None of these agreed with the expected ratio (based on the genetic code and the assumption that every nucleotide had an equal chance of being incorporated). However, when the predicted amino acid distribution was re-calculated based on the observed nucleotide frequency, the corrected prediction closely matched the observed amino acid frequency (Table IV). To a certain extent, this observation supported the idea that biological selection was a less likely explanation. If selection was at the amino acid level (i.e. biological), then the nucleotide frequency might not have matched so well because of the intrinsic degeneracy in the genetic code. It should be noted that despite the skewed nucleotide frequency, all amino acids were represented in significant proportions in each random library, providing a diverse population of combinatorial mutants for further analysis.
Secretion Efficiency of ␣1, Linker, and ␣2 Random Variants-The phenotypes of all combinatorial mutants were determined by blood agar plate assays. E. coli harboring pLG-BCD, which encodes HlyB and HlyD (the transporter complex), as well as HlyC (the toxin activation unit), were transformed with hemolysin signal sequence variants and grown on blood agar plates. The size of hemolytic zones surrounding the colonies on each plate was assigned a rank from 0 (no halo) to 6 (wild-type) (Fig. 3). This zone size is dependent on three factors: the hemolytic activity, the amount of endogenous hemolysin, and the secretion efficiency. We have confirmed by enzyme-linked immunosorbent assay that the level of secreted a The sample size equals to the total number of random and stop variants multiplied by the number of nucleotides in each target region. b 2 value was calculated with expected ratio ϭ 25% for each nucleotide, and the p value was determined with degrees of freedom ϭ 3.
a N represents any of the 4 nucleotides (g/a/t/c). b X represents any of the 20 amino acids or the stop codon. c Non-polar amino acids utilized the degenerate codon NtN. With the exception of Ser and Thr, all polar amino acids were represented by the degenerate codon BaN (where B ϭ g/a/c). Ser and Thr residues were substituted by the codon aBN. The 11th and 12th amino acids in the ␣1 helix are alanines, and were replaced by Jct (where J ϭ g/a/t), and gB c B c (where B c ϭ g/t/c), respectively, in order to maintain the amphiphilicity of the helix.
protein increased with the hemolytic zone. Thus, the size of the halos could be used as an indicator for secretion efficiency. In this study, a hemolytic zone assignment of 4 (ϳ50%) or above was considered to be efficient transport.
Thirty-four ␣1 random variants were generated to investigate the functional role of this region. If the ␣1 amphiphilic helix plays a critical role in transport, most ␣1 random mutants will be expected to be secreted at low levels since they will not contain this specific structural feature. However, if the ␣1 region does not contain any critical elements, then most of the random variants will be secreted at a level equivalent to wildtype. The hemolytic zones for ␣1 random mutants ranged from 2 to 3.5. The population was relatively homogenous, with a mean and a median of 2.8, suggesting that secretion was greatly hampered (Fig. 4A). Since the rest of the signal sequence was intact, any changes in hemolysin transport could be attributed to modifications in the ␣1 region. More specifically, the dramatic reduction in secretion indicated that some feature(s) essential for efficient transport must be located in this region.
A random library was also created in the 10-amino acid region just downstream of the ␣1 helix (linker). The hemolytic zones for the 45 linker random variants ranged from 1.3 to 4, with a mean of 2.4, and a median of 2.3. Secretion in these linker variants was obviously reduced despite the presence of an intact ␣1 helix (Fig. 4B). The dramatic decrease in transport upon random mutagenesis of this region suggested that the linker, like the ␣1 region, also contains some important element(s) that may be required for efficient transport.
The secretion pattern of ␣2 random variants was expected to resemble that of the ␣1 random variants since both regions are conserved. However, examination of the 32 mutants from the ␣2 random library revealed a different distribution. The hemolytic zones for ␣2 random mutants ranged from 1 to 6, with a mean of 5.1, and a median of 5.5. Twenty-four of the 32 fulllength mutants secreted at 5 or higher, with only one transporting at lower than 2.8 (Fig. 4C). This surprising distribution provided strong evidence that the ␣2 region can tolerate almost any combination of amino acids without having a major effect on transport.
Secretion Efficiency of ␣1-Amphiphilic Helical Variants-Results from our ␣1 random library suggested that the ␣1 region contains features important for recognition and transport. As a result, an ␣1-amphiphilic helical library was generated to determine if an amphiphilic helix in this region could support efficient transport regardless of primary sequence. This approach was based on a strategy described by Kamtekar et al. (17) in which the sequence locations of polar and nonpolar residues were specified explicitly, but the precise identities of the side chains were not constrained and varied extensively (Table II). By maintaining the periodicity of polar and non-polar residues within the ␣1 region the same as that of wild-type, all amphiphilic helical variants were predicted to retain an amphiphilic helical structure. A total of 1.5 ϫ 10 8 different amino acid sequences were theoretically possible in this library.
Examination of the 22 ␣1-amphiphilic helical variants allowed us to establish a relationship between structure and function in the ␣1 region. If an amphiphilic helical structure in the ␣1 region is sufficient for transport, then most ␣1-amphiphilic helical variants would be secreted at high levels since they are all predicted to contain this specific structural feature. The hemolytic zones for ␣1-amphiphilic helical mutants ranged from 1.3 to 5.7, with a mean of 4.0 and a median of 4.3. Only one b The predicted amino acid frequencies were calculated based on the observed nucleotide frequencies from Table III. c The sample size equals to the total number of random and stop variants multiplied by the number of amino acids in each target region. d 2 value was calculated based on the observed and predicted frequencies from above and the p value was determined with degrees of freedom ϭ 20.
FIG. 3. Hemolytic zone size. E. coli JM83 bacteria harboring the plasmid pLGBCD, which encodes HlyB and HlyD (transporters) as well as HlyC (toxin activation), were transformed with plasmid DNA encoding a hemolysin signal sequence variant. Transformants were plated on blood agar plates in triplicate and incubated at 37°C for 19 h. Each plate was read twice independently by comparing to a set of standards and assigned a rank from 0 (no transporter) to 6 (wild-type activity). A total of 6 readings for each variant was averaged. The percentage of secretion relative to wild-type for each zone size was approximately: rank 6, 100%; rank 5, 90%; rank 4, 50%; rank 3, 30%; rank 2, 10%; rank 1, 2%; rank 0, 0%.
proposed critical residue, Glu-46, has been reported in this region (11). However, we have isolated a number of amphiphilic helical mutants that do not contain this residue at this position but are still able to support efficient transport, suggesting that this residue may not be essential. Rather, the finding that the average secretion level of ␣1-amphiphilic helical mutants was significantly higher than ␣1 random mutants (which had a mean secretion of 2.8) strongly suggests that an amphiphilic helical structure in the ␣1 region is supportive of efficient transport (Fig. 4D).
Helix Hydrophobic Moment and Hydrophilicity of Random and Amphiphilic Helical Variants-The presence or absence of important elements with each region can be determined by the secretion distribution of the corresponding random variants. Once a region is found to be important, the actual feature(s) can be identified by correlating the genotype of the combinatorial population to the phenotype. To achieve this end, a number of biophysical properties, including helix hydrophobic moment and Kyte-Doolittle hydrophilicity, were calculated for the mutated region of each random and amphiphilic helical mutant and analyzed in the context of transport efficiency (Table V).
The helix hydrophobic moment is an indicator of amphiphilic helix formation (18). This value was relatively low for random mutants (average of 0.14) compared with wild-type (0.23), suggesting that most random variants do not contain an amphiphilic helical structure. The Spearman correlation coefficient between secretion and helix hydrophobic moment was 0.38 (p Ͻ 0.05) for ␣1 random variants. This weak degree of correlation implied that an amphiphilic helix in the ␣1 region contributes partly to efficient transport.
The average hydrophilicity differed substantially among the three random libraries. This could be a result of the uneven distribution of nucleotides. Nevertheless, there existed a wide range of hydrophilicity values within each library. The Spearman correlation coefficient between secretion and hydrophilicity for ␣1 random variants was relatively strong (Ϫ0.53, p Ͻ 0.01), providing some evidence that hydrophobicity in the ␣1 region may be important for efficient transport.
When the helix hydrophobic moment and hydrophilicity were calculated for the linker and ␣2 random variants, no relationships were observed between these values and secretion level. This was expected for ␣2 random mutants since efficient secretion was obtained regardless of the primary sequence. In the case of the linker region, it appeared that helix hydrophobic moment and hydrophilicity were not particularly important for transport.
The average helix hydrophobic moment for ␣1-amphiphilic helical mutants was 0.25 (Table V). This supported that the amphiphilic helical design indeed yielded variants with the intended structure. Interestingly, when the secretion level was correlated with the helix hydrophobic moment for ␣1-amphiphilic helical variants, no specific relationship was observed. It was possible that the helix hydrophobic moment was saturated, and that a stronger amphiphilic helix might not necessarily improve transport. Consistent with results from the ␣1 random variants, there also appeared to be some negative correlation (Ϫ0.38) between hemolytic zone size and hydrophilicity for the amphiphilic helical variants.
Secretion Efficiency of ␣1, Linker, and ␣2 Stop Variants-The stop mutants in each of the random libraries were also analyzed. These variants contain a stop codon in the targeted region, resulting in a truncated signal sequence with a random tail. For example, ␣1 stop mutants were identical up to but not including the ␣1 region, and the rest of the signal sequence (49 amino acids in this case) was substituted with a random string of zero to 11 amino acids. Analysis of stop mutants from different random libraries provided some insights into the functional properties of the extreme C terminus.
Thirty-four ␣1 stop variants and 15 linker stop variants were generated. Their secretion ranged from 1 to 3, with a mean of 1.9 and 1.7, respectively. Since the majority of the signal sequence was removed, it was not surprising to see the dramatic reduction in hemolysin secretion (Fig. 4, A and B). Rather, it was interesting to note that no null mutants were obtained despite the drastic deletions.
Examination of the 53 stop mutants in the ␣2 random library revealed a skewed distribution (Fig. 4C). The zone size for ␣2 stop mutants ranged from 1 to 4, with a mean of 1.5, and a median of 1. It should be pointed out that 13 out of 53 stop mutants had a stop codon at position 1 of the ␣2 region, and FIG. 4. Distribution of signal sequence variants. Upon sequencing of the combinatorial variants, the phenotype of each mutant was determined by blood agar plate assay. Based on the size of the halo, each variant was assigned a rank from 0 (no hemolytic zone) to 6 (wildtype). The secretion patterns of variants from: A, the ␣1 random library (34 random and 34 stop); B, the linker random library (45 random and 15 stop); C, the ␣2 random library (32 random and 39 stop); and D, the ␣1 helical and random libraries (22 helical and 34 random) are plotted above. Each secretion category has a 0.5 margin (e.g. a hemolytic zone size of 5 represents mutants secreting between and including 4.5 and 5.495). thus share the same genotype. Phenotype determination by blood agar plate assay was consistent, they all had an assignment of 1, with only a single exception (i.e. 1.2). The low level of secretion of ␣2 stop mutants suggested that the residues downstream of the linker must contain some important element(s) for efficient transport. The ␣2 region could be excluded since it could tolerate almost any amino acids, thus it is likely that some required element(s) lies within the region after the second helix.
Interpretation of the stop mutants is complicated by the fact that two parameters, length and composition of the extreme C terminus, were changed concurrently. When the hemolytic zone was plotted against the position of the stop codon for ␣1 and linker stop variants, no obvious trend was observed (data not shown). However, when the same plot was performed for ␣2 variants, there appears to be some dependence on length (Fig.  5). Furthermore, when the length element was controlled by comparing mutants with a stop codon at the same position, those that contained positively charged residues between positions Ϫ2 to Ϫ8 consistently transported at levels lower than their counterparts. Taken together, the results suggested that the length and hydrophobicity of the last few amino acids could be important. DISCUSSION The hemolysin transporter complex is dedicated to the active export of the 107-kDa hemolysin directly from the cytoplasm to the outside of the cell. Although this transporter system has a single natural substrate, it is able to recognize and transport many other different primary sequences, from heavily mutated versions of hemolysin (7,10) to protein toxins from other species (13). To investigate the substrate specificity of the hemolysin transporter complex, we have attempted to identify features within the hemolysin signal sequence that are important for transport. The conserved helix-linker-helix motif within the targeting signal was divided into three regions (␣1-linker-␣2) and subjected to random oligonucleotide mutagenesis. Using this approach, we have shown that the ␣1 and linker regions are sensitive to random changes in the primary sequence. For the ␣1 region, it appears only sequences that result in a predicted amphiphilic helical structure and appropriate hydrophobicity can support efficient transport. In contrast, the ␣2 region has no specific primary or secondary structural requirements. This is a surprise since this region represents the second helix of the conserved helix-linker-helix motif. While the ␣2 helix may not be important for secretion, it is possible that this structure is required for some yet unidentified function.
A slight variation in the design of combinatorial library allowed us to investigate the structural requirements of the signal sequence at a higher resolution. We have created an amphiphilic helical library to determine the contribution of secondary structure to transport in the absence of other features in the ␣1 region. Our study revealed that any one of a large number of sequences that yielded a predicted amphiphilic helix in this region is sufficient for transport, demonstrating that there is no specific requirement at the primary sequence level. This is consistent with past reports that the ␣1 region can tolerate a wide variety of point mutations (10,11,19). Based on our data, it is believed that the higher-order structure is more important for function than primary sequence.
In essence, results from the ␣1 combinatorial libraries demonstrated that while the hemolysin transporter system can accommodate many variations of the substrate, certain restrictions apply. This observation seems to hold for the linker region as well. While the leukotoxin signal sequence could be transported efficiently by the hemolysin transporter system despite a lack of sequence similarity in the equivalent linker region, most hemolysin linker random variants were not secreted at high levels. By analogy to the ␣1 region, the important features for transport in the linker region are likely to be some occult elements buried within the primary sequence, rather than the specific identity of individual residues.
Determination of the identity of the functional features within the hemolysin signal sequence provides the first step in understanding the broad substrate specificity of hemolysin transporter complex. In the case of the ␣1 region, the conserved amphiphilic helical structure was found to be important for function. Distribution of the 22 amphiphilic helical mutants points to the likelihood that as many as 1.5 ϫ 10 8 different variants of the ␣1 region could be transported at greater than a The helix hydrophobic moment was calculated using Peptool, v.1.1. A window of 9 residues was used, resulting in a value for each residue within the mutated signal sequence.
b The Kyte-Doolittle hydrophilicity was determined with Wisconsin Package™ Version 9.1, Genetics Computer Group (GCG). A window of 7 residues was used, resulting in a value for each amino acid within the mutated signal sequence.
c The helix-hydrophobic moment or hydrophilicity values for residues within the mutated region of each mutant were averaged. Following that, a mean was obtained for each library.
d The average helix-hydrophobic moment and hydrophilicity values for each mutant were correlated with the hemolytic zone size. The values shown above represent the Spearman correlation coefficients. e Significant at the 0.05 level (2-tailed). f Significant at the 0.01 level (2-tailed).
FIG. 5. Effect of changing the length and amino acid composition of the C terminus on secretion as demonstrated by ␣2 stop mutants. The position of the stop codon of 39 ␣2 stop mutants is plotted against hemolytic zone size. Note that the random region becomes the extreme C terminus in stop mutants. For instance, ␣2 variants that have a stop codon at position 11 contain a 10-amino acid random tail. The probability of secretion appears to increase with length. Furthermore, when mutants of the same length were compared, those with a relatively hydrophobic tail consistently secreted at higher levels.
50% of wild-type. It is intriguing that the hemolysin transporter system can tolerate such a large number of primary sequences. If secondary structure is the only functional requirement of the ␣1 region, what role does it play in transport? Two mechanistically distinct, but not mutually exclusive, explanatory models exist. It has been proposed that the ␣1amphiphilic helical structure may be involved in interacting with the membrane (19). This allows hemolysin to diffuse in a two-dimensional plane rather than a three-dimensional space, greatly enhancing the chance of substrate binding to the transporter complex. If the functional role of the ␣1 region is limited to the membrane level, primary sequence does not have to be specific as long as an amphiphilic helix is present.
It is also possible that the ␣1 region is directly involved in interacting with the transporter complex. In this case, the amphiphilic helical structure may be required to present residues within the ␣1 region in the proper orientation to the binding pocket of the transporter. The ability of the transporter complex to recognize many different primary sequence variations can be explained if this binding pocket contains multiple "contact" residues, all of which could act as potential "docking" sites for residues within the signal sequence. Interaction with any one of several possible combinations of these contact residues would trigger the transport process. Following this rationale, hemolysin and leukotoxin, two proteins that share little primary sequence similarity but which are secreted at equally high efficiency, may actually interact with different sets of contact residues within the binding pocket. Indeed, two point mutants have been isolated in the hemolysin ABC transporter, HlyB, in which the transport of hemolysin was unaffected but the secretion of leukotoxin was reduced (20,21). This observation could be explained if the two mutated residues specifically recognize leukotoxin and not hemolysin. The large number of specific regional mutants generated in this study could also be used for further genetic complementation analyses to elucidate the intricate nature of this transporter-substrate interaction.
HlyB is a member of the ABC transporter superfamily, which is responsible for the translocation of many important substrates across biological membranes. Interestingly, many ABC proteins, such as P-glycoprotein and multidrug resistance protein, exhibit broad substrate specificity. Although the hemolysin transporter system is dedicated to the translocation of one natural substrate, our study clearly demonstrates that it can recognize and transport a wide range of signal sequence variants. This raises the possibility that multiple substrate specificity could be an intrinsic property of ABC transporters, and thus the principles of transport proposed for the hemolysin system could be applied in a more general sense. For example, substrates of P-glycoprotein are relatively amphiphilic or hydrophobic in nature. This property is believed to facilitate their partitioning into the plasma membrane before interacting with the transporter (22). The high substrate concentration in the membrane environment may help to overcome the specificity issue. It has also been suggested that there are multiple functionally distinct but potentially overlapping binding sites within P-glycoprotein that are responsible for drug efflux (23). This is a parallel interpretation to the multivalent binding pocket of the hemolysin system, in which different substrates may interact with different sets of contact residues to promote transport.
In a broader context, combinatorial analysis provides a powerful tool to investigate regions of interest in other biological systems. Random oligonucleotide mutagenesis allows drastic alterations to be engineered into a well defined region while maintaining the natural spatial relationships of the target and non-target regions. Furthermore, a large number of variants can be generated, allowing for a greater degree of confidence in the interpretation of results. The distribution of variants can provide a clear-cut answer regarding the presence of any important features within the target region. Once a region is determined to be important, more in-depth correlation analysis can be carried out on the mutant population to objectively identify the exact features involved. Furthermore, as demonstrated by our helical library, a more specialized design of combinatorial library allows us to dissect primary and secondary structural elements within the same region, and to evaluate them separately for their biological contributions.