Distinctions between Hydrophobic Helices in Globular Proteins and Transmembrane Segments as Factors in Protein Sorting*

Transmembrane (TM) segments in proteins can be distinguished in amino acid sequences as continuous stretches of hydrophobic residues. However, examination of a data base of helical water-soluble (globular) proteins revealed that nearly one-third contained helices of sufficient length to span a bilayer (≥19 residues) that had mean hydrophobicity ≥actual TM segments. We now report that synthetic peptides corresponding to these globular protein sequences, which we termed “δ-helices,” behave like native TM sequences and readily insert into membrane mimetic environments in helical conformations. As well, certain δ-helix sequences can integrate into the membrane bilayer when placed into a membrane-targeted chimeric protein. We establish that δ-helices can be distinguished computationally from bona fide TM segments by the decreased frequency of occurrence of Ile/Val residues and by their relatively decreased solvent accessibilities (versus other globular helices) within tertiary structure. The further observations that (i) δ-helices generally contain three or more charged residues and (ii) δ-helices display relatively even distribution of these charged residues along their lengths, rather than concentration near their N and C termini as observed for TM segments, may constitute key recognition factors in diverting δ-helices from the membrane in vivo. Although a discrete biological role for δ-helices remains to be pinpointed, our overall results suggest that such segments may be required for globular protein folding and identify additional factors that may be important in the correct selection of TM segments by the cellular machinery.

Transmembrane (TM) segments in proteins can be distinguished in amino acid sequences as continuous stretches of hydrophobic residues. However, examination of a data base of helical water-soluble (globular) proteins revealed that nearly one-third contained helices of sufficient length to span a bilayer (>19 residues) that had mean hydrophobicity >actual TM segments. We now report that synthetic peptides corresponding to these globular protein sequences, which we termed "␦-helices," behave like native TM sequences and readily insert into membrane mimetic environments in helical conformations. As well, certain ␦-helix sequences can integrate into the membrane bilayer when placed into a membrane-targeted chimeric protein. We establish that ␦-helices can be distinguished computationally from bona fide TM segments by the decreased frequency of occurrence of Ile/Val residues and by their relatively decreased solvent accessibilities (versus other globular helices) within tertiary structure. The further observations that (i) ␦-helices generally contain three or more charged residues and (ii) ␦-helices display relatively even distribution of these charged residues along their lengths, rather than concentration near their N and C termini as observed for TM segments, may constitute key recognition factors in diverting ␦-helices from the membrane in vivo. Although a discrete biological role for ␦-helices remains to be pinpointed, our overall results suggest that such segments may be required for globular protein folding and identify additional factors that may be important in the correct selection of TM segments by the cellular machinery.
Membrane proteins regulate the molecular traffic and information flow across biological membranes, acting as receptors (1,2), membrane pores (3), ion channels (4,5), and metabolite transporters (6,7). Despite their centrality in cellular function, little high resolution structural information is currently available for membrane proteins (currently ϳ1-2% of the Ͼ54,000 structures deposited in the Protein Data Bank). However, from this information two major structural classes of membrane proteins have been elucidated. In one category, proteins embedded in eukaryotic cell membranes and the inner membranes of bacteria and mitochondria generally adopt structures characterized by bundles of ␣-helical transmembrane (TM) segments. An alternative arrangement where ␤-strands span the membrane and assemble into barrel-like structures is common to proteins embedded in the bacterial and mitochondrial outer membranes (8).
Because high resolution structural determination of membrane proteins is not yet routine, computer simulation methodologies are often used to evaluate helical protein-protein (9) and protein-lipid interactions (10,11). As a prerequisite to such studies, protein helical TM 4 segments and their boundaries must be defined. Hydropathy plots are commonly utilized to identify from primary sequence both the location of TM segments and their approximate entry/exit points (12)(13)(14)(15). The hydropathy values assigned to each residue to create such plots are drawn from one or more scales, among them the Liu-Deber values developed in our laboratory (16). TM segments and peptides derived from them that meet or exceed a segmentally averaged "threshold" hydropathy on this scale (approximately equivalent to a poly-Ala strand) have been shown to spontaneously insert into micellar media (17). The average hydropathy levels of ϳ96% of natural TM segments were also found to exceed this threshold value (18).
From these observations, in conjunction with measurements of residue helical propensity in n-butanol (19), our laboratory developed the TM segment prediction program TM Finder (10) that uses segmental Liu-Deber hydropathy and nonpolar phase helicity values to query primary sequences for potential TM segments. TM Finder demonstrated a 98% predictive value in pinpointing TM segments in a training set of known membrane proteins (10). TM Finder was additionally specifically trained to limit the occurrence of false positives, i.e. globular (soluble) protein regions mispredicted as membrane-embedded. For this purpose the initial TM Finder code was applied to a sequence data base of globular proteins of known tertiary structure to assess helices that were of sufficient length (estimated at Ն19 residues) to span the membrane bilayer. Of a total 174 ␣-helical globular protein regions from 134 different proteins in this data base, we observed that ϳ30% were identified as potential TM segments from the primary sequence alone but could be separated computationally from bona fide TM sequences based on the presence of Ն3 charged residues. We subsequently termed these TM-like sequences that occur within helical globular proteins as ␦-helices to reflect their intermediacy between TM properties and extramembranous localization (20). In the present work we sought to extend this preliminary computational separation of ␦-helices, globular helices, and TM segments with a view toward understanding the criteria that may serve to distinguish these sequence features of globular proteins from TM segments in vitro or even within the cell.

Data Base Construction
The databases of globular and ␦-helices were initially compiled by searching the SwissProt release 34 with the keyword "helix." The ϳ4200 sequences returned by this query were subsequently refined to 174 entries (20) as follows: (i) removing all redundant sequences (defined as those Ͼ25% identical); (ii) removing sequences with nonstandard amino acids; (iii) removing sequences without high resolution structure coordinates deposited in the Protein Data Bank; (iv) removing segments with Ͻ19 residues. The 174 remaining sequences were then divided into globular helices (see supplemental Table S1) and ␦-helices (see supplemental Table S2) via submission to TM Finder; those segments with segmental hydropathy values at or above the Liu-Deber insertion threshold (Ն0.4 on the Liu-Deber scale, see Ref. 21) were deemed ␦-helices. The data base of TM segment sequences was compiled from a non-redundant (defined as those Ͼ25% identical) list of TM ␣-helices with available high resolution structures. Thirty-seven non-homologous TM proteins were identified containing a total of 212 different TM segments (see supplemental Table S3). Unlike the globular and ␦-helix databases, segments shorter than 19 amino acids were retained in the TM data base because the membrane localization of each was confirmed in a high resolution structure (16).

Amino Acid Composition Analysis
Amino acid composition was determined for each segment in each data base by counting the number of each residue and/or group of residues in each individual helix and dividing by helix length to obtain a normalized residue frequency. Mean residue and/or group residue composition values were then calculated for each of the globular helix, TM helix, and ␦-helix datasets. For group comparisons, hydrophobic residues were defined as Ala, Cys, Phe, Ile, Leu, Met, Val, Trp, and Tyr, as determined previously (19), polar residues were defined as Asp, Glu, Gly, His, Lys, Asn, Pro, Gln, Arg, Ser, and Thr, and charged residues were defined as Asp, Glu, Lys, and Arg. The overall mean amino acid compositions of globular, TM, and ␦-helices were compared with an online one-way analysis of variance test. Pair-wise comparisons of amino acid frequencies among the globular, TM, and ␦-helix datasets were performed using t tests.

Solvent Accessibility Analysis
The solvent accessibility of amino acids in globular helices and ␦-helices was evaluated by application of the program NACCESS (22) to structure coordinate files obtained from the Protein Data Bank as described previously (16) with the following modifications: (i) a probe size of 1.40 Å was used to approximate the radius of a water molecule, and (ii) the relative solvent accessibility for each amino acid was calculated by comparison of calculated solvent accessible surface areas to the default reference set supplied with NACCESS. Helix solvent accessibility was calculated as the sum of the individual residue analysis of variance values in each helix divided by the helix length. Helix solvent accessibility distributions were determined for the globular and ␦-helix groups by sorting individual helix solvent accessibility values into bins (0 -5%, Ͼ5-10%, Ͼ10 -15%, Ͼ15-20%, Ͼ20 -25%, Ͼ25-30%, Ͼ30 -35%, Ͼ35-40%, Ͼ40 -50%, Ͼ50 -55%, Ͼ55-60%); the mean solvent accessibility values for the globular and ␦-helix groups were also calculated from the individual solvent accessibility data. Within the globular and ␦-helix groups, the analysis of variance values of residues in the hydrophobic, polar, and charged residue categories defined above were averaged to obtain mean solvent accessibility values. Comparisons of the mean solvent accessibility of hydrophobic, polar, and charged residues between globular and ␦-helix segments were performed using t tests.

Residue Position Analysis
Residues in each helix sequence were sequentially numbered from 1 to n, beginning at the N-terminal residue and ending at the C-terminal residue, such that n represents the total peptide length in residues. The number assigned to each charged residue (Asp, Glu, Lys, and Arg) in each helix sequence was divided by n to obtain a positional value normalized to helix length. Positional values were sorted into bins corresponding to fractions of helix length (Յ20%, Ͼ20 -40%, Ͼ40 -60%, Ͼ60 -80%, Ͼ80 -100%). Numbers of charged residues in each bin were totaled for the globular helix, ␦-helix, and TM helix groups. 2 tests were used to analyze residue distributions.

Peptide Synthesis and Purification
Five ␦-helix segments were selected at random from the data base as candidates for in vitro study based on the presence of an intrinsic Trp residue: 1ECA (residues 114 -133 of the full-length protein), 1HC1 (residues 218 -236), 1MBA (residues 127-145), 2MNR (residues 96 -116), and 5LDH (residues 247-266). The boundaries of the ␦-helix peptides of 1ECA, 1HC1, 1MBA, 2MNR, and 5LDH were chosen as prescribed by examination of the high resolution structure and the TM Finder output (10). Peptides with sequences corresponding to these ␦-helix segments were synthesized with a PS3 peptide synthesizer (Protein Technologies, Inc.) using standard Fmoc chemistry. Additional lysine residues were incorporated into the peptide sequences to increase aqueous solubility as previously described (23). A 4-fold amino acid excess on a 0.1-mmol scale synthesis was used with the O-(7-azabenzotriazol-1-yl)-N,N,N',N'-tetramethyl-uronium hexafluorophosphate; N,N-Diisopropylethylamine activator pair. Synthesis utilized a low load (0.18 -0.22 mmol/g) PALpolyethylene glycol-polystyrene resin that produced an amidated C terminus upon peptide cleavage. Peptide cleavage and deprotection was achieved using a mixture of 88% trifluoroacetic acid, 5% phenol, 5% ultrapure water, 2% triisopropylsilane followed by precipitation with ice-cold diethyl ether, drying, and resuspension in ultrapure water. Crude peptides were purified by reverse phase high performance liquid chromatography on a C4 preparative column (Phenomenex) with a water/acetonitrile gradient in the presence of 0.01% trifluoroacetic acid. Peptide molecular weights were confirmed by mass spectrometry, and the Micro BCA assay (Pierce) was used to determine peptide concentration.

Circular Dichroism (CD) and Fluorescence Spectroscopy
CD spectra were recorded on a Jasco J-720 circular dichroism spectrometer at room temperature. Spectra in aqueous buffer (10 mM Tris, pH 7.2, 10 mM NaCl) and aqueous buffer with 10 mM SDS were taken using a 0.1-cm path length cuvette at peptide concentrations of 25 M. A 0.01-cm path length cuvette was used for secondary structure determination at peptide concentrations of 100 M in aqueous buffer with 50 mM sodium perfluorooctanoate. Fluorescence measurements were carried out in the aqueous and SDS buffer conditions described above on a Hitachi F-400 Photon Technology International C-60 fluorescence spectrometer at an excitation wavelength of 295 nm. Emission spectra were recorded at room temperature from 305 to 405 nm using a peptide concentration of 5 M. Evaluation of the correlation between Chou-Fasman ␣-helical propensity (P ␣ ) and peptide mean residue ellipticity at 222 nm was evaluated using the R statistical software package.

TOXCAT Assay
Plasmid Construction-The expression vector pccKAN and the maltose-binding protein-deficient (malE Ϫ ) Escherichia coli strain NT326 were kindly provided by Dr. Donald M. Engelman, Yale University (24). TOXCAT chimeras fusing the TM sequence of glycophorin A (GpA) and the G83I GpA mutant between the ToxR and maltose-binding protein domains have been previously described (11); chimeras encoding ␦-helix sequences instead of the GpA sequence were constructed in an essentially identical manner via restriction digestion of oligonucleotide cassettes encoding each ␦-helix segment with NheI and BamHI and subsequent ligation into the NheI and BamHI sites of the pccKAN plasmid. The identity of all constructs was confirmed with DNA sequencing before further characterization.
MalE Complementation Test-NT326 cells expressing TOX-CAT chimeras with ␦-helix sequences, the wild-type GpA TM sequence, or the G83I GpA mutant were streaked onto M9 minimal plates with 0.4% maltose as the only carbon source. Under these conditions, transformants capable of growth must target a portion of the chimeric TOXCAT protein into the cyto-  plasmic membrane (24). Transformant growth was evaluated for all constructs after incubation for 2 days at 37°C. Chloramphenicol Acetyltransferase (CAT) Enzyme-linked Immunosorbent Assay-NT 326 cells harboring TOXCAT chimeras were grown at 37°C, harvested into 1-ml fractions at an A 600 of 0.6, pelleted, and stored at Ϫ80°C. Cell lysates were prepared from cell pellets as previously described (11) and assayed for CAT concentration using the CAT enzyme-linked immunosorbent assay kit (Roche Applied Science). A standard curve was generated with CAT provided by the manufacturer. Cells expressing the wild-type GpA and G83I GpA sequences were included in each CAT assay as positive and negative controls, respectively. CAT measurements were performed in at least triplicate and were normalized for the relative expression level of each construct using Western blotting as described (11).

RESULTS
Hydropathy of ␦-Helices-The globular protein sequences Ն19 residues in length considered in our initial study were divided into globular helix and ␦-helix based on their mean segmental Liu-Deber hydropathy values (see "Experimental Procedures"). A data base of confirmed TM ␣-helical segments was also assembled from a non-redundant set of protein TM segments with available high resolution structures (supplemental Table S3); this TM helix data base contains 212 TM segments from 37 non-redundant proteins. We note that the length criterion of Ն19 residues was not applied to the TM helix data base because each TM segment was identified as residing in the membrane bilayer via high resolution structure determination. The mean Liu-Deber hydropathy of each sequence in the globular, ␦-, and TM helix classes was calculated (see supplemental Tables S1-S3) and averaged for each group. We found that the mean hydropathy values of the ␦and TM-helix classes (0.72 Ϯ 0. 26  Hydrophobic and Charged/Polar Residue Content in ␦-Helices-The amino acid compositions of ␦-, globular, and TM helix groups were determined (see "Experimental Procedures") to investigate whether the intermediate hydropathy of ␦-helix segments versus other globular helices and TM segments arose from decreased numbers of hydrophobic residues, increased numbers of polar residues, or both. As expected, TM helix segments contained ϳ1.4fold more hydrophobic, ϳ3-fold fewer charged, and ϳ2-fold fewer polar residues than globular helices (p Ͻ 0.0001, see Table 1). ␦-Helix percentage occurrence values in these residue categories, however, presented as different from the TM and globular groups (p Ͻ 0.0001) and appeared to be intermediate between them (Table 1). For example, the average percentage occurrence of hydrophobic residues per ␦-helix (59.2 Ϯ 6.2%) lies between the globular and TM helix values (47.1 Ϯ 7.8 and 67.3 Ϯ 9.2%, respectively). The distribution of hydrophobic and polar/charged amino acid residue types in ␦-helices, thus, appears to be transitional between TM and globular helix segments.
Amino Acid Composition of ␦-Helices Versus TM and Other Globular Helices-To further probe the origins of the compositional distinctness of ␦-helices versus globular and TM segments, we compared individual amino acid percentage occurrence frequency among the three helix categories (Fig. 1). Consistent with the results of other groups (25), we observed that certain hydrophobic residues (i.e. Phe, Ile, Leu, Val, and Trp; Fig. 1A) were significantly enriched (p Յ 0.01) in TM helices versus globular helical segments; the situation was reversed for polar and charged residues (i.e. Asp, Asn, Glu, Gln, Gly, His, Lys, and Arg; Fig. 1A). These trends are easily rationalized given the respective intramembranous versus cytoplasmic localization of these sequences in their native proteins. Similarly, we noted that that ␦-helices contained significantly more Asp, Glu, Arg, and Lys residues than TM segments (0.05 Ն p Յ 0.01, see Fig. 1B), results consistent with our previous observation that the presence of three charged residues could delineate ␦-helices as extramembranous (10). Interestingly, the observed decrease in hydropathy of ␦-helices versus TM segments could be traced to two individual residues, viz. fewer Ile and Val residues are present in ␦-helix sequences than in TM segments (p Յ 0.01); conversely, the content of all other residues classed as hydrophobic is statistically indistinguishable (p Ն 0.05, Fig. 1B).
Overall, the individual residue composition of ␦-helices appeared to be more similar to globular than TM segments (compare Fig. 1, B and C); however, ␦-helices were significantly enriched in the hydrophobic residues Leu and Phe and depleted in Glu, Lys, and Gln compared with their counterparts in globular proteins (Fig. 1C). In terms of hydropathy, ␦-helices, thus, appear to be distinguished from bona fide TM segments based on a decreased content of ␤-branched hydrophobic residues.
␦-Helices Are More Buried within Their Native Folds Than Other Globular Helices-Because ␦-helices contain greater percentages of hydrophobic residues and lower percentages of polar/charged residues than other globular helices, we hypothesized that ␦-helices might be more buried within their native protein folds. To determine whether this was the case, the locations of ␦and globular helices within their native structures were mapped using a water-sized probe to calculate residue solvent accessibility (see "Experimental Procedures"). Helices within the TM data base were excluded from this analysis due to their established burial within the membrane bilayer. ␦-Helices were found on average to be more buried than other globular helices within their native folds (mean solvent accessibility values of 21.0 Ϯ 7.7 versus 28.9 Ϯ 8.4%, p Յ 0.01). Moreover, the distribution of solvent-accessible residues differs between the ␦-helix and globular helix databases ( Fig. 2A); the majority of ␦-helix segments are 15-20% solvent-accessible, whereas the majority of globular helices have 25-30% accessibility. Rather than being confined to a single residue category, however, the increased accessibility of globular versus ␦-helices is reflected across all residue groupings (Fig. 2B), i.e. hydrophobic, polar, and charged residues are ϳ1.3-1.4ϫ more exposed in globular versus ␦-helices.
Folding of the ␦-Helix Peptides in Aqueous and Membrane Mimetic Environments-Because ␦-helix segments represent sequences that are mispredicted as transmembranous, we initiated experiments to examine the physical properties of ␦-helices versus TM helices in vitro. Accordingly, we synthesized peptides corresponding to five ␦-helix segments selected from the data base based on the presence of an intrinsic Trp residue for anticipated fluorescence experiments. Segments were selected from 1ECA, 1MBA, 1HC1, 2MNR, and 5LDH; see Table 2 for the corresponding ␦-helix sequences. The hydro-phobicity of the ␦-helix sequences necessitated the introduction of Lys residues at their termini to facilitate synthesis and characterization (23); such "Lys tags" have been shown not to interfere with the core peptide sequence of interest (19,26). The location of each of these ␦-helix segments in their native protein structures is shown in Fig. 3.
CD spectra were obtained for each ␦-helix peptide in aqueous buffer and in buffer containing SDS or sodium perfluorooctanoate. Although each sequence demonstrably adopts ␣-helical structure within its soluble protein tertiary fold, none of the peptides exhibited a large amount of helical structure in aqueous buffer (Fig. 4). The 1MBA and 1HC1 peptides, however, displayed a degree of helical character in aqueous solution (minima at 208 and 222 nm; Fig. 4) that was not observed in the 1ECA, 2MNR, and 5LDH ␦-helix peptides. We noted that the helix content of the ␦-helix peptides was correlated with their Chou-Fasman secondary structure propensity (P ␣ ) in aqueous solvent (27) (Fig. 5). For example, 1MBA and 1HC1 have the highest calculated ␣-helical structural propensity (Table 3) and also the greatest amount of helical structure (Fig. 5). The lack of strong aqueous helicity in the 1ECA and 2MNR ␦-helix peptides may be similarly rationalized by the relatively high Chou-Fasman ␤-strand structural propensity (P ␤ ) predicted for these segments (Table 3). Interestingly, each ␦-helix peptide sequence exhibits regions with overlapping ␣ and ␤ secondary structure prediction ( Table 3), suggesting that they may represent segments with competing secondary structure preferences.
All ␦-helix peptides increased in ␣-helix content when exposed to the membrane-mimetic environments of SDS or sodium perfluorooctanoate micelles (Fig. 4). Induction of helical structure in apolar media has been observed with TM proteins such as GpA and the epidermal growth factor receptor (26) but can also occur when intact globular proteins are exposed to SDS micelles (for review, see Ref. 28). In both instances, ordered secondary structures such as ␣-helices are thought to arise in the hydrophobic environment of the detergent micelles to satisfy the hydrogen-bonding requirement of the peptide backbone in the low-dielectric environment of detergent acyl chains. ␦-Helix peptide exposure to an environment of reduced polarity in SDS micelles was confirmed by examining the Trp fluorescence emission spectra of the ␦-helix peptides. Trp has a characteristic fluorescence emission maximum of ϳ350 nm in an aqueous environment; blue shifting of this maximum to a lower wavelength accompanies accommodation of the Trp side chain in a more hydrophobic environment (29). Indeed, blue shifts in Trp emission maxima were observed for each ␦-helix peptide in the presence of SDS ( Table 4), suggesting that each is solvated in the apolar SDS micelle interior.

Competence of ␦-Helix Segments for in Vivo Membrane Insertion-
The behavior of the ␦-helix peptides in SDS micelles implies that these sequences are competent for solva-   FEBRUARY 20, 2009 • VOLUME 284 • NUMBER 8 tion by micelles in vitro but offers no information as to whether or not they meet the requirements for in vivo membrane incorporation. We, therefore, undertook to investigate whether ␦-helix segments could mimic TM sequences by assessing their ability to insert and self-associate in the E. coli inner membrane using the TOXCAT assay (24) (see "Experimental Procedures"). Of the five ␦-helix sequences evaluated in the TOX-CAT assay, only 1ECA was capable of self-association at levels higher than the monomeric GpA G83I control, with a self-association strength ϳ60% of the GpA dimer (Fig. 6). The remaining ␦-helix sequences reported lower levels of association strength than the monomeric control, with the possible exception of the 1MBA sequence. Because the correct membrane insertion of each fusion protein was confirmed by growth of NT326 cells expressing each fusion protein construct on M9-maltose plates (not shown), the latter data suggest that whether or not self-association occurs, at least a portion of each  . Correlation of ␦-helix secondary structure with Chou-Fasman ␣-helix propensity. The mean residue ellipticity at 222 nm determined from the CD spectrum of each peptide in aqueous buffer is given as a function of the Chou-Fasman aqueous ␣-helix propensity of the 1ECA, 1HC1, 1MBA, 2MNR, and 5LDH peptides. The regression line of best fit, correlation coefficient, and associated p values is shown; the correlation meets marginal significance levels (0.05 Յ p Յ 0.10). FIGURE 6. TOXCAT assay of ␦-helix peptides in the E. coli inner membrane. Mean levels of CAT expression from ␦-helices relative to the wild-type GpA dimer are shown Ϯ S.D. (top) along with a representative Western blot used to evaluate protein expression levels (bottom). Blot bands excerpted from separate gels are indicated by solid lines between lanes. G83I denotes the GpA G83I mutant, used as a monomeric control (24). The mean CAT expression levels were compared in t tests; symbols above the bars denote significance level: ϩ, 0.05 Յ p Յ 0.10; *, p Յ 0.05; **, p Յ 0.01. See "Experimental Procedures" for assay details.  Table 2 for sequences). P ␣ and P ␤ were computed using Chou-Fasman structure propensities (27). Lys tags were excluded from propensity calculations. ␦-helix TOXCAT constructs is correctly incorporated in the E. coli inner membrane. Charged Residue Distribution Distinguishes ␦-Helix and TM Sequences-Given that ␦-helices are comparable in length and in overall hydropathy/helicity to native TM segments, we inquired whether the TOXCAT results might be explained by other characteristics of ␦-helix sequences. For example, the bilayer integration efficiency of model TM segments in vivo has been shown to depend strongly on the position of charged residues, i.e. when they are placed toward the ends of sequences, insertion efficiency is increased (30). As such, it is possible that ␦-helix sequences might be fundamentally distinguished from TM sequences based on charged residue positioning. The ␦-helix, TM helix, and globular helix databases were queried accordingly for charged residue position (Fig. 7A). We found that charged residues were essentially evenly distributed along the lengths of ␦and globular helix sequences (p Ն 0.1 compared with expected frequencies based on an even distribution). In contrast, charged residues were more abundant within the first 20% or last 20% of segment length than in the middle of TM segments (i.e. near the ends of TM helices). The sequences of ␦-helices and TM segments can, thus, be computationally distinguished based on charged residue distribution. The notion that factors other than raw hydropathy must influence the membrane insertion propensity of a given ␦-helix segment is also supported from the lack of correlation of segmental hydropathy versus the calculated apparent free energy membrane insertion (⌬G app ), calculated using biological partitioning measurements (30,31) (Fig. 7B).

DISCUSSION
Role in Globular Proteins-␦-Helices represent ϳ30% of the globular-based ␣-helices investigated. This relatively frequent occurrence implies that these segments must be of some utility in the proteins that contain them, and considerable evidence supports a major role for hydrophobic interactions in globular protein folding (32). We nevertheless could discern no trend in terms of localization of ␦-helix segments to known surfaces of proteinprotein interaction or segregation into a particular globular protein type (data not shown). However, ␦-helices are by definition highly apolar and display increased burial within their native protein folds versus other globular helices (Fig. 2, A  and B). As such, we suspect that ␦-helix sequences may be important to the stability of proteins that contain them, perhaps via sequestration of their hydrophobic residues in the protein interior during folding.
␦-Helix segments are nevertheless unable to adopt their helical native structures without input from the remainder of the protein, perhaps because of their competing secondary structure preferences (Fig. 4, Table 3). In fact, ␦-helix sequences generally display strong propensities to exist as ␤-sheet type segments when considered in the absence of the constraints imposed by globular protein tertiary structure (Table 3); this mixed potential is manifested in CD experiments, where several segments do not develop significant helical structure, even in membrane-mimetic environments. This disparity of structural propensity versus conformation of ␦-helix segments mimics that of "discordant helices," globular protein segments that undergo a ␣-to-␤ structure transition and form amyloid-like fibrils (33), and suggests that these sequences may exhibit structural instability in their native folds.
Role of Residue Content-The high intrinsic hydropathy of ␦-helix segments is imparted by a different set of hydrophobic residues than TM segments. Ile and Val residues are depleted in ␦-helices compared with TM segments (Fig. 1), and this may be rationalized in the requirement of such amino acids to be evolutionarily retained in the hydrophobic, restrictive environment of the membrane bilayer versus aqueous solvent. The ␤-branched residues such as Ile and Val have only one populated rotamer as a result of residing in membrane-induced ␣-helices, where they are structurally optimized for the folding and helix-helix interaction requirements of membrane proteins (25). There may, thus, be considerable selective pressure to retain Ile and Val in TM segments relative to ␦-helices as they may retain a structural role beyond maintaining levels of hydrophobicity. Ile and Val are additionally better ␤-sheet formers than helix formers in aqueous solvent (27), perhaps necessitating their depletion in natively ␣-helical ␦-helix sequences.
Recognition of Hydrophobic Segments-␦-Helix peptides are sufficiently similar to bona fide TM segments in terms of their segmental hydrophobicity and apolar helicity to be competent for membrane insertion, and our TOXCAT results indicate that certain ␦-helix sequences are not only capable of membrane insertion but also of self-association within the bilayer when placed in the correct protein context (Fig. 6). How then are hydrophobic segments destined for the interior of globular  Յ 0.0001). B, comparison of ␦-helix partitioning under "in vivo" versus" in vitro" conditions. The insertion efficiency of ␦-helix sequences under in vivo conditions is given as the apparent free energy membrane insertion (⌬G app ), calculated using biological partitioning measurements (30,31); the in vitro condition is represented by Liu-Deber segmental-averaged hydropathy (16).
proteins distinguished from those that become incorporated into the interior of the membrane bilayer? Correct ␦-helix versus TM segment sorting must rely on factors distinct from bulk biochemical properties. Based on our results, it appears that charged residue distribution and/or protein context may act to exclude ␦-helix segments from the bilayer. Thus, authentic TM segments have a skewed distribution of charged residues with such residues appearing at helix termini compared with ␦and soluble ␣-helical segments (Fig. 7A). As well, proteins containing ␦-helix segments appear to lack any additional TM-mimic sequences that could aid their membrane integration in a manner similar, for example, to the bilayer integration of voltagesensor domains of the voltage-dependent potassium channels via "helper helices" (34); of the 51 globular helical proteins containing ␦-helix sequences, only 6 have Ն2 ␦-helix regions (Table  S2). It is also possible that the potential sequestration of ␦-helix sequences in the protein interior at an early stage in folding may prevent recognition of their high intrinsic hydrophobicity.
Examination of the estimated efficiency of ␦-helix sequence partitioning into the lipid bilayer in vivo and into membranemimetic media in vitro further reinforces the notion that bulk biochemical properties are not sufficient to predict the non-TM versus TM location of a ␦versus TM-helix sequence. The two sets of values do not correlate, although a regression line and correlation coefficient are shown for illustrative purposes (Fig. 7B). Charged residues and their location in the ␦-helix segments may, therefore, help to predict these hydrophobic protein regions as destined to be globular.
Conclusions-The present study shows that (i) nearly 30% of globular protein helices of sufficient length to span a membrane bilayer (Ն19 amino acids) have mean hydropathy values equivalent to or greater than known actual TM segments, and (ii) differentiation between these TM-mimicking portions of helical globular proteins and bona fide TM segments could generally be achieved in the first instance by flagging sequences with three or more charged residues as non-TM regions. We further observed that although significant hydrophobicity is absolutely a necessary feature in identifying potential TM segments, additional factors such as the location of the charged residues and an increased occurrence of Ile and Val residues should also be considered. Although ␦-helix segments in globular proteins may embody important hydrophobic structural features of the in vivo protein fold, the work described herein provides additional clues as to how proteins may be sorted by the cell. Our current examination of factors that act to divert ␦-helices from membrane insertion to an aqueous phase indicate that TM Finder and similar software could exploit these specific features to increase our predictability of transmembrane segments in proteins.