The YefM Antitoxin Defines a Family of Natively Unfolded Proteins

Although natively unfolded proteins are being observed increasingly, their physiological role is not well understood. Here, we demonstrate that the Escherichia coli YefM protein is a natively unfolded antitoxin, lacking secondary structure even at low temperature or in the presence of a stabilizing agent. This conformation of the protein is suggested to have a key role in its physiological regulatory activity. Because of the unfolded state of the protein, a linear determinant rather than a conformational one is presumably being recognized by its toxin partner, YoeB. A peptide array technology allowed the identification and validation of such a determinant. This recognition element may provide a novel antibacterial target. Indeed, a pair-constrained bioinformatic analysis facilitated the definite determination of novel YefM-YoeB toxin-antitoxin systems in a large number of bacteria including major pathogens such as Staphylococcus aureus, Streptococcus pneumoniae, and Mycobacterium tuberculosis. Taken together, the YefM protein defines a new family of natively unfolded proteins. The existence of a large and conserved group of proteins with a clear physiologically relevant unfolded state serves as a paradigm to understand the structural basis of this state.

The "thermodynamic hypothesis" of protein folding, as was introduced more than 40 years ago, suggests that the folded state of a given protein represents a global minimum of free energy (1). Although this theory is widely valid, there is a considerable group of "natively unfolded" proteins (as were first denoted by Mandelkow and coauthors (2)) which rather favors the thermodynamically unfolded state (3-6; for a recent review on natively unfolded proteins see Ref. 5). The unfolded state of this group of proteins does not signify a requirement for the activity of molecular chaperones to overcome a large energetic barrier to attain a global minimum energy, but a truly energetically favorable unfolded state. The natively unfolded state is also distinct from the misfolded state in which proteins self-assemble to form large supramolecular assemblies such as amyloid fibrils (7)(8)(9).
Although the number of natively unfolded proteins identified is increasing steadily (4,10), their physiological significance is poorly understood. One case in which a natively unfolded state of a protein appears to have physiological significance is that of the Phd protein of the phage P1 (11). This protein is a part of a bimolecular complex that acts as the "plasmid addiction" module of the phage (12). The addiction module mechanism assures an efficient inheritance of the extrachromosomal phage and is based on the differential physiological stability of its two components, the stable toxin Doc and the labile antitoxin Phd. Upon a loss of the phage in a postsegregational event, no de novo synthesis of either the toxin or antitoxin occurs. Because of the physiological instability of the antitoxin, only the toxic component of the module is ultimately retained within the cured cells, causing the death of cured cells. Consistent with the fact that Phd is recognized and degraded by the ClpXP "quality control" machinery of infected cells (13), we suggested that its unfolded state is the key to its physiological instability, thus serving as a critical element in the function of the TA 1 module. Many "damaged" or misfolded proteins are identified and eliminated by the ClpXP system. These unfolded target proteins may be recognized by ectopic exposure of hydrophobic amino acids, which are normally buried within the hydrophobic core of the protein. Therefore, we assumed that ClpXP recognizes the unfolded Phd protein based on its structural property because it may appear as damaged protein.
TA systems were also identified on chromosomes in both bacteria and archea, but not in eukaryotes (14 -19). These systems share the same paradigm of a stable toxin and an unstable antidote, organization as a polycistronic operon, and the small size of the protein components (70 -100 amino acids). Although TA systems are widely present, their physiological role is not fully understood. It is assumed that the systems play a significant role in survival under stringent conditions (14 -19).
The absolute lack of TA systems in eukaryotes, as opposed to their ubiquitous presence in bacteria and archaea, makes the systems a very attractive antibacterial target. Unlike conventional antibiotics, there is no need for the external introduction of toxic material that may affect the host as well. The blockage of the toxin-antitoxin physical interaction may result in the execution of the inherent toxic potential of the toxin.
In this work, we clearly demonstrate that the Escherichia coli YefM antitoxin protein, although showing very low homology to the Phd protein, is also natively unfolded. Pair-constrained bioinformatics analysis allowed the identification of a large family of natively unfolded host proteins that are based on the Phd-YefM structural framework. The chromosomal organization of the proteins implies that they are a part of functional TA systems in a related group of bacteria, including some major pathogens. The unfolded YefM-like proteins are an attractive target for the development of antibacterial agents because the toxin partner of the TA module recognizes a linear determinant with the antitoxin, which could be mimicked by a therapeutic agent.

EXPERIMENTAL PROCEDURES
Gene Sequence Identification and Alignments-Sequences related to the yefM and yoeB genes of E. coli were identified by a pair-constrained bioinformatic analysis. Sequences were identified using TBLASTN and PSI-BLAST searches (20) of nonredundant microbial genomes data base at NCBI (www.ncbi.nlm.nih.gov/BLAST/). Putative yefM and yoeB homolog sequences were obtained and examined for a toxin-antitoxin gene pair module in the chromosome. Low homology unpaired sequences were discarded. Alignments were produced by ClustalW (21) with default settings and edited using JALVIEW editor.
Growth Rate Analysis-E. coli TOP10 bacteria transformed with pBAD-yefM, pBAD-yoeB, and pBAD-yefMyoeB were cultured overnight in LB broth supplemented with 100 g/ml ampicillin at 37°C. The next day, the three cultures were diluted and adjusted to an absorbance of ϳ0.01 (A 600 ) in LB-ampicillin. Next, each culture was divided into two equal volumes; at time zero, the first half was added with 0.2% Larabinose to induce expression of the target gene and the second half with 0.2% D-glucose to suppress low transcription from the pBAD promoter. All cultures were grown at 37°C/200 rpm, and samples were taken sequentially approximately every 40 -60 min for 9 h. Cells density was measured by its absorbance at 600 nm. To inspect the growth rate for gene induction during logarithmic growth phase, the same analysis assay as above was conducted, with the exception of the time of induction. Cultures were divided, and expression was induced (or suppressed) at the time they had reached an absorbance of ϳ0.45 (A 600 ).
Colony Formation Analysis-E. coli TOP10 bacteria transformed with pBAD-yefM, pBAD-yoeB, and pBAD-yefMyoeB were grown in LB broth at 37°C containing ampicillin as indicated. After overnight growth, cultures were diluted to an A 600 of 0.01 in LB-ampicillin medium. The cultures were then grown at 37°C until an A 600 value of 0.5 was reached. At that point, cells were diluted 10 4 -10 7 times in 10-fold dilution steps and applied as 5-l dropouts on LB-ampicillin-agar plates containing arabinose in the following decreasing arabinose dilutions: 0.2%, 0.1%, 0.05%, 0.02%, 0.005%, and 0.0005%. In addition, a negative control plate without arabinose and supplemented with 0.2% glucose was plated. All plates were incubated at 37°C for at least 20 h.
Cloning, Expression, and Purification of YefM from E. coli-The DNA fragment containing the coding sequence of yefM, flanked by primer-encoded BsrGI and HindIII sites, was produced by a PCR using E. coli strain MC1061 chromosome as template and oligonucleotide primers YEFMSTART (5Ј-GTACAATGAACTGTACAAAAGAAG-3Ј) and YEFMEND (5Ј-GACAAGCTTAGTTTCACTCAATG-3Ј). The product was digested with BsrGI and HindIII enzymes (New England Biolabs), cloned into the BsrGI and HindIII restriction sites of a pET42a expression vector (Novagen) in fusion to glutathione S-transferase (GST) and transformed into E. coli BL21(DE3) pLysS (Novagen). Transformed bacteria were grown in 2YT broth at 37°C/200 rpm to an A 600 of ϳ0.4. Protein expression was induced by the addition of 2 mM isopropyl-␤-D-thiogalactopyranoside. After 1 h, cells were harvested and resuspended in phosphate-buffered saline, pH 7.3 (PBS; 140 mM NaCl, 2.7 mM KCl, 10 mM Na 2 HPO 4 , 1.8 mM KH 2 PO 4 ), protease inhibitor mixture as recommended (Sigma), and 0.5 mM phenylmethylsulfonyl fluoride, and lysed by three passages through a French pressure cell (1,400 p.s.i.). The insoluble material was removed by centrifugation for 20 min at 20,000 ϫ g at 4°C followed by a 0.45-m filtration. The supernatant was applied onto a 1-ml glutathione-Sepharose column (Amersham Biosciences) preequilibrated with PBS, pH 7.3. The protein was eluted using 10 ml of 50 mM Tris-HCl, pH 8.0, 10 mM glutathione. YefM proteins were separated from the GST using 16 units of factor Xa protease (Novagen)/1 mg of YefM fusion. After a 14-h incubation at 37°C, the reaction was terminated by the addition of 1 mM phenylmethylsulfonyl fluoride. Two different methods were applied for YefM purification. In the first method, gel filtration was conducted to remove the GST and linker protein (ϳ40 kDa) from YefM (ϳ11 kDa) using a Sepharose HR 10/30 (fast protein liquid chromatography) gel filtration column (Amersham Biosciences) and a fast protein liquid chromatography instrument (Amersham Biosciences). Proteins were eluted with PBS, pH 7.3, 0.8 ml/min, and a peak that included the ϳ11-kDa YefM proteins was collected after 13 min. Fractions containing the YefM protein were completely purified using 1 mol of immobilized glutathione-agarose (Sigma) agitated for 16 h at room temperature. At this point, YefM was greater than 95% pure as estimated by Coomassie staining of SDS-PAGE. In the second purification method, the YefM and GST protein mixture was divided into 0.5-ml fractions, boiled for 10 min, and then centrifuged at 14,000 rpm for 10 min. The supernatants, containing the purified YefM, were collected and united.
To determine YefM concentration, tyrosine absorbance measurement in 0.1 M KOH was used. Protein concentrations were calculated using the extinction coefficients of 2391 M Ϫ1 cm Ϫ1 (293.2 nm in 0.1 M KOH) for single tyrosine.
The molecular mass of YefM was verified by matrix-assisted laser desorption ionization time-of-flight mass spectrometry using a voyager-DE STR Biospectrometry work station (Applied Biosystems). ␣-Cyano-4-hydroxycinnamic acid was used as the matrix.
Cloning, Expression, and Purification of GST-YoeB from E. coli-The DNA fragment containing the coding sequence of yoeB, flanked by primer-encoded EcoRI and HindIII sites, was produced by a PCR using E. coli strain MC1061 chromosome as template and oligonucleotide primers YOEBSTART (5Ј-AAAGGACATGAATTCGTGAAACTAATC-3Ј) and YOEBEND2 (5Ј-CCTTTGAAGCTTTTCAATAATGATAA-3Ј). The product was digested with EcoRI and HindIII enzymes (New England Biolabs), cloned into the EcoRI and HindIII restriction sites of the pET42a expression vector in fusion to GST, and transformed into E. coli BL21(DE3) pLysS. Bacteria were grown, expressed, and lysed in the same manner described above for GST-YefM fusion. The supernatant was applied onto a 1-ml glutathione-Sepharose column (Amersham Biosciences) preequilibrated with PBS, pH 7.3. The bound protein was eluted using 10 ml of 50 mM Tris-HCl, pH 8.0, 10 mM glutathione. Eluted fractions containing the GST-YoeB protein were collected and assessed quantitatively by Coomassie staining of SDS-PAGE.
Circular Dichroism (CD)-CD spectra were obtained using an AVIV 202 spectropolarimeter equipped with temperature-controlled sample holder and a 5-mm path length cuvette. Mean residual ellipticity, [], was calculated as where is the observed ellipticity, m is the mean residual weight, c is the concentration in mg/ml, and L is the path length in cm. All experiments were performed in PBS, pH 7.3, at a protein concentration of 10 M. For thermal denaturation experiments, samples were equilibrated at each temperature for 0.5 min, and CD ellipticity at 222 nm and 217 nm was averaged for 1 min. Fourier Transform Infrared Spectroscopy (FTIR)-Infrared spectra were recorded using a Nicolet Nexus 470 FTIR spectrometer with a DTGS detector. The sample, 1 g of lyophilized YefM suspended in 30 l of PBS in D 2 O, pD 7.3, was suspended on a CaF 2 plate. The measurements were taken using a 4 cm Ϫ1 resolution and 2,000 scans averaging. The transmittance minima values were determined by the OM-NIC analysis program (Nicolet).
Analysis of YefM Stability-Overnight culture of E. coli carrying the pBAD-yefM plasmid was grown at 37°C/200 rpm in LB broth to stationary phase (A 600 ϭ 1.4). YefM expression was then induced for 10 min with 0.2% arabinose and subsequently treated with 200 g/ml rifampicin and 0.2% glucose to repress further expression from P BAD promoter. Aliquots of 2 ml were removed before and at 15-min intervals after repression and analyzed by Western blot (see below) to assess YefM quantity in bacteria. Densitometer assessment of YefM was achieved using an ImageScanner (Amersham Biosciences) and the Image-Master one-dimensional prime (version 3.01) program (Amersham Biosciences).
Western Blot Analysis-Aliquots (2 ml) were centrifuged at 14,000 rpm for 5 min at 4°C and resuspended in 80 l of double-distilled water. Samples of 60 l were added to 20 l of 4ϫ sample buffer, and the remaining 20 l was used to quantify the total protein using the Coomassie Plus protein assay reagent (Pierce). Aliquots containing equal total protein amounts were loaded on a Tris-Tricine SDS 15% polyacrylamide slab gel. After electrophoresis, the proteins were electroblotted to polyvinylidene difluoride membrane filters (Bio-Rad). The detection of YefM was performed using anti-YefM serum raised in rabbit. The membrane was then incubated with peroxidase-conjugated anti-rabbit antibodies, and YefM proteins were detected through the enhanced chemiluminescence reaction after an exposure to a sensitive film.
Amino Acid Composition and Charge-Hydrophobicity Values Analysis-The rate of occurrence of each amino acid in the YefM family proteins (P Mi ) was determined by averaging its 30 frequencies in each of the 30 YefM homolog sequences. The general amino acid occurrence statistics (P Gi ) were compiled by the Rockefeller authors using the NCBI data base (prowl.rockefeller.edu/aainfo/masses.htm). The comparison ordinates between the amino acid occurrences are given by their fractional difference: (P Mi Ϫ P Gi )/P Gi. The variances of these ratios were calculated as Var(P Mi )/(P Gi ) 2 .
The mean hydrophobicity and the mean net charge of the YefM and the YefM homologs proteins were calculated as described by Uversky and coauthors (3).
Peptide Array Analysis-Tridecamer peptides corresponding to consecutive overlapping sequence of YefM protein were arrayed on a cellulose membrane matrix and covalently bound to a Whatman 50 cellulose support (Whatman). Approximately 50 g of soluble GST-YoeB proteins were examined for their selective peptide binding ability, on the basis of YefM-YoeB putative interaction. In the case of a low stringency binding procedure, membrane was washed briefly in 100% ethanol, washed three times with Tris-buffered saline (TBS; 50 mM Tris-HCl, pH 7.5, 150 mM NaCl), and then blocked for 4 h using 5% (w/v) non-fat milk in TBS. Next, the membrane was washed three times in TBS ϩ 0.1% (v/v) Tween 20 (TBS-T) and incubated for 14 h with 10 ml of GST-YoeB solution at slow shacking at 4°C. Subsequently, the membrane was washed once in TBS-T. Membrane was then added with 10 ml of TBS, mouse anti-GST antibody and horseradish peroxidaseconjugated goat anti-mouse antibody in the appropriate titers. After a 1-h incubation at room temperature, the membrane was washed briefly with TBS-T and TBS. When high stringency binding procedure was performed, washing steps were extensive and multiple. Moreover, the washing step of the blocking solution was reduced to a single brief wash. Bound GST-YoeB proteins were detected through the enhanced chemiluminescence reaction after an exposure to a sensitive film.
Surface Plasmon Resonance (SPR) Analysis-Binding affinities were evaluated by SPR using BIAcore TM 2000 (BIAcore Inc.). Approximately 30 resonance units of the peptide NH 2 -RTISYSEARQNLS-COOH, denoting the YefM recognition determinant sequence, was immobilized onto a research grade sensor chip CM5 using amine coupling kit (BIAcore) as described by the manufacturer. GST-YoeB proteins (at 12.5, 25, and 50 nM concentrations) were passed over the chip surface in 50 mM Tris, pH 7.2, at room temperature at a flow rate of 10 l/min. The chip surface was regenerated with 10 mM HCl in water after each run and reequilibrated with Tris buffer. Sensogram data were analyzed using the BIAevaluation 3.0 software package. The rate constants were calculated for the binding data using local fitting for the data set as described in the BIAevaluation 3.0 manual with the 1:1 Langmuir binding model.

Identification of the yefM-yoeB System Genes-The
YefM protein of E. coli was suggested to be homologous to the Phd protein (23), and, similar to the Phd antitoxin, it was considered to serve as the antitoxin partner of a YoeB toxin. However, this homology is very low and in fact not statistically significant (E ϭ 18, according to pairwise BLAST analysis). This is still very intriguing because the Phd protein appears to have unique structural properties and shows no clear homology to any other proteins. To justify the suggested "YefM-Phd protein family" term (23), systematic exploration of YefM and Phd protein sequences is essentially required. Homologs of YefM were demonstrated to reside on the Francisella tularensis plasmid pFNL10 (23) and on a multidrug resistance plasmid identified in a clinical isolate of Enterococcus faecium (24). The existence of homologs of YefM and YoeB protein in bacterial chromosomes was also suggested (24). However, many unpaired YefM and YoeB homologs were presented (24), indicating that a methodical YefM and YoeB homolog pairing is re-quired to verify their authenticity as a functional module. Therefore, we used a pair-constrained homology search. In this search, a combination of the values of homology (albeit low) for both putative toxin and antitoxin taken together with their chromosomal organization was taken into account. Only pairs of proteins that revealed paradigmatic TA genetic organization, in which the physical distance between the pair of proteins is less than 100 bp, were regarded as putative TA systems. The resulting findings are shown in Fig. 1.
In view of this homology analysis, it became clear that a subset of the YefM homolog sequences which are highly similar to Phd are located adjacent to prophage P1 Doc protein homologs, instead of YoeB ( Fig. 1 and Fig. 2A). Therefore, we relate these sequences as hypothetical phd genes. This group includes translations of genomic sequences from Salmonella typhimurium, Klebsiella pneumoniae, and Yersinia enterocolitica. Those bacteria are actually closer in sequence to Phd (with an E value of 2 ϫ 10 Ϫ9 , 7 ϫ 10 Ϫ9 , and 2 ϫ 10 Ϫ4 , respectively) than to E. coli YefM (E ϭ 2 ϫ 10 Ϫ4 , 3 ϫ 10 Ϫ4 , 0.8, respectively). Anyway, these two systems may exist together: the Y. enterocolitica bacterium includes both YefM-YoeB and Phd-Doc homolog sequences on its genome (see Figs. 1 and 2).
Alignment of all of the homologous translated sequences was conducted to estimate their rate of conservation. YefM homologs alignment ( Fig. 2A) consists 29 different homologs, in addition to the Phd protein sequence of phage P1 (last sequence). The toxins alignment (Fig. 2B) is divided into two sections: the upper panel includes the YoeB homologs, consisting of 26 different sequences, and the lower panel includes the Doc homologs alignment, consisting of 3 different Doc homologs in addition to the Doc protein sequence of the phage P1 itself. YoeB and Doc homologs cannot be engaged into a reliable alignment because of their far diverse sequences.
The yefM-yoeB Genes Act as a TA System-To examine the toxic and antitoxic effect that the expressed proteins have on the cell, YefM and YoeB were overexpressed separately and together as an operon using the pBAD-TOPO plasmid. E. coli strain TOP10 cells, carrying the plasmids, were grown in LB medium, and 0.2% arabinose was added at time zero. A significant effect was observed in these bacteria (Fig. 3, A-C). The overexpression of the putative toxin, YoeB, inhibited the bacterial growth to maximum A 600 of ϳ0.15 (Fig. 3B). Overexpression of both YefM and YoeB as an operon abolished this toxic effect, indicating a TA relationship between YoeB and YefM (Fig. 3C), as accepted (24). Surprisingly, overexpression of YefM alone had displayed an effect on cell growth similar to that by YoeB (Fig. 3A). The same results had been witnessed when cells expressing the system genes were induced during the logarithmic growth stages (Fig. 3, D-F): 0.2% arabinose was added to the different cultures at the time they reached A 600 of ϳ0.45. In the cases of YefM or YoeB expression, absolute growth inhibition had been observed after less than 1 h (Fig. 3, D and E) as cells reached ϳ0.7 A 600 , whereas the expression of both genes together enabled normal growth (Fig. 3F).
To confirm that the YefM is an actual antitoxin, we tested the colony formation capability of each of the clones at decreasing expression levels (Fig. 3G). On the whole, yefM clones have consistently demonstrated a certain degree of growth in all arabinose concentrations, whereas yoeB clones did not form colonies at most concentrations. Moreover, in the presence of 0.005% arabinose, growth of the yoeB clone was disabled, whereas the yefM clone still demonstrated clear growth, indicating that YoeB is a real toxin whereas YefM displays toxicity upon high expression levels.
Biophysical Characterization of YefM-YefM Is Natively Unfolded-YefM was purified as described under "Experimental Procedures," either by performing gel filtration (obtaining ϳ0.1 mg/ml) or by boiling GST and YefM proteins subsequent to factor Xa cleavage (ϳ0.35 mg/ml).
The far-UV CD spectra of the purified YefM protein (in both purification methods) at increasing temperatures (25, 37, 42°C) show a typical random coil pattern with a minimum in the vicinity of 200 nm (25), with only slight changes in spectra caused by an increase in temperature (Fig. 4A). FTIR spectroscopy also indicates that YefM protein is random coil-structured (Fig. 4B). The FTIR spectrum of the purified YefM (room temperature) showed a transmittance minimum at 1,643 cm Ϫ1 relating to random coil structure (26).
A thermal denaturation experiment (Fig. 4C) proves that YefM keeps a consistent predominant random coil structure at the entire temperature range, as a continuous temperature increase of the YefM sample from 2 to 80°C did not significantly shift the CD ellipticity at 222 nm or at 217 nm (wavelengths specifying for maximum CD ellipticity of ␣-helix and ␤-sheet structures, respectively), implying that the structure remained unchanged. Another support for the natively unfolded state of YefM comes from its extraordinary solubility during boiling (Fig. 4D).
Determination of YefM Stability in Vivo-To get insight into the structural stability of the YefM antitoxin in its native state within cells, we have examined its proteolytic stability in vivo. For that end, we performed a short expression of YefM followed by its full repression under stationary growth. Analysis of YefM levels in E. coli, before and after repression at different intervals, reveals that the YefM antitoxin is proteolytically unstable (Fig. 5). YefM degraded in vivo with a half-life of approximately 1 h. This result correlates with expected features of TA systems, where the antitoxin proteins are preferred substrates for a protease, and is consistent with the half-life reported for the unfolded Phd antitoxin (13).
Amino Acid Composition of YefM Family Proteins-To visualize differences between amino acid composition of the YefM proteins and the general amino acid composition and to gain further insight into the role of the sequence in providing disorder characteristics, we have compared the general occurrence of each amino acid in relation to its mean occurrence in YefM proteins. As shown in Fig. 6A, YefM family proteins are considerably enriched in Met and Glu (30 -50%) and substantially depleted in Trp, Cys, Pro, Phe, and Gly (Ͼ50%). The obtained results for these amino acids are significant, with a p value Ͻ 0.001, as determined by a one-sample t test. Other amino acids do not display significant enrichment or depletion from the general occurrence of amino acids.

Charge-Hydrophobicity Relationships in the YefM Family Proteins-A comparative study that was published by Uversky et al.
(3) demonstrated well that it is possible to predict whether a given sequence encodes a folded or natively unfolded protein by a two-dimensional plot of the overall hydrophobicity and the net charge of the studied proteins. To assess whether the charge-hydrophobicity properties of the YefM family proteins correlate with those previous findings, we have examined these relationships for YefM, Phd, and their homolog sequences as described previously (3) (Fig. 6B). Unexpectedly, the YefM-Phd family proteins were found to be localized mostly within the defined "folded region" of the plot. The localization of Phd protein and its homologs is indistinguishable from the YefM homologs.
Identification of YefM Recognition Determinant-On the basis of the YefM natively unfolded structure, we assumed a linear determinant rather than a conformational one to be recognized by its toxin partner. To identify this determinant in the YefM sequence, we have designed an array consisting of 41 overlapping tridecamer peptides corresponding to amino acids residues 1-12 up to 80 -92 of the whole YefM sequence in successive order with 2-amino acid shifts (Fig. 7A) synthesized on a cellulose membrane matrix. The YefM fragments capable of binding GST-YoeB fusion were identified by immunoblotting. Using a low stringency procedure to obtain maximum putative interaction sites, we have identified three such regions. As seen in Fig. 7A, first region included three tridecamer peptides (YefM 11-23 -YefM [15][16][17][18][19][20][21][22][23][24][25][26][27] in decreasing binding capacity, including the sequence RTISYSEARQNLSATMM (underlined sequence represents major bound site); the second region included the single YefM 33-45 peptide sequence, APIL-ITRQNGEAC; the third region comprised the two YefM 75-87 and YefM 77-89 peptides, which cover the MDSIDSLKSGKG-TEKD sequence.
To verify our results, we used a second peptide array membrane comprising those regions with the intention of perform-ing a high resolution analysis of the putative binding sites (Fig.  7B). We used a high stringency procedure (see "Experimental Procedures") to minimize unspecific binding of the GST-YoeB fusion protein or antibodies. The examined sites were extended to include YefM 8 -31 as the first region ; YefM 29 -48 as the second region, and YefM 72-92 as the third region. The shift between each arrayed tridecamer peptide was reduced to a single amino acid. Of all examined regions, the YefM 11-23 peptide (RTIS-YSEARQNLS) was detected as the best YoeB binding sequence.
The Arginine in Position 19 Is Essential for YefM-YoeB Interaction-Alongside the verification of the major binding sequence, we tried to detect a single amino acid that would be crucial for YefM-YoeB interaction. The identified binding sequence is rather conserved through the YefM-Phd protein families. However, two amino acids are notably conserved within: arginine (position 19) and leucine (position 22), as seen in Fig.  2A. We have examined the binding capability of a GST-YoeB fusion to a cellulose membrane array using tridecamer peptides corresponding to the YefM 11-23 sequence, containing Arg-19 or Leu-22 replacements to alanine or glycine (Fig. 7C). Although L22A and L22G replacements only attenuated the binding of YoeB, R19A or R19G totally interrupted the binding, suggesting that the arginine in position 19 is essential for the binding of the YoeB toxin. DISCUSSION Non-native protein structures attract an increasing degree of intention because of their abundance on the one hand and the lack of understanding of their physiological significance on the other. Identification of distinct families of natively unfolded proteins, understanding their conservation on the structural level, and understanding their physiological role are therefore of high importance. Here, using a combination of bioinformatics and biophysical and physiological analysis, we define a new family of natively unfolded proteins, the YefM-Phd family. Using a pair-constrained bioinformatic approach, we were clearly able to demonstrate that members of the family are present in a large number of bacteria. Although the level of homology within the antitoxins family is relatively low ( Fig.  2A), we were surprised to find Phd homologs that share higher percentage of homology to YefM than Phd does (Y. enterocolitica, K. pneumoniae, and S. typhimurium). Although YefM and Phd proteins share very low sequence homology, the key feature that the proteins share is the natively unfolded state at physiological temperatures ( Fig. 4 and Refs. (11 and 27). Because both Phd-Doc (12) and YefM-YoeB (Fig. 3) are proven to be functional TA systems, these findings may suggest that Phd and YefM antitoxins have evolved from a common ancestor system and that at a certain point in the past the antitoxin may have branched out to establish new TA systems consisting of different toxins.
Interestingly, the level of homology within the YoeB family (Fig. 2B) appears to be significantly higher compared with the YefM family of proteins ( Fig. 2A). The level of conservation observed with the YoeB proteins is highly consistent with a toxic activity that explicitly targets specific cellular determinants and that requires a well defined fold such as a keylock or induced fit recognition. On the other hand, the low degree of conservation of the extended YefM-Phd family is consistent with a protein missing a clear structural recognition and/or catalytic activity that otherwise requires a defined configuration. It is important to note that YefM and Phd proteins could be irregularly conjugated to a Doc-like or YoeB-like toxins, two families of toxins that could not be aligned and do not share any substantial homology. It is more consistent with a family of protein that is essentially designed to be recognized as a dam-aged protein and does not represent an interactive or catalytic scaffold. Moreover, the relatively small area of YefM which shows the highest level of conservation was identified to include the target of linear recognition by the YoeB protein (Fig. 7).
Physiological assays have verified that the yoeB gene encodes a toxin that is lethal or inhibitory to host cells and that yefM encodes an antitoxin that prevents the lethal action of the toxin ( Fig. 3 and Ref. 24). Unexpectedly, upon overexpression, YefM inhibited the bacterial growth. However, the dose-dependent behavior of toxicity may suggest that it is an artifact of overexpression rather than a true physiological phenomenon (Fig. 3G).
It is hypothesized that the proteolytic stability difference of the TA system components arises from their thermodynamic stability difference. YefM strongly supports this hypothesis as it was demonstrated to be a natively unfolded protein. Furthermore, among all structurally described antitoxins (Phd of P1 (11,27), ParD of RK2/RP4 (28), CcdA of F (29), and ⑀ of pSM19035 (30)), YefM is the most unstable protein. One of the general structural characteristics of a natively unfolded protein is the lack of secondary structures. At 37°C, the Phd antitoxin seems to be in a largely unfolded, random coil conformation as well (11). However, at 4°C or at 37°C in the presence of the trimethylamine N-oxide chemical chaperone, Phd folds into an ordered protein containing ϳ45% ␣-helix. Analysis of the YefM far-UV CD spectra yields a low content of ordered secondary structure (␣-helices and ␤-sheets) and does not change even at low temperature of 2°C (Fig. 4, A and C) or upon the addition of trimethylamine N-oxide chemical chaperone (data not shown). YefM was also confirmed to be random coil by FTIR analysis (Fig. 4B). Additional substantiation for YefM being a FIG. 6. Analysis of the physicochemical properties of the identified proteins. A, YefM amino acid occurrence relative to the general amino acid occurrence (prowl.rockefeller.edu/aainfo/masses.htm), given by (P Mi Ϫ P Gi ) /P Gi . Error bars represent the S.D. values. The significance of difference between the antitoxin amino acids mean occurrences and the general occurrences designated by *, indicates p Ͻ 0.001 as determined by one sample t test. The amino acids are arranged according to residue flexibility (32), with increasing flexibility to the right. B, comparison of the mean net charge and the mean hydrophobicity for the YefM (circles) and the Phd (triangles) protein families. The solid line represent the border between natively unfolded proteins (upper left) and folded proteins (bottom right) calculated using the equation R ϭ 2.785H Ϫ 1.151, where H is the mean hydrophobicity and R is the mean net charge, as was proposed by Uversky and coauthors (3). The YefM protein (gray circle), Phd protein (gray triangle), and their homologs are mostly localized in the "folded" region. Mean net charge and mean hydrophobicity values were calculated as described in Ref. 3. most unstructured protein comes from its unusual resistance to aggregation upon boiling (Fig. 4D), which is consistent with a lack of secondary structure elements that mediate aggregate formation through intermolecular association (see Fig. 4D).
Indeed, YefM is proteolytically unstable in vivo (Fig. 5), suggesting that it maintains an unfolded conformation within cells. This feature further correlates with the observed proteolytic instability of other antitoxins, as Phd and MazE (13,19).
It was suggested recently that the relations between sequence and disorder proteins include amino acid compositional bias and high predicted flexibility (6, 31). According to this study, it was demonstrated that natively unfolded proteins are substantially depleted in Trp, Cys, Phe, Ile, Tyr, Val, Leu, and Asn and substantially enriched in Ala, Arg, Gly, Gln, Ser, Pro, Glu, and Lys. Indeed, we found that the same amino acid compositional bias is valid when comparing the occurrence of the above disordered sequences (using the ALL-disorder sequences data base (31)) with the general occurrence of amino acids (prowl.rockefeller.edu/aainfo/masses.htm) (data not shown). In addition, the depleted amino acids were shown to correspond to low flexibility residues, whereas the enriched amino acids corresponded to high flexibility ones (6). The flexibility ranking is based on a scale developed by Vihinen et al. (32) and reflects the propensity of a given residue to be buried or exposed (i.e. low or high flexibility, respectively) in the crystal structure of globular proteins. However, the amino acid composition of the natively unfolded YefM family proteins is rather different (Fig. 6A). Although both the studied disordered proteins and the YefM family proteins are depleted significant in Trp, Cys, and Phe, the YefM proteins are depleted further in Gly and Pro, amino acids considered as disorder-promoting (6,22). Moreover, Glu is the sole amino acid that seems to be significantly enriched in both. Noteworthy, the most rigid residues (Trp, Cys, and Phe) remained depleted in both surveys, insinuating an essential importance in the absence of coreforming side chains in the coding of intrinsically disordered sequences.
Recent comparative studies suggested that it is possible to predict whether a given sequence encodes a folded or natively unfolded protein (3)(4)(5). This suggests that a natively unfolded protein must possess the combination of low mean hydrophobicity and relatively high net charge under physiological conditions. However, the majority of the YefM family proteins do not correlate with this determination, including YefM and Phd proteins (Fig. 6B). Obviously, this result is coupled with the  7. Identification of the YoeB binding sequence in the YefM protein using a peptide array. A, 41 tridecamer peptides corresponding to consecutive overlapping sequences of 92 amino acids. YefM proteins (2-amino acid shift between peptides) were arrayed on a membrane. GST-YoeB binding to the membrane was analyzed. B, tridecamer peptides corresponding to consecutive overlapping sequences of YefM 8 -31 , YefM 29 -48 , and YefM 72-92 (single amino acid shift between peptides) were arrayed on a membrane and analyzed for GST-YoeB binding. C, tridecamer peptides corresponding to YefM-YoeB recognition sequence with Arg-19 and Leu-22 replacements were analyzed for GST-YoeB binding. No GST-YoeB binding could be detected to R19A or R19G tridecamer peptide. unique amino acid compositional bias of the YefM family proteins mentioned above, which does not fit the established characteristics of disordered sequences. The relative lack in high flexibility side chains (e.g. Lys, Pro, Gly, Ser, and Gln) together with an insufficient depletion in hydrophobic rigid side chains (e.g. Ile, Tyr, Val, and Leu), account for the relatively low net charge and rather high overall hydrophobicity that characterize the YefM family. Furthermore, in the case of the YefM family proteins, we propose that the lack of aromatic residues, rather than hydrophobic, maintains the disordered state of YefM. As seen in Fig. 6A, the depletion in the aromatic residues Phe and Trp, unlike other hydrophobic residues, is conserved through the YefM family. The lack of aromatic moieties is consistent with the lack of organized and packed hydrophobic core.
As discussed in the introduction section, the TA system may serve as an excellent target for antibacterial agent. One approach is to prevent the toxin and antitoxin components from interacting in vivo, which would trigger their inhibitory (or lethal) effect on cell growth. Here, we have identified the molecular recognition sequence within the YefM protein using peptide array (Fig. 7) and SPR analysis (Fig. 8). In the future we intend to use this information for the design of agents that will affect the YefM-YoeB interaction.