ClbP Is a Prototype of a Peptidase Subgroup Involved in Biosynthesis of Nonribosomal Peptides*

The pks genomic island of Escherichia coli encodes polyketide (PK) and nonribosomal peptide (NRP) synthases that allow assembly of a putative hybrid PK-NRP compound named colibactin that induces DNA double-strand breaks in eukaryotic cells. The pks-encoded machinery harbors an atypical essential protein, ClbP. ClbP crystal structure and mutagenesis experiments revealed a serine-active site and original structural features compatible with peptidase activity, which was detected by biochemical assays. Ten ClbP homologs were identified in silico in NRP genomic islands of closely and distantly related bacterial species. All tested ClbP homologs were able to complement a clbP-deficient E. coli mutant. ClbP is therefore a prototype of a new subfamily of extracytoplasmic peptidases probably involved in the maturation of NRP compounds. Such peptidases will be powerful tools for the manipulation of NRP biosynthetic pathways.

The pks genomic island of Escherichia coli encodes polyketide (PK) and nonribosomal peptide (NRP) synthases that allow assembly of a putative hybrid PK-NRP compound named colibactin that induces DNA double-strand breaks in eukaryotic cells. The pks-encoded machinery harbors an atypical essential protein, ClbP. ClbP crystal structure and mutagenesis experiments revealed a serine-active site and original structural features compatible with peptidase activity, which was detected by biochemical assays. Ten ClbP homologs were identified in silico in NRP genomic islands of closely and distantly related bacterial species. All tested ClbP homologs were able to complement a clbP-deficient E. coli mutant. ClbP is therefore a prototype of a new subfamily of extracytoplasmic peptidases probably involved in the maturation of NRP compounds. Such peptidases will be powerful tools for the manipulation of NRP biosynthetic pathways.
Polyketide (PK), 3 nonribosomal peptide (NRP), and hybrid NRP-PK natural products are very interesting compounds in drug development strategies (1). Their biosynthesis occurs on multimodular enzymatic assembly lines called megasynthases (2). The conformations that establish biological activity are attained by tailoring and editing enzymes such as acylases, alky-lases, glycosylases, and oxidoreductases during chain growth and after thioesterase-mediated release. The dedicated tailoring enzymes are encoded by genes clustered with the assembly line genes for coordinated regulation.
Recently, Nougayrède et al. (3,4) identified in Escherichia coli strains and other Enterobacteriaceae a large (54 kb) genomic island named pks, which encodes a typical cluster of NRP synthases (NRPS), PK synthases (PKS), and hybrid NRPS/ PKS. The pks island was detected in E. coli strains involved in extraintestinal infections such as urinary tract infection and septicemia but also in commensal strains from healthy people (3,5,6). The genetic and functional analyses of the pks island indicate that it encodes for synthesis of a PK-NRP hybrid compound, named colibactin. E. coli strains expressing this gene cluster cause DNA double-strand breaks in human eukaryotic cells during infection (3). This DNA damage induces phosphorylation of the H2AX histones (␥H2AX) and activates the ATM-Chk2 signaling pathway that leads to transient G 2 /M cell cycle arrest and cell swelling (megalocytosis). Exposed cells exhibit signs of incomplete DNA repair, which leads to chromosomal instability, an increase in gene mutation frequency and anchorage-independent colony formation, demonstrating the mutagenic and transforming potentials of colibactin (7).
In addition to eight NRP and PK megasynthases, the pks island encodes nine accessory, tailoring and editing enzymes (3). Eight of these proteins, including the protein ClbP, are required to induce DNA damage in infected cells. No protein similar to ClbP has been reported so far as an accessory or tailoring enzyme. Here, we determined the crystal structure of the ClbP major domain, its biochemical activity, and we characterized distant ClbP homologous proteins that form a new subgroup of serine-reactive peptidases associated with NRP assembly lines.

EXPERIMENTAL PROCEDURES
Strains, Plasmids, and Media-The strains and plasmids used in this study are listed in supplemental Tables 1 and 2, respectively. E. coli strains were routinely grown at 37°C in lysogeny broth (LB) or on LB agar plates. Ampicillin (100 g/ml), kanamycin (50 g/ml), chloramphenicol (25 g/ml), or tetracycline (200 ng/ml) were added as required to the medium. Non-E. coli strains were grown as recommended by the Deutsche Sammlung von Mikroorganismen und Zellkulturen or Collection de l'Institut Pasteur collections or by the givers.
DNA Cloning for ClbP Investigations-The DNA sequences of clbP were amplified from the genome of E. coli strain IHE3034 (3). The DNA fragments encoding the entire ClbP and the catalytic domain of ClbP designated ClbPpep were amplified by PCR using primer pairs clbPMK-F and clbPBN-R, and clbP-NdeI2-F and clbP-NotI2-R (supplemental Table 3), respectively. The PCR reactions for cloning experiments were performed using the High Fidelity Platinum Taq polymerase or the Expand High Fidelity PCR System according to the manufacturer's instructions (Invitrogen). The amplified fragments were purified, double-digested by MfeI/BamHI for cloning at the EcoRI/BamHI sites of pASK-IBA33plus to obtain ClbP-RGS6His (ClbP ϩ C-terminal in-frame RGS6His tag), or double-digested by and cloned at the NdeI/NotI sites of pET9am to obtain pClbPpep (ClbP catalytic domain). In other experiments, ClbP-6His was subcloned from pMB702 (3) into pBRSK vector (8) to obtain pOB901. The constructions were checked by double-stranded DNA sequencing (Genome Express). Plasmids were subsequently electroporated into competent E. coli strains: p33ClbP and pOB901 into DH10B/pBACpks⌬clbP strain for HeLa cellular challenges, and pClbPpep into BL21(DE3) strain for ClbPpep and derivative overexpression (original strains are listed in supplemental Table 1).
Site-directed Mutagenesis-The site-directed mutagenesis experiments were performed using the Quik-Change TM sitedirected mutagenesis kit according to the manufacturer's instructions (Stratagene). The substitutions E159K, S188A, H257A, G328S, F316R/G328S, N331A, and C337A were individually introduced into ClbP-RGS6His cloned into the p33ClbP plasmid and the substitution Y186E was introduced into ClbPpep cloned into the pClbPpep plasmid. In other experiments, ClbP-6His S95A, K98T, and Y186G mutants were generated by "reverse PCR" using pOB901 as template. Reverse PCR products were purified, ligated and transformed into DH5␣. Successful mutations were confirmed by double DNA sequencing and the final constructs were used to transform E. coli DH10B/pBACpks⌬clbP for p33clbP and pOB901 derivative vectors, or E. coli BL21(DE3) for pClbPpep derivative vector. The mutagenic primers are listed in supplemental Table 4.
ClbPpep and ClbPpepY186E Production and Purification-The pClbPpep plasmid encodes the peptidase domain of ClbP without the first 102 nucleotides and the last 390 nucleotides of the clbP gene. These sequences encode the signal peptide and three putative anchor hydrophobic transmembrane helices that were removed to facilitate cytoplasmic expression and protein purification. The E. coli BL21(DE3) transformed with pClbPpep plasmid was cultured in Terrific broth medium supplemented with 50 g/ml kanamycin, 0.4 M D-sorbitol, and 2.5 mM ␤-betaine. The cell culture was grown at 37°C with shaking to an A 600 nm ϳ 0.7. Overexpression of the clbPpep-encoding gene was induced with 0.2 mM isopropyl ␤-D-thiogalactopyranoside overnight at 25°C with shaking. Cells were harvested by centrifugation (10,000 ϫ g for 10 min at 4°C), resuspended in 20 mM Tris-HCl buffer (pH 9.0) (4 ml/g of pellet). After addition of 5 g of lysozyme (2 h at 4°C), cells were disrupted by ultrasonic treatment (four times for 30 s, each time at 20 watts). The extract was clarified by centrifugation at 14,000 ϫ g for 60 min at 4°C. After addition of 2 mg of DNase I (15 min at room temperature; Roche), the supernatant (10,000 ϫ g for 60 min at 4°C) was dialyzed overnight against 20 mM Tris-HCl buffer (pH 9.0) and filtrated (0.22 m filter; Millex-GS, Millipore, Carrigtwohill, Ireland). Purification was carried out by ion-exchange chromatography onto a HiTrap TM Q Sepharose TM High Performance column (10 ml; Amersham Biosciences) equilibrated with 20 mM Tris-HCl (pH 7.0) and eluted with a linear NaCl gradient (0 to 500 mM). The ClbPpep-containing elution peak was then purified by gel filtration chromatography onto a Superose TM 12 column (3.2 ϫ 30 cm; Amersham Biosciences), which had been equilibrated and eluted with 100 mM NaCl and 5 mM phosphate buffer (pH 8.0). The ClbPpep-containing elution peak was extensively dialyzed against 50 mM NaCl, 5 mM phosphate buffer (pH 8.0), concentrated by ultrafiltration to 10 mg/ml for crystallization. The enzyme was Ͼ95% homogeneous as determined by Coomassie Blue staining after SDS-PAGE. The N-terminal sequencing of the isolated protein was achieved by sequential Edman reaction (Plate-forme de Protéomique Analytique et Fonctionnelle, Institut National de Recherche Agronomique, Nouzilly, France). Production and purification of ClbPpepY186E were performed in the same way using the plasmid pClbPpepY186E.
Crystallization and Structure Determination-ClbP crystals were grown in hanging drops over a well solution of 0.8 M monosodium dipotassium phosphate buffer (pH 7.0). Crystals reached a size of 40 m ϫ 40 m ϫ 70 m within 4 weeks. Before data collection, crystals were immersed for ϳ30 s in a cryoprotectant solution (20% sucrose, 1.0 M monosodium dipotassium phosphate, pH 7.0) and were flash-cooled in liquid nitrogen. Data were collected using a Q315r ADSC-CCD detector on ESRF beamline 14-4 at the European Synchrotron Radiation Facility (Grenoble, France). Reflexions were indexed, integrated, and scaled using the CCP4 package (9). The initial model was obtained by molecular replacement with the program PHASER (10) and the x-ray structures (Protein Data Bank codes 1E15, 1RGY, and 1FR1) as a search model. The structure was automatically and manually refined with REFMAC5 and COOT programs, respectively (11,12). Cross-validation was used throughout, and 5% of the data were used for the R free calculation. The stereochemical quality of the models was monitored with the PROCHECK program (13). Ramachandran plots were calculated by RAMPAGE (14). Processing and crystallographic refinement statistics for ClbPpep crystal structure are listed in supplemental Table 5.
Mass Spectrometry-ClbPpep (20 M) was incubated at 20°C for 24 h in a 100 mM Tris, 75 mM NaCl buffer (pH 7.5) alone, or with 20 mM imipenem (Sigma), in a 1-ml final volume. The reaction mixtures were desalted by dialysis against a 5 mM ammonium bicarbonate buffer, before analysis by MALDI-TOF mass spectrometry. The sample was spotted onto the stainless steel target (Applied Biosystems) and an equal volume of matrix solution (14 mg/ml sinapinic acid in 50% acetonitrile in water containing 0.1% trifluoroacetic acid) was deposited into the sample drop. MALDI-TOF mass spectra were acquired on a Voyager DE-PRO mass spectrometer (Applied Biosystems) in a linear positive mode, using an accelerating voltage of 25 kV, a grid voltage of 93%, a pulse delay extraction of 700 ns, in the 15,000 -50,000 m/z mass range. Calibration was performed in a close external mode using a protein calibration mixture (LaserBio Labs).
Western Blot Experiments-Protein samples (20 -25 g for each sample) were resolved by SDS-PAGE separation electrophoresis on 12.5% polyacrylamide gels, immobilized onto PVDF membranes, and analyzed by immunoblot using the ECL blotting system (Amersham Biosciences), and monoclonal anti-RGS6His or anti-5His antisera (Qiagen, France), according to the manufacturer's instructions.
Eukaryotic Cell Culture, Bacterial Infections, ␥H2AX Staining, and Cell Cycle Analysis-Cell assays were performed as described previously with minor modifications (3). For bacterial infections, overnight LB-ampicillin cultures of bacteria were diluted in interaction medium (DMEM, 5% FCS, 25 mM HEPES, 100 g/ml ampicillin). For experiments with ClbP homologs, tetracycline was added to overnight cultures and interacting medium to induce their expression. Approximately 50% confluent HeLa cell cultures (ATCC CCL2) were then infected with a multiplicity of infection (number of bacteria per eukaryotic cell) of 100. Cells were washed 4 h after inoculation and incubated in DMEM 10% FCS, 200 g/ml gentamicin until analysis. Cell morphology was examined by Giemsa staining at 72 h after infection. For ␥H2AX staining, cells were fixed at 4 or 20 h after infection in 4% formaldehyde and incubated with anti-phospho-H2AX (Ser-139) antibodies (JBW301, Upstate) followed by FITC-or rhodamine-conjugated secondary antibodies. DNA was stained with DAPI or TO-PRO-3 (Invitrogen) and images were acquired with a Zeiss LSM 510 Meta or Olympus IX70 laser scanning confocal microscope, and the confocal aperture was set to achieve a z optical thickness of ϳ0.5 m. For cell cycle analysis, cellular suspensions were fixed and permeabilized with 70% ethanol at 48 h after infection and then treated with 50 g/ml propidium iodide and 250 g/ml RNase. DNA content data were acquired with an SLR II or FACSCalibur flow cytometer (Becton Dickinson) and analyzed with FlowJo software (Tree Star).
Bioinformatics-The search for ClbP homologous proteins and analysis of their topology were performed using BLAST (15) and the InterProScan server. Multiple sequence alignment of peptidase domains and phylogenic tree were performed using COBALT and TOPALi software (version 2.5) (16,17). Analysis of the genetic context of ClbP-and FmtA-like-encoding genes (more or less 30 kb of both sides of the genes) was performed manually from the NCBI genome database. The analysis of PK and NRP megasynthase domains and their substrate specificity was performed using the SEARCH NRPS-PKS program.

In Silico Analysis of ClbP-
The clbP open reading frame was predicted to encode a 504-amino acid periplasmic inner membrane-anchored protein comprising three parts: (i) a C-terminal domain of three transmembrane helices (residues 390 -412, 433-455, and 465-485), (ii) an N-terminal signal sequence (ALA-30-QE-31), and (iii) a large periplasmic domain (positions 31-375). According to primary structure comparisons, the last domain exhibits the two conserved motifs ( 95 SMS 98 K versus SxxK and 186 YAS versus YxN) that characterize the catalytic center of the MEROPS S12 enzyme family (18) and is therefore designated hereafter as ClbPpep. The S12 family is a heterogeneous group of active site serine enzymes, of which the substrates are peptides or closely related compounds such as ␤-lactams. The activity and biological function of a few of these enzymes have been described, such as class C ␤-lactamases, R61 D-Ala-D-Ala carboxypeptidase B, DmpB aminopeptidase, Pab87 peptidase, D-amino acid amidase, and EstB esterase (19 -24). However, the activity and the function of numerous enzymes of the MEROPS S12 family remain unassigned, notably the FmtA-like proteins (25), which have been previously reported to be related to ClbP (3).
Structure Determination of ClbP Peptidase Domain-To investigate the function and activity of ClbP, the structure of the ClbPpep domain was determined by x-ray diffraction. The structure was refined against diffraction data extending to 2.4 Å resolution. The electron density map (supplemental Fig. 1) showed well defined density throughout most of the structure, with the exception of the six N-terminal residues. Diffuse density indicated disorder in the loop harboring residues 241-249. The stereochemical parameters of the model were satisfactory (supplemental Table 5); a Ramachandran plot showed no residues in disallowed regions of / space. The final model included three ClbPpep monomers and 277 water molecules.
The three-dimensional structure of ClbPpep consists of two structural regions, all-␣ and ␣/␤ (Fig. 1A). The ␣/␤ region is formed by residues 41-112 and 206 -375, and the domain folds as a seven-stranded anti-parallel ␤-sheet (b1, b2, b9, b10, b11, b12, and b13) with six ␣-helices and three ␤-strands (b3, b4, and b7) packed on both faces of the sheet. The all-helical region (residues 113-205) contains four helices and loops. The conserved motifs 95 SMSK and 186 YAS, which correspond to S12 enzyme active site, are located in a large groove, between the two structural domains near the N terminus of the first helix of the all-␣ domain.
As expected, the most pronounced homology using the DALI program was obtained with the structure of S12 family members (supplemental Table 6), especially E. coli AmpC ␤-lactamase and Pab87 peptidase (26). Although their overall fold was clearly similar, there are significant differences affecting the accessibility of the active site from the surrounding sol-vent. In comparison with AmpC ␤-lactamases, the upper part of the catalytic pocket was largely open, because of the absence of two helices (h9 and h10) (Fig. 1, A and E), as observed in Pab87 peptidase and R61 carboxypeptidase (21,27). Moreover, the remaining structural element of ClbP in this area, the loop between residues 295-315, b9, and b10 strands (residues 310 to 311 and 316 -318, respectively), moved away from the catalytic pocket, resulting in a widening of the catalytic groove, which was unusually larger than the known structures of S12 enzymes. In the bottom part of the catalytic pocket, ClbPpep and Pab87 peptidases harbored the additional helix h4Ј (residues 166 -170) in contrast with AmpC enzymes (Fig. 1, A and E). This helix slightly obstructed accessibility to the ClbPpep catalytic pocket. However, the entrance of the binding site was more  OCTOBER 14, 2011 • VOLUME 286 • NUMBER 41 open in ClbPpep than in S12 peptidases, which harbor one, two, or several bulky helices in this area. Of note, between Cys-337 and Cys-367, ClbP harbored a disulfide bond, which links b12 strand (residues 335-341) with the C-terminal helix h11. To our knowledge, this disulfide bond is not observed in other known structures of S12 enzymes. Overall, the ClbPpep structure was closely similar to that of S12 peptidases. However, ClbPpep had noticeable conformational differences in the active site, especially in the two extremities of the catalytic groove, which was unusually large (Fig. 1, B-D). This atypical shape suggests an adaptation of ClbP catalytic pocket to a specific substrate.

Structure and Functionality of ClbP Peptidase
Catalytic Residues and Implications for Catalysis-Striking structural similarities were observed between the active site of ClbPpep, class C ␤-lactamases, and other S12 enzymes such as Pab87 peptidase (21,28,29), in particular, for the motif 95 SxxK and the Tyr-186 residue. The positioning of this last residue and the presence of His-327 make the ClbPpep active site very similar to that of Pab87 peptidase (Fig. 2, A and B). In addition, the location and geometry of the residues Ser-95, Lys-98, and Tyr-186 suggest a direct role in the catalytic mechanism mediated by ClbP (31, 32).
To confirm the importance of these residues in ClbP, substitutions S95A, K98T, and Y186G were introduced by site-directed mutagenesis residues into His-tagged ClbP. The resulting mutants were assessed for their ability to restore the cytopathic activity of the pks-positive E. coli ⌬clbP isogenic mutant. Infected HeLa cells were analyzed for megalocytosis, G 2 /M cell cycle arrest, and phosphorylation of histone H2AX (3). None of these mutants were able to restore the cytopathic activity of the pks-positive E. coli ⌬clbP isogenic mutant in contrast with the wild-type ClbP protein (Fig. 2, C and D). These results confirm the identification of the active site and the importance of ClbP main domain for the functionality of pks island. They also suggest that these enzymes and other S12 serine active-site enzymes share a catalytic mechanism.
Residues Surrounding Catalytic Residues and Substrate Binding-The motif YxN, which is observed in most S12 enzymes, was replaced by the 186 YAS motif in ClbP (Fig. 2B). Ser-188 in ClbP was hydrogen bonded to the Lys-98 residue of motif SxxK (Fig. 2A). In class C ␤-lactamases, the structural analog of Ser-188 is Asn-152, which interacts with both the substrates and Lys of the motif SxxK (19). His-257 of ClbP, which points at the vicinity of Ser-188, may also play a role in substrate binding. Opposite the 186 YAS motif, the catalytic pocket of ClbP exhibits the HGG motif. The alignment of ClbP with class C ␤-lactamases revealed that HGG motif is a structural analog of KTG box (Fig. 2B), which is involved in substrate binding because of Lys and Thr (29). In ␤-lactamases, Lys can be replaced by the closely related residues His or Arg, and Thr may be replaced by Ser. A positively charged side chain followed by one bearing a hydroxyl group has therefore been suggested to be universally conserved. The second residue in ClbP motif is an atypical residue Gly-328 instead of the canonical residues Ser or Thr, which are usually directly involved in substrate recognition via a hydrogen bond (29). However, in the vicinity of Gly-328 (Figs. 1E and 2A), positions 330 and 331 of ClbP harbor the unusual residues Gln and Asn respectively, which point toward the catalytic pocket and may establish hydrogen bonds with substrates.
The residues Glu-159 and Phe-316 of ClbPpep are structural analogs of positions involved in the binding of the N-terminal and C-terminal parts of peptides in S12 amino-and carboxypeptidases, respectively. These positions determine the carboxypeptidase or aminopeptidase activity of S12 peptidases (30). Previous studies suggest that the residues Glu-159 and Phe-316 of ClbP ( Fig. 2A) should favor aminopeptidase activity because the negatively charged residue Glu-159 may accommodate the positively charged N-terminal part of peptides, and the absence of positively charged residue in position 316 may alter the accommodation of the negatively charged C-terminal part of peptides.
Overall, the six positions 159, 188, 257, 316, 328, and 331 can therefore participate in substrate binding in ClbP because the site chain of these residues pointed toward the catalytic pocket ( Fig. 2A). The hydrogen bond donors Ser-188 and His-257 of ClbP-His were replaced by Ala, which is not able to accept or give hydrogen bonds. The substitutions E159K and F316R were introduced into ClbP-His to modify the charges in the critical areas of the catalytic pocket. We introduced the substitution G328S to check whether a small residue is required in the second position of the motif HGG to accommodate ClbP substrate. The substitution C337A was also introduced into ClbP-His to check the importance of the disulfide bond Cys-337-Cys-367 for stability or folding. No substitution altered the trans-complementing activity of these ClbP variants in the pkspositive E. coli ⌬clbP isogenic mutant. ClbP activity may have been only weakly modified by these substitutions. Unfortunately, efficient biochemical tests are not yet available to monitor ClbP activity. However, cellular tests are a good representation of the physiological context, and the results are evidence of substrate tolerance of ClbP.
ClbP Is a Peptidase and NRP-recognizing Protein-As ClbP presents a S12 enzyme scaffold and catalytic residue positioning close to Pab87 peptidase, peptidase activity was investigated using the reporters Gly-, D-Ala-, L-Ala-, and L-Ala-L-Ala-p-nitroanilides. The catalytic efficiency k cat /K m of ClbPpep against these substrates were 0.005, 0.018, 0.020, and 0.040 M Ϫ1 min Ϫ1 , respectively. This peptidase activity is weak in comparison with that of enzymes really adapted to these substrates such as S12 aminopeptidases (30). However, in a ClbPpep mutant harboring the substitution Y186E, there was no activity, which confirms the involvement of the ClbP binding site in peptidase activity.
Imipenem is a NRP-type compound belonging to the chemical group of ␤-lactams, which efficiently bind members of the S12 enzyme family, such as carboxypeptidase and AmpC ␤-lactamases (33,34). ClbPpep incubated with imipenem was analyzed by mass spectrometry (supplemental Fig. 2). Imipenem induced a mass shift consistent with the formation of imipenem acyl-enzyme complex of ClbPpep, as observed for ␤-lactambinding proteins (35). Despite a high concentration of imipenem (ratio imipenem/ClbPpep Ͼ 1/1000), part of ClbPpep exhibited a mass corresponding to the native protein. The results show that ClbP is able to bind NRP-type compounds, has a moderate ␤-lactam-recognizing activity, and is devoid of significant ␤-lactamase activity. The k cat value (Ͻ0.001 M Ϫ1 s Ϫ1 ) was ϳ1 million-fold lower than typical values of AmpC enzymes (36). Consequently ClbP, in contrast to AmpC, was not able to increase the resistance level of E. coli to ␤-lactam antibiotics.
Identification of ClbP Homologs across Prokaryotic Lineages-We therefore screened public protein and genomic databases to identify proteins homologous to ClbP peptidase domain. 77 homologs were obtained including the protein ZmaM, which may be involved in the synthesis of the PK-NRP antibiotic zwittermicin A (ZmA) (37). These proteins only shared 29.3% (ZP_04059476.1) to 42.7% (ZP_04215168.1) identity and all belonged to the MEROPS S12 enzyme family. The amino acid sequences and the predicted topology of both ClbP and the homologs were compared with those of representative members of the S12 family that have been assigned to a biological function and/or a biochemical activity. The resulting phylogenetic tree and the topologies revealed that the proteins clustered in three major branches supported by high bootstrap values (Ն64%) (supplemental Fig. 3). One branch harbored the known representative members of the S12 family, which were subclustered according to their biochemical activity (␤-lactamases, peptidases, esterases, and amidases). The second harbored the FmtA-like proteins (n ϭ 17) of the Staphylococcus species (25) that were phylogenetically closer to ClbP than the other members of the S12 family and that exhibited a predicted topology identical to that of ClbP in contrast with the other S12 family members. The third branch exhibited the longest phylogenic distance, which clearly defined a subgroup containing ClbP and most closely related proteins (n ϭ 60). These proteins, designated ClbP-like, harbored a predicted topology identical to that of ClbP. Some (n ϭ 14), such as ZmaM, possessed in addition a putative half-size ABC-type exporter domain in the C terminus. Except ClbP and its closest homolog (ClbP-like of H. chejuensis) produced by different ␥-proteobacteria, all of the other ClbP-like were identified in taxons of the Firmicute phylum (Bacillus and Clostridium geni), which is the main bacterial source of PK, NRP, and hybrid PK-NRP compounds with the Actinobacteria phylum (38).
ClbP-like Proteins Display Functional Promiscuity-No clues for the presence of megasynthase catalytic unit-encoding genes were revealed in the vicinity of any fmtA-like and 17 clbP-like genes. In contrast, putative NRPS-and PKS-encoding genes were observed in the vicinity of 18 clbP-like genes Structure and Functionality of ClbP Peptidase OCTOBER 14, 2011 • VOLUME 286 • NUMBER 41 (supplemental Fig. 4). Moreover, putative NRPS-encoding genes were present in all of these genomic islands unlike PKSencoding genes, suggesting that the cognate products contain peptide moiety that may be related to the peptidase activity of ClbP-like proteins. To test whether these enzymes are functionally equivalent, we analyzed the ability of six representative ClbP-like proteins (originating from different phyla, different geni, and encoded by genes inserted or not in NRP biosynthesis clusters) and one FmtA-like to xeno-complement the cytopathic activity of the pks-positive E. coli ⌬clbP isogenic mutant. Megalocytosis, G 2 /M cell cycle arrest, and histone H2AX phosphorylation were restored by all ClbP-like proteins, whose gene is located in a NRPS-encoding genomic island (n ϭ 4) or is not (n ϭ 2) (Fig. 3). In contrast, the FmtA-like protein did not restore cytopathic activity despite its full expression (Fig. 3). The efficient xeno-complementations thus support the functional promiscuity of ClbP-like proteins. Moreover, these results are evidence of substrate tolerance of NRP-associated peptidases and confirm data obtained in experiments that have investigated residues of the ClbP catalytic pocket potentially involved in substrate binding.

DISCUSSION
The aim of this study was to identify the structure and function of the ClbP protein, which is essential to the genotoxic activity of the pks island. The crystal structure of the ClbP enzymatic domain together with mutagenesis experiments and biochemical assays revealed the same ␣/␤ fold observed in the members of the MEROPS S12 family and a serine-reactive active site conferring NRP-binding and peptidase activities necessary to colibactin bioactivity. ClbP presented an unusually wide and deep catalytic groove and an active site with original structural features. We extended investigations to ClbP-like proteins, which are the closest related proteins of ClbP, and are clustered on a distinct phylogenetic branch in the MEROPS S12 enzyme family. ClbP and some ClbP-like-encoding genes belong to NRPS-encoding gene clusters. The ClbP-like proteins displayed functional promiscuity and shared a particular cellular localization and topology according to in silico analyses, namely a large N-terminal extracytoplasmic peptidase domain anchored to the inner membrane by three C-terminal transmembrane helices. Hence, ClbP and the closest related ClbPlike proteins form a functional peptidase subgroup of S12 serine enzymes, which is probably involved in NRP or PK-NRP biosynthesis. We propose to designate this subgroup the "NRPassociated peptidases." Bioinformatics and biochemical analyses of the ZmA biosynthesis enzymes performed by Kevany et al. (37) suggest that the compound is initially biosynthesized as part of a large metabo-  that is processed by a peptidase, resulting in the formation of ZmA. The only candidate enzyme encoded by the ZmA biosynthesis gene cluster to catalyze such a peptidase activity is ZmaM, a ClbP-like protein. This observation and our results suggest that NRP-associated peptidases can process NRP or PK-NRP compounds. ClbP encoded by the pks island and the other NRP-associated peptidases may process a procolibactin and other NRP compounds into biologically active molecules by cleaving a nonribosomal peptide bond. The ClbP-like peptidase domain of ZmaM had been suggested to cleave a "pre-ZmA" metabolite between the D-Asn and D-Ser, releasing ZmA and a fatty acyl D-Asn (37). Interestingly, in silico analysis of NRPS substrate specificity showed that all gene clusters producing NRP-associated peptidases encode NRPS able to insert the close related residues Asn and/or Gln into putative NRP compounds, suggesting that these residues may be involved in substrate recognition by NRP-associated peptidases. Of note, ClbP-like proteins were the only proteins encoded by these islands that harbored a predicted translocation signal sequence and a predicted periplasmic catalytic domain, in contrast to other enzymes involved in biosynthesis of PK, NRP, and mixed compounds that are cytoplasmic proteins. Accordingly, NRPassociated peptidases may act outside of the cytoplasm.
Several NRP-associated peptidases harbor in the C terminus a typical half-size ABC exporter domain such as ZmaM (supplemental Figs. 3 and 4). These ABC export systems are involved in the extrusion of noxious substances and the export of extracellular toxins (39). It has already been suggested that these ABC exporters transport NRP compounds (39). ClbP and other NRP-associated peptidases did not harbor such an export domain, but their cognate gene clusters encode a Multidrug and Toxic Compound Extrusion transporter (such as ClbM in the pks island) or other efflux pumps. After the translocation of an inactive NRP compound from cytoplasm to periplasm via efflux pumps, the periplasmic location of the NRP-associated peptidase domain may allow the maturation of the fully active toxin away from a potential cytoplasmic target of its producer cell. NRP-associated peptidases and efflux pumps can be seen therefore as a two-component resistance mechanism against the possible toxic activity generated by the cognate NRPS-encoding gene cluster.
NRP-associated peptidases genetically linked to NRPS gene clusters display a functional promiscuity with those genetically unlinked to NRPS gene clusters. These latter peptidases may also participate in the synthesis of a bioactive NRP compound because we found one to three putative NRP biosynthetic clusters in 15 of 16 genomes possessing a NRP-associated peptidase genetically unlinked to a NRPS gene cluster. It has already been suggested that enzymes required for NRP and PK biosynthesis such as phosphopantetheinyl transferases are encoded in a distinct region of the main biosynthetic gene cluster in bacterial genomes (40).
In conclusion, ClbP is a prototype of a new subfamily of extracytoplasmic serine-reactive peptidases probably acting as maturating enzymes for NRP compounds. The essential role of ClbP in the production of the biological effect make this protein a key target for the control of the bioactivity of the PK-NRPproducing pks gene cluster, which may affect commensalism and/or pathogenicity and constitute a predisposing factor for the development of colorectal cancer (7). The engineering of NRP compounds should take into account these new catalytic units, which could be interesting tools for modifying the structures and bioactivity of NRP and PK-NRP compounds and be a target for identifying bioactive molecules.