The Two-step Biosynthesis of Cyclic Peptides from Linear Precursors in a Member of the Plant Family Caryophyllaceae Involves Cyclization by a Serine Protease-like Enzyme*

Background: In the Caryophyllaceae, cyclic peptides (CP) are biosynthesized from linear precursors via an unknown pathway. Results: Two protease-like enzymes are involved in precursor processing. Conclusion: A serine protease-like enzyme was recruited for the cyclization step in CP biosynthesis. Significance: This represents a very significant advance in our understanding of the mode and evolution of CP biosynthesis in plants. Caryophyllaceae-type cyclic peptides (CPs) of 5–12 proteinogenic amino acids occur in 10 plant families. In Saponaria vaccaria (Caryophyllaceae), they have been shown to be formed from linear peptide precursors derived from ribosomal translation. There is also evidence for such precursors in other members of the Caryophyllaceae, Rutaceae, and Linaceae families. The biosynthesis of CP in the developing seeds of S. vaccaria was investigated with respect to the enzymes involved in precursor processing. Through biochemical assays with seed extracts and synthetic peptides, an enzyme named oligopeptidase 1 (OLP1) was found that catalyzes the cleavage of intermediates at the N terminus of the incipient CP. A second enzyme, peptide cyclase 1 (PCY1), which was separated chromatographically from OLP1, was found to act on the product of OLP1, giving rise to a cyclic peptide and concomitant removal of a C-terminal flanking sequence. PCY1 was partially purified, and using the methods of proteomics, a full-length cDNA clone encoding an enzyme matching the properties of PCY1 was obtained. The substrate specificity of purified recombinant PCY1, believed to be the first cloned plant enzyme whose function is peptide cyclization, was tested with synthetic peptides. The results are discussed in the light of CP biosynthetic systems of other organisms.

Recently, the name orbitide has been suggested for the Caryophyllaceae-type CP (9). Many CP are known to have interesting biological activities (4,10). For example, segetalin F has been reported to have vasorelaxant activity (11). CP can be formed with a variety of chemical linkages beyond the peptide bond. In plants, however, CP tend to be homodetic, in the sense that at least one ring of amino acids involves peptide bond linkages exclusively. There is also a tendency in plants for CP to include only proteinogenic amino acids, or simple derivatives of them. The four major classes of CP in plants include the Caryophyllaceae (orbitide), kalata, PawS, and knottin types (9). These differ in typical size and degree of disulfide bridging, for example.
Given the physiological roles and technological and pharmaceutical interest in CP (12)(13)(14), understanding the natural and engineered processes for CP production is currently an active area of research. For many CP in eubacteria and fungi, their biosynthesis involves the linkage and modification of amino acids and other compounds with the help of nonribosomal peptide synthetases (15,16). More recently, it has become clear that in cyanobacteria, fungi, and especially in plants, CP biosynthesis can occur via the processing of ribosomally produced precursors (17)(18)(19)(20)(21)(22)(23)(24)(25)(26). Indeed, for the case of plant CP, in essentially all cases for which there is sufficient evidence, ribosome-derived precursors are involved in CP biosynthesis. Also, there are examples from cyanobacteria and fungi in which CP are formed from ribosome-derived precursors (24,26).
The Caryophyllaceae-type CP occur in members of the Caryophyllaceae, Linaceae, Rutaceae, and seven other plant families. They typically consist of 5-12 proteinogenic amino acids in a single ring formed from peptide bonds. In some cases, nonproteinogenic amino acids are included in the structure. Within the Caryophyllaceae, Saponaria vaccaria (syn. Vaccaria hispanica) has been investigated, and eight CPs called segetalins (A to H) have been isolated (10,27). More recently, evidence for two additional segetalins (J and K) has been reported (17).
It appears that these segetalins are biosynthesized via precursors of ϳ35 amino acids that have conserved N-and C-terminal flanking regions and a variable region representing the sequence of the mature segetalin (see Fig. 1) (17). Expression of cDNAs encoding the presegetalins in transformed roots of S. vaccaria gives rise to CP. In this paper, we describe efforts to understand the process by which precursors of Caryophyllaceae-type CPs, such as presegetalins, are processed to CP. In an important advance in our understanding of CP biosynthesis, a combination of partial enzyme purification, mass spectrometry, and bioinformatics has led to the elucidation of a biosynthetic process involving two peptidase-like enzymes and the molecular cloning and preliminary characterization of the enzyme that catalyzes the cyclization step.

EXPERIMENTAL PROCEDURES
Chemicals-Presegetalin A1 (purity Ն 75%; see Table 1) and presegetalin A1 [14,32] (purity Ͼ 75%) were chemically synthesized at the Sheldon Biotechnology Centre of McGill University (Montreal, Canada). 3 The presegetalin A1 was further purified by HPLC fractionation with a C18 column using a water/acetonitrile gradient (with trifluoroacetic acid as a modifier). All other linear peptides (purity, Ͼ90%; see Table 1) were chemically synthesized at Bio Basic Canada Inc. (Markham, Canada). Segetalin A isolated from S. vaccaria seed by the method of Morita et al. (28) was obtained from John Balsevich (National Research Council of Canada, Saskatoon, Canada).
Plant Material and RNA Isolation-Seeds of S. vaccaria White Beauty were obtained from CN Seeds Ltd. (Pymoor, UK). The plants were grown under greenhouse conditions with a daily regime of 12 h of light and 12 h of dark at 23°C. S. vaccaria developing seeds at stages 1 and 2 were harvested according to the following scheme: stage 1, seed white, pod green; stage 2, seed tan, pod green; stage 3, seed copper, pod partially dessicated; and stage 4, seed dark brown, pod dessicated.
For total RNA isolation from S. vaccaria developing seeds, the protocol of Gambino et al. (29) was modified. For the rapid cetyltriethylammonium bromide-based procedure, 0.6 ml of extraction buffer containing 2% cetyltriethylammonium bromide, 2.5% polyvinylpyrrolidone (M r ϭ 40,000), 2 M NaCl, 100 mM Tris-HCl, pH 8.0, 25 mM EDTA, and 2% of ␤-mercaptoethanol (added just before use) was heated to 65°C in a microcentrifuge tube. 150 mg of developing seeds were ground in liquid nitrogen and added to the extraction buffer, and the tube was incubated at 65°C for 10 min. The sample was extracted two times with chloroform/isoamyl alcohol (24:1, v/v), and 0.25 volumes of 3 M LiCl was added. The mixture was kept on ice for 30 min and centrifuged at 20,000 ϫ g for 20 min at 4°C. The pellet was dissolved in 0.5 ml of SSTE buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, 1% SDS, 1 M NaCl) and extracted with 0.5 ml of chloroform/isoamyl alcohol (24:1, v/v), followed by precipitation with isopropanol.
cDNA Sequencing-A collection of S. vaccaria developing seed ESTs based on Roche 454 sequencing technology was developed as follows. First strand cDNA was synthesized from S. vaccaria Stage 1 developing seed total RNA using the Omniscript reverse transcription kit (Qiagen) according to the man-ufacturer's instructions. ESTs were generated from cDNA prepared from the isolated RNA using Roche Applied Science GS-FLX Titanium Technology at the McGill and Genome Quebec Innovation Centre (Montreal, Canada) according to the manufacturer's instructions. Within the MAGPIE software system (30), the sequences were assembled using MIRA (31), and contigs were annotated based on BLASTX searches of GenBank TM .
Protein Assay-For enzyme assays and purification, the protein was measured using a modified micro BCA protein assay (Pierce). 10-l samples were mixed with 100 l of BCA working reagent and incubated at 60°C for 30 min, after which optical density at 562 nm was recorded.
Assay of Recombinant PCY1-Unless otherwise noted, the activity of recombinant PCY1 was determined in an optimized assay as follows. The assay contained 20 mM Tris (pH 8.5), 100 mM NaCl, 5 mM DTT, 0.2 mg BSA, and 15 g/ml of presegetalin A1 [14,32] or other synthetic peptides. The reaction was initiated by the addition of 0.3 g of recombinant PCY1, in a total reaction volume of 100 l. The assay was incubated at 30°C for up to 1 h and stopped by the addition of 0.9 ml of methanol. The samples were centrifuged, and the supernatants were evaporated and resuspended in 50% (v/v) methanol in water. The samples were then analyzed by ion trap LC/MS (see below). Controls were performed with either omission of enzyme or stopping the reaction at 0 h. Where necessary, 0-h peak areas were subtract from 1-h peak areas for the linear product.
Enzyme Purification-Eight grams of stage 2 developing seeds from S. vaccaria (var. White Beauty) were homogenized manually with a plastic pestle in 1.5 ml of low protein-binding microcentrifuge tubes. The seeds were ground for 2 min in 20 aliquots of 500 l of 20 mM Tris buffer (pH 8) on ice followed by centrifugation at 13,000 ϫ g for 10 min. The supernatants were removed, and additional 250-l aliquots of buffer were added, and the grinding and centrifugation was repeated. The supernatant fractions were pooled to a total volume of 17 ml and filtered (0.2 m to 25 mm cellulose acetate membrane; VWR International, Mississauga, Canada). The resulting crude filtrate was used for enzyme assays and chromatographic purification.
Three separate applications of 5 ml each of the crude filtrate (see above) were made to an anion exchange column (Mono Q 10/100; GE Healthcare) connected to an Agilent 1100 series HPLC equipped with an auto injector, diode array detector, and fraction collector cooled to 4°C. The column was held at 4°C and pre-equilibrated with 20 mM Tris-HCl (pH 8.0). The column was eluted with 180 ml of a linear gradient of NaCl (0 -1 M) in 20 mM Tris-HCl (pH 8.0) at a flow rate of 3 ml/min. 6-ml fractions were collected, desalted with PD-10 columns (GE Healthcare), and then concentrated to ϳ200 l in Amicon Ultra centrifugal filters (30-kDa cutoff; Millipore, Bellerica, MA). These fractions were assayed for both OLP1 and PCY1 activities. Fractions containing PCY1 activity were combined in a loading buffer containing 3 M ammonium sulfate in 20 mM Tris-HCl (pH 8.0) and applied to a hydrophobic interaction perfusion chromatography column (PerSeptive POROS 20 HP2; Bio-Rad) pre-equilibrated with 3 M ammonium sulfate in 20 mM Tris-HCl (pH 8.0). The column was eluted with a 60-ml decreasing linear gradient of ammonium sulfate (3-0 M in 20 mM Tris-HCl, pH 8.0) at a flow rate of 4 ml/min. 4-ml fractions were collected, desalted with PD-10 columns, and then concentrated to ϳ200 l in Amicon Ultra centrifugal filters and assayed for PCY1 activity. Fractions containing PCY1 activity were combined and concentrated to 100 l and applied to a Superose TM 6 10/300 gel filtration column (GE Healthcare) that had been pre-equilibrated with 20 mM Tris-HCl, 100 mM NaCl (pH 8.0). The proteins were eluted with 20 mM Tris-HCl, 100 mM NaCl (pH 8.0) over 30 ml at a flow rate of 0.2 ml/min. 1-ml fractions were collected and concentrated to ϳ200 l with Amicon Ultracel-10K membrane centrifugal filter units (Millipore) and assayed for PCY1 activity.
Ion Trap LC/MS-For ion trap LC/MS analysis of enzyme assays, an Agilent 6320 ion trap LC/MS system was used under default Smart Parameter settings. The analyzer and ion optics were adjusted to achieve optimal resolution (Agilent installation guide G2440-90105) using the ESI Tuning Mix (Agilent installation guide G2431A). The mass spectrometer was scanned in the m/z range of 50 -2200 at 8100 mass units/s with an expected peak width of Յ 0.35 mass units. For automated MS/MS, the trap isolation width was 4 atomic mass units. The associated Agilent 1200 LC was fitted with a Zorbax 300 EXTEND-C18 column (150 ϫ 2.1 mm, 3.5-m particle size) maintained at 35°C. The binary solvent system consisted of 90:10 (v/v) water/acetonitrile containing 0.1% formic acid and 0.1% ammonium formate (solvent A) and 10:90 (v/v) water/ acetonitrile containing 0.1% formic acid and 0.1% ammonium formate (solvent B). The separation gradient was 90:10 A/B to 50:50 A/B in 3 ml over 20 min. The detection of segetalin A in assay samples is described previously (17).
MALDI-TOF MS-MALDI-TOF MS of enzyme assays was performed on samples that were purified by adsorption onto and elution from C18 Empore high performance disk material (3M, Minneapolis, MN) using the "stage tip" method (32). Stage tips were prepared by removing the beveled tip from a 20-gauge syringe needle with a tubing cutter. Empore disk material was then cut, cookie cutter style, with this needle and packed into the narrow end of a 10-l disposable pipette tip with a piece of fused silica tubing. Methanol (10 l) was applied to the tip and expelled slowly with a 1.25-ml syringe. Aqueous TFA (0.1%) was then passed through the tip, followed by assay sample (20 l). The disk material was washed with 20 l of 0.1% TFA, and the peptides were then eluted with 20 l of acetonitrile:aqueous 0.1% TFA (1:1). Analysis of the peptides was carried out using an AB Sciex 4800 MALDI TOF-TOF TM analyzer (Applied Biosystems, LLC., Frederick, MD). The mass spectrometer was operated in positive ion reflectron mode scanning m/z values from 500 to 4000. The default calibration was updated with a standard mixture of peptides containing des-Arg 1 bradykinin (m/z 904.468), Gu 1 fibrinopeptide B (m/z 1570.677), and three adrenocorticotropic hormone fragments corresponding to amino acids 1-17 (m/z 2093.087), 18 -39 (m/z 2465.199), and 7-38 (m/z 3657.929). All of the samples and calibrants (0.5 l) were mixed on the MALDI plate with the matrix ␣-cyano-4hydroxycinnamic acid (0.5 l). The data were collected and averaged from 800 laser irradiation events. Monoisotopic mass lists were generated with Data Explorer (Applied Biosystems) and copied into the BioLynx program in MassLynx 4.0 (Waters). Matches to subsequences of presegetalin A1 were investigated using the Find Mass program with an allowed mass deviation of 0.5 Da. Masses within 0.2 Da were considered to be matching.
Quadrupole Time of Flight LC/MS-Quadrupole time-offlight (Q-TOF) LC/MS analysis of tryptic peptides was performed using a Q-TOF Global Ultima mass spectrometer (Micromass, Manchester, UK) equipped with a nano-electrospray (ESI) source and a nanoACQUITY UPLC solvent delivery system (Waters, Milford, MA). The mobile phase was composed from a binary solvent system of C, 0.2% formic acid and 3% acetonitrile and D, 0.2% formic acid and 95% acetonitrile. The peptides were desalted with an in-line solid phase trap column (180 m ϫ 20 mm) packed with 5 m resin (Symmetry C18; Waters) and separated on a capillary column (100 m ϫ 100 mm; Waters) packed with BEH130 C18 resin (1.7 m; Waters) using a column temperature of 35°C. 5-l samples were introduced into the trap column at a flow rate of 15 l/min for 3 min, using C:D 99:1, and flow was diverted to waste. After desalting, the flow was routed through the trap column to the analytical column with a linear gradient of 1-10% solvent D (400 nl/min, 16 min), followed by a linear gradient of 10 -45% solvent D (400 nl/min, 30 min). Unless otherwise stated, Q-TOF parameter settings consisted of a capillary voltage of 3,850 V, a cone voltage of 120 V, and a source temperature of 80°C.
The samples were analyzed using data-dependent acquisition, which consisted of the detection of multiply charged positive ions (z ϭ 2-4) from an MS survey scan. The scan range was from m/z values of 400 -1900, with a scan time of 1 s. Up to three MS/MS scans were triggered (collision energy ranged from 20 to 80 eV, depending on charge state and precursor m/z) from each MS scan event with a peak detection window of 4 m/z units (signal intensity threshold was 16 counts/s). In MS/MS experiments, the data were acquired in continuum mode with a scan time of 1.9 s, and dynamic exclusion of previously detected precursors was set at 2 min. Peptide signals corresponding to trypsin and keratin were also excluded from MS/MS data collection. To obtain high mass accuracy, the reference compound leucine enkephalin (80 nM in 1:1 acetonitrile:0.2% aqueous formic acid; Environmental Resource Associates, Arvada, CO; m/z ϭ 556.2771) was continuously introduced to a second ESI source and used for the mass calibration.
The data were processed with ProteinLynx Global Server 2.4 (Waters) using RAW files from Q-TOF LC/MS. The resulting PKL files were submitted to MASCOT (version 2.3.02; Matrix Science Ltd., London, UK) for peptide searches against the NCBI nr database (version 011110) hosted by the National Research Council of Canada and local databases containing the sequence information from Sanger (33) and Roche 454 (this work) sequencing of S. vaccaria developing seed cDNA. MAS-COT search parameters allowed for a maximum of 1 miscleavage for tryptic digestion and a mass tolerance for precursor peptide ions of Ϯ50 ppm and for fragment ions up to Ϯ0.2 Da. Carbamidomethylation of cysteine was selected as a fixed modification, and oxidation of methionine was used as a variable modification.
Isolation of a Full-length Pcy1 cDNA Clone from S. vaccaria-Fractions that catalyzed the formation of segetalin A from presegetalin A1 [14,32] from the various stages of chromatography were mixed 1:1 with SDS-PAGE Laemmli sample buffer (200 mM Tris-HCl, pH 6.8, 4% SDS, 0.2% bromphenol blue, 200 mM dithiothreitol, 40% glycerol) and heated at 99°C for 5 min. The samples were subjected to SDS-PAGE under denaturing conditions using electrophoresis buffer (25 mM Tris-HCl, pH 7.5, 250 mM glycine, 0.1% SDS) and a 10% Ready Gel precast polyacrylamide mini-gel (Bio-Rad). Precision Plus Protein TM molecular weight standards (Bio-Rad) were loaded on the same gel. The gel was stained with Oriole TM fluorescent gel stain (Bio-Rad) for 15 h. The protein bands were visualized by UV illumination, and the most prominent bands, corresponding to the last chromatographic step, were excised from the gel and digested with trypsin using the MassPrep II proteomics work station (Micromass UK Ltd.) following a procedure described previously (34). Tryptic peptides were analyzed by Q-TOF LC/MS, and the resulting data were used for MASCOT searches of sequence databases (see above).
A DNA plasmid clone containing the full-length open reading frame of contig c272 (from Roche 454 sequencing; NCBI accession SRX202186), identified by MASCOT searches as representing a peptide cyclase candidate cDNA, was obtained as follows. First strand cDNA was synthesized from S. vaccaria developing seed stage 1 total RNA using the Omniscript reverse transcription kit (Qiagen) according to the manufacturer's instructions. The contig c272 ORF was amplified by PCR (denaturation at 95°C for 4 min; 35 cycles of 95°C for 20 s, annealing at 54°C for 30 s, and extension at 72 o for 2.3 min; followed by 10 min at 72°C) using forward (ATG GCG ACT TCA GGA TTC TCG) and reverse (TCA GTC TAT CCA AGG AGC TTC AAG C) oligonucleotide primers and Platinum Taq DNA polymerase high fidelity (Invitrogen). The resulting PCR product was gel-purified and recombined with pCR8/ GW/TOPO using a TA Cloning kit (Invitrogen). After transformation of ONE SHOT TOP 10 competent Escherichia coli cells (Invitrogen) using spectinomycin for selection, a plasmid clone (named pCB006) containing the contig c272 ORF was identified by colony PCR and DNA sequencing. The corresponding gene was named Pcy1. To allow E. coli expression of a N-terminal His 6 -tagged version of PCY1, the PCY1 ORF of pCB006 was recombined with the Gateway expression vector pDEST17 (Invitrogen) and used to transform competent E. coli BL21-AI TM . The resulting plasmid and E. coli strain were called pCB008 and pCB008/BL21-AI, respectively.
E. coli Expression and Purification of PCY1-An overnight 1-ml LB culture of pCB008/BL21-AI was used to inoculate 100 ml of overnight autoinduction medium (35) that was incubated at 37°C with shaking until an A 600 of 0.4 was reached. Arabinose was then added to the a concentration of 0.2% (w/v), and culture growth was continued at 16°C with agitation overnight. The cultures were centrifuged at 2,000 ϫ g at 4°C for 10 min, and the resulting cell pellets were frozen at Ϫ20°C. The pellets were resuspended in chilled 500 l of B-Per bacterial protein extraction reagent (Pierce) and then transferred to two 1.5-ml Eppendorf tubes for cell lysis at room temperature for 20 min. Lysis was promoted with ultrasonication. The lysates were then centrifuged (12,000 ϫ g, 4°C, 8 min), and the supernatants were mixed with equal volumes of resin binding equilibration/wash Buffer and added to 250 l of HisPur TM cobalt resin (Pierce) for purification of PCY1 by batch adsorption and elution at 150 and 300 mM imidazole, respectively, according to the manufacturer's recommendations. Each eluate was concentrated to 150 l and desalted by spin dialysis (Amicon Ultra-15 devices; Millipore) following the manufacturer's protocol and subsequently assayed for peptide cyclase activity. The purity of recombinant PCY1 was judged by SDS-PAGE to be ϳ90%.

Detection of Separate Oligopeptidase and Peptide Cyclase
Activities Involved in Cyclic Peptide Biosynthesis-In previous work (17), it was shown that extracts of the developing seeds of S. vaccaria were capable of catalyzing the conversion of the 32-amino acid linear precursor presegetalin A1 to the six-membered CP, segetalin A (Fig. 1, overall reaction). In a further effort to understand the enzyme or enzymes involved in cyclic peptide formation in the Caryophyllaceae, S. vaccaria seed extracts were subjected to separation by ion exchange chromatography. The products of assays (see "Experimental Procedures," Fig. 2, and Ref. 17) using presegetalin A1 as a substrate were then analyzed by MALDI-TOF MS (see "Experimental Procedures" and Fig. 3). Assay of fractions eluting at ϳ0.35 M NaCl (Fig. 2) showed significant loss of substrate and formation of prominent MS peaks corresponding to peptide masses of 1434. 8 and 1984.0, which in turn correspond to linear peptides with the sequences MSPILAH-DVVKPQ and GVPVWAFQAKDVENASAPV, respectively (Fig.  3). Daughter ion analysis of the peaks was consistent with the above cleavage products (supplemental Fig. S1). This suggests that cleavage of the QG peptide bond in presegetalin A1 is an important reaction in the biosynthesis of segetalin A. The formation of MSPILAHDVVKPQ and GVPVWAFQAKDVENA-SAPV from presegetalin A1 was confirmed by ion trap LC/MS (Fig. 4, a and b). Thus, the data are consistent with a peptide with the sequence GVPVWAFQAKDVENASAPV (called presegetalin A1 [14,32]) being an intermediate in segetalin A biosynthesis. Consequently, the pathway from presegetalin A1 to segetalin A shown in Fig. 1 is hypothesized. Presegetalin A1 is suggested to be cleaved initially after position 13, giving rise to presegetalin A1 [1,13] and presegetalin A1 [14,32]. The latter is then processed, giving rise to segetalin A.
Given the above results, fractions obtained from ion exchange chromatography were tested with synthetic presegetalin A1 [14,32], and indeed, specific fractions were found to catalyze the formation of segetalin A, as well as the linear form of segetalin A (Figs. 1, 2, and 4, c and d). This activity was distinguishable from that giving rise to the initial cleavage of presegetalin A1 to presegetalin A1 [14,32], being highest in fractions eluting at ϳ0.2 M NaCl (Fig. 2). This indicates that presegetalin A1 [14,32] is a true intermediate formed during CP processing and that at least two enzymes are required for CP processing: one for initial cleavage of presegetalin A1 to form the intermediate presegetalin A1 [14,32] (named oligopeptidase 1) and a second enzyme for further processing and cyclization of the intermediate to form segetalin A (named peptide cyclase 1; Fig. 1).
Partial Purification and Cloning of Peptide Cyclase 1 (PCY1)-With a view toward cloning and characterization of the peptide cyclase PCY1, further purification of ion exchange fractions was pursued using hydrophobic interaction and size exclusion chromatography. Fig. 5 shows the results of SDS-PAGE analysis  [14,32] formation) and PCY1 activity (c; segetalin A formation from presegetalin A1 [14,32]; see "Experimental Procedures"). of the active fractions from each stage of the purification. After the final purification step, bands from SDS-PAGE gels were excised and treated with trypsin. The resulting peptides were each analyzed by Q-TOF LC/MS to give peptide mass and fragment ion information. These data were used in MASCOT searches of the predicted tryptic peptides from two EST databases derived from S. vaccaria Sanger (36) (GenBank TM accession numbers LIBEST_028081 and LIBEST_028082 and Roche 454 sequencing (this work; NCBI Short Read Archive accession SRX202186)). When the MS data from an excised gel band corresponding to a M r of ϳ83,000 were used in searches, matches were found to two Sanger sequencing-derived ESTs (SVAR04NG UP 023 E08 and SVAR03NG UP 028 E03) and a set of contiguous cDNAs sequences obtained from Roche 454 sequencing called contig c272 (from the SVASD1PC 454 EST collection). The search yielded 13 unique peptide matches with the predicted amino acid sequence of contig c272 (supplemental Table S1). Given the similarity of the predicted amino acid sequence of c272 to serine proteases (see below) and the well documented propensity for such proteases to catalyze transpeptidation and cyclization reactions (37-39), the correspond-ing gene product was investigated as a peptide cyclase. The gene corresponding to c272 was named Pcy1. A full-length cDNA clone was obtained for Pcy1 (see "Experimental Procedures"), which encodes a polypeptide of 725 amino acids with a predicted M r of 82,400 (GenBank TM accession number KC588970).
Sequence Analysis and Homology Modeling of S. vaccaria PCY1-A BLASTP search of GenBank TM with the predicted amino acid sequence of Pcy1 revealed the greatest sequence identity with members of the enterase lipase superfamily (COG1505) (40). In particular, PCY1 shows the highest amino acid sequence identity to predicted gene products from Vitis vinifera (GenBank TM accession number CAN70125) and Populus trichocarpa (GenBank TM accession number XP_002306966; see supplemental Fig. S2 for alignment). Further sequence analysis strongly suggests placement of PCY1 within the S9A family of serine peptidases. This family includes porcine (Sus scrofa) muscle prolyl oligopeptidase (POP; Protein Data Bank code 1QFS; GenBank TM accession number AAA31110), which shows 49% sequence identity to S. vaccaria PCY1 and for which a crystal structure has been determined in the presence of a covalently bound inhibitor (41). The structure of porcine muscle POP was used as a template for the construction of a homology model of PCY1 by MODELLER (version 9.10; Fig. 6) (42). The structural alignment of PCY1 and porcine muscle POP showed a root mean square deviation of C␣ positions of 0.26 Å. Given the excellent agreement of the model with the structure   of porcine muscle POP, PCY1 appears to possess two domains homologous to those of porcine muscle POP: a catalytic ␣/␤ hydrolase domain and an unusual ␤-propeller domain. Also, the model is consistent with the identity of Ser 562 , Asp 653 , and His 695 (PCY1 numbering) as members of a serine-peptidaselike catalytic triad, which shows excellent three-dimensional alignment with those of porcine muscle POP (Fig. 6). The putative catalytic triad faces the ␤-propeller domain. Ser 562 occurs within the sequence GGSNGG, which is conserved among ␥/␤ hydrolases and likely forms a "nucleophilic elbow" (43). Despite the alignment of catalytic amino acids, there is a notable single amino acid insertion/deletion near His 695 (supplemental Fig. S2).
Recombinant PCY1 Shows Peptide Cyclase Activity-Given that serine peptidases have a well documented propensity to catalyze transpeptidation and cyclization reactions (37,38) and the similarity of PCY1 to oligopeptidases, recombinant PCY1 was investigated as a candidate for involvement in CP biosynthesis in S. vaccaria. For functional characterization of PCY1, the recombinant enzyme was purified from E. coli cells harboring the plasmid pCB008, which consists of the Pcy1 ORF fused with an N-terminal His 6 tag. The activity of immobilized metal ion affinity chromatography-purified PCY1 was assayed with presegetalin A1 [14,32] using ion trap LC/MS. As did partially purified plant extracts, purified recombinant PCY1 showed the formation of segetalin A and smaller amounts of linear segetalin A in the presence of presegetalin A1 [14,32] (Fig. 4, c-e). Evidence for the formation of the C-terminal fragment presegetalin A1 [20,32] was also found by ion trap LC/MS/MS (data not shown). The broad pH optimum of PCY1 was centered between 8.5 and 9.0 (supplemental Fig. S3). Optimal production of segetalin A was dependent on the presence of DTT (supplemental Fig. S4). On the other hand, the production of linear segetalin A was significant in the absence of DTT. The estimated turnover number with presegetalin A1 [14,32] is notably low at ϳ1 h Ϫ1 .
Substrate Specificity of PCY1-A variety of synthetic linear peptides were tested as substrates for PCY1. First, putative native peptide cyclase substrates from S. vaccaria and Dianthus caryophyllus (carnation) were tested (Table 1). Evidence for the expected CP formation was found for the putative A class precursors for segetalins B, D, G, H, and L; for the F class precursors of segetalins F and J; and for the putative D. caryophyllus precursor of cyclo(GPIPFYG). No evidence for CP production from the putative D. caryophyllus precursor GYKDCCVQAKDLENAAVPV (peptide 11) was found. Thus, PCY1 appears to be able to catalyze the production of a number of native S. vaccaria CP. On the other hand, the data also suggest important differences among peptide cyclases within the Caryophyllaceae. Interestingly, a P20Q substitution in presegetalin D1 [14,31] led to formation of an alternate truncated product, suggesting that position 20 may be important in determining the amino acid positions to be included in the CP product.
Variants of presegetalin A1 [14,32] differing in both the C-terminal region and that of the incipient cyclic peptide were tested as substrates with recombinant PCY1. Variants representing C-terminal deletions of 1, 2, 4, 8, 12 and 13 (linear segetalin A) amino acids did not give rise to detectable levels of segetalin A or linear segetalin A in the presence of PCY1 (sup-plemental Table S1). This suggests that the majority of the C-terminal amino acid sequence of presegetalin A1 [14,32] is important for segetalin A production. This was investigated further in what is essentially an alanine scanning study of the substrate specificity of PCY1 with respect to the C terminus (substitution with Val was made for two positions, which are Ala in the wild type substrate; supplemental Table S1 and Fig.  7). All except three of the variants show a very significant drop in production of segetalin A when assayed with PCY1. On the other hand, a conservative substitution of Val with Ala at position 25 resulted in a more than 2-fold enhancement of the production of both cyclic and linear segetalin A. The substitutions E26A and S29A both had relatively little effect on both cyclic and linear product formation. With these results in hand, deletion of the amino acids at positions 25 and 26 in presegetalin A1 [14,32] (peptide 18 in Table 1) was tested and found to allow the formation of segetalin A at a rate ϳ170-fold lower than for the wild type substrate.
Alanine scanning was also performed for the sequence of the incipient cyclic peptide in presegetalin A1 [14,32] (Ala 19 was substituted with Val). Given the low turnover number of PCY1 and the difficulties in obtaining multiple cyclic peptide standards, the qualitative ability of PCY1 to form linear and cyclic products was simply scored based on the presence of relevant mass spectrometric signals (supplemental Table S2). In all but the case of the A19V substitution, it was possible to detect such signals, suggesting that such substitutions are tolerated, at least qualitatively, in the incipient CP sequence.
Given the pharmaceutical (44) and nanotechnological (45,46) applications for cyclic peptides, it was of interest to further test the range of possible CP, which could be produced by PCY1. Two presegetalin A1 [14,32] variants representing insertions of A and AAA in the incipient CP sequence were tested ( Table 1). Both of these variants were found to act as substrates. Also, a "D-amino acid scan" of the segetalin A sequence was performed by testing variant presegetalin A1 [14,32] with single D-amino acid substitutions. These were tolerated by the PCY1 at all chiral positions except for the C-terminal Ala (Table 1). Given this, the single and double substitutions were made in presegetalin A1 [14,32] to test the production of cyclo(Gly-Val-Pro-Val-D-Ala-Ala), cyclo(Gly-Val-D-Pro-Val-D-Ala-Ala), and cyclo(Gly-Val-D-Ala-Val-D-Ala-Ala). Indeed, all three products were detectable by MS of enzyme assay samples (data not shown). This is particularly significant in that even-numbered CP with alternating D-amino acids are known to form nanotubes (45).

DISCUSSION
Prior to 2011, very little was known about the biosynthesis of Caryophyllaceae-type CPs. There had been a preliminary report that the linear form of heterophyllin B could undergo cyclization in the presence of plant extracts (47); however, this does not appear to have been followed up. Thus, it was still unclear whether ribosomes were involved in Caryophyllaceaetype CP. In a previous report (17), we provided evidence for the existence in the Caryophyllaceae and Rutaceae of genes encoding CP precursors, which are formed by translation on ribosomes and processed to shorter CP. In this paper, we have The synthetic linear peptides indicated were used in 1-h assays with PCY1 (see "Experimental Procedures"). CP and linear peptide production was determined by ion trap LC/MS and confirmed by ion trap LC/MS/MS (see supplemental Table S2). Expected CP sequences are underlined. Variant amino acids are indicated in reverse type. D. caryophyllus precursor sequences are derived from GenBank TM accession numbers AW697819 and CF259478 (17). Quantitation of segetalin A and linear segetalin A production from selected substrates is shown in Fig. 7. Lowercase letters denote D-amino acids.
obtained biochemical evidence allowing the elaboration of the process by which CP precursors are converted to CP in S. vaccaria. This appears to be a two-step process involving two separate enzymes. The first enzyme, which we have called OLP1, is essentially an oligopeptidase that recognizes and cleaves presegetalin A1 at the KPQ-GVP sequence corresponding to the junction between the N-terminal region and the incipient CP sequence. Further work is required to clone and characterize OLP1.
The second enzyme involved in CP precursor processing in the S. vaccaria is PCY1, which we have cloned and characterized. Briefly, PCY1 is a serine protease-like enzyme that recognizes the highly conserved C-terminal region of a number of CP intermediates and catalyzes the excision and cyclization of the incipient CP sequence.
It is interesting, particularly from an evolutionary point of view, to compare and contrast CP biosynthesis in the Caryophyllaceae, with other organisms. Within plants, the biosynthesis of the other three classes of CP, the kalata, PawS, and cyclic knottin types, show some common features. All appear to be biosynthesized from ribosome-derived precursors, which differ greatly in arrangement and sequence. In many cases where evidence is available for ribosome-derived biosynthesis of CP in plants, the general pattern appears to be cleavage at the N terminus of the incipient CP sequence follow by cleavage and cyclization (2,20,48). For the kalata, PawS, and cyclic knottins types, the precursor sequences generally share certain features including Gly and Asx at the N and C termini, respectively, of the incipient CP sequence (48). Thus, the evidence suggests an asparaginyl endopeptidase is responsible for the final cyclization step for these CP types. In S. vaccaria, there is also a tendency for Gly at the N terminus of the incipient CP sequence. This appears to extend to carnation (17). However, the so-called F class precursors in S. vaccaria do not follow this rule, having the rather bulky Phe at that position. In flax (L. usitatissimum), recently reported multidomain precursors of Caryophyllaceae-type CP show a strong tendency for Met at the N terminus of the incipient CP sequence (49). At the C-terminal position of incipient CP sequences of S. vaccaria, Citrus spp., and L. usitatissimum (17,49), the amino acid is not strictly conserved. Consequently, asparaginyl endopeptidases are not implicated in the processing of Caryophyllaceae-type CP precursors.
Outside of plants, both ribosome-dependent and ribosomeindependent modes of CP biosynthesis are prevalent (5,26). In mammals, cyclic defensins appear to be produced by the ligation of two linear peptides (8). In cyanobacteria and fungi, there are examples of CP produced from ribosome-derived precursors. Among the fungi, Amanita mushrooms produce precursors to amanita toxins. These have a conserved Pro upstream of the C terminus of the incipient CP and at the N terminus of it. Although a prolyl oligopeptidase has been implicated in the processing of the precursors, the mechanism of cyclization remains unclear (26). For cyanobacteria, the biosynthesis of patellamides and related CP have been well studied (19,50). Indeed, the biosynthetic pathway seems to be very similar to the biosynthesis of segetalins in S. vaccaria. Precursors containing one or more CP sequences are flanked by conserved regions. A protease cleaves at the N terminus of the incipient CP sequences, and a separate protease-like enzyme cyclizes the peptide with removal of C-terminal sequence. For Prochloron spp., the latter enzyme is encoded by patG (51) and is notably in the S8 family of serine proteases and not closely related to PCY1. It is notable that the products of patG and its homologues and that of Pcy1 are the only cloned enzymes whose main function appears to be peptide cyclization. Thus, among a wide range of taxa, there are some commonalities in the mode of biosynthesis of ribosome-derived CP. For systems in which the pathway is reasonably well characterized, it appears to be common for a precursor with N-and C-terminal flanking regions to be first processed to remove the N terminus. This is followed by a combined cleavage and cyclization reaction catalyzed by a acyl-intermediate-forming (cysteine or serine type) protease homologue. This is consistent with the known transpeptidation reactions, which cysteine and serine proteases are capable of catalyzing (37,38). This certainly suggests a pattern of convergent evolution in which different proteases were recruited in different lineages, giving rise to diverse CP. Further work is required to determine whether the evolution of the cyclases in question was minimal relative to the ancestral proteases or whether there was significant specialization involved.
The comparison of PCY1 with porcine muscle POP offers some insights into the structure and function of PCY1. Porcine muscle POP is a serine oligopeptidase with an unusual ␤-propeller domain. This domain is thought to limit access of larger polypeptides to the active site, and this is consistent with porcine muscle POP acting on oligopeptide substrates, Indeed, an engineered disulfide "latch" on the ␤-propeller lid abolishes even oligopeptidase activity (52). This is consistent with the notion that PCY1 also acts on oligopeptides as shown in this paper, although further work is required to determine the limits of its substrate size. The close agreement of the PCY1 with the porcine muscle POP structure suggests that the structural differences that affect the function of the two enzymes may be quite subtle. On the other hand, the amino acid insertion/deletion near the active site histidine may prove to be important. Clarification of this will likely require a high resolution experimental structure to understand. This is in contrast with the cyanobacterial patG homologues whose products have distinctive "capping helices" (reminiscent of the ␤-propeller lid), which are not found in related proteases (53,54).
As discussed above, the identity of PCY1 as a serine protease homologue is relevant to the mechanism of the cyclization reaction. Serine proteases are known to catalyze proteolysis in a multistep reaction, which includes the early formation of an acylserine intermediate formed by nucleophilic attack of the serine at the carbonyl of the scissile peptide bond, with release of the C terminus of the substrate. This is typically followed by hydrolysis of the acyl intermediate, freeing the N terminus of the substrate and regenerating the unmodified form of the enzyme. However, in some enzymes or under some conditions, aminolysis of the acylserine intermediate can compete with hydrolysis. Where the amine in question is at the N terminus of a separate peptide, transpeptidation results. If the amine is at the distal end of the acyl group of the intermediate, then cyclization results. Furthermore, Berkers et al. (38) point out that "if protein complexation or conformation positions the amine-donating component involved in transpeptidation in close proximity to the acyl ester component, this will facilitate its nucleophilic attack on the acyl-enzyme intermediate and hence favor aminolysis." Given this last point, it is difficult to imagine how this might occur for segetalin formation. The acyl putative intermediates expected for the reaction represented in Table 1 are short but quite variable in sequence and length. On the other hand, the C terminus of the PCY1 substrates is quite highly conserved, and amino acid substitutions can have a strong negative effect on cyclization. In principle, the C terminus of the substrate could be released from the enzyme prior to cyclization, the data presented suggest the possibility that the C terminus remains bound to the enzyme even after acyl intermediate formation and somehow directs the aminolysis (cyclization). Further work is required to investigate this. It is notable, however, that Agarwal et al. (54), propose a similar retention of the C-terminal fragment for patG homologues.
Although the cloning and characterization of PCY1 offers insights into the biosynthesis of CP in plants, the enzyme may also have applications in biotechnology. There is considerable interest in cyclic peptide drugs and vaccines (for example, see Refs. 55 and 56). Biotechnological production of CP allows both the generation of CP libraries for screening as well as the production of individual CP for testing or production. The relative relaxed substrate specificity of PCY1 may be useful in this regard, and further investigation of the applications of PCY1 and related enzymes is warranted.