Substrate Specificity of the Adenylation Enzyme SgcC1 Involved in the Biosynthesis of the Enediyne Antitumor Antibiotic C-1027*

C-1027 is an enediyne antitumor antibiotic composed of a chromophore with four distinct chemical moieties, including an (S)-3-chloro-4,5-dihydroxy-β-phenylalanine moiety that is derived from l-α-tyrosine. SgcC4, a novel aminomutase requiring no added co-factor that catalyzes the formation of the first intermediate (S)-β-tyrosine and subsequently SgcC1 homologous to adenylation domains of nonribosomal peptide synthetases, was identified as specific for the SgcC4 product and did not recognize any α-amino acids. To definitively establish the substrate for SgcC1, a full kinetic characterization of the enzyme was performed using amino acid-dependent ATP-[32P]PPi exchange assay to monitor amino acid activation and electrospray ionization-Fourier transform mass spectroscopy to follow the loading of the activated β-amino acid substrate to the peptidyl carrier protein SgcC2. The data establish (S)-β-tyrosine as the preferred substrate, although SgcC1 shows promiscuous activity toward aromatic β-amino acids such as β-phenylalanine, 3-chloro-β-tyrosine, and 3-hydroxy-β-tyrosine, but all were <50-fold efficient. A putative active site mutant P571A adjacent to the invariant aspartic acid residue of all α-amino acid-specific adenylation domains known to date was prepared as a preliminary attempt to probe the substrate specificity of SgcC1; however the mutation resulted in a loss of activity with all substrates except (S)-β-tyrosine, which was 142-fold less efficient relative to the wild-type enzyme. In total, SgcC1 is now confirmed to catalyze the second step in the biosynthesis of the (S)-3-chloro-4,5-dihydroxy-β-phenylalanine moiety of C-1027, presenting downstream enzymes with an (S)-β-tyrosyl-S-SgcC2 thioester substrate, and represents the first β-amino acid-specific adenylation enzyme characterized biochemically.

C-1027 is an enediyne antitumor antibiotic isolated from the fermentation broth of Streptomyces globisporus (1,2). It is produced as a chromoprotein complex consisting of a binding protein (annotated as CagA) and the reactive C-1027 chromophore containing a conjugated enediyne harbored within a ninemembered cyclic ring (see Fig. 1). Upon release from CagA, the enediyne core of the C-1027 chromophore readily undergoes a Bergman cycloaromatization to yield a transient biradical species capable of extracting hydrogen atoms from DNA, which in the presence of molecular oxygen can ultimately lead to singleand double-stranded DNA breaks (3). C-1027 and the entire enediyne family share this mode of action, and as a family, their potent cytotoxicity rivals that of any previously discovered natural product (4).
The biosynthetic gene cluster for C-1027 was previously cloned and sequenced, and analysis of the open reading frames (ORFs) 4 provided genetic evidence to adequately propose a mechanism of biosynthesis, transport, resistance, and regulation of C-1027 (5). Comparison of the gene cluster to other members of the enediyne family, including calicheamicin (6) and neocarzinostatin (7), revealed a unified biosynthetic approach among the enediynes and supported a convergent biosynthesis of C-1027 from four components, a deoxy aminosugar, benzoxazolinate, ␤-amino acid, and enediyne core starting from glucose-1-phosphate, chorismic acid, L-␣-tyrosine, and acyl-CoAs, respectively (Fig. 1). The pathway to generating the C-1027 deoxy aminosugar was initially targeted for biochemical analysis, and the first reaction catalyzed by SgcA1 has been characterized as a ␣-D-glucopyranosyl-1-phosphate thymidylyltransferase (8). Of the remaining moieties, only the initial step for the ␤-amino acid moiety (S)-3-chloro-4,5-dihydroxy-␤-phenylalanine has been identified, a novel aminomutase reaction catalyzed by SgcC4 converting L-␣-tyrosine to (S)-␤-tyrosine (9).
Among the 56 ORFs identified within the boundaries of the C-1027 biosynthetic gene cluster, three ORFs exhibit high homology to nonribosomal peptide synthetase (NRPS) domains. NRPSs are typically large modular proteins that catalyze the synthesis of a wide range of biologically active peptides * This work is supported in part by National Institutes of Health Grants GM067725 (to N. L. K.) and CA78747 (to B. S.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1  Within the C-1027 biosynthetic gene cluster, two ORFs upstream and in opposite orientation to sgcC4 were found to encode proteins with sequence similarities to condensation (SgcC5) and PCP (SgcC2) domains ( Fig. 2A). An ORF located adjacent to and downstream of sgcC4 was also identified and encodes an 881-amino-acid protein, SgcC1, having sequence homology to adenylation domains. Although SgcC1 has closest homology to adenylation domains that activate L-␣-tyrosine, preliminary characterization has revealed that SgcC1 recognizes (S)-␤-tyrosine as opposed to any of the standard ␣-amino acids, including either isomer of ␣-tyrosine or ␣-phenylalanine or analogs (11). Although these results support the findings that SgcC4 is the initial enzyme in the pathway (catalyzing the stereospecific formation of (S)-␤-tyrosine from L-␣-tyrosine) and SgcC1 catalyzes the subsequent step (Fig. 2B), they contradicted the prediction on the basis of the so-called "nonribosomal codes" (12,13) that SgcC1 would be an ␣-amino acid-specific adenylation domain (11).
As a follow-up to the initial characterization of SgcC1, we now provide definitive experimental data to assign the substrate specificity for SgcC1. Using amino acid-dependent ATP-[ 32 P]PP i exchange assays and electrospray ionization-Fourier transform mass spectrometry (ESI-FTMS), chlorinated and hydroxylated tyrosine or phenylalanine analogs were tested revealing that (S)-␤-tyrosine is the preferred substrate for SgcC1. The results unambiguously establish that SgcC1, following the SgcC4 aminomutase reaction, catalyzes the second step in the biosynthesis of the (S)-3-chloro-4,5-dihydroxy-␤phenylalanine moiety from L-␣-tyrosine by forming the (S)-␤tyrosyl-S-SgcC2 intermediate. Subsequent steps involve halogenation (SgcC3), hydroxylation (SgcC), and incorporation of the fully modified ␤-amino acid unit into the enediyne core (SgcC5), although the precise timing of the last steps awaits further validation (Fig. 2B). SgcC1 as a naturally occurring, discrete protein could therefore provide a relatively simple platform to address the inherent amino acid specificity-conferring elements of adenylation domains to ␤-amino acids, of which the current nonribosomal codes were based solely on a single structure of the excised, L-␣-phenylalanine-specific adenyla-

The SgcC1 (S)-␤-Tyrosine-specific Adenylation Enzyme
tion domain of GrsA (PheA) (12,13). As an initial attempt to determine the structural elements of SgcC1 necessary for proper ␤-amino acid selection, truncated protein and a single point mutation of SgcC1 were generated and the results reported.

EXPERIMENTAL PROCEDURES
Chemicals and Instrumentation-If not mentioned, chemicals and instruments used were identical to that previously reported (11). Cinnamic acid, p-hydroxycinnamic acid, and 3-amino-3-phenylpropionic acid (␤-phenylalanine) were from Fisher-Acros (Pittsburgh, PA). (R)-3-Amino-(4-hydroxy-phenyl)-propionic acid ((R)-␤-tyrosine) was from PepTech Corporation (Burlington, MA). Restriction enzymes were from New England Biolabs (Beverly, MA) or Invitrogen, and expression vectors were from Novagen (Madison, WI). Unless explicitly stated, the compounds used were racemic mixtures. Protein analysis was performed with position-specific iterated-BLAST and BL2SEQ using the San Diego Supercomputer Center Biology WorkBench, version 3.2, and the adenylation domain active sites were extracted using the NRPS Predictor software from Universität Tübingen.
Synthesis of Racemic 3-Chloro-␤-tyrosine-Synthesis of 3chloro-␤-tyrosine was achieved by following the method reported by Weaver and co-worker (14). 3-Chloro-benzaldehyde (76 mg) was refluxed with 1 equivalent of malonic acid (51 mg) and 2 equivalents of ammonium acetate (76 mg) in ethanol (5 ml) for 2 days. The reaction mixture was adjusted to an approximate pH of 4 with 1 N HCl and separated by a strong acid cation exchange column (Dowex 50WX8). The product, 3-chloro-␤-tyrosine, was eluted with 1% ammonium hydroxide. The residue was dissolved in distilled water after the solvent was removed through evaporation under reduced pressure and further purified on C-18 reverse phase column (Alltima 10 ϫ 250 mm, 5m; Grace Davison Discovery Sciences). 1  Synthesis of Racemic 3-Chloro-5-hydroxy-␤-tyrosine-3-Chloro-5-hydroxy-␤-tyrosine was synthesized by the same procedure as that used for the synthesis of 3-chloro-␤-tyrosine beginning from 3-cloro-4,5-dihydroxybenzaldehyde, which was prepared from 3-chloro-4-hydroxyl-5-methoxybenzaldehyde following the method reported by Perchellet and co-workers (15). 1  DNA Manipulations-Cloning and construction of the sgcC1 (pBS1033) and sgcC2 (pBS1034) expression vectors were previously described (11). A truncated form of SgcC1 was prepared by PCR amplification using pBS1005 (5) as a template and a forward primer of 5Ј-GGGAATTCCATATGGGCGCTCT-GCCGCTGGAC-3Ј (NdeI site underlined) and a reverse primer of 5Ј-GGCAAGCTTGCGGGTGAGCCGGGAGCG-3Ј (HindIII site underlined). The amplified 1629-base-pair fragment was cloned into the same sites of pET29a to yield pBS1037. After confirmation of the DNA sequence fidelity, the NdeI/HindIII fragment was isolated and cloned into pET28a to yield pBS1038. Although the former construct produces SgcC1 as a C-terminal His 6 -tagged protein, the latter construct produces SgcC1 as a C-and N-terminal His 6 -tagged protein, both  OCTOBER 6, 2006 • VOLUME 281 • NUMBER 40 of which have the 338 amino acids at the N terminus of SgcC1 deleted.

The SgcC1 (S)-␤-Tyrosine-specific Adenylation Enzyme
A P571A point mutation of SgcC1 was generated by PCR amplification of the template pBS1033 (11) using the Expand long template PCR system (Roche Applied Science). Reactions were performed using the manufacturer's provided Buffer 2 with 5% Me 2 SO, primers of 5Ј-GTCTCCCCGGAGCACGA-CGCGGCGCTGGCCGAGGTC-3Ј and the reverse complement (with the Ala codon underlined), and a PCR program consisting of an initial hold at 94°C for 2 min followed by 20 cycles of 94°C for 10 s, 56°C for 30 s, and 68°C for 7 min. The template DNA was digested with 10 units of DpnI for 1 h at 37°C followed by heating to 90°C for 5 min and cooling at room temperature before transformation. The introduction of the correct point mutation and the fidelity of the entire gene including 250 bp upstream and downstream were confirmed by DNA sequencing to yield pBS1039.
Amino Acid-dependent ATP-[ 32 P]PP i Exchange Assays-Assessment of the SgcC1 adenylation enzyme substrate specificity was performed as described previously (11). A pH profile for SgcC1 was generated using a three-buffer system of 52/52/100 mM Tris-ACES-ethanolamine as described previously (16). Assays were run at 30°C in 1ϫ buffer with 5 mM MgCl 2 , 0.1 mM EDTA, 5 mM ATP, 0.9 M [ 32 P]PP i , 1 mM (S)-␤-tyrosine, and 10 nM SgcC1 under initial velocity conditions (Ͻ10% isotopic conversion into ATP). Each data point represents a minimum of four duplicate end-point assays. Because the pH profile was utilized only for determining the conditions for optimal activity, the data were fitted to a smooth curve in contrast to a specific rate equation, which would warrant further kinetic analysis at varied pH.
The steady-state kinetic parameters for SgcC1 were determined with activity assays carried out at 30°C in 100 mM Tris-HCl (pH 9.0), 5 mM MgCl 2 , 0.1 mM EDTA, 5 mM ATP, 1.0 M [ 32 P]PP i , and varied co-substrate as follows: 25-1600 M for 3-chloro-␤-tyrosine, 3-hydroxy-␤-tyrosine, and ␤-phenylalanine, 5-370 M for (R)-␤-tyrosine, and 0.5-200 M for (S)-␤tyrosine. Enzyme concentration for each substrate ranged from 25 to 70 nM to maintain initial velocity conditions. Single time points were analyzed between 1 and 5 min; each data point represents a minimum of four duplicates with a S.D. of Ͻ10%. Kinetic constants were extracted by nonlinear regression analysis using Kaleidagraph software (Adelbeck Software, Reading, PA).
Activity of SgcC1 with SgcC2-Preparation of recombinant SgcC1 and SgcC2 proteins, including overproduction in Escherichia coli, purification, and in vitro 4Ј-phosphopantetheinylation of apoSgcC2, were as described previously (11). Acylation with (S)-␤-tyrosine was analyzed as described previously (11). Briefly, a quenched sample was loaded onto a Jupiter C4 column (Phenomenex, La Jolla, CA) to purify and desalt SgcC2 species. Mass spectrometric analysis was performed with a custom 8.5-T ESI-FTMS spectrometer equipped with a front-end quadrupole. The sample was introduced using a Nanomate 100 for automated nanospray (Advion Biosciences, Ithaca, NY) and typically 500 ms of ion accumulation/scan was used. The instrument was externally calibrated using ubiquitin (Sigma).

Organization of NRPS within C-1027 Biosynthetic Gene
Cluster-The gene cluster for C-1027 biosynthesis contains three ORFs with sequence homology to condensation (SgcC5), adenylation (SgcC1), and PCP (SgcC2) domains found in modular NRPSs. The ORFs are located in close proximity, interrupted only by sgcC4, of which the gene product has already been characterized and was identified to catalyze the initial step in the biosynthesis of the ␤-amino acid moiety of C-1027 by forming (S)-␤-tyrosine ( Fig. 2A). Therefore, these three ORFs, which constitute a minimal NRPS module, were hypothesized to process the SgcC4 product to maturation and its subsequent incorporation into the C-1027 chromophore (Fig. 2B).
pH Profile of SgcC1-The optimal pH for SgcC1 activity was analyzed using amino acid-dependent ATP-PP i exchange assay and (S)-␤-tyrosine as the substrate. A three-buffer system was employed to maintain a constant ionic strength (16). SgcC1 was active between pH 5 and 11, and the pH profile exhibited two maxima at pH values 7.5 and 9.0 (Fig.  5). When 100 mM Tris-HCl was substituted, no change in activity was observed (data not shown). Because SgcC4 had a pH optimum of ϳ9.0, further experiments were performed in 100 mM Tris-HCl at pH 9.0.
Single-substrate Kinetics of SgcC1-Kinetic parameters were determined using the five substrates that had significant activity when tested with SgcC1. All of the substrates displayed typical Michaelis-Menten kinetics in the presence of saturating ATP, as exemplified by (S)-or (R)-␤-tyrosine (Fig. 6). The best substrate, (S)-␤-tyrosine, had a k cat of 2.2 Ϯ 0.5 s Ϫ1 and K m of 3.2 Ϯ 0.6 M, whereas the other substrates had a similar but slightly lower k cat and significantly increased K m (Table 1). Overall, SgcC1 catalysis was Ͼ50-fold more efficient with (S)-␤tyrosine than any substrate analog tested, excluding the R-enantiomer, which had a 25-fold lower efficiency.

JOURNAL OF BIOLOGICAL CHEMISTRY 29637
SgcC1 Mutation Analysis-Adenylation domains have been proposed to extend ϳ550 amino acids within an NRPS module (10). Therefore, the wild-type SgcC1 protein (881-amino-acid residues) was truncated by deleting 338 amino acids from its N terminus to encompass only the putative AMP-forming domain (C terminus). Of the constructs prepared, only the dual His 6 -tagged version was slightly soluble, but no activity was observed with this construct.
A P571A mutant of SgcC1 was prepared to examine possible changes in substrate specificity and/or catalysis. This residue neighbors the invariant aspartate residue Asp-570, the carboxylate side chain of which interacts with the ␣-amino group to lock the orientation of the L-␣-amino acid into the substrate binding pocket (Fig. 3). The mutation abolished activity with all substrates except (S)-␤-tyrosine. Kinetic analysis for SgcC1(P571A) yielded a K m of 13 Ϯ 4 M and a k cat of 0.07 Ϯ 0.02 s Ϫ1 , a 142-fold loss in catalytic efficiency when compared with the wild-type SgcC1 (Table 2).

DISCUSSION
Adenylation domains of NRPS catalyze the activation, at the expense of ATP, and subsequent loading of the activated amino acids or carboxylic acids to a PCP domain to form an aminoacyl-S-PCP thioester. SgcC1 has significant sequence homology to numerous adenylation domains of NRPS, and as a result, has been hypothesized to be involved in the formation, activation, and incorporation of the (S)-3-chloro-4,5-dihydroxy-␤phenylalanine moiety into C-1027, although the precise timing of the steps cannot be predicted a priori (Fig. 2). Sequence analysis predicted L-␣-tyrosine as the probable substrate for SgcC1, as deduced from its closest sequence identity to domains that activate L-␣-tyrosine, such as TycC3, NovH, and SimH, and comparisons to the nonribosomal codes of NRPS suggested that L-␣-tyrosine was the most likely substrate for SgcC1 (Fig. 3) (12,13). However, our previous results with SgcC4 preferring L-␣-tyrosine were inconsistent with this proposal (11), and therefore, a full biochemical characterization of SgcC1 was undertaken.
As previously reported, when L-␣-tyrosine was tested as a substrate, SgcC1 was only active in the presence of SgcC4, which converts L-␣-tyrosine to (S)-␤-tyrosine in situ (11). In contrast, authentic (S)-␤-tyrosine, the established product of SgcC4 (9), was recognized and loaded to SgcC2 to form an (S)-␤-aminoacyl-S-SgcC2 thioester intermediate. When a full spectrum of substrates was tested, including substituted ␣and ␤-phenylalanine analogs that could hypothetically be generated during biosynthesis of the (S)-3-chloro-4,5-dihydroxy-␤-phenylalanine moiety of C-1027, only ␤-amino acids were activated, all with relatively high rates under the conditions employed, prompting a thorough kinetic analysis. Although the five aromatic ␤-amino acids examined were all recognized and activated by SgcC1, (S)-␤-tyrosine was clearly the preferred substrate by Ͼ50-fold. The enantiomer (R)-␤-tyrosine is also a substrate for SgcC1 but is 25-fold less efficient, and this nonspecific activity is likely to be further compensated by the stereospecificities of SgcC4 and downstream enzymes such as SgcC5. Interestingly, condensation domains have recently been shown to contribute to substrate specificity by discriminating between isomers (21).
Rational engineering of the adenylation domain specificity without adverse effect on the structural integrity of the NRPS biosynthetic machinery could be viewed as one of the most powerful combinatorial biosynthesis strategies for natural product diversity. Central to this strategy is the development of the general rule, known as nonribosomal codes, for substrate recognition in the adenylation domain of an NRPS, which has provided a molecular basis for engineered biosynthesis of novel peptides (12,13,22). Based on the structure of PheA (23), 10   amino acid residues that line the amino acid binding pocket were identified as the general specificity-conferring code of adenylation domains for NRPSs (Fig. 3). According to this model, the Asp residue at position 235 and the Lys residue at 517 interact with the ␣-amino and the carboxylate groups, respectively, to lock orientation of the L-␣-amino acid upon activation. This configuration projects the side chain of the amino acid into the binding pocket, comprising the 10 amino acid residues within a radius of ϳ5.5 Å, which in the adenylation domain could then be used to predict the nature of the side chain, and thereby, the amino acid specificity. This model is based on the structural conservation of the binding pocket as reflected by the relatively high sequence similarity of adenylation domains and has been supported by predicting the adenylation domain substrate specificity according to the nonribosomal code followed by experimental verification (12,13) and re-engineering to alter adenylation domain specificity followed by isolation of the resultant products with the targeted amino acid incorporated (22). An important feature of NRPSs is their ability to incorporate nonproteinogenic amino acids, such as ␤-amino acids, and their presence in the resultant metabolites contributes critically to their biological activities and therapeutic value. Because the current model is based on the structure of PheA, the prediction model is severely limited to L-␣-amino acids and cannot be used to predict adenylation domain specificity for ␤-amino acids or several nonproteinogenic substrates, given the key interaction between the ␣-amino group and Asp-235, which is fundamental to this model, as this interaction locks the configuration of the amino acid. In ␤-amino acids, this fundamental interaction is no longer applicable, and thus, the predictive model breaks down. The ␤-amino acid specificity of SgcC1 reported here unveiled the limitation of the current model, providing an outstanding opportunity to decipher the nonribosomal codes for ␤-amino acids.
Of the common amino acids, Pro has the uncanny ability to modulate stability, structure, and function of a given protein (24). The presence of Pro generates kinks in ␣-helix structure, and in particular, the torsional angle inherent in Pro is highly coupled to the preceding residue in the case of SgcC1, the invariant Asp residue, the carboxylate side chain of which is critical in developing the current nonribosomal codes for ␣-amino acids (12,13). A survey of L-tyrosine or L-phenylalaninespecific adenylation domains showed that an Ala (or Gly in a couple of cases) occupies this position, and 32 of 1250ϩ adenylation domains (25) contain a Pro (Fig. 3). Of the 32, 17 have known specificity, 5 and in all of these cases, only three substrates are utilized, none of which are proteinogenic ␣-amino acids or utilized to form an amide bond following the classical NRPS paradigm. AcvA homologs activate the ␦-carboxylic acid of L-␣-aminoadipic acid for penicillin biosynthesis (26), MycE homologs activate the ␦-carboxylic acid of L-␣-glutamic acid for microcystin biosynthesis (27), and CepC homologs activate dihydroxyphenylglycine in vancomycin and teicoplanin biosynthesis (28) (Fig. 3). In addition, the enediyne maduropeptin contains a similar aromatic moiety to C-1027 and likewise the gene cluster has the entire set of SgcC-C5 homologs. Its adenylation domain (MdpC1) is proposed to catalyze the identical reaction to SgcC1 or minimally is unable to process ␣-amino acids and also contains a Pro adjacent to the invariant Asp. 6 Assuming a common fold and mechanism for all adenylation domains, which has been observed from structural analysis with three sequence-unrelated enzymes, firefly luciferase (29), PheA (23), and DhbA (30), P571 of SgcC1 was initially hypothesized to aid in repositioning the preceding Asp-570 to appropriately interact with the ␤-amino group, which could be envisaged to produce a similar effect for the other respective adenylation domains. However, tests of the P571A mutant showed that activity was completely abolished, except with (S)-␤-tyrosine, and in this case with a 142-fold drop in catalytic efficiency mostly manifested from a decrease in k cat ( Table 2). Structural analysis of SgcC1, which is currently in progress, will undoubtedly reveal the particular features of SgcC1 that modulate catalysis and specificity for a ␤-amino acid, and subsequent structural comparisons to other adenylation domains will ultimately enable us to expand the nonribosomal codes of NRPS to cover nonstandard amino acids such as ␤-amino acids.
The C-1027 gene cluster features many unusual features, including the occurrence of three ORFs that constitute a minimal NRPS module. We foresee this as an excellent opportunity to study the features inherent in NRPS domains responsible for high fidelity molecular recognition and catalysis while maintaining the native protein structural integrity (i.e. naturally occurring discrete proteins versus artificially excised domains out of the context of the native, giant multidomain NRPS enzymes). The fact that SgcC1 is rather promiscuous for ␤-aromatic amino acids and that genetic engineering has proven that downstream processes are quite forgiving (5, 11) also presents an opportunity to produce novel C-1027 analogs by precursor-directed biosynthesis approach.