Domain Analysis of the Molecular Recognition Features of Aromatic Polyketide Synthase Subunits*

Bacterial aromatic polyketide synthases (PKSs) are a family of homologous multienzyme assemblies that catalyze the biosynthesis of numerous polyfunctional aromatic natural products. In the absence of direct insights into their structures, the use of gene fusions can be a powerful tool for understanding the structural basis for their properties. A series of truncated and hybrid proteins were constructed and analyzed within a family of PKS subunits, designated aromatases/cyclases (ARO/CYCs). When expressed alone, neither the N-terminal nor the C-terminal domain of the actinorhodin (act) or the griseusin (gris) ARO/CYC exhibited substantial aromatase activity. However, in the presence of each other, the half proteins were active. Furthermore, analysis of a set of hybrid proteins derived from the act and gris ARO/CYCs allowed us to localize the chain length dependence of this aromatase activity to their N-terminal domains. Unexpectedly, however, when the C-terminal domain of the gris ARO/CYC was expressed in a context where aromatase activity was absent, it could modulate the chain length specificity of the tetracenomycin (tcm) minimal PKS, leading to the formation of a novel 18-carbon product in addition to the expected 20-carbon one. It was also found that monodomain ARO/CYCs such as tcmN cannot substitute for the the N-terminal domain of didomain ARO/CYCs, even though they exhibit high sequence similarity with the N-terminal domain. Together, these results illustrate the utility of protein engineering approaches for dissecting the structure-function relationships of PKS subunits and for the generation of mutant alleles with novel biosynthetic properties.

have demonstrated that PKSs are structurally and mechanistically related to fatty acid synthases. Both classes of synthases are multifunctional enzymes that catalyze repeated decarboxylative condensations between acylthioesters (usually acetyl, propionyl, malonyl, or methylmalonyl). Unlike typical fatty acid synthases, PKSs introduce structural variability into the product by varying the extent of a reductive cycle comprising of a ketoreduction, dehydration, and enoylreduction on each ␤-keto group of the polyketide chain. Furthermore, PKSs also control chain folding by catalyzing one or more regiospecific cyclizations in the nascent polyketide chain (1)(2)(3)(4)(5)(6)(7).
Bacterial aromatic PKSs (2,8) produce a broad range of polycyclic aromatic natural products such as the carbon chain precursors of doxorubicin and tetracycline. Inspired by the structural diversity and pharmaceutical relevance of these products, the molecular recognition features of PKSs have been targets of intensive analysis and manipulation via genetic engineering. The recent development of a host vector system in Streptomyces coelicolor enabled the efficient construction and expression of recombinant PKSs (9) and provided a means to decipher the function(s) and specificity of each subunit (9 -18). In the process a series of heuristics have been proposed and used to engineer recombinant PKS gene clusters to generate novel polyketides in a predictive manner (17,19). Unfortunately, even with this design capability, our understanding of the mechanisms by which aromatic PKSs control their molecular recognition features is extremely limited and remains hampered by the absence of structural information on these protein complexes. The situation could potentially be ameliorated by taking advantage of the existence of numerous allelic forms of every PKS subunit, each of which shares a high degree of sequence similarity with the others (1). Here we used sequence comparisons to design and study a series of truncated and fusion proteins.
Each aromatic PKS (for example, see Fig. 1) contains a set of three essential subunits referred to as the minimal PKS (15). In addition, most aromatic PKSs also include auxiliary subunits such as a ketoreductase (KR), an aromatase/cyclase (ARO/CYC), and other cyclases. ARO/CYCs are a particularly interesting family of subunits, since they exhibit the following features: (a) They occur in two architectural forms. Members of one subset, hereafter referred to as didomain ARO/CYCs, consist of ϳ300 amino acid residues; whereas members of the other subset, the monodomain ARO/CYCs, consist of ϳ150 residues. The N-terminal halves of didomain ARO/CYCs exhibit a high degree of sequence similarity to the monodomain proteins (20,21) and to a lesser extent to their C-terminal halves (22) (Fig.  3). (b) Didomain ARO/CYCs catalyze the aromatization of the first six-membered carbocyclic ring derived from polyketide * This work was supported by the Stanford-National Institutes of Health Graduate Training Program in Biotechnology (to R. J. X. Z.) and by Grant MCB-9417419 from the National Science Foundation, an National Science Foundation Young Investigator Award, and a David and Lucile Packard Fellowship in Science and Engineering (to C. K.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
In an attempt to gain insights into the structural basis for these multifarious properties of the ARO/CYC family of proteins, truncated and fusion proteins were constructed and analyzed.

EXPERIMENTAL PROCEDURES
Bacterial Strains, Culture Conditions, and DNA Manipulations-DNA manipulations were performed using standard in vitro techniques with Escherichia coli XL1-Blue as a host organism. Expression plasmids were passaged through E. coli ET12567 (dam, dcm hsdS Cm r ) (24) to generate unmethylated DNA for introduction into S. coelicolor CH999 by transformation (9). E. coli strains were grown under standard conditions (25), and S. coelicolor strains were grown on R2YE agar plates (26).
Construction of Genes Encoding Hybrid ARO/CYCs-Plasmids pRZ50, pRZ51, pRZ52, pRZ54, pRZ56, pRZ60, pRZ64, and pRZ67 (see Tables I, II, and III) are derivatives of pSEK21 (15). Plasmids pRZ53, pRZ55, pRZ57, pRZ61, pRZ65, pRZ71, pRZ73, pRZ74, and pRZ77-80 (see Tables I, II, and III) are derivatives of pSEK23 (15). The 5Ј and 3Ј halves of the act and gris ARO/CYC genes were amplified separately via polymerase chain reaction. Based on sequence similarities (Fig. 2), the N-terminal domain for the act ARO/CYC was defined to consist of residues 1-145; whereas the C-terminal domain was defined to consist of residues 146 -317. Likewise, the N-terminal domain of the gris ARO/ CYC consisted of residues 1-148, and the C-terminal domain consisted of residues 149 -320. The monodomain portion of tcmN was defined as residues 1-169, as in an earlier study (21). All segments were amplified with the following designed sequences (5Ј to 3Ј): a PstI restriction site, a ribosome binding site (AGGAGGA), an NdeI restriction site overlapping the start codon, the natural coding sequence, a stop codon, and an EcoRI restriction site. 5Ј segments also included an NdeI restriction site immediately upstream of the stop codon. Thus, each fragment represented a gene by itself. In addition, wild-type or hybrid genes could be constructed by fusing different 5Ј and 3Ј fragments together at the NdeI site. In turn, these genes were inserted into either pSEK21 (which carries the act minimal PKS and the act ketoreductase) or pSEK23 FIG. 1. Proposed biosynthetic pathway catalyzed by the actinorhodin (act) PKS (biosynthetic intermediates are purely hypothetical). After biosynthesis of the full polyketide chain by the "minimal" PKS, which includes a ketosynthase/putative acyl transferase (KS/AT), chain length factor (CLF), and acyl carrier protein (ACP), the nascent octaketide chain undergoes ketoreduction (catalyzed by the KR), aromatization of the first ring (catalyzed by the didomain ARO/ CYC), and a second cyclization (catalyzed by the second ring cyclase (CYC2)). In the absence of some of these subunits, shunt products are produced.

FIG. 2. Function of the didomain ARO/CYCs.
Both the actinorhodin (act) and griseusin (gris) didomain ARO/CYCs are essential for the aromatization of the first ring of polyketides, which have been reduced at the C-9 carbonyl. (which carries the tcm minimal PKS and the act ketoreductase) immediately downstream of the minimal PKS gene set as described previously (15).

Verification of the Didomain Structure of the act and gris
ARO/CYC-To evaluate the function of the N-and C-terminal halves of the didomain ARO/CYCs and to test whether they do indeed fold into separate domains, recombinant gene clusters were constructed in which the putative domains of the act and gris ARO/CYCs were expressed as separate polypeptides in conjunction with either the act (octaketide producer) or tcm (decaketide producer) minimal PKS and the act KR (Table I).
Earlier studies have established that the presence of SEK34 (2) or SEK43 (6) are indicative of the existence of first ring aromatase activity in a 16-or 20-carbon polyketide producing strain respectively (16,17). Neither the N-terminal (CH999/ pRZ50, pRZ77, and pRZ78) nor the C-terminal (CH999/pRZ51, pRZ73, and pRZ74) domains of the act or gris ARO/CYC showed substantial aromatase activity when expressed alone. The extract from CH999/pRZ50 (N-terminal domain of act ARO/CYC) exhibited an HPLC peak characteristic for SEK34, but the product yields were inadequate for NMR analysis. An in vitro assay with 14 C-labeled malonyl CoA of this strain also supported the production of SEK34. 2 (Unexpectedly, the gris C-terminal domain alone led to the formation of a novel product, RZ53 (7) (see below).) However, when the N-and C-terminal domains were co-expressed as separate polypeptides (CH999/pRZ64, pRZ79, and pRZ80), SEK34 (2) was produced by CH999/pRZ64 and pRZ79, whereas SEK43 (6) was produced by CH999/pRZ80, suggesting that the domains can associate productively with themselves and/or the PKS complex. It should be noted that the biosynthesis of comparable amounts of mutactin by CH999/pRZ64 and RM20b by CH999/pRZ80 implies that domain disconnection leads to only partial aromatase activity in these two strains.
Dissecting Chain Length Hierarchy Using Hybrid Didomain ARO/CYCs-As summarized above, didomain ARO/CYCs exhibit an interesting hierarchy with respect to their recognition of polyketide chain length. To dissect the structural basis for this property, a set of recombinant PKS gene clusters containing hybrid genes derived from the act and gris ARO/CYCs was constructed. The results described in Table II show that ARO/ CYC proteins containing the N-terminal domain of the gris subunit (CH999/pRZ54, pRZ55, and pRZ61) catalyzed the aromatization of both octaketide and decaketide substrates. In contrast, those containing the N-terminal domain of the act subunit (CH999/pRZ52, pRZ53, and pRZ71) recognized only octaketide subunits. Thus, chain length recognition appears to reside exclusively within the N-terminal domain of didomain ARO/CYCs.
Replacement of the N-terminal Domain of Didomain ARO/ CYCs with a Monodomain ARO/CYC-As mentioned above, the N-terminal domains of the didomain ARO/CYCs exhibit significant sequence similarity to the monodomain ARO/CYCs, which suggested that the monodomain ARO/CYC may be able to replace the N-terminal domain. To investigate this hypothesis, hybrid proteins where the monodomain ARO/CYC portion of the tcmN protein was inserted in place of the N-terminal domain of the act and gris ARO/CYCs were constructed. As summarized in Table III, (1) is produced by the actinorhodin (act) "minimal" PKS plus KR (27). SEK34 (2) is generated when the act minimal PKS is expressed along with the KR and either the act or griseusin (gris) didomain ARO/CYC (16). RM20 (3), RM20b (4), and RM20c (5) are produced by the minimal tetracenomycin (tcm) PKS plus KR (9,12). (RM20b (4) and RM20c (5) are stereoisomers with opposite configurations at C-7.) SEK43 (6) is produced by the minimal tcm PKS plus KR and gris didomain ARO/CYC (17). RZ53 (7) is produced by the minimal tcm PKS in conjunction with the KR and any hybrid ARO/CYC (other than the natural gris ARO/CYC), which includes either the gris Cterminal domain or the act ARO/CYC.

TABLE I act and gris ARO/CYC domains expressed as separate proteins
Polyketides produced by combinations of the N-and C-terminal domains of the actinorhodin (act) and griseusin (gris) didomain ARO/CYC expressed as separate polypeptides with either the act and tcm minimal PKS and KR proteins. The minimal PKS is comprised of the ketosynthase/putative acyltransferase and chain length factor from the gene cluster indicated and the acyl carrier protein from the act PKS. In the context of the act minimal PKS, mutactin (1) production signifies a nonfunctional ARO/CYC (15); whereas SEK34 (2) production signifies a functional ARO/CYC (16). In the context of the tcm minimal PKS, RM20 (3), b (4), c (5) production signifies a nonfunctional ARO/CYC (12), SEK43 (6) production signifies a functional ARO/CYC (17), and RZ53 (7) production signifies that the ARO/CYC domain(s) influence the chain length of the polyketide backbone. Length-During the course of the above studies it was observed that, although the C-terminal domain of the gris ARO/CYC could not catalyze first ring aromatization by itself, its coexpression with the tcm minimal PKS and the act ketoreductase (CH999/pRZ74) led to the production of a relatively abundant new metabolite that was not formed in the presence of the gris N-terminal domain alone (CH999/pRZ78) or the act N-or C-terminal domains alone (CH999/pRZ50 and pRZ51, respectively). Indeed, production of this metabolite was also observed in strains containing the act/gris and tcm/gris hybrid proteins (CH999/pRZ53 and pRZ65, respectively). Likewise, because this metabolite was also produced by CH999/pRZ80, which contains partial aromatase activity, but not in CH999/pRZ61, which exhibits full aromatase activity, its production is the result of the presence of the gris C-terminal domain in a context, where it is unable to participate in first ring aromatase activity. (Upon more sensitive analysis, this new metabolite was also observed in strains CH999/pRZ71 and pRZ57 (Ͼ1 wt% of total polyketide products); however, its production levels in these strains were more than an order of magnitude lower than the production levels in their counterpart strains, CH999/ pRZ53 and pRZ65 (ϳ10 wt% of total polyketide products), respectively.) The structure of this new metabolite, designated RZ53 (7) (Fig. 4), was solved using a combination of NMR (Table IV), mass spectroscopy, and isotope labeling analysis. The chemical shifts were remarkably similar to those of corresponding atoms in RM20 (3) (9) except for the absence of a methylene and a carbonyl signal. Sodium [1,2-13 C]acetate feeding experiments confirmed that the carbon chain of RZ53 was derived from nine acetate units. The coupling constants calculated from the 13 C NMR spectrum of the enriched RZ53 sample also facilitated peak assignment. High resolution fast atom bombardment gave a molecular weight of 282.0897, which is consistent with C 17 H 14 O 4 MW (282.0892). Thus, RZ53, a novel "unnatural" natural product, is the first known nonaketide to be produced by the tcm PKS.

DISCUSSION
Without any direct insight into their structures, some cognizance into the structural basis of several properties of the didomain ARO/CYC has been acquired via domain truncation and replacement. Our studies demonstrate that the N-terminal domain is responsible for the variability observed in the polyketide chain length recognition by the ARO/CYC. This variability could either arise due to differences in the substrate binding pockets of individual subunits, or it may reflect differential affinities for the minimal PKS subunits themselves. Given the availability of cell-free systems for polyketide biosynthesis (28,29), further in vitro analysis could shed light on this question.
In the course of these studies, we also obtained structural insights into yet another interesting attribute of these enzymes, their ability to influence the chain length specificity of the minimal PKS within a window of one acetate unit. Specifically, we have established that it is the C-terminal domain of the didomain ARO/CYC that is responsible for this trait. This is not the first report of auxilary subunits influencing the chain length specificity of the minimal PKS. Recent studies of the frenolicin (fren) PKS showed that both the act KR and tcmN can modulate the relative distribution of 16-carbon and 18carbon polyketide products (19). A similar phenomenon has also been reported in the case of the whiE PKS (30), which produces 22-carbon and 24-carbon backbones. However, until now it has been assumed that these observations are peculiar to PKSs that exhibit relaxed chain length specificity in nature, because at least the frenolicin producer, S. roseofulvus, is known to produce both 16-and 18-carbon natural products with very similar structures (31,32). In contrast, neither the naturally occurring PKS from S. glaucescens nor any engineered PKS containing the core tcm subunits has yielded an 18-carbon product thus far. Our new findings therefore provide the clearest evidence that certain auxiliary PKS subunits can not only control the folding, reduction, and aromatization of the nascent chain but can also secondarily influence the chain TABLE II act and gris ARO/CYC hybrids Polyketides produced by various combinations of hybrid didomain ARO/CYCs between the actinorhodin (act) and griseusin (gris) didomain ARO/CYCs and either the act or tetracenomycin (tcm) minimal polyketide synthase proteins and act KR. For hybrids combined with the act minimal PKS plus KR, mutactin (1) production indicates a nonfunctional hybrid (15); whereas SEK34 (2) production indicates a functional hybrid (16). For hybrids combined with the tcm minimal PKS plus KR, production of RM20 (3), b (4), c (5) signifies a nonfunctional hybrid (12), production of SEK43 (6) represents a functional hybrid (17), and generation of RZ53 (7) (15), whereas SEK34 (2) production indicates a functional hybrid (16). For hybrids combined with the tcm minimal PKS plus KR, production of RM20 (3), b (4), c (5) signifies a nonfunctional hybrid (12), production of SEK43 (6) represents a functional hybrid (17), and generation of RZ53 (7) signifies that the hybrid influences the chain length of the polyketide.  a Carbons are labeled according to their number in the polyketide backbone (Fig. 4).
length of the final product by plus or minus one acetate unit.
The hypothesized didomain nature of the act and gris ARO/ CYCs (22) was verified by expressing the proposed domains as separate proteins and demonstrating that they can associate productively to manifest the properties attributed to an intact didomain ARO/CYC. However, because neither domain exhibited significant aromatase activity when expressed alone, it appears that both domains must work in concert to aromatize the first ring. Furthermore, fusion proteins between the monodomain ARO/CYC portion of tcmN and the C-terminal domains of the act or the gris ARO/CYC disclosed that although there is good sequence homology between monodomain ARO/ CYCs and N-terminal domains, monodomain ARO/CYCs are incapable of executing the role(s) of the N-terminal domain in the aromatization of the reduced first ring.
In summary, the utility of deletion and domain swapping analysis between different alleles of the aromatic PKS subunits has been demonstrated. In conjunction with in vitro studies, higher resolution gene shuffling should yield further insights into the mechanistic basis for ARO/CYC function and specificity and may even give rise to additional ARO/CYC subunits with novel properties.