Understanding substrate specificity of polyketide synthase modules by generating hybrid multimodular synthases.

Modular polyketide biosynthesis can be harnessed to generate rationally designed complex natural products through bioengineering. A detailed understanding of the features that govern transfer and processing of polyketide biosynthetic intermediates is crucial to successfully engineer new polyketide pathways. Previous studies have shown that substrate stereochemistry and protein-protein interactions between polyketide synthase modules are both important factors in this process. Here we investigated the substrate tolerance of different polyketide modules and assessed the relative importance of inter-module chain transfer versus chain elongation activity of some of these modules. By constructing a variety of hybrid modular polyketide synthase systems and assaying their ability to generate polyketide products, it was determined that the substrate tolerance of each individual ketosynthase domain is an important parameter for the successful recombination of polyketide synthase modules. Surprisingly, however, failure by a module to process a candidate substrate was not due to its inability to bind to it. Rather, it appeared to result from a blockage in carbon-carbon bond formation, suggesting that proper orientation of the initially formed acyl thioester in the ketosynthase active site was important for the enzyme-catalyzed decarboxylative condensation reaction.

Modular polyketide biosynthetic pathways are usually made up of several large polyketide synthase proteins, each containing up to 20 catalytically active domains. All the domains required for the addition and tailoring of a single acetate or propionate unit onto the growing polyketide chain are grouped together to form a module (Fig. 1). Each module must receive the incoming acyl chain from the previous module (onto its ketosynthase domain) and load the correct extender unit (onto the acyl carrier protein domain via the acyltransferase domain). The extender unit is then added onto the growing acyl chain in a ketosynthase-catalyzed reaction, and the product is reductively tailored by optional ketoreductase, dehydratase, and enoylreductase domains. Finally the module must transfer the acyl chain to the next module for further processing (7,8).
The modular arrangement of these biosynthetic pathways facilitates evolutionary change (9), accounting for the large degree of structural and functional variation found in the small molecule polyketide products. Recombination of the order, type, and number of modules in a given biosynthetic pathway presumably maintains the core function of each module while allowing for change in the structure and function of the final products. The adaptive but robust nature of modular PKS suggests that they can be harnessed to generate rationally designed complex polyketides through bioengineering (10 -12).
To rationally design modular PKS pathways it is necessary to understand what features govern processing of the growing polyketide chain. Previous studies (13)(14)(15)(16) have shown that the first step, transfer of the growing polyketide chain, is promoted by linker sequences flanking each module. These inter-modular linkers facilitate chain transfer between adjacent covalently or non-covalently associated modules. The specificity of the ketosynthase (KS) domains for the stereochemistry of the incoming acyl chain (15,(17)(18)(19) is also important in determining module specificity and tolerance.
Understanding of these inter-module and enzyme-substrate interactions has primarily emerged from analysis of the 6-deoxyerythronolide B synthase (DEBS) (15,(17)(18). To test the generality of these observations, we have evaluated the ability of modules from three different PKS pathways to tolerate more complex substrates. We selected a panel of modules from the erythromycin (1, 2), rifamycin (3,4), and picromycin polyketide synthase pathways (20,21) (Fig. 2) and assayed their ability to accept and process di-, tri-, and pentaketides. For a subset of these modules, we also investigated the relative importance of chain transfer versus chain elongation in the processing of a specific radiolabeled substrate. ARC, Inc. (2S,3R)-[1-14 C]2-Methyl-3-hydroxypentanoic acid N-acetylcysteamine thioester ( 14 C-labeled product 4, 55.0 mCi/mmol) was obtained by custom synthesis from Amersham Biosciences. All other chemicals were purchased from Sigma. DNA manipulations were performed in Escherichia coli XL1 Blue (Stratagene) using standard procedures (22). Restriction enzymes were from New England Biolabs. PCRs were carried out using Pfu polymerase (Stratagene) as recommended by the manufacturer.
Engineering of Module Cassettes and Module Nomenclature-By using module boundaries as defined earlier (13), modular (typically BsaBI-SpeI) cassettes were engineered for a selected set of modules including modules 2, 3, 5, and 6 of DEBS (hereafter referred to as eryM2, eryM3, eryM5, and eryM6, respectively), modules 2, 3, and 6 of the picromycin synthase (hereafter referred to as picM3 and picM6, respectively), and modules 2, 5, 7, and 8 of the rifamycin synthetase (hereafter referred to as rifM2, rifM5, rifM7, and rifM8, respectively). The 5Ј end of each cassette could be fused in a modular fashion to either the short N-terminal linker segment preceding eryM3 (M3N) or eryM5 (M5N), or an additional module followed by an intra-polypeptide linker (15). Similarly, the 3Ј end of each cassette could be fused to either the C-terminal linker segment found after eryM2 (M2C) or eryM4 (M4C) (15) or to a thioesterase (TE) domain (eryTE or picTE) capable of catalyzing chain release.
In Vivo Production of Polyketides-E. coli BAP1 (25) was used as host for polyketide production. [1-14 C]Propionic acid feeding was performed as follows. Individual transformants were inoculated into 10 ml of Luria-Bertani (LB) media in the presence of carbenicillin (50 g/ml) and kanamycin (25 g/ml) at 37°C and 250 rpm. Cultures were grown to mid-log phase (A 600 ϭ 0.6 -0.8), cooled on ice for 10 min, and then centrifuged. The cell pellets were resuspended in 1 ml of the remaining supernatant and induced with 100 M isopropyl-␤-D-thiogalactopyranoside, and [1-14 C]propionic acid was added to a final concentration of 0.27 mM. The culture was stirred for an additional 20 h at 22°C. At this point the culture was centrifuged, and 100 l of the supernatant was extracted twice with ethyl acetate (300 l each time). The extract was dried in vacuo and subjected to TLC analysis.
Unlabeled or [1-13 C]propionic acid feeding experiments were performed as follows. A single transformant was used to start a 5-ml LB culture with carbenicillin (100 g/ml), and kanamycin (50 g/ml) at 37°C and 250 rpm. The starter culture was used to inoculate 200 ml of LB media at the same antibiotic concentrations as above. These culture were grown at 250 rpm and 37°C to mid-log phase (A 600 ϭ 0.6 -0.8), cooled to 4°C for 10 min, and induced with 100 M isopropyl-␤-Dthiogalactopyranoside. Unlabeled or [1-13 C]propionic acid was added to a final concentration of 250 mg/liter, and the cultures were incubated at 22°C for 20 h. The sample was then centrifuged and the supernatant extracted twice with 600 ml of ethyl acetate. The sample was dried in vacuo, resuspended in CDCl 3 , analyzed via high pressure liquid chromatography (Beckman System Gold, C 18 reverse-phase column, 4.6 mm ϫ 25 cm; Beckman Coulter, CH 3 CN/H 2 O gradient), and analyzed by atmospheric pressure chemical ionization-or electrospray ionization-mass spectrometry (Thermo-Finnigan LCQ).
In Vitro Production of Individual Modules-Individual modules fused with a TE domain were expressed and purified as described previously (15). In vitro analysis was performed in a solution of 400 mM NaH 2 PO 4 , 1 mM EDTA, 2.5 mM DTT, 5 mM NaCl, 20% glycerol, and 1.5% Me 2 SO, pH 7.2, in the presence of 1 M protein, 20 mM diketide (product 4), 800 M [2-14 C]methylmalonyl-CoA (or [2-14 C]malonyl-CoA), and 4 mM NADPH. After 1 h, the reactions were quenched by the addition of ethyl acetate and vortexing. Extraction twice with ethyl acetate removed the triketide lactone product from the aqueous layer. Triketide product was detected using radio-TLC and compared with authentic reference sample. In vitro reactions were also carried out as described above except without NADPH, and the ␤-keto products were compared with authentic ␤-keto triketide reference samples.
Detection and Characterization of Polyketide Products-Polyketides were considered to be produced if 14 C-labeling experiments produced a detectable spot at the correct R f on a radio-TLC assay. The lower limit of sensitivity for this assay is 0.1 pmol of product for the in vitro experiments. In the in vivo experiments, polyketides were considered detectable if quantification of the correct spot showed it to be Ͼ1% of the intensity of the product from DEBS1 and eryM3ϩTE (Fig. 3B, product 6). The polyketide products 5, 6, 8, and 11 were characterized by LC/MS and matched with authentic samples from previous studies (17,26,27). Both 13 C-labeled and unlabeled polyketide products 9 and 12 were characterized by LC/MS. Administration of sodium [1-13 C]propionate gave product 9 labeled at C-1, C-3, C-5, and C-7 consistent with incorporation of four intact propionate units and gave product 12 labeled at C-3 and C-7 consistent with incorporation of two intact propionate units and decarboxylated product.
Protein Labeling Experiments-Protein labeling experiments were performed by preincubating individual proteins (20 g) for 5 min at room temperature with or without 5 mM cerulenin. Thereafter 14 Clabeled product 4 was added, and the sample was incubated at room temperature for an additional 15 min. The reactions were terminated by the addition of SDS-PAGE sample buffer (5 l), and proteins were separated by SDS-PAGE (4 -15% Bio-Rad). The gel was stained by Coomassie Brilliant Blue, and the labeling pattern was visualized by exposure to a PhosphorImager (Packard Instrument Co.) for 24 h.

RESULTS
In Vitro Production of Triketide Lactone-We tested the ability of various KS domains to accept and process a simple diketide substrate in vitro (4, Fig. 3A). Modules were fused to a thioesterase domain and a hexa-His tag and expressed in E. coli under the control of the T7 promoter. Purification by affinity chromatography afforded the individual proteins high purity. The individual proteins were then incubated in vitro with the N-acetylcysteamine thioester of the diketide (product 4), 14 C-labeled methylmalonyl-CoA (or malonyl-CoA), and NADPH and assayed for the production of radiolabeled chain elongation products. Because of the availability of authentic standards of the expected triketide products, we were able to determine with certainty in a radio-TLC based assay whether the expected compounds were produced. The high sensitivity of the radio-TLC assay also allowed us to probe for very low levels of product formation (0.1 pmol, which corresponds to a product concentration of 5 nM under typical assay conditions). We found that four modules from DEBS (eryM2, eryM3, eryM5, and eryM6), one module from the rifamycin synthetase (rifM5), and one module from the picromycin synthase (picM6) could elongate 4 as evidenced by a detectable spot at the appropriate retention time on the radio-TLC image. Our assays also showed that eryM4, rifM2, rifM7, rifM8, and picM3 could not elongate 4 since there was no detectable spot on the radio-TLC image at the appropriate retention time for the ␤-keto or ␤-hydroxy triketide lactone products.
Dissecting Barriers to Chain Elongation by Individual Modules-We investigated the ability of selected individual modules from the above experiment to accept substrate (product 4) and a methylmalonyl extender unit. To determine whether the substrate was accepted by the protein, 1-14 C-labeled product 4 was incubated in separate experiments with eryM5 and eryM6 (which catalyze elongation of 4), and rifM7 and picM3 (which do not catalyze elongation of 4). To verify that the substrate was acylating the correct active site of these multifunctional proteins, the KS domain, incubations were performed in the presence and absence of cerulenin, a selective KS inhibitor (28). SDS-PAGE autoradiography (Fig. 4, A and B) revealed that the purified modules were acylated and that cerulenin inhibited acylation. Similar observations were also made with other modules that do not catalyze elongation of 4 (data not shown). Separate incubation of the above purified proteins with either [ 14 C]methylmalonyl-CoA or [ 14 C]malonyl-CoA (29) showed that the proteins were selectively acylated by methylmalonyl-CoA, the predicted extender unit (data not shown).
In Vivo Tolerance of Modules for Triketide Substrates-We also tested the ability of various KS domains to accept and FIG. 3. Three systems test the substrate selectivity of various KS domain for a di-, tri-, and pentaketide. Various modules (module X) were assayed with three different substrates (4, 7, and 10). All of the substrates were intermediates in the biosynthesis of erythromycin (see Fig. 2A). Six products are expected in A, five in B, and only one in C based on the native activity of the modules (see Fig. 2). Only the experimentally observed products are shown. PKS proteins DEBS1 and DEBS2 were used to generate substrates 7 and 10 in situ and in vivo (B and C). Assays were performed using 14 C-labeled extender units, and product formation was monitored by radio-TLC analysis. Products 5, 6, 8, and 11 were characterized by comparison to authentic standards. Compound 9 was characterized by LC/MS analysis since an authentic standard was not available. For modules tested and results see Table I. process a triketide substrate (7, Fig. 3B) by developing a twoplasmid in vivo expression system. Because the synthetic triketide N-acetylcysteamine thioester spontaneously cyclizes, we constructed a system that generated the triketide intermediate in situ. By using the expression construct for the protein DEBS1, we were able to generate the desired triketide as an acyl-enzyme intermediate (product 7). We then created a set of plasmids with a complementary resistance marker to the above construct. This set of nine plasmids contained individual modules fused to a TE domain (Fig. 3B). To facilitate the transfer of the acyl intermediate between the two proteins, the native C terminus of DEBS1 and the N terminus of DEBS2 were incorporated into our two-plasmid system. By co-expressing the DEBS1 expression plasmid with plasmid containing either eryM3 or picM6, we observed by radio-TLC and LC/MS the expected tetraketide product 8 which matched authentic sample from previous studies (26). For the reaction of DEBS1 with eryM6 and rifM5, the expected tetraketide product 9 was isolated and characterized by radio-TLC and LC-MS. When [1-13 C]propionate was fed to these systems, the mass of the product increased as expected by four m/z units, confirming the tetraketide character of the compound. When eryM5, rifM2, rifM7, rifM8, and picM3 were assayed, by co-expression with DEBS1 and [ 14 C]propionate feeding, the expected products could not be detected by radio-TLC analysis (Ͻ1% as active as ery3).
In Vivo Tolerance of Modules for Pentaketide Substrates-We further extended our in vivo system by testing individual modules toward pentaketide substrates (10, Fig. 3C). Since pentaketide substrates present a significant challenge to synthesize chemically, we developed an in situ method for their production. We used a construct that expresses the first two proteins from erythromycin biosynthesis, DEBS1 and DEBS2. This system produced the pentaketide as acyl-enzyme intermediate (product 10). We then constructed a small library of plasmids containing our modules of interest fused to a TE domain. To facilitate the transfer of the pentaketide intermediate between proteins, we added the N-terminal linker sequence from DEBS3 to the N terminus of our plasmid library (Fig. 3). We selected the modules eryM5, eryM6, and rifM5 to study with this system because we had authentic references of the product. Moreover, the above studies showed that eryM6 and rifM5 were relatively tolerant toward unnatural substrates, whereas eryM5 was the natural recipient of the pentaketide product of DEBS1 and DEBS2. When each individual member of our library was co-expressed with DEBS1 and DEBS2, we were able to detect the formation of compound 11 by comparison to an authentic reference on radio-TLC. We were also able to confirm product formation by mass spectrometry.
In Vivo Tolerance of Modules eryM3 and picM3 for Alternative Triketides-We tested the ability of two different KS domains from eryM3 and picM3 to accept and process two different triketide substrates 7 or 12 ( Fig. 3B and Fig. 5). This experiment was based on the two-plasmid system described under "In Vivo Tolerance of Modules for Triketide Substrates." From the above work we had three of the four needed expression vectors. The remaining plasmid was generated from DEBS1 by replacing eryM2 with picM2. Expression of the vector containing eryM2 produced compound 7 as an acyl enzyme intermediate, and expression of the vector containing picM2 generated 12 as an acyl enzyme intermediate. When eryM3 was co-expressed with the acyl-enzyme intermediate 7 we were able to detect the formation of the expected product by both radio-TLC and LC/MS analysis (Table I, entry 2B). However, when eryM3 was co-expressed with intermediate 12, none of the expected product could be detected (Fig. 5). We were also able to detect the expected product when picM3 was co-expressed with intermediate 12 (Fig. 5) but not with intermediate 7 (Table I, entry 10B). Thus eryM3 and picM3 appeared to have orthogonal recognition for intermediates 7 and 12, respectively.
Role of Linkers in Engineered Multimodular PKSs-We evaluated the importance of having appropriately matching C-and N-terminal linker sequences for the generation of polyketide products. These linker sequences are important for selective intermolecular transfer of the growing polyketide chain to the next protein (13,14). By using selected constructs we evaluated the relative contributions of matched and mismatched linker sequences. When DEBS1 was co-expressed with eryM6ϩeryTE fused to the M5N linker (Fig. 3B), the titer of product 5 was substantially (ϳ10 times) reduced relative to the analogous construct in which eryM6ϩeryTE was fused to the M3N linker. In contrast, when co-expressed with DEBS1 and DEBS2, substantially greater quantities of hexaketide product 11 were obtained from the former eryM6ϩeryTE construct relative to the latter construct.

DISCUSSION
In this study we have investigated the tolerance of individual modules from erythromycin, picromycin, and rifamycin PKSs for substrates of increasing structural complexity. Each substrate examined was a biosynthetic intermediate in the 6-deoxyerythronolide B pathway (Fig. 3, 4, 7, and 10). We identified trends in the substrate specificity of individual modules based on the substitution and stereochemistry at the ␣ and ␤ positions. These results provide the first comparative glimpse into relative substrate tolerance of diverse modules, setting the stage for the rational design of hybrid multimodular PKS.  accept and process three substrates The module X column shows which modules were tested. Columns A-C refer to the different systems shown in Fig. 3. A plus sign indicates successful production of polyketide as determined by ratio-TLC, and the minus sign indicates no product detected. Modules not tested are left blank.

Entry
Module X Product observed (Fig. 3) PicM6 ϩ ϩ Our initial experiments assayed the ability of various purified modules to accept and elongate a diketide substrate (4) in vitro. Whereas some modules successfully generated the desired triketide products, others did not. The recombinant proteins were not misfolded or inactivated during purification since all of them were able to discriminate between malonyland methylmalonyl-CoA in labeling studies. We therefore interpret the absence of product to indicate intolerance of these modules to either accept or elongate substrate 4.
Substrate selectivity can be achieved through three different mechanisms: (i) a barrier against chain transfer to the KS domain, (ii) inability of the KS to catalyze decarboxylative chain elongation, or (iii) inability of the TE domain to release the product of chain elongation. Since the TE domain has been shown to release equivalent reduced or unreduced products in control experiments with other modules, we can rule out blockage at this step of the catalytic cycle. Unexpectedly, our labeling studies have shown that modules incapable of producing triketide products can be acylated by diketide (4) at the active site cysteine of their KS domains. Thus, their failure to process the substrate lies not in the chain transfer step but in the condensation reaction (29,30). We speculate that either the electrophilic acyl chain binds in a different binding pocket during the acylation step as compared with the condensation step, or more likely that the requirements for accurate positioning of the electrophile are more stringent during catalysis of the decarboxylative condensation reaction as opposed to the inter-module acyl transfer reaction.
Comparison the ability of differing modules to accept and process the three substrates studied (Table I) provides us with some insight into the substrate selectivity of these different modules. Based on these results, we can propose a tentative relationship between the native substrate a module processes and the range of non-native substrates that it will accept and elongate. We have studied previously the substrate tolerance of individual modules from erythromycin biosynthesis for different diketide substrates (15,(17)(18)(19). These studies found that the relative stereochemistry of the native substrate at the ␣ and ␤ positions needs to be conserved for the substrate to be processed. We identified a similar trend. Most of our substrates contained syn ␣-methyl-␤-hydroxy substituents. These substrates were uniformly accepted by modules whose native substrates contain syn ␣-methyl-␤-hydroxy substituents, regardless of their absolute stereochemical configuration (Table I, entries 1A, 2A and B, 5A-C, and 11A and B). Also, modules whose native substrate possessed an anti stereochemical arrangement did not accept the syn substrates used in our study (Table I, entries 9A and B). There was, however, an exception to this trend. rifM7, whose native substrate contains syn ␣-methyl-␤-hydroxy substituents would not process any of the substrates we tested (Table I, entries 8A and B). We also observed that modules whose native substrates contain no ␤-substituent and an ␣-methyl group were highly likely to accept and process our substrates (Table I,  In addition to stereochemical preferences, our results suggest that maintaining the same level of hybridization as the native substrate is important for the KS domain to accept and process non-native substrates. Modules whose native substrates contain sp 2 -hybridized carbons (␤-keto or ␣,␤-unsaturated) would not process our sp 3 -hybridized substrates (entries 3A and 10A and B). To test this hypothesis further, we examined two tetraketide-forming systems. In these systems triketide substrates with sp 2 -(␣,␤-unsaturated) and sp 3 (syn ␣-methyl-␤-hydroxy)-hybridized centers were assayed to determine whether they could be accepted and elongated by eryM3 or picM3. As we anticipated, eryM3, whose native substrate is sp 3 -hybridized, could not accept and process the sp 2 -hybridized substrate but could accept and process the sp 3 -hybridized substrate. Similarly, picM3, whose native substrate is sp 2 -hybridized, could not accept and process the sp 3 -hybridized substrate; however, it could accept and process the sp 2 -hybridized substrate. Thus the inability of eryM3 to process the sp 2 -hybridized substrate and picM3 to process the sp 3 -hybridized substrate could be the result of substrate selectivity exerted by the KS domain. These results are also consistent with the work of Reynolds and co-workers (31), where a triketide, rather than the expected 12-and 14-membered macrolides, was produced when PikA1 was replaced by DEBS1.
An important aspect of designing hybrid modular polyketide systems is ensuring the correct protein-protein interactions such that the growing acyl chain is transferred and processed to provide the correct final product. Acyl chain transfer has been shown to be facilitated by the C-and N-terminal regions of these multifunctional proteins (13)(14)(15)(16). The C-terminal peptide sequence of one PKS subunit binds to the N-terminal region of the next subunit in the multienzyme complex (32). In the experiments presented here, we use C and N termini that are known to interact and successfully create functional hybrid multienzyme PKS systems. When C-and N-terminal regions that do not interact are used, a substantial decrease in the efficiency of the hybrid PKS systems is observed. These results reinforce the importance of maintaining protein-protein interactions for optimal performance of hybrid PKS systems. However, the growing acyl chain can still be transferred to the next PKS protein, even in systems where the C-and N-terminal regions do not interact. This observation further supports our hypothesis that the substrate specificity of the KS domains is the critical determinate for generating functional hybrid PKS systems.
In conclusion, combinatorial shuffling of PKS modules to generate diverse polyketide products requires not only an understanding of the inter-module acyl chain transfer requirements but also an understanding of the intrinsic substrate specificity of the individual modules themselves. Understanding the mechanistic and structural basis of substrate specificity will allow us to engineer broad substrate tolerance and simplify module shuffling based on biosynthetic engineering.