An Active Enzyme Constructed from a 9-Amino Acid Alphabet*

Nature employs a set of 20 amino acids to produce a repertoire of protein structures endowed with sophisticated functions. Here, we combined design and selection to create an enzyme composed entirely from a set of only 9 amino acids that can rescue auxotrophic cells lacking chorismate mutase. The simplified protein captures key structural features of its natural counterpart but appears to be somewhat less stable and more flexible. The potential of a dramatically reduced amino acid alphabet to produce an active catalyst supports the notion that primordial enzymes may have possessed low amino acid diversity and suggests that combinatorial engineering strategies, such as the one used here, may be generally applied to create enzymes with novel structures and functions.

Natural evolution produces complex protein folds with a 20-amino acid alphabet. Primordial protein synthesis, however, is believed to have involved only a handful of amino acids (1). What is the minimal number of building blocks required for protein structure and function? Several studies have demonstrated that considerably reduced amino acid alphabets are sufficient to encode native-like proteins (2)(3)(4)(5). For instance, a de novo designed protein constructed from a 7-amino acid alphabet adopts a well defined four-helix bundle fold (5). Nevertheless, such simplified proteins are generally devoid of function.
Simplifying existing proteins without impairing their normal function is challenging, too, given the precise identity and positioning of residues required for binding and catalysis. Selection strategies in combination with design have been successfully applied to search sequence space for active proteins (6 -8). For example, phage display has been exploited to obtain functional SH3 domain variants in which 70% of the wild-type sequence was replaced with a five-letter amino acid alphabet while retaining key binding site residues (6).
In a previous study, we exploited selection in vivo to replace all secondary structure units in an AroQ chorismate mutase (CM) 2 with simple binary-patterned modules of 4 polar and 4 apolar residues (9). However, the total number of amino acid types present in the resulting CMs was 14 since the highly conserved active site amino acids and the loop residues were held constant in the original design. Here, we have extended this work and shown that fully functional proteins can be constructed entirely from a 9-amino acid set. These represent the most severely simplified enzymes reported to date.

MATERIALS AND METHODS
Reagents-Restriction enzymes were from New England Biolabs. T4 DNA ligase was from Fermentas. Pfu Turbo polymerase was from Stratagene. Protein concentration was determined with the Coomassie Plus protein assay reagent (Pierce), using bovine serum albumin as the calibration standard.
Selection-Library plasmid pools were transformed into the KA12/ pKIMP-UAUC selection strain with efficiencies greater than 10 6 clones/g of library DNA (10). Transformed cells were washed in minimal medium and plated onto M9c minimal medium plates (lacking Tyr and Phe). Plates were typically incubated for 3 days at 30°C or 6 days at 25°C.
Protein Production and Purification-The gene encoding the simplified 9-amino acid CM was subcloned as a 282-bp NdeI/XhoI fragment into the T7 promoter-based expression vector pET-22b-pATCH, which appends a His 6 tag at the C terminus, and overexpressed in Escherichia coli strain KA13 (11). Expression from the T7 promotor was induced with isopropyl-1-thio-␤-D-galactopyranoside. The enzyme was purified by affinity chromatography on nickel-nitrilotriacetic acid-agarose resin (Qiagen) followed by preparative fast protein liquid chromatography on a Superdex 75 (26/60) gel filtration column (Amersham Biosciences). * This work was supported by the Swiss Federal Institute of Technology and the Schweizerischer Nationalfonds. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. □ S The on-line version of this article (available at http://www.jbc.org) contains a supplemental MjCM was produced as described previously (12). Protein integrity was confirmed by mass spectrometry.
Kinetic Assays-CM activity measurements were performed at 20°C in acetate buffer (pH 5) as reported previously (12). The inhibition assay was carried out under the same conditions in the presence of a transition state analog (compound 1, 0 -100 M) (13).
H/D Exchange-H/D exchange experiments were performed as described previously (14). The number of protected labile H atoms was calculated by subtracting the observed mass from the mass of the fully deuterated protein.
CD Spectroscopy-Thermal unfolding curves of a 16 M protein solution (in the presence and absence of 320 M inhibitor) were monitored by CD at 222 nm, increasing the temperature from 10 to 95°C in 1°C steps.

RESULTS
CMs catalyze the rearrangement of chorismate to prephenate and are essential for the biosynthesis of the aromatic amino acids Tyr and Phe in bacteria, fungi, and plants. The binary patterned mutase that served as our starting point is a domain-swapped homodimer (9); each subunit is composed of three helices (H1, H2, and H3) connected by two loops (L1 and L2) (Fig. 1A). With the exception of a glutamine (Gln 88 ) and three arginines (Arg 11 , Arg 28 , Arg 51 ) in the active site and 4 loop residues (Gly 43 , Pro 45 , His 66 , Val 68 ), all its building blocks belong to the eightletter alphabet: Asp, Glu, Asn, Lys/Phe, Ile, Leu, Met ( Fig. 2A). Arginine residues are invariant constituents in the active sites of AroQ CMs (12,15) (Fig. 1B) and could not be replaced, but the other noncanonical catalytic and loop residues were systematically substituted to produce 9-amino acid enzymes.
First, the active site glutamine (Gln 88 ) was replaced by the set of 4 polar amino acids using site-directed mutagenesis. The resulting library was introduced into the auxotrophic E. coli strain KA12/pKIMP-UAUC (10) and subjected to in vivo selection. Three of the four protein variants complemented the CM deficiency of host cells under selective conditions (Supplemental Table 1), although the growth rates differed significantly (Glu Ͼ Asn Ͼ Asp). The size and charge of the amino acid side chain seems to play an important role at this position. Second, we replaced His 66 and Val 68 in loop L2 of the active Gln88Xaa mutants with the polar and apolar amino acid subsets, respectively. Of the 48 possible combinations introduced by combinatorial mutagenesis, only 4 yielded active enzymes (Supplemental Table 1). Third, 2 residues in loop L1 (Gly 43 and Pro 45 ) in the complementing L2 variants were replaced with either the polar or the apolar set of amino acids or with both sets simultaneously. However, no complementing clones were observed after 20 days at 25°C. Consequently, all 4 L1 loop residues (Gly 43 -Ile 44 -Pro 45 -Ile 46 ) were simultaneously randomized. Because the nature of the genetic code did not allow assignment of a single degenerate codon encoding the simplified 9-amino acid alphabet (Asp, Glu, Asn, Lys/Phe, Ile, Leu, Met plus Arg), a new degenerate codon DWS (D ϭ A, G, T; W ϭ A, T; S ϭ C, G) was employed to construct a library encoding all 9 design residues plus Val, Tyr, and the amber stop codon (UAG). Based on the experimental transformation efficiencies, Ͼ95% of the 58,564 possible variants were evaluated in the selection system (16); only five complemented the CM deficiency. All active clones contain Asp 45 , aliphatic residues at positions 44 and 46, and polar residues at position 43 (Supplemental Table 1); neither Val nor Tyr was found in any functional sequence.
Despite dramatic simplification relative to natural mutases, the selected 9-amino acid enzymes are fully functional in vivo, replacing wild-type CM in bacterial metabolism (Fig. 2B). Bacterial cells encoding the simplified catalysts grew only somewhat slower than those with wild-type enzymes from E. coli (EcCM) and Methanococcus jannashii (MjCM), presumably due to the Q88E active site mutation. In natural AroQ CMs, Gln 88 serves to stabilize the transition state during catalysis, acting as a hydrogen bond donor (15), and replacing this residue with glutamic acid shifts the pH optimum to approximately pH 5 (17). Under physiological conditions (pH 7), glutamic acid should be largely deprotonated, which would be deleterious for substrate binding and catalysis (Fig. 1B).
One of the simplified CM variants displaying high activity in vivo ( Fig.  2A) was overproduced for detailed characterization. A His 6 tag was FIGURE 1. AroQ structure and active site (15). A, the homodimeric EcCM is shown with a transition state analog inhibitor (13) bound at its active sites; the two identical polypeptide chains are colored blue and pink for clarity. B, proposed interactions between residues in the evolved active site of the simplified enzyme and the transition state analog inhibitor, compound 1 (red), based on the x-ray structure of EcCM (15). Residues Gln 88 and Ser 84 in EcCM are substituted with Glu 88 and Asn 84 in the 9-amino acid enzyme. Residue numbers are referenced to EcCM.
attached at its C terminus to facilitate purification. The addition of this tag does not alter the behavior of the protein in vivo (Fig. 2B). The overproduced enzyme was purified by affinity chromatography (nickelnitrilotriacetic acid) and gel filtration, typically yielding 6 mg of soluble protein/liter of liquid culture. Although the activity of the wild type remains constant over a broad pH range (pH 5-9) (12), the simplified enzyme displays optimal activity at pH 5 as a result of the Q88E mutation (Fig. 1B, Supplemental Fig. 1). Under these conditions, its k cat value (0.9 s Ϫ1 ) is only 3-fold lower than that of MjCM (3.1 s Ϫ1 ), whereas its K m value (830 M) is 40-fold higher than that of the wild type (20 M). In line with its lower specific activity, the simplified CM binds the transition state analog inhibitor (compound 1, Fig. 1B) 50-fold less tightly (K i ϭ 10 M) than the wild-type enzyme (K i ϭ 0.21 M).
The simplified enzyme is highly helical as judged by far UV CD spectroscopy (Supplemental Fig. 2A) and elutes as a dimer from a sizeexclusion chromatography column (Supplemental Fig. 2B) like typical AroQ mutases (12). Its 1 H-NMR spectrum exhibits similar peak dispersion to wild-type MjCM (Supplemental Fig. 3, A and B) but generally contains fewer peaks due to the reduced amino acid diversity. The addition of the transition state analog inhibitor causes only a slight increase in NMR peak dispersion (Supplemental Fig. 3, C and D) and in helical content of the enzyme (Supplemental Fig. 2A). However, some of the biophysical properties differ somewhat from wild-type CMs. For instance, the minimized mutase undergoes slightly faster H/D exchange than its natural counterpart (Fig. 3A). An additional 10 -14 backbone amides (out of ϳ100) appear to exchange more rapidly in the simplified enzyme, suggesting greater fraying of its helices (18). Addition of the transition state analog tightens both structures to a similar extent (Ϸ3-5%) (14). Chemical denaturation studies showed that the minimized protein is less stable (⌬G U (H 2 O) ϭ 9.5 kcal mol Ϫ1 ) than MjCM (⌬G U (H 2 O) ϭ 24.0 kcal mol Ϫ1 ) (12). Its thermal unfolding is reversible but noncooperative, although the addition of inhibitor induces modest cooperativity (T m ϭ 55°C) (Fig. 3B). For comparison, the wild type unfolds with a high degree of cooperativity, even in the absence of ligand (T m ϭ 88°C) (12). Finally, the protein binds the hydrophobic dye ANS (19) to a greater extent than MjCM, leading to enhancement of fluorescence and a blue shift of the emission maximum (Fig. 3C). Addition of the transition state analog impairs ANS binding, probably by tightening the packing of the hydrophobic core.

DISCUSSION
The fact that less than half of the proteinogenic amino acids are sufficient to construct a chorismate mutase that is metabolically competent, conferring near wild-type levels of cell growth to an auxotrophic host, supports the hypothesis that primitive enzymes may have contained a reduced set of building blocks (1). Nevertheless, dispensing with 11 of the 20 standard amino acids significantly reduces the diversity of favorable internal packing interactions, leading to destabilization of the overall protein structure. As a consequence, the simplified enzyme displays certain properties (noncooperative thermal unfolding, ANS binding) reminiscent of the molten globule state (20,21). Interestingly, it appears to be less molten than a previously engineered CM monomer (14,22), probably due to additional stabilizing interactions in the hydrophobic region located between its two active sites. In terms of enzyme evolution, these two proteins can be viewed as models for primitive catalysts in that both are less complex than a modern day enzyme such as MjCM with respect to primary sequence or quaternary structure.
The reduced amino acid alphabet we used to create the minimized enzyme was arbitrarily chosen to suit a design strategy based on two FIGURE 2. Strategy for engineering simplified CMs. A, amino acid sequences of the binary patterned parent protein (14 building blocks) and simplified variants (13,11, and 9 building blocks). Polar and apolar residues are shown in red and blue, respectively. Residues that do not belong to the 8-amino acid alphabet (black) were replaced by selection in three steps. Arginine residues were held constant. B, in vivo complementation. The ability of the minimized enzyme to complement the CM deficiency of KA12/pKIMP-UAUC cells (10) was evaluated by streaking clones on minimal medium (M9c) in the absence (left) and presence (right) of Tyr and Phe. Cells bearing the simplified protein without (i) and with (ii) a His 6 tag grow similarly. Cells containing wild-type EcCM (iii) and MjCM (iv) or the vector (v) are included as positive and negative controls.

9-Amino Acid Enzyme
degenerate codons each encoding a set of 4 polar and 4 apolar residues.
Notably, neither small (Ala, Cys, Gly), hydroxylated (Ser, Thr), nor sterically constrained (Pro) residues are required to obtain an active helical bundle structure, although they occur frequently in AroQ CMs (12). Some of these residues (especially Ala and Gly) have been generally considered indispensable building blocks for the formation of helices and turns in severely simplified proteins (23,24). Given the restriction (no small amino acids) and redundancy (Asp/Glu and Ile/Leu) of the 9-amino acid alphabet, more extensive simplification of the mutase may well be feasible. The CM selection system, like selection systems more generally (25), is ideally suited to explore alternative amino acid alphabets to identify the best minimal set, as well as to pinpoint residues that are critical for folding and function by re-expanding these alphabets via additional rounds of directed evolution. The optimal building blocks for producing active ␣-helical and ␤-sheet proteins are unlikely to be the same, and analogous experiments on structurally distinct CMs, which include all-␣-helical (15), pseudo-␣/␤-barrel (26), and all-␤-sheet architectures (27), promise valuable insights into the unique structural requirements of different scaffolds.
In this context, recent advances in computation, which have made possible the design of novel proteins with atomic level accuracy (28 -30) as well as the redesign of natural proteins to create novel binding (31) or catalytic sites (32), can be expected to reinforce experimental efforts to identify viable alternative alphabets for specific folds. In turn, simplified alphabets may facilitate computational searching by reducing the number of sequences that need to be sampled and should bring us closer to the still unrealized goal of designing new enzymes from scratch.