A structural model of the plant acyl-acyl carrier protein thioesterase FatB comprises two helix/4-stranded sheet domains, the N-terminal domain containing residues that affect specificity and the C-terminal domain containing catalytic residues.

Plant acyl-acyl carrier protein thioesterases (TEs) terminate the acyl-acyl carrier protein track of fatty acid biosynthesis and play an essential role in determining the amount and composition of fatty acids entering the storage lipid pool. A combination of bioinformatics tools was used to predict a three-dimensional model for Arabidopsis FatB (AtFatB), which comprises a fold similar to that of Escherichia coli TEII, an enzyme that is functionally similar to plant TEs but lacks significant sequence similarity and displays different inhibitor sensitivity. The catalytic residues in AtFatB, Cys-264 and His-229, localize to the same region of the model as catalytic residues found in other enzymes with helix/multi-stranded sheet motifs (hot dog folds). Based on the model, we identified Asn-227 as a possible third member of the proposed papain-like catalytic triad. The conversion of Asn-227 to Ala resulted in a loss of detectable activity (>200-fold reduction), similar to the result seen for the equivalent mutation in papain. Mapping of plant TE specificity-affecting mutations onto the structural model showed that these mutations all cluster around the catalytic triad. Also, superposition of the crystallographically determined structures of the complexes of 4-hydroxybenzoyl-CoA TE with substrate and beta-hydroxydecanoyl thiol ester dehydrase with inhibitor onto the AtFatB model showed that the substrate and inhibitor localize to the same region as the AtFatB catalytic triad in their respective structures. Together these data corroborate the structural model and show that the hot dog fold is common to enzymes from both prokaryotes and eukaryotes and that this fold supports at least three different catalytic mechanisms.

and ACP. Their activity represents the terminal step in the plastidial fatty acid biosynthesis pathway. The resulting free fatty acids enter the cytosol where they are esterified to coenzyme A and further metabolized into membrane lipids and/or storage triacylglycerols. Acyl-ACP TEs have characteristic chain-length specificities that vary from 8 -18 carbons, and the substrate preferences of individual TEs have been shown to play a key role in determining the composition of storage lipids (1)(2)(3). Because of this role, several studies have focused on engineering TEs with altered substrate specificities as a strategy for tailoring specialty seed oils (4). Although partially successful, these efforts have been hampered by the lack of structural information regarding the plant TEs.
Plant TEs are a class of enzymes considered different from those of animals and bacteria because of their lack of sequence similarity (2) and differences in inhibitor sensitivities. Specifically, plant TEs exhibit sensitivity to thiol inhibitors, whereas animal and bacterial TEs are sensitive to serine-reactive reagents (5)(6)(7), suggesting that the plant enzymes employ a cysteine in catalysis. Conversion of the only conserved cysteine in plant TEs to serine resulted in an enzyme that retained ϳ60% activity and converted inhibitor sensitivity from thiol reagents to serine reagents (8). Based on these results and the identification of a conserved histidine within all plant TE sequences that is required for catalysis, the plant enzymes have been proposed to contain a papain-like catalytic triad (8) containing Cys-264 and His-229 and an unidentified asparagine or aspartic acid.
Plant acyl-ACP TEs are nuclear-encoded, plastid-targeted globular proteins (2) that are functional as dimers (9,10). Based on amino acid sequence alignments, the plant TEs have been shown to cluster into two families: FatAs, which show marked preference for 18:1 ACP with minor activity toward 18:0 and 16:0 ACPs, and FatBs, which hydrolyze primarily saturated acyl-ACPs with chain lengths that vary between 8 and 18 carbons (3,11,12). FatAs and FatBs both contain predicted ϳ60-amino acid transit peptides; however, FatBs have an additional conserved hydrophobic 18-residue domain that can be removed without affecting activity and that has been proposed to form a helical transmembrane anchor (4). With the exception of two short regions that are unique to each class, the FatA and FatB sequences contain a core region of ϳ210 residues that show dispersed sequence similarity throughout.
In the absence of a crystal structure of either a plant TE or sequence homolog, we employed the program 3D-Jury metapredictor (13) to compile the output of several individual predictive servers and generate a consensus secondary structural map of FatB. 3D-Jury then compared the FatB map to those of proteins with structures in the protein data bank and identified potential templates for modeling a three-dimensional structure of the plant acyl-ACP TEs. The FatB secondary structural map consisted of the 210-residue core that contained two repeats of a helix and multistranded sheet fold common to the so-called hot dog fold proteins. A search was performed to identify matches to the first and second domains individually. Models of these elements were subsequently manually assembled into a single structural model using the crystallographically determined structure of the Escherichia coli TEII protein as an alignment guide. The model is consistent with the existence of a papain-like catalytic triad and contains a conserved asparagine that our mutagenesis experiments show is important for activity. The model is also consistent with previous results from site-directed mutagenesis and chimeragenesis experiments on plant TEs.

EXPERIMENTAL PROCEDURES
Sequences-All sequences were obtained from NCBI. Multiple sequence alignments and phylogenetic trees were produced using Vector NTI Align (InforMax, Inc., Bethesda, MD). Three-dimensional coordinates for known structures were obtained from the PDB. All numbering of AtFatB residues is relative to the mature AtFatB protein sequence (see Fig. 1) (14).
Structure Prediction and Analysis-The structure of the plant TE was predicted using the program 3D-Jury metapredictor (13). 3D-Jury provides both the secondary structure consensus as well as a list of scored proteins that, based on similarities between secondary structure, could serve as templates for molecular modeling. For each query sequence, 3D-Jury also provides BLAST scores from a search for trivial homologs in the PDB as well as PDB-BLAST scores from a more in-depth PSI-BLAST-based homolog search against the PDB. In CASP5, 3D-Jury showed an 86% correlation between the score output and correctly positioned residues in LiveBench 6 (15). Once the most common high scoring protein was identified from the ranked 3D-Jury list, the Arabidopsis FatB (AtFatB) sequence was threaded onto that three-dimensional structure using the homology-modeling program ESyPred3D (16) that is based on a strategy using neural networks to evaluate sequence alignments. ESyPred3D uses the program MOD-ELLER to build the final structural model. To assemble the final structure, each portion of the plant TE was modeled individually and then manually aligned using the backbone of E. coli TEII (PDB code 1C8U) as a guide with the use of Rasmol molecular graphics (17). Images of the structures were produced using DeepView (18). Ramachandran plots were produced and evaluated using the program RAM-PAGE (19).
Testing of the Model-Based on the location of the proposed active sites of the plant TE and multiple sequence alignments of this region of various plant TEs, we identified Asn-227 as the prime candidate for the third member of the catalytic triad. We used overlap extension PCR to design the AtFatB-N227A mutant and tested its activity in an E. coli expression system (20,21).
E. coli Expression System-The coding sequence of the mature At-FatB was amplified from plasmid TE3-2 (14) with primers 5ЈSpeFatBf (5Ј-GACTAGTTTACCTGACTGGAGCATGCTTCTTGC-3Ј) and Fat-B(X)R (5Ј-CGGCTCGAGGGTAGTAGCAGATATAGTT-3Ј) and cloned into the pBC expression plasmid using XhoI and SpeI restriction sites. The final plasmid construct pBC(AtFatB-parent) contains three amino acid residue differences (I176L, E178D, L202S) as compared with the GenBank TM sequence (accession number Z36911). Primers for the overlap extension PCR to build the N227A mutant were NAF (5Ј-TGAC-CTAGATGTTGCACAGCATGTGAAT-3Ј) and NAR (5Ј-ATTCACAT-GCTGTGCAACATCTAGGTCA-3Ј). Plasmids were transformed into the K27 strain of E. coli (CGSC5478), which has a mutation in the FadD enzyme of fatty acid biosynthesis that prevents uptake of free fatty acid from the medium. Thus, when an acyl-ACP TE is expressed in this system, the free fatty acid product of the thioesterase reaction accumulates in the medium (20). This assay has been used and validated for FatBs and FatAs in several studies (12,14,20,21).
Fatty Acid Analysis-Fatty acid content of the medium from various cell cultures was determined by the production and measurement of fatty acid methyl esters. Briefly, 22 l of glacial acetic acid and 1 ml of 1:1 (v:v) chloroform:methanol containing 18:3 as an internal standard were added to 0.5 ml of medium from pelleted cells corrected to give equivalent cell density based on A 600 . After mixing by inversion, phases were separated by centrifugation, and the lower phase was transferred to a fresh glass tube. The chloroform was evaporated by N 2 stream, 1 ml of 2% H 2 SO 4 in methanol was added, and the samples were incubated at 90°C for 1 h. Samples were extracted once with 1 ml of 0.9% NaCl and 2 ml of hexane. The organic phase was transferred to a fresh tube and dried under N 2 and then resuspended in 50 l of hexane. 3-l samples were analyzed with the use of a Hewlett-Packard 6890 gas chromatograph equipped with a 5973 mass selective detector and a J&W DB-23 capillary column (60 m ϫ 250 m ϫ 0.25 m). The injector was held at 225°C, the oven temperature was varied (100 -160°C at 25°C/min, then 10°C/min to 240°C), and a helium flow of 1.1 ml/min was maintained.

RESULTS
Secondary Structure-The amino acid sequences of 10 plant TEs were submitted to 3D-Jury (Table I). The secondary structures predicted by each of the three programs included in the metaserver analysis were similar for all of the TEs (Fig. 1). Following an initial region of variable secondary structure, each of the plant TEs contained a tandem repeat of a helix/4stranded sheet motif separated by a linker region of variable length and structure.
Fold Recognition-For each of the ten plant TEs evaluated, the 3D-Jury and PDB BLAST scores were significant as defined by the 3D-Jury program ( Table I). The most common high scoring match for each was the structure for 4-hydroxybenzoyl-CoA TE (4HBT) from Pseudomonas sp. strain CBS-3 (PDB code 1BVQ). The active 4HBT is a homotetramer that comprises a dimer of dimers (22,23). The 4HBT structure contains a long helix packed against a 5-stranded anti-parallel ␤-sheet (24,25). The active site is composed of residues from both monomers (22). 3D-Jury predicts a tandem repeat of domains equivalent to the 4HBT structure within a single FatB monomer (Fig. 1), the first domain with 21.2% identity and the second with 17.3% identity as calculated by ESyPred3D. Within the FatA mono- The most common match in every case was 4HBT (PDB code 1BVQ). Other high scoring matches were E. coli hypothetical protein Ybaw (1NJK) and E. coli hypothetical protein Ec709 (1S5U). All 10 3D-Jury scores for the most common match (1BVQ) are significant (i.e. Ͼ50 as defined by the 3D-Jury program (13) mer, 3D-Jury only identifies a single equivalent region to the 4HBT structure, although the secondary structures of FatA and FatB are very similar overall, suggesting that it, too, will comprise a similar structure. Other common high scoring secondary structural matches for the plant TE secondary structural maps were 1S5U (E. coli hypothetical protein Ec709; Swiss-Prot accession no. P08999) and 1NJK (E. coli hypothetical protein Ybaw; Swiss-Prot accession no. P77712), both of which also share the hot dog motif. All other predicted matches also shared this motif and included 1Q4S (Arthrobacter sp. strain Su 4-hydroxybenzoyl CoA TE), 1J1Y (Thermus thermophilus Hb8 Paai), 1PSU (E. coli PaaI), and 1VH5 (Ydii putative TE). The ribbon structure of 1BVQ chain A and the predicted structure of the N-terminal domain of AtFatB are strikingly similar (Fig. 2), as are the structures of high scoring matches 1NJK and 1S5U.
Because 3D-Jury predicted that AtFatB contains two domains with similarity to the 4HBT structure, we searched the PDB for proteins with a tandem repeat of the 4HBT core domain. The E. coli TEII protein has a tandem repeat of the helix/4-stranded sheet motif as part of a structure consisting of 12 strands and six helices (26). The active site of TEII is located near the interface between the two individual hot dog domains and contains a catalytic triad of Asp, Gln, and Thr, which orients a water molecule that initiates a nucleophilic attack on the substrate (26). Apart from the core secondary structural motif (i.e. tandem repeats of helix/4-stranded sheet (HEEEE) separated by a linker region), the secondary structure of At-FatB is quite different from that of TEII, particularly in the FIG. 1. Secondary structure prediction of the mature AtFatB protein and sequence alignment with 4HBT (1BVQ). The two repeats of 4HBT sequence that align with FatB are underlined. Residues that are 100% conserved in plant TEs are marked with circles, specificity mutants with asterisks, catalytic residues with plus signs, and the two inactive mutants with Xs. The boxed sequence contains a region of 1BVQ sequence that is missing in the secondary structure prediction and omitted from the alignment (NYFikcglppwrqtvVERGIvgt-pIVSC; lowercase letters are missing residues). Standard one-letter amino acid codes are used. For the secondary structure code, C is coil, H is ␣-helix, and E is ␤-strand. regions immediately preceding each of the repeated domains. For example, the N terminus of TEII preceding the first repeat contains a helix followed by two short sheets and a coil region (26), whereas the N terminus of AtFatB contains a long helix followed by a long coil region and a long sheet (Fig. 1). Given these differences in secondary structure, it is not surprising that TEII was not among the structures identified by 3D-Jury. Furthermore, because of the differences in secondary structure and sequence between TEII and AtFatB, we were unable to thread AtFatB directly onto the TEII structure. However, we were able to build separate models of the two individual domains of AtFatB based on their individual highest secondary structure matches. The model of the N-terminal domain of AtFatB was based on the structure of the monomeric protein 4HBT; the model of the C-terminal domain of AtFatB was based on the monomeric protein E. coli Ybaw (PDB code 1NJK). The two individual AtFatB domains were assembled using the model of TEII (Fig. 3) as a guide. Specifically, the C␣ atoms of the ␤-sheets of the two individual AtFatB models were used to align the overall model of AtFatB with respect to the corresponding C␣ atoms in TEII. A series of close contacts, primarily at the interface between the two manually aligned domains, were identified with the use of DeepView. The majority of these close contacts were resolved by performing a single round of manual refinement with DeepView to minimize steric hindrance. The AtFatB model was deposited at PDB under accession number 1XXY.
Quality of the Three-dimensional Model-The C␣ atoms of the ␤-sheets of FatB and E. coli TEII overlap to within an average 1 Å root mean square deviation; for 132 C␣ atoms, the root mean square deviation is 1.39 Å. The stereochemistry of the model was evaluated using RAMPAGE (19). The Ramachandran plot shows that in the predicted AtFatB model, 91.3% of the residues are in the favored region, 7.3% in the allowed region, and 1.4% in the outlier region (Fig. 3b). The three outlier residues are Pro-99, Glu-178, and Trp-316. Residues 178 and 316 are in loops near the end of the modeled regions. For comparison, the crystallographically determined 4HBT structure contains 97.1% of the residues in the favored region and 2.9% in the allowed region (22), whereas E. coli TEII contains 91.3% favored and 8.7% allowed (26).

Modular Organization of the FatB Active Site and Relationship to Active Sites of Other TEs-The catalytic Cys-264 and
His-229 (8) of the plant TEs are present in the C-terminal domain (Fig. 1) and are located adjacent to each other in the FatB model (Figs. 3 and 4). They localize to the same region of the structural model as catalytic residues found in the structures of 4HBT (22,25), ␤-hydroxydecanoyl thiol ester dehydrase (24), and E. coli TEII (26) (Fig. 4a). Additionally, the AtFatB equivalents of all the known mutations in plant TEs that affect substrate specificity (Met-197, Arg-199, and/or Thr-231 in Umbellularia californica FatB1 (8); Gly-108, Ser-111, and Val-193 in Garcinia mangostana FatA1 (27)) localized to the N-terminal domain (Fig. 1) and mapped to the region around the active site residues Cys-264 and His-229 in the structural model (Fig. 4b). Thus, the predicted structure of the AtFatB shows a modular organization of the active site in that residues that affect specificity occur within the N-terminal domain, whereas the catalytic residues occur within the Cterminal domain.
Interestingly, residues that affect substrate specificity are located adjacent to 4HBT residue Ser-91, which is the only side chain that participates directly in ligand binding (25), a position occupied by a 100% conserved Gly residue in plant TEs. Finally, loop Asn-122-Leu-127 of 4HBT has been implicated in inhibitor binding both because it is located near the pyrophos-phate moiety of the inhibitor and because it changes structure upon ligand binding (25). The corresponding loop in AtFatB is longer (residues 168 -178) and contains at least two residues involved in substrate specificity, suggesting that it, too, is part of the plant TE active site.
It has been suggested that the plant TE active site will be similar to that of papain because they share catalytically active Cys and His residues (8). Papain also contains a catalytic Asn residue (28) as the third member of the catalytic triad that also contributes to the structural integrity of the active site (29). Mutation of the papain Asn-175 residue to an alanine resulted in an ϳ150-fold reduction in activity (29). Previous attempts to identify an active site Asp or Asn residue in the plant TEs were unproductive because there were seven conserved Asp or Asn residues within the protein sequence (8). However, with the inclusion of recently deposited plant TE sequences in alignment, the number of conserved residues is reduced to three Asn and a single Asp residue (Asn-75, Asp-223, Asn-227, and Asn-232) (Fig. 1). We identified Asn-227 as the most likely candidate for the third member of a plant TE catalytic triad because it is in close proximity to Cys-264 and His-229 in the AtFatB model (Fig. 3a) and to the catalytic Asp-17 from 4HBT and the catalytic His-70 from ␤-hydroxydecanoyl thiol ester dehydrase in the superposition of the structural models (Fig. 4a). An AtFatB-N227A site-directed mutant showed no detectable activity as determined by fatty acid methyl esters prepared from culture medium (Fig. 5). Based on our estimate of the assay sensitivity (0.35 nmol/ml), the activity of the AtFatB-N227A mutant is Ͼ200-fold lower than the AtFatB-parent. We identified a mutation, N232Y, in one of the other conserved Asn residues, but it had little effect on enzyme activity (data not shown). A highly impaired mutant (R171M, i.e. in a completely conserved residue in the plant TEs), which we isolated in the process of cloning the wild-type AtFatB, provided further evidence for that region of the protein being important for activity (Fig. 4b). Finally, an inactive mutant of FatA was isolated that contained the single active site amino acid substitution K175E, supporting the idea that plant FatA and FatB TEs share similar folds (Fig. 4b). Interestingly, a mutation at the equivalent position in U. californica FatB from Thr to Lys, when present in combination with two additional mutations, resulted in an altered substrate specificity (8). DISCUSSION The analysis presented here provides evidence for an expanded family of TEs that share similar folds and employ several distinct catalytic mechanisms. This conclusion is supported by the following observations. 1) The predicted secondary structural map of the FatB TE contains two repeats of the helix/4-stranded secondary structure motif. 2) The best match for the secondary structural maps of plant TEs is Pseudomonas 4HBT, an enzyme with a similar catalytic function. 3) The three-dimensional model constructed by manual alignment of the two plant TE domains using the E. coli TEII backbone had a root mean square deviation of only 1.39 Å over 132 C␣ atoms. The quality of the FatB model also compares well with other published predicted models (30,31), which typically show 1.6 -13% of residues in disallowed regions of Ramachandran plots compared with the 1.4% for the model presented here. 4) The model is consistent with previous experimental data, in that all the residues previously demonstrated to affect either activity (His-229 and Cys-264) or substrate specificity , and Thr-231 in U. californica FatB1 (8); Gly-108, Ser-111, and Val-193 in G. mangostana FatA1 (27)) mapped close to our predicted active site. 5) Using the model, we were able to identify Asn-227, a potential third member of a papain-like catalytic triad that, when mutated to Ala, showed a Ͼ200-fold reduction in activity, suggesting that it may play a structural/ functional role in the plant TE triad that is similar to the role Asn-175 plays in the papain catalytic triad. 6) The active site residues in the FatB three-dimensional model (Figs. 3a and 4a) occur with relative spacing and orientation similar to those seen in papain (32). 7) The final measure of support for our conclusion is that the modular organization of AtFatB, with the residues that contribute to substrate specificity occurring in the N-terminal domain and the residues that constitute the proposed catalytic triad occurring in the C-terminal domain, is consistent with previous observations (4,33). Modular architecture such as this is particularly well suited to rapid diversification of enzymatic function (7,34). Although the predicted structural model is consistent with the available biochemical data (as described above), the sequence similarity between AtFatB and the threading templates (ϳ20% identity) is lower than that commonly employed for homology modeling. Thus, we caution against overinterpreting details of the model.
4HBT and TEII were discounted as structural homologs of the plant acyl-ACP TEs based on a lack of primary sequence similarity and the identification of conserved catalytic residues that differed between the plant and bacterial enzymes (8). It was only with the use of the bioinformatics tools employed in this study that the similarities among the secondary structures of 4HBT, TEII, and the plant TEs were discovered, independent of any primary sequence or functional similarities, which allowed us to construct a structural model for FatB. It is interesting that so many of the plant TE secondary structure matches are also thioesterases, linking the families by secondary structure and catalytic function, albeit with a diversity of catalytic mechanisms. Although 3D-Jury identified structural similarities between plant TEs and other hot dog fold-containing TEs, it did not identify similarity with other thioesterase families, such as LuxD (representative of the ␣/␤ hydrolase superfamily (35)), nor were we able to thread a plant TE onto the LuxD structure using EsyPred3D.
ACP, a small acidic protein, and coenzyme A, an adenosine-3Ј-phosphate derivative, have quite different structural properties, so it might seem surprising that the same architecture of enzyme would recognize both acyl carriers. However, the helix/4-stranded sheet architecture is conserved among enzymes that recognize either acyl-coenzyme A or acyl-ACPbound substrates (23,24,26,36). Indeed, plant acyl-ACP TEs have been shown to recognize acyl-coenzyme A-bound substrates (9). We note that the interchangeability of ACP and coenzyme A adducts applies also to diiron-containing 4-helix bundle proteins such as the plant acyl-ACP desaturases that have also been shown to be active with acyl-coenzyme A (9).
The identification of a common architecture linking TEs from several kingdoms that contain different catalytic residues raises questions as to the origin of these enzymes. Because plant acyl-ACP TEs are found in plastids that are believed to have arisen from symbiotic cyanobacteria, it seems likely that the plant enzymes evolved from an ancestral bacterial thioesterase. Supporting this view is the presence of two 4HBT-like sequences in the Synechocystis proteome (37) as well as the general structural and functional similarities of the fatty-acid synthase machinery between plants and bacteria (38). That the plant TEs have a fold as well as a function similar to the other hot dog fold-containing TEs suggests that they share a common lineage and is consistent with the previous proposal that this family of enzymes diverged long ago (25). The closer structural similarity between the bacterial TEII from E. coli and AtFatB as compared to the plant FatAs supports Voelker's proposal that plant FatBs predate FatAs (12). The observation that the bacterial ␤-hydroxydecanoyl thiol ester dehydrase, which catalyzes dehydration and double bond isomerization of 10-carbon thiol esters of ACP (24), shares a common architecture with plant and bacterial TEs raises the possibility that these classes of fatty acid-metabolizing enzymes arose from a common progenitor enzyme. However, the highly divergent amino acid sequences and the differences in catalytic mechanisms preclude us from ruling out a convergent evolutionary origin. In addition to the evolutionary implications described above, the structural model of AtFatB provides a conceptual framework for both interpreting existing and planning future structurefunction studies on the plant acyl-ACP TEs.