Structural and Functional Evolution of Positively Selected Sites in Pine Glutathione S-Transferase Enzyme Family*

Background: The functional significance of positive selection in enzyme families is largely unknown. Results: The structural and biochemical properties of positively selected sites in pine GST enzyme family were characterized. Conclusion: The positively selected sites significantly affect enzyme activity and specificity. Significance: This study sheds light on the selective aspects of the functional and structural divergence of plant enzyme family. Phylogenetic analyses have identified positive selection as an important driver of protein evolution, both structural and functional. However, the lack of appropriate combined functional and structural assays has generally hindered attempts to elucidate patterns of positively selected sites and their effects on enzyme activity and substrate specificity. In this study we investigated the evolutionary divergence of the glutathione S-transferase (GST) family in Pinus tabuliformis, a pine that is widely distributed from northern to central China, including cold temperate and drought-stressed regions. GSTs play important roles in plant stress tolerance and detoxification. We cloned 44 GST genes from P. tabuliformis and found that 26 of the 44 belong to the largest (Tau) class of GSTs and are differentially expressed across tissues and developmental stages. Substitution models identified five positively selected sites in the Tau GSTs. To examine the functional significance of these positively selected sites, we applied protein structural modeling and site-directed mutagenesis. We found that four of the five positively selected sites significantly affect the enzyme activity and specificity; thus their variation broadens the GST family substrate spectrum. In addition, positive selection has mainly acted on secondary substrate binding sites or sites close to (but not directly at) the primary substrate binding site; thus their variation enables the acquisition of new catalytic functions without compromising the protein primary biochemical properties. Our study sheds light on selective aspects of the functional and structural divergence of the GST family in pine and other organisms.

Enzymatic proteins as biological catalysts are essential components of every biological system. The key functional characteristics of an enzyme are its catalytic activity toward different substrates and its substrate specificity. The enzyme family is a group of enzymes with similar amino acid sequence and protein folds encoded by a group of genes created by gene duplication. Numerous studies (1)(2)(3) have shown that different members of enzyme families have distinct substrate specificities or catalytic activities, which are shaped by selective pressures during the course of evolution (2,4,5). Phylogeny-based analyses of codon substitution have shown that many protein families have been subject to positive selection (6 -8), and positive selection is now recognized as an important driver of protein evolution (9 -11). However, the lack of appropriate combined functional and structural assays has generally hindered attempts to elucidate patterns of positively selected sites and their effects on enzyme activity and substrate specificity.
Glutathione S-transferases (EC 2.5.1.18) comprise a large, ubiquitous gene family of multifunctional proteins that mainly catalyze the conjugation of intracellular glutathione (GSH) to a wide variety of electrophilic, cytotoxic, and genotoxic molecules of endogenous or exogenous origin. Detoxification of xenobiotics is considered to be the main function of plant GSTs, but other functions include protecting cells from diverse biotic and abiotic stressors, such as pathogens, heavy metal toxins, oxidative agents, and UV radiation (12)(13)(14)(15). Because of the diversity of potential xenobiotics and stressors, functional divergence of this enzyme family has major adaptive significance. Accordingly, the supergene GST family displays extensive functional diversification in gene expression, enzymatic activities, and substrate specificities (2,3). However, although much is known about the biochemistry of GST, little attention has been paid to the patterns of selective pressures that have directed its evolutionary divergence and acquisition of novel activities.
The molecular mechanisms that create structural modifications of protein copies and the subsequent evolutionary genetic processes that fix mutations, alter functions, and improve adaptation should differ among plant lineages due to differences in selective pressures associated with differences in their life history traits. Gymnosperms are a large group of plants with a long evolutionary history. Many are important coniferous components of northern ecosystems and valuable forestry trees. Unlike annual plants, the perennial conifers have long generation times, large effective population sizes, and wide distributions over heterogeneous mountainous regions. These life history traits could theoretically favor mutations that facilitate adaptation to changing environments. However, due to the lack of genome information, knowledge of the functional, structural, and selective aspects of natural enzyme family evolution in conifers is limited. In this study we used a combined evolu-tionary and experimental approach to elucidate evolution and selection in the GST gene family in Pinus tabuliformis, a pine that is widely distributed from northern to central China including cold temperate and drought regions. It grows from 0 to 2700 m above sea level and forms extensive forests (16). Through analyses of substitution and gene expression patterns, enzyme assays, site-directed mutagenesis, and structural comparisons, we examined 1) the selective context and functional divergence in the GST family in the pine and 2) the structural and functional importance of positively selected sites.

EXPERIMENTAL PROCEDURES
Molecular Cloning and Nomenclature-Primers, based on putative Pinus taeda GST gene sequences identified from the GenBank TM EST database and listed in supplemental Table S1 were designed to amplify mRNA and genomic DNA of P. tabuliformis. Total RNA was isolated from fresh needles of P. tabuliformis using an Aurum Total RNA kit (Bio-Rad), and firststrand cDNA was synthesized using a TaKaRa RNA PCR kit (avian myeloblastosis virus) Version 3.0 (TaKaRa). PCR was performed in a 25-l reaction mixture containing 1-2 g of template DNA, 0.1 units of TaqDNA polymerase (Invitrogen), 200 M each of dNTP (Invitrogen), 1.5 mM MgCl 2 , and 10 pmol of each primer. Optimized PCR conditions consisted of 3-min initial denaturation at 94°C, then 35 cycles of 30 s at 94°C, 40 s at 55°C, and 60 s at 72°C followed by a 3-min final extension step at 72°C. PCR products were separated and recovered from a 1% agarose gel using a GFX PCR DNA and Gel Band Purification kit (Amersham Biosciences), then cloned into the pGEM-T Easy Vector (Promega) and sequenced in both directions to verify the gene sequences and structures. The sequences were further analyzed by the NCBI conserved domain database (www.ncbi.nlm.nih.gov) to confirm the presence of typical GST N-and C-terminal domains in the putative encoded proteins.
Following Edwards et al. (17), we assigned a univocal name to each P. tabuliformis GST gene (supplemental Table S2), consisting of Pta followed by a letter denoting the subfamily class (e.g. GSTU, -F, -T, -Z, and -L corresponding to Tau, Phi, Theta, Zeta and Lambda, respectively) then a progressive number for each gene of that class (e.g. PtaGSTU1).
Expression of GST Genes in P. tabuliformis Tissues-To investigate expression patterns of GST gene family members during growth under normal conditions, we sampled five mature P. tabuliformis trees in the Beijing Botanical Garden and five 3-month-old P. tabuliformis seedlings grown in a greenhouse. We isolated total RNA from top bud, needle, root tip, and phloem (stem and root) tissues of each mature P. tabuliformis tree and young needle, stem, and root tissues of each seedling. We then amplified GST sequences in the samples using PCR, as described above, and primers listed in supplemental Table S3.
Phylogenetic and Molecular Evolution Analyses-The GST protein sequences were aligned using the MUSCLE program (18), with subsequent manual adjustment using BioEdit (19). A phylogenetic tree of Pinus GSTs was then reconstructed using a maximum likelihood (ML) procedure by PHYML (20) with the JTT (Jones, Taylor, and Thornton) amino acid substitution model. For analysis of Tau-class GSTs, members of the sister class Lambda of GSTs were used as an outgroup. One thousand bootstrap replicates were performed in each analysis to calculate the support for identified clades and nodes.
We identified two major clades of P. tabuliformis Tau GSTs (Fig. 1A). To evaluate variation in selective pressure between them, we used CODEML branch models in PAML (21) to estimate the ratio of nonsynonymous versus synonymous substitutions ( ϭ d N /d S ) under two a priori assumptions: a one-ratio model in which one value was assumed for the entire tree and a two-ratio model in which values were allowed to vary between the two major clades (Fig. 1A). To verify which of the models best fitted the data, likelihood ratio tests were applied by comparing twice the difference in log-likelihood values between pairs of the models using a 2 distribution, with the degrees of freedom equal to the differences in the number of parameters between the models (22). The branch model tests indicated that selective pressures significantly differed between the two major clades of P. tabuliformis Tau GSTs and that one clade (clade A in Fig. 1A) has undergone relaxed negative selection. Thus, we used two branch-site models (A0 and A) to test whether positive selection has occurred at some amino acid sites in clade A. Clades A and B were assigned as foreground and background branches, respectively, and Model A allowed to vary among sites in foreground branches, whereas Model A0 did not allow for positive selection; instead was fixed at a value of 1. The positive selection (A) and neutral (A0) branchsite models were evaluated using likelihood ratio tests, then the Bayes Empirical Bayes method (23) was applied to the results from Model A to identify candidate positively selected sites.
Expression and Purification of GST Proteins-To investigate the enzymatic functions of P. tabuliformis Tau GST proteins, the P. tabuliformis Tau GST genes were subcloned into pET30a expression vectors (Novagen) to obtain an N-terminal His 6 tag. The primers used to construct the GST expression vectors are listed in supplemental Table S4. Colonies containing appropriate inserts were identified by sequencing.
The likelihood ratio test and Bayes Empirical Bayes analysis identified five positive-selection sites with posterior probabilities Ͼ0.99. To investigate whether these sites are important for catalysis, we selected protein PtaGSTU17 as a model, subcloned PtaGSTU17 cDNA into protein vector pET30a (Novagen), then performed site-directed mutagenesis using the pET30a/ PtGSTU17 plasmid as a template, the mutagenesis primers listed in supplemental Table S4, and methods described by Zeng and Wang (24). The resultant plasmids were used to transform Escherichia coli BL21 and verified by sequencing. Expression and purification of the wild-type and mutant PtaGSTU17 proteins followed the procedures presented by Zeng et al. (25).
1,3-diazole (NBD-Cl), as described by Ricci et al. (27); and the diphenyl ethers (Fluorodifen) and cumene hydroperoxide (Cum-OOH), as described by Edwards and Dixon (28). These substrates were frequently used to determine the enzymatic characteristics of plant GSTs (28). All assays were carried out at 25°C. Protein concentrations in the enzyme preparations were determined by measuring their absorbance at 280 nm. The fluorescent dye ANS is a sensitive probe for detecting and analyzing structural changes in proteins (29). To detect protein-folding differences between the mutant and wild-type enzymes, we obtained fluorescence measurements using a HITACHI F-4500 FL spectrophotometer after adding 100 l of 2 mM ANS to a final concentration of 0.05 mg/ml enzyme (sodium phosphate buffer, pH 7.4) in 1-ml reaction mixtures. A total of three scans each for blank and sample were collected and averaged for each enzyme. Reported spectra are the means from at least three independent experiments.
Homology Modeling-Three-dimensional structures of P. tabuliformis Tau GSTs were constructed using the InsightII software package (Accelrys) and x-ray structure of a soybean Tau GST (Protein Data Bank accession number 2VO4) as template. The P. tabuliformis Tau GSTs showed 34 -50% of protein sequence identity to the template protein. Initially, the sequences were aligned using the Align 2D program of the InsightII package Homology Module, structures were then automatically built using the Modeler Module, and the resulting models were evaluated by the profile-3D program. Substrate binding sites are detected as cavities present on the surface of the protein. In this study the binding sites were identified by grid analysis using the Binding-site Analysis module in InsightII.

Pinus GST Gene Family Contains Many Diverse Members-
Based on the P. taeda EST database, we cloned 44 full-length genes encoding putative GST proteins from P. tabuliformis (supplemental Table S2). This list is most likely incomplete due to gaps in the currently available genome information. The predicted proteins encoded by these 44 genes were initially classified based on the NCBI conserved domain analysis, which divided them into eight classes: Tau, Phi, Theta, Zeta, DHAR, Lambda, TCHQD and EF1B␥. The Tau GSTs were most numerous (26 members) followed by the Phi and Lambda classes (seven and three, respectively), then Zeta, DHAR, and EF1B␥ classes (two members each), and finally Theta and TCHQD classes (just one member each). Phylogenetic analysis of these P. tabuliformis GSTs with the GST family from Populus trichocarpa showed that GSTs of each subfamily class from both genomes grouped as a single clade with high bootstrap support (data not shown), which further supports the subfamily designations among the 44 P. tabuliformis GSTs.
Expression Patterns of GST Gene Family in Pine-The expression patterns of 44 P. tabuliformis GST genes were examined in the radicle, seedlings, and mature tree in normal conditions. Seven Phi GSTs and all minor classes GSTs (PtaGSTL1, -2, and -3, PtaGSTZ1 and -2, PtaGSTT1, PtaDHAR1 and -2, PtaEF1B␥1 and -2, and PtaTCHQD1) were expressed in all tissues or organs examined (Fig. 1B). But expression divergences were observed among the Tau GSTs. The phylogenetic tree showed the 26 Tau GSTs were divided into distinct two clades (clades A and B, Fig. 1A). The six genes (PtaGSTU1, -14, -25, -24, -17, and -26) in clade B were expressed in all of the tissues examined. In contrast, of the 20 Tau GSTs in clade A, only nine were expressed in all tissues. Another two (PtaGSTU11 and - 19) were not detectably expressed in any tissue under normal growth and development conditions, possibly because they are expressed at sub-detectable levels. Alternatively, they may only be induced in response to treatments and/or in tissues not examined in our study, or they may be pseudogenes. The other nine Tau GSTs in clade A were selectively expressed in specific tissues (Fig. 1B). These results indicate that functional divergence is more pronounced among the Tau GST genes in clade A than in clade B.
Molecular Evolution of Pinus Tau GSTs-Tau class GSTs were the most numerous in plant GST gene family. Our previous study showed the moss Physcomitrella patens did not contain any Tau GST, whereas the lycophyte Selaginella moellendorffii had 47 Tau GSTs (30). The GST gene families in Oryza sativa, Arabidopsis thaliana, and P. trichocarpa have been previously characterized (2,31,32), and they reportedly contain 52, 28, and 58 Tau members, respectively. Our search of the NCBI EST database identified 35 Tau GSTs from four additional gymnosperm species: P. taeda, Gnetum gnemon, Ginkgo biloba, and Cycas rumphii (supplemental Table S5). Genome-wide phylogenetic analysis of Tau GST genes in the above nine land plant species revealed two major clades (A and B, Fig. 2). Clade A contained Tau GSTs of eight seed plant species but not lycophyte, indicating that this group of GSTs might have been lost in lycophyte (Figs. 2 and 3). The 20 P. tabuliformis Tau GSTs of clade A in Fig. 1A grouped with other gymnosperms and formed a separate group (A2b, Fig. 2) that was diverged from angiosperm GSTs. The clade B can be divided into two major groups, B1 and B2. B1 contained GSTs from all species, whereas B2 contained only lycophyte and gymnosperm GSTs (Fig. 2). This phylogeny suggests that the B2 GSTs might have been lost in angiosperms (Fig. 3). All the six P. tabuliformis Tau GSTs of clade B in Fig. 1A were placed in clade B of Fig. 2.
To test for significant differences in selective pressures between clades A and B in Fig. 1A, we performed a two-ratio branch model test using PAML, in which different ( ϭ d N /d S ) values were assigned to the two clades ( Table 1). The log likelihood values under the one-ratio and two-ratio models were ln L ϭ Ϫ9394.796577 and ln L ϭ Ϫ9378.129047, respectively. A likelihood ratio test indicated that the one-ratio model should be rejected (p Ͻ 0.001); hence, selective pressures differed between the two clades. The mean values for clades A and B were 0.43 and 0.20, respectively, indicating that clade B has been under stronger purifying selection, whereas some amino acid changes in clade A might have been preserved by positive selection. To test this hypothesis, we applied a branch-site test (33) to identify target sites that are potentially under positive selection in clade A (Table 1). For this, we assigned clades A and B as foreground and background branches, respectively, and obtained log likelihood values under the positive selection model A and the null model A0 of ln L ϭ Ϫ9186.073757 and ln L ϭ Ϫ9188.425166, respectively. The likelihood ratio test indicated that the null model should be rejected (p Ͻ 0.03), corrob-orating the hypothesis that some amino acid sites in clade A have been under positive selection. Further analysis using a Bayes empirical Bayes procedure identified five sites (alignment positions 17, 43, 87, 129, and 183; Fig. 4) that are under positive selection with posterior probabilities Ͼ0.99. The two clades differed markedly in amino acid composition at these five sites, and at two of the five sites they shared no common residues (Fig. 4). Further PAML evaluation by including Tau GSTs of P. taeda also supported the positive selection on these five sites (data not shown).
Gene conversion or recombination can contribute to sequence divergence in some gene families (34). In this study we used GENECONV (35) and the Recombination Detection Program (RDP) (36) to detect gene conversion and recombination events among P. tabuliformis Tau GST genes. Only two gene conversion events and two recombination events were detected among all pairwise GST gene comparisons (data not shown). This result suggests that gene conversion and recombination do not seem to have played significant roles in GST evolution in pine. Similar results are also observed in other gene families, e.g. Arabidopsis MADS-box and vertebrates arylamine N-acetyltransferase gene families (37,38).

Structural Distribution of the Putative Positively Selected
Sites-We modeled the three-dimensional structures of the 26 P. tabuliformis Tau GSTs using x-ray structure of a soybean Tau GST (PDB accession number 2VO4) as reference. The simulated structures of the 26 Tau GSTs were superimposed to evaluate the goodness of fit of the overall topologies (Fig. 5A). This examination showed that all members shared the same conformation of the structural elements of ␣-helices and ␤-sheets, but structural modifications are present in loop regions. Among the five putative positively selected sites, four are located in loop regions and one in an ␣-helix of the C-terminal domain (Figs. 5A and 4).
GST proteins typically have two catalytically active sites: the prime GSH substrate binding site (G-site) and a secondary hydrophobic substrate binding site (H-site). The G-site within each GST class is formed by a group of highly conserved amino acid residues in the N-terminal domain of the protein, and the H-site is formed by residues in the C-terminal domain that are generally less conserved (32). Using the x-ray structures of soybean Tau GST as references, we identified five sites (alignment positions 19, 48, 62, 74, and 75 in Fig. 4; pink arrows in Fig. 5A) in P. tabuliformis Tau GSTs that correspond to G-site residues. To identify H-site residues, we used PtaGSTU17 as a model to predict possible ligand binding cavities in the protein by the Binding Site Analysis program in InsightII. One large cavity located between the N-and C-terminal domains was identified as the substrate binding pocket (blue dots, Fig. 5B). Residues Lys-119 and Trp-171 of PtaGSTU17 are located at the entrance and on the wall of this hydrophobic substrate-binding pocket, respectively (Fig. 5B), indicating that these two sites are likely H-site residues.
Among Substrate Specificities and Activities of Pinus Tau GSTs-Among 44 P. tabuliformis GSTs, the Tau GSTs were the most numerous. Seventeen Tau GSTs with differing evolutionary relationships were selected for protein expression and purification. All of these GSTs were expressed as soluble proteins in E. coli (Table 2), except for four Tau GSTs (PtaGSTU18, -22,     -23, and -25), which were expressed as inclusion bodies. In this study only soluble proteins were used to investigate their biochemical functions.
Thirteen purified Tau GSTs listed in Table 2 showed specific activities toward substrate fluorodifen. Of 13 purified Tau GSTs, 11 showed specific activity toward CDNB and NBD-Cl, 9  Table 1 are indicated by white arrows; the GSH binding sites are indicated by purple arrows. In B, positions of Lys-119 and Trp-171 residues of PtaGSTU17 are shown in purple.

TABLE 2 Specific activities of the P. tabuliformis Tau GSTs and mutants towards six substrates (mean ؎ S.D. obtained from at least three independent determinations)
ND, no activity detected.  shared similar substrate spectrum. Although these five GSTs showed similar substrate spectrums, their enzymatic activities toward each substrate were very different, e.g. PtaGSTU4 and -5, which are closely related paralogs (Fig. 1A), showed 1.6 -9.6-fold differences in their enzymatic activities toward each substrate. These results indicated P. tabuliformis Tau GSTs had divergence in substrate specificities and activities. Functional Significance of the Putative Positively Selected Sites-To investigate the functional significance of the five putative positively selected sites, we constructed three sets of mutants for biochemical assays. First, we selected PtaGSTU17 of clade B (Fig. 1A; background branch in the branch-site model test) and mutated the corresponding sites in PtaGSTU17 to the amino acid residues present in clade A (foreground branch in the branch-site model test). There were 6 -10 variants at each of these five sites in the 20 GSTs of clade A (Fig. 4). If a residue variant was represented by Ͼ15% of the sequences in clade A, we then replaced the corresponding amino acid residue in PtaGSTU17 with this common residue in clade A. Based on this criterion, we constructed and purified 11 PtaGSTU17 mutants (R12I, R12L, R12T, N37D, N37T, P80G, P80D, K119T, K119R, W171L, and W171F) to represent the major variants at the five positively selected sites. Second, we arbitrarily mutated each of the five sites in PtaGSTU17 to an amino acid not present in clade A GSTs, in this case to Ala. The enzymatic activities of the purified recombinant proteins were then explored using six GST substrates: CDNB, DCNB, NBC, Cum-OOH, NBD-Cl, and fluorodifen. Significant differences in enzyme activity (p Ͻ 0.05, Mann-Whitney U test) were found between the mutants and wild type (WT) toward all six substrates, except for mutants P80D and P80G toward substrate DCNB ( Table 2). The mutants differing at the same residue site also differed markedly in activities toward the substrates, except for the two Pro-80 mutants (Fig. 6); thus the variants generally displayed a broad spectrum of activity ranges.

GSTs
Third, we selected a GST, PtaGSTU7, of clade A (Fig. 1A, foreground branch in the branch-site model test) and replaced its amino acid at the five selected sites to the corresponding residues in clade B (background branch in the branch-site model test). Altogether, seven PtaGSTU7 mutants were constructed, and they were measured for enzymatic activities toward the six GST substrates (Table 2). To examine the activity changes between these mutants and the natural GSTs in clade A and clade B in Fig. 1A, we conducted a multiple response permutation procedure (MRPP) test. The MRPP is a nonparametric test that is flexible over unequal sample sizes and violation of normality assumptions. The MRPP test showed a significant difference between the PtaGSTU7 mutants and GSTs in clade A (p Ͻ 0.02) but not between the mutants and clade B (p Ͼ 0.05). This result lends further support to the functional significance of the putative positively selected sites and their contribution to the divergence of pine GST family.
The first two positively selected sites (corresponding to Arg-12 and Asn-37 of PtaGSTU17) were located very close to the G-site in the spatial conformation of the protein. Thus, substitution of the residues at these selected sites in clade A could have caused structural changes in the GSH binding pocket and hence pronounced diversification in biochemical function (Table 2 and Fig. 6). The third positively selected site is located in the linker region between the N-and C-terminal domains of the protein. Mutants at this position (P80G and P80D) were less functionally divergent both from each other and from WT. They showed less dispersed enzyme activities than mutants with variants at the other four positive selection sites, demonstrating that changes in this region have less dramatic effects on the enzyme function (Table 2 and Fig. 6).
The last two positively selected sites, corresponding to Lys-119 and Trp-171 of PtaGSTU17, are located in the C-terminal domain and are predicted H-site residues. When Lys-119 of PtaGSTU17 was mutated to Thr, Arg, and Ala, except for K119A toward fluorodifen, the resulting mutants K119T, K119R, and K119A showed significantly lower activities than WT toward all six substrates (p Ͻ 0.05, Mann-Whitney U test) ( Table 2). When Trp-171 was replaced by Leu, Phe, or Ala, the mutants W171L, W171F, and W171A showed decreased activity toward four substrates (CDNB, DCNB, NBC, and Cum-OOH) but significantly increased (1.2-4.4-fold) activity toward fluorodifen ( Table 2). In PtaGSTU17, the side chain of Lys-171 formed part of the wall of the hydrophobic substrate binding pocket (Fig. 5B). The Trp residue had a large indole (benzopyrrole) side chain. When Trp-171 was replaced by Leu, Phe, or Ala, which have smaller side chain residues, the hydrophobic substrate binding pocket of the three mutants could become larger than that of WT, thereby increasing accessibility for large substrates such as fluorodifen. In ANS fluorescence probe binding analysis, all three mutants showed marked increases in fluorescence intensity compared with the wild-type enzyme (Fig. 5C), indicating that more hydrophobic surface was exposed in them. Positive selection at this amino acid site may have driven adaptation of the enzyme to large molecular substrates by increasing the size of the hydrophobic substrate binding pocket. This finding suggests that the substrate specificity of individual GSTs is influenced by the architecture of their access tunnels.

DISCUSSION
Mechanisms that expand protein functional diversity are clearly key evolutionary issues. Large enzyme families with promiscuous functions often acquire specific activities through divergent evolution involving structure-and function-altering amino acid substitutions (39). In nature, advantageous amino acid substitutions may be promoted by positive selection (5,40), but the general structural distributions of positively selected mutations and their effects on protein function remain poorly examined.
Conventional theory emphasizes changes in function produced by amino acid replacements in active sites (41). GSTs have two characteristic catalytic active centers: the primary GSH substrate binding site (G-site) and the secondary hydrophobic substrate binding site (H-site). The G-site defines the primary biochemical function of all GST proteins and thus is highly conserved within each GST class across divergent groups of plants (42,43). The H-site is less highly conserved, enabling the enzymes to interact with diverse substrates and facilitating functional diversification of the GST enzyme family. In this study we detected five positively selected sites in P. tabuliformis Tau GSTs, three of which are not located in active sites, two are located in the H-site; none is located in the G-site of the protein.
In plant GSTs, highly conserved G-site residues are critical for GSH binding and catalytic activity. In particular, one almost absolutely conserved Ser residue (alignment position 19 in Fig.  4) in all plant Tau GSTs is responsible for stabilization of the thiolate anion of enzyme-bound GSH (42,44). We have previously shown that replacement of this Ser residue with Ala abolishes the enzyme catalytic activity toward various hydrophobic substrates (25,45), demonstrating that mutations in the critical active G-site disrupt the essential function of the protein. This strong functional constraint is reflected in the absence of G-site residues among the five putative positively selected sites. In contrast, two of the five positively selected sites are H-site residues. H-sites are located in the C terminus of the protein and are highly variable both in sequence and topology within the Tau GST class (43). In the present study, substitution of the two H-site residues resulted in significant changes in enzymatic activities toward all six tested substrates ( Table 2). These functional alterations could be interpreted in terms of the specific structural requirements for enclosing different substrate molecules. Variation at the H-site could provide an effective way for the GST family to acquire new functions toward environmental stimulants while maintaining the primary GSH binding property through conservation of the G-site. In accordance with this hypothesis, an H-site mutation in a human mu class GST, M2-2, elicited a 1000-fold increase in specific activity toward the substrate trans-stilbene oxide (46).
Our study further showed that mutations outside the active centers may also have marked effects on the protein function, although mutations close to the active sites seem to be more effective than distant ones. In pine Tau GSTs, three of the five positively selected sites are non-active sites, of which two are adja-cent to the G-site and one (Pro-80 in PtaGSTU17) is distant from both active centers. The mutations at Pro-80 in PtaGSTU17 had milder effects on the enzyme function than mutations at the other four positively selected sites (Fig. 6). Studies on Candida antarctica lipase, aspartate aminotransferase, and biphenyl dioxygenase (47) suggest that mutations at sites Ͻ10 Å from a protein active site could alter substrate selectivity or catalytic activity very strongly. This is probably because mutations close to the active site are more likely to induce subtle conformational changes in the substrate binding pocket than distant mutations. Such a pattern supports the notion that strong evolutionary constraints in some regions of proteins can be potentially altered by mutations elsewhere in the structure, i.e. changing the amino acid at one position may modify the ''landscape" of evolutionary constraints at neighboring positions (48). Thus, positive selection acting on residues adjacent to, rather than directly at, a critical active site is likely a general mechanism for functional diversification of enzyme families. Accordingly, positive selection for amino acid replacements outside the active site of the Drosophila JGW protein produced a novel dehydrogenase with altered substrate specificity compared with the ancestral protein (40).
Protein structural and functional properties are defined by rigid and flexible regions. Rigidity is an important factor for the integrity of a protein native folded structure, whereas conformational flexibility is important for binding and catalysis of diverse substrates (49). Protein loop regions are relatively flexible compared with the ␣-helices and ␤-strands they connect. Four of the five putative positively selected sites in pine Tau GSTs identified in this study are located in loop regions, indicating that loop regions of the GST protein can tolerate moderate structural modifications. Furthermore, large numbers of x-ray structures of plant and animal enzyme proteins have revealed that active site residues are often in loop regions (50). This suggests that adaptive functional diversification of enzyme families is achieved through modifications in flexible regions more often than modifications of rigid structural elements, as they allow the acquisition of new functions without disrupting native folding structures.
Plant Tau GSTs play important roles in defense responses against both biotic and abiotic stresses by detoxifying xenobiotics and combating oxidative stress (51). Due to their sessile nature, perennial plants are constantly exposed to a multitude of environmental stresses. Hence, possessing GSTs with diverse activities and high evolutionary flexibility is likely to have high adaptive value, enabling perennial plants to respond to the diverse environmental challenges they are likely to encounter. Secondary metabolism plays key roles in a multitude of plant ecological interactions, so the evolution of efficient, flexible enzyme systems should greatly enhance their fitness. In natural systems, adaptive mutations that generate advantageous new traits (e.g. beneficial shifts in substrate specificity) are particularly difficult to identify, because they may be obscured by the accumulation of subsequent neutral mutations. In addition, as pointed out by Camps et al. (11), whereas adaptive mutations drive positive selection, biological adaptation involves numerous, concomitant compensatory mutations. These compensatory mutations may also be positively selected to suppress del-eterious pleiotropic effects of adaptive mutations in highly constrained regions (11). Distinguishing between adaptive and compensatory mutations remains a challenging task. Phylogenetic approaches alone are insufficient to elucidate the mechanisms that govern the adaptive molecular evolution of enzyme families. Relationships between the structures and functions of proteins are often unclear even when x-ray structures are available. In this study we combined molecular evolutionary analyses, protein structural modeling, and site-directed mutagenesis to define the structural and functional properties of the positively selected sites in Tau GSTs in the P. tabuliformis genome. The use of site-directed mutagenesis to complement molecular evolutionary analyses is a powerful approach that can provide insights into the evolution of enzyme functional divergence. The finding that most of the identified positively selected sites (four of five) in pine Tau GSTs have significant effects on substrate specificity suggests that adaptive evolution of this gene family has been partly driven by selective pressures that have broadened its substrate spectrum and activities. We propose that amino acid replacements accumulated by GST have widened its functions as a consequence of adaptive evolution. Our results provide new insights into the adaptive molecular evolution of the GST enzyme family in perennial plants. They have interesting implications for our understanding of how enzyme families evolve and contribute to our understanding of the selective context leading to biological divergence.