Identification of the allosteric site for neutral amino acids in the maize C4 isozyme of phosphoenolpyruvate carboxylase: The critical role of Ser-100

The isozymes of photosynthetic phosphoenolpyruvate carboxylase from C4 plants (PEPC-C4) play a critical role in their atmospheric CO2 assimilation and productivity. They are allosterically activated by phosphorylated trioses or hexoses, such as d-glucose 6-phosphate, and inhibited by l-malate or l-aspartate. Additionally, PEPC-C4 isozymes from grasses are activated by glycine, serine, or alanine, but the allosteric site for these compounds remains unknown. Here, we report a new crystal structure of the isozyme from Zea mays (ZmPEPC-C4) with glycine bound at the monomer–monomer interfaces of the two dimers of the tetramer, making interactions with residues of both monomers. This binding site is close to, but different from, the one proposed to bind glucose 6-phosphate. Docking experiments indicated that d/l-serine or d/l-alanine could also bind to this site, which does not exist in the PEPC-C4 isozyme from the eudicot plant Flaveria, mainly because of a lysyl residue at the equivalent position of Ser-100 in ZmPEPC-C4. Accordingly, the ZmPEPC-C4 S100K mutant is not activated by glycine, serine, or alanine. Amino acid sequence alignments showed that PEPC-C4 isozymes from the monocot family Poaceae have either serine or glycine at this position, whereas those from Cyperaceae and eudicot families have lysine. The size and charge of the residue equivalent to Ser-100 are not only crucial for the activation of PEPC-C4 isozymes by neutral amino acids but also affect their affinity for the substrate phosphoenolpyruvate and their allosteric regulation by glucose 6-phosphate and malate, accounting for the reported kinetic differences between PEPC-C4 isozymes from monocot and eudicot plants.

During photosynthesis, to fix and reduce carbon and to synthesize simple sugars, the ATP and NADPH generated in the light reactions are used in the carbon-fixation reactions. In many plant species, known as C 3 type, these reactions occur exclusively in the chloroplast by means of the C 3 pathway, also known as the Calvin-Benson cycle, in which the starting and ending compound is ribulose 1,5-bisphosphate. In this pathway, the carboxylation step is catalyzed by ribulose 1,5-bisphosphate carboxylase/oxygenase (Rubisco). 3 The majority of angiosperms are of the C 3 type, but there is a group of plants, called the C 4 type, that include agronomically important crops, such as maize (Zea mays), sorghum (Sorghum bicolor), sugarcane (Saccharum officinarum), and common millet (Panicum miliaceum), as well as most of the weeds in which the Calvin-Benson cycle is not the only pathway used in the carbon-fixation reactions. In the C 4 plants, two metabolic pathways are involved in CO 2 assimilation: the C 4 and the C 3 pathways (1). The C 4 pathway provides a CO 2 -concentrating mechanism that ensures high CO 2 levels in the environment of the C 3 pathway enzymes, greatly favoring the carboxylating activity of Rubisco over its oxygenating activity. This results in an important reduction in photorespiration, a process that in C 3 plants causes significant losses in photosynthetic efficiency, as well as in a better use of water and nitrogen. As a consequence, C 4 plants exhibit an increased productivity, especially in sunny, dry, and hot environments.
The initial and highly regulated step in the C 4 pathway takes place in the cytosol of mesophyll cells and consists in the irreversible carboxylation of 2-phosphoenolpyruvate (PEP) by the C 4 photosynthetic isozyme of phosphoenolpyruvate carboxylase (orthophosphate:oxaloacetate carboxylyase (phosphorylating), EC 4.1.1.31), yielding oxaloacetate and P i . The carboxylating reactions of PEPC-C 4 and Rubisco take place in different cell types in the leaves of those plants that have the two-cell C 4 pathway (also called Kranz type because of the anatomy of their leaves), which is by far the most widespread, whereas these reactions take place in different intracellular compartments in those plants that have the single-cell C 4 pathway (2,3). In addition to transport of some of the pathway intermediates between cells or intracellular compartments, in the two kinds of C 4 pathways, the three essential steps are the initial fixation of CO 2 by a PEPC-C 4 isozyme to form a C 4 acid, the decarboxylation of the C 4 acid to release CO 2 in the Rubisco environment, and the regeneration of the primary CO 2 acceptor PEP.
Because of the increased need of food to satisfy the requirements of the growing human population, much effort has been, and still is, devoted to the improvement of the photosynthetic performance and therefore the yield of C 3 plants, particularly those of agronomically and economically important crops, such as wheat (Triticum aestivum), rice (Oryza sativa), barley (Hordeum vulgare), soybean (Glycine max), and potato (Solanum tuberosum). This goal has been attempted by introducing C 4 pathway genes, particularly that coding for the PEPC-C 4 isozyme from maize, using conventional transgenic techniques so they could have the CO 2 -concentrating mechanism of C 4 plants.
The importance of the reaction catalyzed by PEPC-C 4 isozymes in the photosynthetic metabolism of C 4 plants is underscored by their allosteric regulation (for reviews, see Refs. 4 -6), which is particularly complex in the case of isozymes from some monocot C 4 plants, such as that from maize (ZmPEPC-C 4 ) (Fig. 1). It has long been known that, at physiological pH, ZmPEPC-C 4 is activated by phosphorylated trioses and hexoses, particularly D-glucose 6-phosphate (G6P) (7)(8)(9)(10)(11)(12)(13), by its substrate PEP in its free form, i.e. not complexed with Mg 2ϩ ions (13,14), as well as by the neutral amino acids glycine, serine, and alanine (8 -11, 15-19). In addition, it is allosterically inhibited by L-malate (20). Whereas the activation by G6P and the inhibition by L-malate or L-aspartate are common to most PEPC enzymes biochemically characterized to date, the activation by neutral amino acids has only been found in PEPC-C 4 isozymes from monocot grasses (Poaceae family) (8,9,11,15,16,19). As exemplified by ZmPEPC-C 4 , the activation by neutral amino acids of the PEPC-C 4 isozymes from grasses appears to be crucial for their activity under conditions close to those assumed to be physiological given that malate inhibition is significantly reverted by these compounds but not by G6P (18). The structural basis of these kinetic differences between monocot and eudicot PEPC-C 4 isozymes is not known at present.
The location in the three-dimensional structure of different PEPC enzymes of the allosteric sites for carboxylic acids and for phosphorylated sugars was predicted by site-directed mutagenesis and confirmed by X-ray crystallography (for a review, see Ref. 6). To understand the mechanism underlying allosteric regulation, it is of pivotal importance to identify the protein residues that interact with the allosteric effector. The allosteric site for malate was unequivocally identified in the three-dimensional structure of the PEPC-C 4 isoenzyme from Flaveria trinervia (FtPEPC-C 4 ), which was crystallized bound to an aspartate molecule (Ref. 21; Protein Data Bank code 3ZGE). This allosteric site appears to be highly conserved in PEPC enzymes PEPC-C4 Figure 1. Schematic representation of the allosteric regulation of ZmPEPC-C 4 . In C 4 plants, the first carboxylation reaction that incorporates the atmospheric CO 2 in an organic compound is catalyzed by PEPC-C 4 isozymes located in the mesophyll cells. The product of this reaction, oxaloacetate, is then reduced to malate, as in maize shown here, or converted to aspartate in other plants; both compounds are important allosteric inhibitors of most PEPC enzymes. During the illumination period, L-malate accumulates to high concentrations in the mesophyll cells from where it passively diffuses to the bundle-sheath cells where it is decarboxylated to yield CO 2 , which is then incorporated in the Calvin-Benson cycle by Rubisco, and pyruvate, which moves back to the mesophyll cells, regenerating PEP. The allosteric activators trioses-P and hexoses-P are mainly produced in the bundle-sheath cells from the 3-phosphoglycerate (PGA) formed in the Rubisco-catalyzed reaction but also may be produced in mesophyll cells given that 3-phosphoglycerate may diffuse from the bundle-sheath cells. Neutral amino acids (glycine and serine) are intermediates of the photorespiration pathway; they may also be transported to the mesophyll cell where they activate PEPC-C 4 isozymes from monocot grasses, thus counteracting malate inhibition.

Allosteric site for neutral amino acids in ZmPEPC-C 4
because that of FtPEPC-C 4 is very similar to the allosteric site previously found in the PEPC enzyme from Escherichia coli (Refs. 22-24; Protein Data Bank codes 1QB4, 1FIY, and 1JQN). The ZmPEPC-C 4 allosteric site for G6P was proposed to be located between the two monomers of a dimeric unit of the native tetrameric PEPC structure because a sulfate anion was found bound at this site (24). This suggested location was in agreement with previous site-directed mutagenesis studies on residues assumed to be involved in binding G6P (25,26). Moreover, this is a conserved region in other plant PEPCs as shown in the crystal structures of the PEPC-C 3 isozymes from Flaveria pringlei (Ref. 21; Protein Data Bank code 3ZGB) and Arabidopsis thaliana (Protein Data Bank code 5FDN). However, Schlieper et al. (27) have recently questioned the relevance of this site for activation. With regard to the allosteric site for neutral amino acids, its existence has been predicted on the basis of biochemical data, which indicate that it is distinct from the allosteric site for G6P (11,13,17,18,28), but its position remains unknown. Site-directed mutagenesis experiments on ZmPEPC-C 4 suggested that it might be formed by residues of the monomer-monomer interface of the dimeric units of the native tetramer, contiguous to the proposed G6P allosteric site (26), and by residues in a loop at the C-terminal region of the protein (29).
With the aim of unequivocally locating and characterizing the allosteric site for neutral amino acids and to determine the structural differences between monocot and eudicot PEPC-C 4 isozymes accounting for their kinetic differences, especially in regard to their sensitivity to neutral amino acid activation, we obtained a new crystal structure of the ZmPEPC-C 4 isozyme which showed, for the first time, a glycine molecule bound to its allosteric site at the predicted monomer-monomer interface, close to the proposed G6P allosteric site. In addition, we found that a serine residue (Ser-100), which was not previously identified by site-directed mutagenesis and does not directly interact with the bound glycine, is critical for allowing binding of the neutral amino acids, whereas a lysine at this position, as found in the FtPEPC-C 4 isozyme would impede their binding. Accordingly, the ZmPEPC-C 4 S100K mutant was insensitive to activation by neutral amino acids. Multiple alignment studies of the known PEPC-C 4 amino acid sequences indicated that a serine or glycine is present at this position in PEPC-C 4 isozymes from the Poaceae family, but a lysine is present at this position in the isozymes from the monocot sedges (Cyperaceae family) and eudicots families, thus explaining the reported lack of activation by neutral amino acids of eudicot PEPC-C 4 isozymes. In addition, Ser-100 appears to play an important role in determining other kinetic properties of ZmPEPC-C 4 and presumably of every PEPC-C 4 from grasses, different from those of the eudicot PEPC-C 4 isozymes.

Crystallographic evidence of the allosteric site for neutral amino acids in ZmPEPC-C 4
The structure of the recombinant ZmPEPC-C 4 in complex with glycine was determined at 3.3 Å resolution. The atomic coordinates and structure factors have been deposited in the Protein Data Bank under accession code 5VYJ. All data collection and refinement statistics of the structure are summarized in Table 1.
This new crystal structure consists of a ␣␤-barrel with eight central ␤-strands surrounded by at least 44 ␣-helices ( Fig. 2A), very similar to the other known PEPC three-dimensional structures, including those of the enzyme from E. coli (22- 5FDN). The asymmetric unit of the crystal reported here contained the four subunits that constitute the biological unit of the enzyme, which can be considered a dimer of dimers, thus allowing the observation of the dimer-dimer interface for the first time. The asymmetric units of the previously reported PEPC crystals were either a dimer (1JQO) or two monomers belonging to different dimeric units of the tetramer (1QB4, 1JQN, 4BXH, 4BXC, 3ZGE, 3ZGB, and 5FDN). Also, for the first time, in each monomer of the 5VYJ crystal, a clear electron density was observed in crevices that open to the solvent at the monomer-monomer interface of the dimeric units of the tetramer, adjacent to the G6P allosteric site. A glycine molecule with occupation of 1 accounted for the extra electronic density as indicated by a simulated annealing omit map F o Ϫ F c at 3.0 (Fig. 2, B and C). Modeling other low-molecular-weight com-

Allosteric site for neutral amino acids in ZmPEPC-C 4
pounds present in the crystallization medium, glycerol, acetate, or ethylene glycol, gave no satisfactory results (Fig. S1). The glycine molecule makes hydrogen bonds through its ␣-carboxyl group with the side chains of Arg-334 and Trp-333 of one subunit and through its ␣-amino group with the sidechain carboxyl group of Glu-229 of the neighboring subunit. Glu-229 in turn is hydrogen-bonded to the side chains of Ser-100 and His-104 of its own subunit. In addition to its binding to the ␣-carboxyl of the bound glycine, Arg-334 participates in a web of interactions with side chains of residues of the opposing subunit that contribute to the stabilization of the monomermonomer interface. Of particular relevance appears to be interactions of Arg-334 with Glu-928 and Arg-226, which is in turn bound to the side-chain carboxyl of Asp-428 of the same subunit as Arg-334. In the ZmPEPC-C 4 5VYJ crystal structure, Arg-334 and Glu-928 close the allosteric site, acting as a lid that The red arrow marks the position of the glycine activator in the 5VYJ ZmPEPC-C 4 crystal structure reported here. E, differences in the conformation of critical residues of the allosteric site for amino acids between the 5VYJ ZmPEPC-C 4 in complex with glycine (green carbon atoms) and the 1JQO ZmPEPC-C 4 structure (magenta carbon atoms). Hydrogen bonds are depicted as black dashed lines (C) or green and magenta (E); cutoff is 3.0 Å. In B, C, and E, the side chains of relevant protein residues are shown as sticks with oxygen atoms in red, nitrogen in blue, and carbons in green, yellow, or magenta. Glycine and acetate molecules are depicted as spheres colored similarly but with black carbons. A, C, and E were generated using PyMOL. B and D were generated using the UCSF Chimera package (52).

Allosteric site for neutral amino acids in ZmPEPC-C 4
should be open to allow the entrance of the amino acid activator. The "open" conformation of this allosteric site can actually be observed in the previously reported 1JQO ZmPEPC-C 4 structure as shown in Fig. 2D in which no activator molecule is bound. In the empty allosteric site for neutral amino acids of the 1JQO crystal there is free access of the amino acid activator due to the "outside" conformations of both Arg-334 and Glu-928. The high B-factor values of the side chains of these two residues in the 1JQO structure (96 for Arg-334 and 132 for Glu-928, much higher than the average B-factors values) compared with those in the 5VYJ crystal (22 and 20, respectively, similar to the average B-factors value) indicate their mobility in the absence of the activator glycine and the flexibility of this region needed to allow the entry and binding of the activator in the allosteric site. Indeed, the conformation of the residues surrounding the ␣-amino group of glycine is similar in both structures except that of Arg-334, whose side chain moves about 5 Å from a position exposed to the solvent in the 1JQO structure toward the ␣-carboxyl group of glycine in the 5VYJ structure reported here (Fig. 2E). This movement, most likely induced by the binding of glycine, makes room for the side chain of Glu-928, whose carboxyl group also moves about 11 Å from its position in the protein surface toward the guanidinium group of Arg-334 of the neighboring subunit, as shown in Fig. 2E. Also, in the 5VYJ structure, the imidazole ring of His-53 rotates about 25°from its position in the 1JQO structure, interacting with Glu-928 of its own subunit (not shown in the figure for clarity). Therefore, the binding of glycine significantly contributes to the stabilization of the monomer-monomer interface of the ZmPEPC-C 4 isozyme, not only by bridging the two monomers through the hydrogen bonds made by its ␣-amino and ␣-carboxyl groups but also by promoting the formation of several hydrogen bonds not present when the activator is not bound. This may account, at least in part, for the activation effect of glycine.
The 5VYJ crystal reported here also shows how the allosteric site for neutral amino acids and the proposed allosteric site for G6P are connected in the ZmPEPC-C 4 isozyme, thus explaining the synergistic effects of both kinds of allosteric activators observed before (11,12,17,18). This connection is established by Asp-228, whose side-chain carboxyl group makes hydrogen bonds with the side chain of Arg-232 (Fig. 2C), a conserved residue in PEPC-C 4 isozymes (see below). This is in agreement with the previous observations that the affinity for glycine decreased when Asp-228 was changed to an asparagine and that the affinity for both glycine and G6P decreased when Arg-232 was changed to a glutamine (26). Therefore, the interactions of Asp-228 with Arg-232 appear to play an important role in the activation by both kinds of activators, neutral amino acids and G6P.
To explore whether L-serine or L-alanine, the other two neutral amino acids that have been reported as activators of ZmPEPC-C 4 (11,15,18), could also be bound in the same site as Gly as well as to learn the possible mode of their binding, we constructed in silico energy-minimized models with L-alanine or L-serine docked in this region. We also docked D-serine and D-alanine inside the Gly-binding site given that they were also found to be activators of the ZmPEPC-C 4 isozyme (11). In the models, shown in Fig. S2, the side chains of L/D-alanine or L/D-serine fit well into this site without producing any steric clash, and their ␣-carboxyl and ␣-amino groups interact with protein residues in the same manner as do the glycine groups in the crystal structure reported here. The results of these simulations support that the site where glycine was observed bound in the 5VYJ crystal is indeed the allosteric site for neutral amino acids.

Comparison of ZmPEPC-C 4 -Gly and FtPEPC-C 4 crystal structures
To determine the structural differences between the PEPC-C 4 isozymes that are allosterically activated or nonactivated by neutral amino acids, we compared the region of the allosteric site for neutral amino acids of the 5VYJ crystal structure with the equivalent region in the reported crystal structures of FtPEPC-C 4 . Despite that every residue that interacts with glycine in the 5VYJ ZmPEPC-C 4 crystal structure is conserved in the FtPEPC-C 4 isozyme (although some of them have a different conformation), we observed a critical change in a nearby residue that does not make a direct interaction with the bound glycine: Ser-100 in the maize enzyme is changed to a lysine (Lys-96) in the Flaveria enzyme. Interestingly, the sidechain amino group of Lys-96 occupies a position similar to that of the ␣-amino group of the bound glycine in the ZmPEPC-C 4 5VYJ crystal (Fig. 3A), but it is hydrogen-bonded to the carboxyl group of Asp-223 (equivalent to Asp-228 in the maize enzyme) instead of being bonded to the carboxyl group of Glu-224 (equivalent to Glu-229 in the maize enzyme). Moreover, the side-chain carboxyl of Asp-223 has a position comparable with that of the ␣-carboxyl of the glycine bound in the maize enzyme and forms hydrogen bonds with the same tryptophanyl (Trp-328 in FtPEPC-C 4 equivalent to Trp-333 in ZmPEPC-C 4 ) and arginyl (Arg-329 in FtPEPC-C 4 equivalent to Arg-334 in ZmPEPC-C 4 ) residues as the activator molecule in the maize enzyme. Consequently, in the Flaveria PEPC-C 4 crystals, the neutral amino acid allosteric site does not exist because the side chains of Lys-96 and Asp-223 totally fill up the cavity where glycine binds in the ZmPEPC-C 4 5VYJ crystal (Fig. 3B). The different position of Asp-223 in the FtPEPC-C 4 crystals with respect to that of the equivalent Asp-228 in the two known ZmPEPC-C 4 structures is due to a Ϫ95°change in the angle of this aspartate, resulting in the movement of its side chain toward the region equivalent to the neutral amino acid allosteric site. Also, in the FtPEPC-C 4 crystals, Glu-224 (equivalent to Glu-229 in the maize enzyme) has a different conformation than in the ZmPEPC-C 4 crystals. Its side chain is outside the Gly-binding site, interacting with Asn-362 of the opposing subunit (Fig. 3C), whereas in the maize crystals the asparagine at the equivalent position (Asn-368) is exposed to the solvent. Interestingly, the interactions between the activator molecule and protein residues that stabilize the dimer-dimer interface in the region of the neutral amino acid allosteric site of the maize enzyme are made between protein residues in the Flaveria enzyme. Finally, the side chain of Glu-922 of FtPEPC-C 4 (equivalent to ZmPEPC-C 4 Glu-928, a residue that interacts with Arg-334 in the 5VYJ crystal) is only observed in two of the FtPEPC-C 4 crystals where it is exposed to the solvent (Fig. S3).

Allosteric site for neutral amino acids in ZmPEPC-C 4 Biochemical evidence supporting the identification of the allosteric site for neutral amino acids in the ZmPEPC-C 4 -Gly crystal structure
To prove the critical role of the residue at position 100 of ZmPEPC-C 4 in its allosteric activation by neutral amino acids, we substituted Ser-100 with a lysine by site-directed mutagenesis and compared the kinetics of activation by neutral amino acids of the S100K mutant with those of the WT ZmPEPC-C 4 enzyme (Fig. 4). The assays were conducted at pH and concentrations of substrates believed to be close to those prevailing in vivo under illumination conditions (pH 7.4, 0.1 mM bicarbonate, 3 mM PEP, 0.4 mM free Mg 2ϩ , 20 mM L-malate) as discussed previously by Tovar-Méndez et al. (18). As shown in Fig. 4, A and B, glycine greatly increases the WT recombinant enzyme activity even in the presence of the high L-malate concentrations characteristic of the day period in a manner similar to that observed in the enzyme purified from maize leaves (18). In contrast, glycine was not able to activate the S100K mutant either in the absence or presence of the inhibitor, thus confirming the importance of the size and charge of the residue at position 100 for the amino acid activator to bind to its allosteric site. Neither was the S100K mutant enzyme activated by L-serine and L-alanine (Fig. 4, C and D). The kinetic parameters estimated from the neutral amino acid saturation data are given in Table 2.
The ZmPEPC-C 4 S100K mutant also showed other significant kinetic differences from the ZmPEPC-C 4 WT ( Fig. 5 and Table 3). These differences mainly consist in a higher affinity for the substrate and activator (Fig. 5, A and B) and a lower affinity for the inhibitor (Fig. 5C). Interestingly, G6P counteracted the inhibition by L-malate of the S100K mutant (Fig. 5D), whereas it did not in the WT enzyme as reported previously (18). Thus, the residue at position 100 of ZmPEPC-C 4 appears to be determinant not only of its activation by neutral amino acids but also of its general kinetic behavior. Interestingly, the global kinetic properties of the S100K mutant resemble more those of the PEPC-C 4 isozymes from eudicot plants (19) than of the WT ZmPEPC-C 4 . These kinetic differences could not be attributed to changes in the association state, folding, or stability of the mutant enzyme. Both the mutant and WT recombinant enzymes were obtained in the soluble fraction of the E. coli cells extracts with a similar yield, indicating that both recombinant proteins were properly folded. Also, the native structure of both proteins is tetrameric (Fig. S4A), and both exhibit a similar thermal stability in thermal-shift experiments with apparent transition temperatures (T m ) of 48.8°C for the WT and 47.75°C for the S100K mutant (Fig. S4B). . Amino acid side chains are depicted as sticks with carbon atoms in orange or gray (depending on the monomer of the dimeric unit), oxygen atoms in red, and nitrogen atoms in blue. B, surface representation of the monomer-monomer interface of the FtPEPC-C 4 dimer, showing that the allosteric site for neutral amino acids does not exist because the position of the above mentioned residues does not leave a cavity between the two monomers. The side chains of relevant protein residues are shown as sticks with oxygen atoms in red, nitrogen in blue, and carbons in black. Of the three reported crystal structures of FtPEPC-C 4 , the one shown here is Protein Data Bank code 3ZGE. C, differences in the conformation of critical residues of the allosteric site for amino acids between the 5VYJ ZmPEPC-C 4 (green carbon atoms) and FtPEPC-C 4 3ZGE (orange carbon atoms) crystal structures. Hydrogen bonds are depicted as dashed lines, black in A and green (maize isozyme) or orange (Flaveria isozyme) in C; cutoff is 3.0 Å. A and C were generated using PyMOL, and B was generated using the UCSF Chimera package (52).

Residue conservation at the allosteric site for neutral amino acid in plant PEPC-C 4 isozymes
Because PEPC-C 4 isozymes from monocots are activated by neutral amino acids, whereas isozymes from eudicots are not, we considered it of interest to investigate the degree of conser-vation at the amino acid positions of the allosteric site for neutral amino acids found in the ZmPEPC-C 4 crystal structure (Protein Data Bank 5VYJ). Using the proposed criterion to identify PEPC-C 4 isozymes, that is the presence of serine at a

Allosteric site for neutral amino acids in ZmPEPC-C 4
position equivalent to 774 of the FtPEPC-C 4 isozyme (30) or equivalent to position 780 of the ZmPEPC-C 4 isozyme, a total of 172 nonredundant protein sequences (allelic forms excluded) were initially identified as PEPC-C 4 in the RefSeq protein database of the National Center for Biotechnology Information (NCBI), and two were identified in the Phytozome version 12.1.5 database. Of these, only 138 sequences are from C 4 plants (118 from monocots and 20 from eudicots), and thus we consider them to be true PEPC-C 4 isozymes (Table S1); the rest were from eudicot C 3 plants (three sequences) or CAM plants (31 sequences) ( Table S2). The latter finding is consistent with the proposed shared origin of some C 4 and CAM PEPC isozymes (31). Another PEPC sequence that possesses a Ser at position 780 is that encoded by the gene EEF27881 (Gen-Bank TM accession number) from Ricinus communis; it is a bacterial-like PEPC, similar to those previously found in Arabidopsis and rice (32), and therefore was not included in our analysis. So far, the only known sequence for a PEPC-C 4 isozyme from a C 4 plant from the Hydrocharitaceae family (that from Hydrilla verticillata) does not have a serine at this position. Hence, the presence of Ser-780 is not a sufficient criterion to distinguish PEPC-C 4 from other PEPC isozymes as previously pointed out by others (33). Although our sequence analysis was restricted because most of the reported PEPC-C 4 sequences were partial, lacking the Nand/or C-terminal regions (only 13 sequences from monocots and five sequences from eudicots were complete), and in the case of eudicots only sequences from four of the 14 families known to contain at least one C 4 member have been reported to date, after multiple amino acid sequences alignments, we prepared sequence logos of selected residues of the monocot and eudicot sequences showing their degree of conservation (Fig.  S5). Of the 16 PEPC-C 4 sequences from grasses that show the residue at position 100, in 11 of them this residue is serine, as in ZmPEPC-C 4 , and in the other 5 the residue at this position is glycine, which is a conservative change from the point of view of size and charge. The H. verticillata PEPC-C 4 sequence also has a serine at the position equivalent to Ser-100 of ZmPEPC-C 4 , but in the Hydrilla enzyme the possible activation by neutral amino acids has not been tested. The two PEPC-C 4 sequences reported from sedges that include this position have lysine, similar to the only eight PEPC-C 4 sequences from different eudicot families that include the residue at this position (Fig. S5A). The PEPC-C 4 isozymes from sedges have not been biochemically characterized yet, so we do not know whether they are activated by neutral amino acids, although on the basis of the presence of lysine at position 100 we would predict that they are not. In regard to the PEPC-C 4 isozymes from eudicots, those studied in this respect (Amaranthus retroflexus, Amaranthus tricolor, Amaranthus hypochondriacus, and Portulaca oleracea) are insensitive to neutral amino acids (15,19). The evolutionary conservation of these two different kinds of residues, small and neutral or large and positively charged, at this critical position strongly supports the importance of this trait. We also found that the residues directly involved in the binding of the activator glycine (Glu-229, Trp-333, and Arg-334; ZmPEPC-C 4 numbering) are absolutely conserved in both monocot and eudicot PEPC-C 4 sequences (Fig. S5B). In regard to the residues involved in the web of hydrogen bonds that stabilize the conformation of the neutral amino acid allosteric site, His-104, Arg-226, Asp-428, Glu-928 (residues that interact with those that bind the activator molecule), Asp-228, and Arg-232 (the residue that links the neutral amino acid and the proposed G6P allosteric sites as mentioned above) also are totally conserved or conservatively changed in the case of His-104 (changed to asparagine in some monocots or to glutamine in some eudicots sequences) and of Glu-928 (which is aspartate in some of the eudicot sequences) (Fig. S5C).
The conservation of other ZmPEPC-C 4 residues that have been related to the allosteric site for neutral amino acids by previous site-directed mutagenesis studies (Lys-927, Glu-932, Lys-934, Gly-937, and Lys-940) (29) as well as the two Pro residues (Pro-915 and Pro-949) that flank the highly flexible loop in which these residues are included was also investigated (Fig.  S5D). The polar nature of the charged residues is highly conserved, and the two prolines are absolutely conserved, suggesting that this exposed loop exists in every PEPC-C 4 isozyme and that the flexibility of the loop flanked by these prolines is important for their activity and/or allosteric regulation. The most conspicuous change is that of Gly-937, which is the most frequent residue in the PEPC-C 4 monocot sequences, although some have aspartate or glutamate, and is glutamate and less often aspartate in the eudicot PEPC-C 4 sequences.

Identification of the allosteric site for neutral amino acids
To understand the mechanism underlying allosteric regulation, it is of pivotal importance to identify the protein residues that interact with the allosteric effectors. The crystal structure of ZmPEPC-C 4 in complex with glycine reported here reveals the position in the three-dimensional structure and the structural details of the allosteric site for neutral amino acids, which remained unknown. The comparison of ZmPEPC-C 4 5VYJ crystal structure with that of the FtPEPC-C 4 isozyme already known indicated that the main structural reason for the lack of activation by amino acids in PEPC-C 4 isozymes from eudicot plants is the presence of a lysine at position 100 of the maize enzyme instead of the serine or alanine residue that the isozymes from monocots have. The crystallographic data show that there is no room for the binding of the neutral amino acids in the Flaveria enzyme because of the large and positively charged side chain of this lysine and the interactions it makes with a nearby aspartate residue conserved in both monocot and eudicot PEPC-C 4 isozymes. In fact, the cavity between dimers that forms the allosteric site for neutral amino acids in the ZmPEPC-C 4 does not exist in the FtPEPC-C 4 enzyme. Moreover, despite the scarce number of complete sequences reported to date, the conservation of serine at position 100 in the PEPC-C 4 sequences from grasses and the presence of lysine at the equivalent position in those from sedges and eudicots strongly support the critical role played by this residue in determining the different susceptibility to activation by neutral amino acids of these two kinds of PEPC-C 4 isozymes, which was confirmed by the lack of activation by these compounds of the ZmPEPC-C 4 S100K mutant. Because other kinetic properties

Allosteric site for neutral amino acids in ZmPEPC-C 4
were also significantly affected in this mutant, we propose that in this respect there are two different kinds of PEPC-C 4 isozymes: the Ser/Gly-100 type or the Lys-100 type. The available PEPC-C 4 sequences indicate that of the three monocot families that include species with C 4 metabolism (Cyperaceae, Hydrocharitaceae, and Poaceae) the isozymes from plants of the Poaceae family (grasses), which are economically the most important, and probably the only PEPC-C 4 from the Hydrocharitaceae family are of the Ser/Gly-100 type, whereas those from plants of the Cyperaceae family (sedges) are of the Lys-100 type. Likewise, PEPC-C 4 sequences from eudicot plants appears to be of the Lys-100 type according to the absolute conservation of this residue in the known sequences that show this position.
Ser-100 was not identified as an important residue for glycine activation in the previously reported site-directed mutagenesis studies, which concluded that Arg-226, Asp-228, Glu-229, and Arg-232 are the important residues (26). The ZmPEPC-C 4 5VYJ crystal structure reported here with a glycine molecule bound shows that Glu-229 actually participates in the binding of the activator molecule and that the other three residues are involved in maintaining the proper conformation of the allosteric site by means of a web of hydrogen bonds (Fig. 2C). Later, it was suggested that Lys-927 and Gly-937 form the glycinebinding site of the maize enzyme and that other charged residues in the loop to which they belong are also important for binding of the activator (29). However, the ZmPEPC-C 4 5VYJ crystal structure reported here shows that neither Lys-927 nor Gly-937 participate in the binding of the activator. The residues mentioned in the referenced work belong to an exposed loop that extends from Pro-915 to Pro-949 in ZmPEPC-C 4 and from Pro-909 to Pro-945 in FtPEPC-C 4 . This loop is so flexible that a great part of it could not be observed in the reported crystal structures (Fig. S3A). Lys-927 is exposed to the solvent in every plant PEPC structure so far reported, and Lys-940 is also exposed to the solvent in the maize crystal structures but is not observed in the Flaveria crystal structures. The residues equivalent to Glu-932 and Lys-934 are not observed in any of the reported PEPC-C 4 crystal structures, whereas the residue at the position equivalent to Gly-937 is not observed in the FtPEPC-C 4 crystal structures. Gly-937 was proposed to be critical for the ability of monocot PEPC-C 4 isozymes to be activated by neutral amino acids in contrast to the eudicot isozymes because it was assumed that the presence of glycine at this position is a distinct feature of monocot PEPC-C 4 isozymes that distinguishes them from the eudicot PEPC-C 4 isozymes where there is an aspartate or glutamate at this position (29). However, our sequence analysis indicated that not all monocot sequences have Gly-937; some of them, even of the Poaceae family, have aspartate or glutamate (Table S1). This finding suggests that the residue at position 937 is not as critical for neutral amino acid activation as is the residue at position 100. Nevertheless, to determine the possible structural reason for the reported effect that the change of Gly-937 of the maize enzyme to an aspartate had on the response to glycine (29), we performed the in silico mutations G937D and G937E (Fig. S6). Using the ZmPEPC-C 4 5VYJ crystal structure with glycine bound, we found that either an aspartate or glutamate at the position equivalent to Gly-937 would be exposed to the solvent, not interacting with other protein residues in any of their possible rotamers (Fig. S6, A and  B). But when we simulated the possible exposed conformations of Arg-334 and Glu-928 that could occur when the allosteric site for neutral amino acid is empty as indicated by the ZmPEPC-C 4 1QJO structure, we found rotamers of Arg-334 and of Asp/Glu-937 that can interact (Fig. S6, C and D). This interaction would stabilize the exposed conformation of Arg-334, opposing its movement toward the bound activator molecule to form the critical hydrogen bond with the ␣-carboxyl group as well as the other hydrogen bonds with other protein residues that stabilize the activator-bound conformation of the allosteric site. Therefore, it appears that the presence of aspartate/glutamate at position 937 in PEPC-C 4 isozymes from monocot plants could decrease the affinity to neutral amino acids by somehow limiting the movement of Arg-334 into the allosteric site where this residue importantly contributes to the binding of the carboxyl group of the activator. Additionally, the substitution of charged residues for neutral residues in the Pro-915 to Pro-949 loop could affect the dynamics of this loop, which in turn may negatively affect the allosteric transition induced by the binding of neutral amino acids. This dynamic effect could explain the decreased sensitivity to glycine caused by the amino acid substitutions in this loop.

Implications for the understanding of the kinetics and allosteric regulation of the PEPC-C 4 isozymes
In the absence of the inhibitor L-malate, neutral amino acids activate ZmPEPC-C 4 roughly to the same extent as G6P (8,10,11,18), but when this enzyme is assayed in the presence of malate, glycine or serine counteract the inhibition of the ZmPEPC-C 4 enzyme by L-malate to a much higher degree than G6P (17,34). This is particularly true when the assays are carried out at conditions close to those prevailing during the day, i.e. high L-malate concentrations (35,36) and low CO 2 concentration (37), when G6P is unable to significantly revert the inhibition caused by L-malate, whereas glycine or L-serine offset this inhibition, producing an enzyme almost as active as that in the absence of the inhibitor (Refs. 18 and 19 and results in the present work shown in Fig. 4B). At first sight, this result contradicts previous reports of G6P offsetting malate inhibition of ZmPEPC-C 4 isozymes (10, 17) and other PEPC-C 4 isozymes such as that from Hydrilla (38) and those from eudicot plants reported by Rosnow et al. (39). In the case of the eudicot isozymes, the reports are consistent with the results obtained with the ZmPEPC-C 4 S100K mutant presented here. In the case of the monocot PEPC-C 4 isozymes, it has to be taken into account that these previous results were obtained at nonphysiologically high Mg 2ϩ ion and bicarbonate concentrations, and under those experimental conditions the concentration of the preferred substrate, the complex Mg-PEP (13), when the total PEP concentration is kept at 3 mM is almost saturating, which greatly increases the affinity for the activator G6P and greatly diminishes that for the inhibitor L-malate. But under conditions near those estimated to prevail during the day, when the concentration of Mg-PEP is well below saturation, the activation by glycine and/or L-serine, not by G6P, appears to be crucial for achieving appreciable levels of ZmPEPC-C 4 activ-

Allosteric site for neutral amino acids in ZmPEPC-C 4
ity and, therefore, a significant rate of CO 2 assimilation (18). Given this, we believe that it is of pivotal importance that PEPC-C 4 isozymes are in vitro tested at near physiological concentrations of their substrates to correctly evaluate the role played by their allosteric effectors.
If the activation by neutral amino acids is so critical, the question arises of how do those C 4 plants whose PEPC-C 4 isozyme is not activated by neutral amino acids contend with L-malate inhibition. Noteworthy, the kinetic properties of the ZmPEPC-C 4 S100K mutant suggest that it would be more active than the WT enzyme at low PEP concentrations in the absence of the activator regardless of the absence or presence of L-malate. Also, of particular relevance in this respect, is that G6P is able to revert to a significant extent the inhibition by malate of the S100K mutant, contrary to the WT enzyme. These kinetic properties of the S100K mutant are, at least qualitatively, more similar to those of the PEPC-C 4 isozyme from the eudicot plant A. hypochondriacus (AhPEPC-C 4 ) than to those of the WT ZmPEPC-C 4 . Our findings suggest that these kinetic differences between the Ser/Gly-100 -type and the Lys-100 -type PEPC-C 4 isozymes could explain why the PEPC-C 4 isozymes from eudicot plants and likely also those from sedges, although they have not been kinetically characterized to date, do not require the activation by neutral amino acids to efficiently work under physiological conditions.

Conclusion
The crystallographic and site-directed mutagenesis data reported here unequivocally identify the location of the allosteric site for neutral amino acids in the ZmPEPC-C 4 threedimensional structure and point out the importance of the residue at position 100 of ZmPEPC-C 4 for neutral amino acid activation and other kinetic properties, which was confirmed by site-directed mutagenesis experiments. Residue conservation analysis supports our proposal that the nature of the residue at this position plays a similar critical role on the catalysis and allosteric properties of every PEPC-C 4 isozyme. We believe that our findings are of great biotechnological interest given the current efforts for the molecular engineering of grass C 3 plants, such as rice, wheat, and barley, to improve their productivity.

DNA constructs and site-directed mutagenesis
The ppc-C 4 gene coding for ZmPEPC-C 4 with GenBank accession number CAD60555 cloned into plasmid pTM94 was a kind gift from Prof. Izui (Kyoto University). The protein encoded by this gene differs from that encoded by the gene with GenBank accession number CAA33317 in only two amino acid residues: Pro-482 and Asp-509 in CAD60555 are Ser-482 and Glu-509, respectively, in CAA33317. These are polymorphic changes that seem not to affect either the structure or function of the PEPC-C 4 isozyme. The gene was transferred from plasmid pTM94 to the pET32a(ϩ) (Novagen) expression vector by ligating the NcoI-HindIII-digested insert into the previously linearized plasmid with the same restriction enzymes. Because this gene contains an internal NcoI restriction site, a partial NcoI digestion was used to isolate the full-length gene prior to its insertion into pET32a(ϩ). The plasmid was purified from transformed TOP-10 E. coli cells selected by growing the cells at 100 g/ml ampicillin and analyzed by restriction analysis, PCR, and sequencing of the insert. The ZmPEPC-C 4 sequence within pET32a(ϩ) did not contain any mutations and was inframe to be expressed as a fusion protein with a thrombincleavable double tag of thioredoxin and His 6 at the N terminus of the ZmPEPC-C 4 protein. Once the construct was verified, it was used to transform BL21-CodonPlus(DE3)-RIL E. coli cells (Novagen) to overexpress the gene.
To generate the ZmPEPC-C 4 S100K mutant, the forward and reverse primers 5Ј-GCCATCCTCGTGGCGAAGTCCATC-CTGCAC-3Ј and 5Ј-GTGCAGGATGGACTTCGCCACGAG-GATGGC-3Ј, respectively, were used to substitute serine at position 100 by lysine. The ZmPEPC-C 4 gene cloned into the pET32a(ϩ) plasmid was used as the template for site-directed mutagenesis by PCR using the QuikChange II site-directed mutagenesis kit (Agilent Technologies) following manufacturer's instructions. The DNA was sequenced to confirm that the desired mutation was present and that no other nucleotide changes occurred. Finally, the mutant protein was overexpressed in BL21-CodonPlus(DE3)-RIL E. coli cells.

ZmPEPC-C 4 expression, purification, and assay
Overexpression of both the recombinant WT and S100K ZmPEPC-C 4 proteins was achieved by inducing cells grown in Luria-Bertani (LB) broth with 100 g ml Ϫ1 ampicillin at an A 600 of 0.6 by the addition of 0.1 mM isopropyl ␤-D-thiogalactoside followed by incubation for 7 h at 25°C. Cells were harvested by centrifugation at 6,000 ϫ g for 10 min. The pellet was suspended in 10 ml of 50 mM HEPES-KOH buffer, pH 7.5, containing 2 mM 2-mercaptoethanol, 50 mM KCl, 1 mM phenylmethanesulfonyl fluoride (PMSF), and 10% (v/v) glycerol (Buffer A) and sonicated for 20 min. Cell debris was removed by centrifugation at 15,000 rpm for 20 min. The ZmPEPC-C 4 protein was purified using Protino nickel-tris(carboxymethyl)ethylene diamine resin (Macherey-Nagel, Düren, Germany) from which it was eluted with 150 mM imidazole in Buffer A. Imidazole was then removed by centrifugal concentration using Amicon Ultra 30 (Millipore). The concentrated enzyme was digested for 2 h at 25°C with enterokinase (New England Biolabs), then applied to a Mono Q chromatography column, and eluted with a gradient from 0 to 400 mM potassium phosphate, pH 7.4. Protein concentrations were determined spectrophotometrically by A 280 using a extinction coefficient of 111,730 M Ϫ1 cm Ϫ1 predicted from the amino acid sequence using ExPASy ProtParam (40).
ZmPEPC-C 4 activity was assayed spectrophotometrically at 30°C in a coupled enzymatic assay with malate dehydrogenase by monitoring NADH oxidation at 340 nm (⑀ 340 ϭ 6,220 M Ϫ1 cm Ϫ1 ) as described (28). The standard assay medium consisted of 100 mM HEPES-KOH buffer, pH 7.3, containing 0.1 mM NaHCO 3 , 0.2 mM NADH, 3 mM total PEP, 0.4 mM free Mg 2ϩ , and 5 units of malate dehydrogenase. Enzyme activity in the presence of allosteric effectors was assayed using the standard assay with the indicated effector concentrations. Each assay was performed in duplicate and with at least two different enzyme preparations. One unit is defined as the amount of enzyme Allosteric site for neutral amino acids in ZmPEPC-C 4 needed to catalyze the formation of 1 mol of oxaloacetate/min under our experimental conditions.

Crystallization, X-ray data collection, structure solution, and refinement
Crystals of ZmPEPC-C 4 with glycine bound were grown at 18°C by the hanging-drop vapor-diffusion method, mixing equal volumes of protein and reservoir solution, which consisted of 100 mM Tris-HCl, pH 8.5, 200 mM sodium acetate trihydrate, 100 mM potassium/sodium tartrate 4-hydrate, and 15% (w/v) PEG 4000. Protein concentration was 10 mg ml Ϫ1 in 10 mM HEPES-KOH buffer, pH 7.5, 1 mM DTT, 10 mM MgCl 2 , and 100 mM glycine. The crystals were cryoprotected in reservoir solution plus 20% (v/v) glycerol and cryocooled in N 2 . X-ray diffraction data of the ZmPEPC-C 4 -Gly were collected at 100 K at the Advanced Photo Source, beamline 23 ID-B (GM/CA), at the Argonne National Laboratory, Chicago, IL.
The data were integrated using XDS (41) and scaled and truncated with programs from the CCP4 suite (42). Because of anisotropy, the data set was submitted to the Diffraction Anisotropy Server (http://services.mbi.ucla.edu/anisoscale) 4 (43) to perform ellipsoidal truncation and anisotropic scaling. The ellipsoidal truncated data set, with the upper resolution limit cut at 3.7, 3.3, and 2.7 Å along a*, b*, and c*, respectively, was used in later stages of the model refinement. The initial phases were obtained by molecular replacement with the program Phaser (44) using the coordinates of the reported ZmPEPC-C 4 structure (Protein Data Bank code 1JQO) as a starting model. Alternating cycles of automatic and manual refinement were carried out with the standard protocols of Phenix (45), monitoring the R work and R free split during the whole process. Phenix was also used for atomic positions; group atomic displacement parameters; and translation, libration, and screw-rotation displacement. The simulated annealing omit map was calculated with Phenix following simulated annealing refinement at 3,000 K. The program Coot (46) was used to analyze the electron density maps (2F o Ϫ F c and F o Ϫ F c ). Structural alignments were performed with Coot and PyMOL.

Kinetic data analysis
Kinetic data were analyzed by nonlinear regression calculations. Initial velocity data obtained varying the concentration of activator at constant concentration of substrates were fitted to Equation 1. where v a and v 0 are the experimentally determined initial velocities in the presence and absence of activator, respectively; v a max is the estimated maximum activity at saturating activator concentrations; [A] is the activator concentration; A 0.5 is the concentration of activator that gives half-maximum activation at fixed concentrations of substrate; and h is the Hill number indicative of the degree of cooperativity in the binding of the activator.
Kinetic data obtained at varied concentrations of substrate PEP were fitted to a Hill equation (Equation 2).
where v 0 is the experimentally determined initial velocity, V max is the estimated V max , [S] is the concentration of the substrate, S 0.5 is the concentration of substrate that gives half-V max , and h is the Hill number indicative of the degree of cooperativity in the binding of the substrate.
When the concentration of L-malate was varied at constant concentration of substrates, the experimental data were fitted to Equation 3. where v i and v 0 are the experimentally determined initial velocities in the presence and absence of inhibitor, respectively; v i min is the estimated residual velocity at saturation of the inhibitor; [I] is the inhibitor concentration; I 50 is the concentration of inhibitor that gives half-maximum inhibition at fixed concentrations of substrates; and h the Hill number indicative of the degree of cooperativity in the binding of the inhibitor.

Retrieval and sequence analysis of PEPC-C 4 orthologs
Plant genomes typically contain several sequences coding for members of the PEPC family. Some of them were considered alleles given their high similarity of the nucleotide and amino acid sequences or because they mapped to a single locus (when mapping information was available), but others were considered true isozymes because they are located in different loci. To clarify this, we searched all available PEPC orthologs from plants, performing a blastp search on the RefSeq collection of the NCBI (https://www.ncbi.nlm.nih.gov/protein) protein database using as query the amino acid sequence of ZmPEPC-C 4 (GenBank accession number CAA33317) that corresponds to the PEPC isozyme involved in C 4 photosynthesis in maize (47) coded by ppc-C 4 gene (48). To not exclude any candidate plant PEPC-C 4 , the E-value threshold was set at 10; the scoring matrix used was BLOSUM62 with gap opening and gap extension costs of 11 and 1, respectively. All retrieved sequences belong to Viridiplantae. To include sequences for which we would have mapping, allelism, and paralogy information, we also performed blastp searches on version 12.1.5 of the Phytozome database (https://phytozome.jgi.doe.gov/), 4 which contains data for 63 wholly sequenced Viridiplantae genomes. Only hits with an E-value higher than 10eϪ3 were excluded. The resulting sequences were aligned with ClustalX2 (49), and the alignment was refined by hand, in some cases with the help of structural alignments using homology models, and stripped of redundant sequences.
For the construction of the sequence logos that show the conservation of residues at positions related to the neutral amino acid allosteric site, plant PEPC-C 4 sequences were chosen based on the criterion of having serine at the position equivalent to ZmPEPC-C 4 780 (30). Only sequences belonging to a C 4 plant were considered; PEPC sequences from C 3 or CAM plants that have this residue were excluded. Sequence logos were constructed with WebLogo3 (50, 51).