The Chemistry of the Reaction Determines the Invariant Amino Acids During the Evolution and Divergence of Orotidine 5'-monophosphate Decarboxylase*

Orotidine 5'-phosphate decarboxylase has the largest rate enhancement for any known enzyme. For an average protein of 270 amino acids from more than 80 species, only eight amino acids are invariant, and seven of these correspond to ligand-binding residues in the crystal structures of the enzyme from four species. It appears that the chemistry required for catalysis determines the invariant residues for this enzyme structure. A motif of three invariant amino acids at the catalytic site (DXKXXD) is also found in the enzyme hexulose-phosphate synthase. Although the core of OMP decarboxylase is conserved, it has undergone a variety of changes in subunit size or fusion to other protein domains, such as orotate phosphoribosyltransferase, during evolution in different kingdoms. The phylogeny of OMP decarboxylase shows a unique subgroup distinct from the three kingdoms of life. The enzyme subunit size almost doubles from Archaea (average M r of 24.5 kDa) to certain fungi (average M r of 41.7 kDa). These observed changes in subunit size are produced by insertions at 12 sites, largely in loops and on the exterior of the core protein. The consensus for all sequences has a minimal size of <20 kDa.

been squenced from over 80 species. We present an analysis of this diverse set of sequences that demonstrates that even while the subunit size has almost doubled in some species, a core set of important amino acids has remained invariant or highly conserved.
This apparent structural flexibility is of interest since this enzyme has the currently greatest observed rate enhancement, defined as k cat /k non , by a factor of 10 17 (2). The remarkable catalytic proficiency of OMP decarboxylase results from the exceptional stability of the carboxyl group of OMP, since the uncatalyzed decarboxylation of OMP has a t of 78 million years. This then makes it more significant that this enzyme has no cofactors (3), and makes interpretation of the catalytic mechanism a challenge.
Four high quality crystal structures of OMP decarboxylase were recently produced: one from Archaea (M. thermoautotrophicum; (8)), two from the Eubacteria B. subtilis (9) and E. coli (10), and one from a eukaryote, S. cerevisiae (11). These enzymes were bound to the product UMP (9) or to tight binding transition state analogs (8,10,11). This enables a comparison of the amino acid residues in these structures shown to be necessary for ligand binding with the amino acids that are found to be invariant or highly conserved in all available sequences. Consistent with the demands dictated by the chemistry for this proficent reaction, the amino acid residues that have been maintained as invariant across all kingdoms of life are almost completely those that are essential for catalysis. Furthermore, the recent abundance of OMP decarboxylase sequence data has made it practical to define the phylogeny of this essential enzyme in comparison to earlier phylogenetic analyses. Work by Woese and colleagues led to the widely used ribosomal RNA sequences as a reference standard (4,5), since this molecule is an essential and presumably original component of the oldest organisms. Efforts have also been made to use proteins for this purpose. Since no single protein had been sequenced in enough organisms, Doolittle et al. (6) chose to use 57 different proteins, for each of which sequences existed from at least four species. Their results show that phylogenetic relationships are not too dissimilar from relationships based on RNA, but their time scale for important divergence points was significantly different than for trees based on RNA studies.
An element of uncertainty has entered the recently developed paradigm for a tree of life with the findings that genes may be transferred laterally between species.
Comparison of the complete genomes of Methanoccocus jannashii, Escherichia coli, Synechocystis 6803, and Saccharomyces cerevisiae suggested that lateral transfer may have been on a large scale (7). The authors detected a pattern for two classes of genes: informational (maintenance and expression of DNA, and signaling), and operational (enzymes of general metabolism). For yeast, informational genes were most closely related to Methanoccocus jannashii (Archaea), while operational genes were closer to Escherichia coli (Bacteria).
If transfer of genes between very different species is at all extensive, then no single gene (molecule) may serve as a unique reference standard to define any tree of life. It may then be necessary to have more such reference standards. Proteins that have an essential function, such as OMP decarboxylase, are good candidates for this role. The enzyme activity has been widely detected (1), and this implies that new sequences will continue to become available to help phylogenetic analyses.

EXPERIMENTAL PROCEDURES
Sequence Analysis: The amino acid sequences for all defined OMP decarboxylases were obtained from standard databases. A majority of sequences is available in Swiss-Prot; a few sequences were found by BLAST searches of Genbank, and then translated. These latter sequences could be verified as true OMP decarboxylase sequences by the location of a critical motif, DxKxxDIxxT, where the capital letters represent invariant residues,'x' is any residue, and the two italic letters are almost invariant -being present in all but two or three of 82 sequences (see Fig. 1).
Sequences were initially aligned with the program Pileup of the University of Wisconcsin Computer Group (GCG9) . Since nine of these sequences are contained in a bifunctional UMP synthase, the OMP decarboxylase portion was identified by comparison to the consensus for all the monofunctional OMP decarboxylase sequences.
For phylogenetic analyses a final alignment was obtained with the program CLUSTAL X (12,13). An evolutionary tree was constructed using the PHYLIP fitch program (14) from evolutionary distances calculated with the PHYLIP protdist program. This alignment also illustrates that there are 11 consensus sequence segments, which define the secondary structure elements of the α/β barrel core structure (8)(9)(10)(11). Inserts of varying sizes occur throughout the sequence, and always at positions where there are loops in the protein structures.

Global Sequence Alignment and Identification of Signature Sequences
In the key signature sequence at residues 145 -154, the D, K, and D are absolutely invariant, suggesting that they were likely to be involved in the catalytic mechanism.
Also invariant are an aspartate at position 59, a lysine at position 110, a glycine at 399, a glutamine at 422, and an arginine at 446. The position numbers are based on the alignment in Figure 1, but all the individual enzymes are actually smaller, as shown next to their species name. Since alignment programs allow some gaps to optimize the alignment, the result shown in Figure 1 was visually adjusted to have Glutamine422 become invariant, aided by the information from the four crystal structures. These are yeast OMP decarboxylase bound to barbiturate monophosphate (BMP) (11), the Bacillus subtilis enzyme bound to UMP (9), and the enzymes from Methanobacterium thermoautotrophicum and Escherichia coli bound to 6-azauridine monophosphate (8,10).
All four OMP decarboxylase proteins show an α/β barrel core structure, with 8 central β strands surrounded by 9 helices (8)(9)(10)(11). In each of these structures 9 to 11 amino acids make important hydrogen bonds to the ligand at the catalytic site; the nine consensus residues are indicated in Fig. 1 by symbols for hydrogen bonding (closed or open circles). Up to six other amino acids bind to a few of these amino acids that bind the nucleotide ligand directly, and help to stabilize them. Only one of these is at a consensus position (Asp145). Thus, of these ten consensus residues shown to be necessary in the four crystal structures, seven of them are at an invariant position in Fig.   1 (shown in bold), and the other three are at very conserved residues (in italic). The invariant amino acids tend to be at the ends of β strands, or in loops. Only Thr154 and Arg446 are in a helix. These results show the importance of the overall core α/β barrel structure, as well as the few essential amino acids therein.
While nine of the ten residues shown by the four crystal structures to participate in binding were predicted in the sequence alignment, each of the structures also shows one or more additional amino acids that have a secondary role at the catalytic site, but these are not at the same position in the sequence alignment. This limited variety may be due to actual variations in the separate structures, and/or the somewhat different nucleotide ligands being bound.
The usefulness and limits of such an alignment analysis become apparent. Figure 2 depicts the yeast protein structure, and the specific residues involved in binding to the substrate analog BMP (11). Of seven amino acids whose side chains bind directly to BMP, five are invariant for more than 80 different species (Asp59, Lys147, Asp150, Gln422, Arg446) and one is highly conserved (Thr154). But, while five additional amino acids make side chain contacts to stabilize other amino acids that bind BMP, only two of these are invariant (Lys110, Asp145) while the other three are not even conserved (#57, #176, #311).
Of the many loops in the crystal structures, three are specifically identified in Fig.   1, and are known to participate at the catalytic site. OMP decarboxylases in the four protein structures, as well as in man, or mouse are normally dimeric (1). An important structural feature that could not be anticipated from the sequence alignment is that the catalytic site is formed by segments of the two adjoining subunits in the four enzyme structures. Thus, Loop A on one subunit containing Asp150 -Thr154 connects across the dimer interface to complete the catalytic site made by residues Asp145 and Lys147 plus other amino acids shown in Figure 3. Therefore, one half of the signature motif (Asp145 X Lys147 X X Asp150 Ile151 X X Thr154) contributes to the catalytic site of one subunit in the functional dimer, while the second half of this motif contributes to the catalytic site of the adjoining subunit. Given the close spacing of these key residues in the primary sequence, it was not anticipated that they could contribute to separate catalytic sites on different subunits. However, this significant result for the active site architecture corroborates earlier kinetic studies which showed that only the dimer form of OMP decarboxylase was catalytically competent (15). Thus the recently obtained crystal structures (8)(9)(10)(11) are entirely consistent with these earlier kinetic studies.
Two loops (Loops B and C, Fig. 4) tend to be poorly defined in the apoenzyme, but clearly close over the catalytic pocket when a ligand is bound, and each loop contains a residue needed for binding (Fig. 1). Loops A and B appear to be constant in size, while Loop C is clearly variable, as defined in Table I. It is evident that this loop has a minimal size in Archaea, but is much larger in other species. Since the structure from M. thermoautotrophicum (Archaea) is the only structure showing water molecules at the active site (8), it appears that this loop functions to exclude water where this loop is large enough (11). Figure 1 also shows three highly conserved amino acids that are not found to be involved in binding a ligand, and therefore may only be needed for part of the structure: Ile151, Pro398 and Gly399. Thus, of 12 amino acids identified in the sequence alignment, nine of these are shown to be involved in the active site of the crystal structures. And of the nine amino acids shown in the structures to be specifically involved in binding a ligand, seven of these were apparent in the sequence alignment presented in Figure 1.

Phylogeny Based on OMP Decarboxylase
A phylogenetic analysis of the OMP decarboxylase sequences (Fig. 4) is in general fairly consistent with earlier evolutionary trees. Four large clusters are evident. Three of these correspond to the well established taxonomic kingdoms Archaea, Eubacteria, and Eukaryota. However, a quite distinct fourth group is evident; designated as "Mixed", this group consists mainly of four mycobacteria, two members of the Thermus/Deinococcus group and one myxobacterium. More surprisingly, included in this group is one multicellular eukaryote, Trypansosoma cruzi. Since all other eukaryotes group well together, and at quite a distance from the mycobacterial subgroup, it may well be that the inclusion of T. cruzi in this subgroup is an example of lateral gene transfer.
Among the eukaryotes, the multicellular species C. elegans and D. melanogaster are farther outliers from the main cluster than the simple slime mold D. discoidium. Also of interest is that the many fungi in this data set show as much or more diversity than all the multicellular eukaryotes.
In examining the subunit sizes of OMP decarboxylases from the different species, it was evident that the Archaea had a distinctly smaller protein than the eukaryotes.
When all proteins were analyzed for their size versus evolutionary distance from A. pernix (Archaea), it became evident that the subunit size varied almost two-fold from Archaea to the Pyrenomycetes fungi, and there appeared to be a modest correlation of increasing protein size with evolutionary distance from A. pernix (Fig. 5A). Since no unique functions or benefits have been described for the larger proteins, the data were reanalyzed by subtracting for each protein sequence the variable extensions at the 10 by guest on  http://www.jbc.org/ Downloaded from N-and C-termini (Fig. 1, Inserts 1, 2, and 12), as well as the single large insert of the Pyrenomycetes fungi, an example of which is Insert 7 for N. crassa at residues 195 -299 ( Fig. 1). The replot for this set of truncated sequences now shows much less variation in size (Fig. 5B). The horizontal line suggests an average subunit size for the core domain of about 228 amino acids, with a variation of about 25 amino acids from this mean. This variation represents the remaining smaller inserts in the different species (see Table I). If only those residues for all species that are consistently present at any given position were used for this plot (the minimal consensus), the data set would reduce to about 180 amino acids. This would then be the smallest core domain, with an M r <20 kDa.
A detailed analysis of all 12 insert positions for the major phylogenetic groups is listed in Table I. Insert numbers correspond to those in Fig. 1. Inserts 1 and 2, where they occur in the four structures, are separate helices, while Insert 12 extends the final helix in these proteins (Fig. 3). Insert 2 is the immediate N-terminal addition found in almost all sequences. Since a few species have no Insert 2, and since for Eubacteria Insert 2 is fairly small, it is interpreted here as an early extension at the N-terminus.
Only the eukaryotes have an additional N-terminal extension, Insert 1, which is generally in the 16-23 amino acid range (see Table I). The average size for most inserts is quite small. This reflects thefact that gor a given subgroup, omly a few member species actaully have an insert at that postion. The inclusion of standard deviation values in Table I shows that these values are often zero, indicating no variation in insert size, and this suggests that the function of many loops at which inserts occur may place a constraint on the size of the insert at a given position. An additional level of complexity in the evolution of OMP decarboxylase is that the gene for this enzyme became fused with a second gene in all multicellular eukaryotes studied (Fig. 6). The second gene always codes for orotate phosphoribosyltransferase (orotate PRTase), the enzyme that immediately precedes OMP decarboxylase in the pathway for the de novo synthesis of UMP, so that the fused gene now codes for a bifunctional protein designated as UMP synthase (1). Since the arrangement of the fused genes, or of their protein domains, is commonly with the orotate PRTase preceding OMP decarboxylase, and since it is found in the slime mold D. discoideum, this fusion presumably occurred once at the beginning of the metazoan expansion, and has been stably maintained thereafter.

The analysis of insert variations and their effect on subunit size is summarized in
While T. cruzi also has a bifunctional UMP synthase, the domains are linked in the reverse order. The ready ability of the OMP decarboxylase to fuse with an orotate PRTase domain for a stable bifunctional UMP synthase is evident in the four protein structures. Each one shows that both the N-terminus and C-terminus extend side-byside on one surface of the protein. These termini are evident at the upper left of the subunit structure in Fig. 3, or at the lower corners in Fig. 4. Therefore linkage at either of these termini with orotate PRTase leads to a fairly similar overall structure for the bifunctional UMP synthase.
This unusual pattern is again consistent with the origin of the T. cruzi OMP decarboxylase and UMP synthase being completely separate from all other eukaryotes.
Since it has been proposed above that the acquisition of the OMP decarboxylase gene was by lateral transfer for this parasite, then the unusual reverse fusion with orotate PRTase (Fig. 6)  It is very likely that the two enzymes diverged from a common ancestor, since of the eight invariant amino acids in OMP decarboxylase five are also found in hexulosephosphate synthase at the corresponding position (last two sequence entries, Fig. 1).
This enzyme from various methanophile bacteria is always an oligomer, though it may be a dimer (16,17), a tetramer (18), or a hexamer (19). Since this enzyme also has the same unique motif found at the junction of the two subunits of OMP decarboxylase in forming the active site, then it would be consistent with the observed oligomer forms of hexulose-phosphate synthase for it to have a similar architecture for the catalytic site.
Kinetic studies have emphasized that for hexulose-phosphate synthase divalent metals appear to be essential for activity, with Mg 2+ and Mn 2+ being equally effective, while other divalent cations had modest or no benefit (16,18,19). However, it was noted that this enzyme was somewhat unstable, and that the presence of metals improved stability for long term storage at -60 °C or short exposure to 60 °C (16,18,19

DISCUSSION
The results of such an extensive sequence analysis for a single gene support our emerging understanding of the conservation of protein structure during evolution.
OMP decarboxylases from three kingdoms have now been defined as having an α/β barrel structure. By comparing those amino acids that are invariant or highly conserved across more than 80 species to the amino acids shown to be functional in the crystal structures, we see that through evolutionary divergence only the most structurally and functionally essential amino acids are conserved, and that amino acids identified by such an alignment are highly significant and most likely to be involved directly in catalysis.
However, the validity of the above judgment is highly dependent on the size of the data base. Two papers analyzed OMP decarboxylase sequences available earlier, and found 48 invariant residues for a set of 17 sequences (21), which became reduced to 10 invariant residues for a set of 20 sequences (22). It must again be noted that on average such sequences code for 270 amino acids. In the present analysis only eight of these amino acids remain as invariant for 82 sequences, and seven of these invariant residues participate in binding a transition state analog. Such a result is consistent with the hypothesis that invariant residues are chiefly specified by their essential role in the actual chemical reaction.
What may be essential for this enzyme is the core α/β barrel structure, which is most likely very similar in all species. It is worth noting that inserts that occur at different positions of the sequence, and in different species, almost always occur in loops between elements of secondary structure (see Fig. 3). This suggests that such inserts are easily tolerated if they are on the outside of the core structure, and do not alter this core structure or sterically impede access of the ligands to the catalytic site.
Thus the single 100 amino acid insert shown for N. crassa (Insert 7) may be viewed as a small domain attached to the side of the main α/β barrel core. Since at least seven species of fungi have retained this large insert, it may have a function yet to be discovered.
The segments of consensus sequence generally have well defined boundaries (Fig.   1), and therefore defining the size of an insert between two such consensus segments is then not difficult. In a few cases, such boundaries are not as well defined, and we therefore set the boundary so as to accomodate the minimum sequence for any given species within such a segment. This may clearly influence the Insert size shown for some species.
Only the mycobacteria (Mixed group, Table I) Table I suggests two possible patterns for such alterations in the protein's size. For the majority of inserts, the standard deviation is quite small. This would be consistent with a single insertion event, at a given position, and that divergence of species from that event occurred with minimal changes at that site. Where the standard deviation is quite large, as for Insert 2 with Archaea, this represents an N-terminal extension that may have occurred separately in the different species (by alteration of the start codon), and therefore shows considerable diversity in the size of this extension.
In addition to its ability to handle smaller inserts or additions to the core structure, the core domain itself is easily joined to at least two other protein domains (Fig. 6). With the demonstration that T. cruzi can form UMP synthase in the alternate configuration (23) (Fig. 6), it becomes evident that in such gene fusions there must be adequate linker DNA to code for the connecting polypeptide between the two domains. This assumption is based on the fact that the OMP decarboxylase from yeast is only functional as a dimer. Kinetic studies obtained the same result for this enzyme activity in the mammalian UMP synthase, which adds further evidence that only the dimer is functional (15,24). Furthermore, the recent crystal structure (25)  That this fusion of the same two genes occurred at least twice also suggests that some benefit is associated with the coupling of these two protein domains. Although the two domains of UMP synthase catalyze sequential metabolic steps, evidence for the strict channeling of the common metabolite (OMP) between the two domains was not observed (26). However, with the ability to separately clone and express the two domains for the human UMP synthase, it was shown that the bifunctional protein has much greater stability than either of the independent catalytic domains (27), and such a benefit may explain two different gene fusion events to produce UMP synthase. This latter finding also implies that some interaction must occur between the two different domains in UMP synthase to provide this observed stability.          In most multicellular eukaryotes OMP decarboxylase is fused with, and C-terminal to, orotate phosphoribosyltransferase. In T. cruzi, this linkage is reversed, and in D. radiodurans fusion is with some unidentified protein. 26