Identification of the human methylmalonyl-CoA racemase gene based on the analysis of prokaryotic gene arrangements. Implications for decoding the human genome.

In this report, we identify the human DL-methylmalonyl-CoA racemase gene by analyzing prokaryotic gene arrangements and extrapolating the information obtained to human genes by homology searches. Sequence similarity searches were used to identify two groups of homologues that were frequently arranged with prokaryotic methylmalonyl-CoA mutase genes, and that were of unknown function. Both gene groups had homologues in the human genome. Because methylmalonyl-CoA mutases are involved in the metabolism of propionyl-CoA, we inferred that conserved neighbors of methylmalonyl-CoA mutase genes and their human homologues were also involved in this process. Subsequent biochemical studies confirmed this inference by showing that the prokaryotic gene PH0272 and its human homologue both encode DL-methylmalonyl-CoA racemases. To our knowledge this is the first report in which the function of a eukaryotic gene was determined based on the analysis of prokaryotic gene arrangements. Importantly, such analyses are rapid and may be generally applicable for the identification of human genes that lack homologues of known function or that have been misidentified on the basis of sequence similarity searches.

Prokaryotes frequently cluster genes of related function. Hence, if the function of one gene in a conserved cluster is known, the remaining genes in that cluster can be inferred to function in the same metabolic process (1)(2)(3)(4)(5). Knowledge of a gene's metabolic role can lead to rapid assignment of specific function by focusing biochemical studies. Although such analyses appear inapplicable to eukaryotes (gene clustering is rare in these organisms), it should be possible to extrapolate the functional information obtained from the analysis of prokaryotic gene arrangements to eukaryotes by homology searches (Fig. 1). To determine whether such analysis can in fact be used to rapidly determine the function of human genes, we investigated genes involved in propionyl-CoA metabolism.
Both prokaryotes and eukaryotes metabolize propionyl-CoA by a coenzyme B 12 -dependent pathway (6) (Fig. 2). In humans, inborn errors in the methylmalonyl-CoA mutase (mcm) gene lead to methylmalonyl aciduria, a rare but severe inherited disease (7). Defects in the CblABCDF complementation groups also lead to this disease (7). These complementation groups are thought to encode enzymes needed for the conversion of hydroxy-B 12 to coenzyme B 12 , the required cofactor for methylmalonyl-CoA mutase (MCM). 1 In addition, defects in the DL-methylmalonyl-CoA racemase gene may also lead to methylmalonyl aciduria, as MCM is specific for the L-isomer of methylmalonyl-CoA (8). Although the mcm gene is known, the genes corresponding to the CblABCDF complementation groups and to the DL-methylmalonyl-CoA racemase have not been identified.
To identify additional human genes involved in propionyl-CoA metabolism, prokaryotic gene arrangements were examined. By these analyses, we identified the human DL-methylmalonyl-CoA racemase gene as well as a second human gene likely to be involved in propionyl-CoA metabolism, but of unknown function. To our knowledge, this is the first report of a eukaryotic gene that has been identified based on the analysis prokaryotic gene arrangements. Importantly, the method used is rapid and may be generally applicable for the identification of eukaryotic genes that lack homologues of known function or which have been misidentified on the basis of sequence similarity searches.

MATERIALS AND METHODS
General Protein and Molecular Methods-Electrophoresis, bacterial transformation, restriction digestions, and other routine molecular and biochemical methods were performed as described previously (9).
Cloning of the Human DL-Methylmalonyl-CoA Racemase Coding Sequence for High Level Expression-PCR was used to amplify the portion of the human racemase cDNA corresponding to its prokaryotic homologues. The primers used were GTGTGGAACCTGGGTCGACTGAACC and TCAAGCTTGCTCCAGTTCCACAAGGA, and the template used was a Marathon-ready human liver cDNA library (CLONTECH, Palo Alto, CA). The enzyme used for amplification was Advantage 2 DNA polymerase mix (CLONTECH). The PCR product obtained was treated with the Klenow fragment of DNA polymerase to create blunt-ended DNA (10) and cloned into the SmaI site of pGEM4z (Promega, Madison, WI). Plasmid DNA isolated from one transformant was shown to have the expected DNA sequence. Subsequently, this clone was used as template for a second PCR amplification that employed primers GCC-GCCAGATCTGGATGACGACGACAAGATGTGGAACCTGGGTCGA-CTC and GCCGCCAAGCTTTCAAGCTTGCTCCAGTTCCACAAGGA. These primers introduced an enterokinase site as well as BglII and HindIII restriction sites that allowed cloning into the protein expression vector, pET41a (Novagen, Cambridge, MA). The BglII site was positioned such that after cloning the DL-methylmalonyl-CoA racemase would be fused to N-terminal glutathione S-transferase-and 6ϫHistags encoded by the expression vector. The enterokinase site was posi-tioned such that the portion of the human racemase homologous to its prokaryotic counterparts would be precisely released upon proteolysis (Fig. 3). The DNA sequence of one clone derived from the second PCR product was verified, and the plasmid carrying this clone was transformed into the expression strain Escherichia coli BL21 DE3 RIL (Stratagene). One isolate obtained from this transformation (TA1002) was used for high level expression of the putative human racemase enzyme.
Cloning the Pyrococcus horikoshii DL-Methylmalonyl-CoA Racemase Coding Sequence for High Level Expression-PCR was used to amplify the P. horikoshii DL-methylmalonyl-CoA racemase coding sequence. The primers used were GCGATCAGATCTCATATGATATGGATGTTT-AAGAGGATAGACCATGTTGG and GCGGAATTCGACGTTACTCTT-TCCTTTCACATAGCTCCAGGAGC. These primers provided the BglII and EcoRI sites used for cloning the PCR product. The BglII site was positioned such that subsequent to cloning the native DL-methylmalonyl-CoA racemase was expressed via a T7 promoter and an idealized ribosome-binding site present on the expression vector. The template used for amplification was P. horikoshii genomic DNA obtained from the American Type Culture Collection, and the enzyme used was Pfu DNA polymerase (Stratagene). The PCR product was cloned into the BglII and EcoRI sites of expression vector pTA925 (9). Plasmid DNA isolated from one transformant was shown to have the expected DNA sequence. This DNA was transformed into the expression strain E. coli BL21 DE3 RIL. One isolate obtained from this transformation (TA1015) was used for high level expression of the native form of the putative P. horikoshii racemase enzyme.
Growth of Expression Strains and Preparation of Cell Extracts-Protein expression strains were grown on LB supplemented with 25 g/ml kanamycin at 30°C with shaking at 250 rpm. Cells were grown to an optical density of 0.6 -0.8 at 600 nm. Then, expression of the target protein was induced by the addition of isopropyl-␤-D-thiogalactopyranoside to a final concentration of 1 mM. Cultures were incubated at 30°C with shaking at 250 rpm for an additional 3 h. Cells were collected by centrifugation, resuspended in 3 ml of 50 mM sodium phosphate, pH 7, 300 mM NaCl, and broken using a French Pressure Cell (SLM Aminco, Urbana, IL). The cell lysate was centrifuged for 30 min at 31,000 ϫ g max using a Beckman JA20 rotor. The supernatant from this centrifugation was the crude cell extract used for protein purification.
DL-Methylmalonyl-CoA Racemase Assays-DL-Methylmalonyl-CoA racemase activity was measured using a coupled assay. In this assay, racemase converted D-methylmalonyl-CoA to L-methylmalonyl-CoA, which in turn was converted to succinyl-CoA by MCM, which is specific for the L-isomer (6). Racemase activity was quantified using HPLC to follow the disappearance of methylmalonyl-CoA using HPLC conditions described previously (11). Assay mixtures contained 50 mM potassium phosphate pH 7, 25 mM NaCl, 2 mM MgCl 2 , 75 M DL-methylmalonyl-CoA, and 5.6 g/ml holo-MCM. The holo-MCM was prepared by incubating the following mixture for 30 min at 4°C in the dark: 0.56 g/l purified apo-MCM, 0.63 M coenzyme B 12 , and 2 mM dithiothreitol. The specific activity of the holo-MCM used was Ͼ2 mol/min/mg of protein.
Assays were initiated by addition of holo-MCM and incubated for 5 min at 37°C. This depleted the L-methylmalonyl-CoA from the assay mixtures. After this initial 5-min incubation, a source of DL-methylmalonyl-CoA racemase was added, and incubation at 37°C was continued for an additional 5 min. Reactions were terminated by addition of 75 l of 1 M acetic acid. Samples were then frozen using a dry ice ethanol bath and stored at Ϫ20°C until analyzed. To prevent photolysis of coenzyme B 12 , all manipulations were carried out in dim light.
Purification of the Human DL-Methylmalonyl-CoA Racemase-Crude cell extract, prepared as described above, was filtered through a 0.45-m pore size filter, and 1 ml of filtered extract (32 mg of protein) was applied to a 1-ml Ni-NTA column (Qiagen, Chatsworth, CA). The column was used according to the manufacturer's instructions except that prior to elution, the column was washed with 10 ml of equilibration buffer containing 40 mM imidazole. The eluate from the Ni-NTA column was further processed using a 2-ml column of glutathione-Sepharose 4B (Amersham Pharmacia Biotech, Upsalla, Sweden) according to the manufacturer's instructions. The partially purified racemase from the glutathione affinity column was concentrated and exchanged into buffer containing 20 mM Tris⅐HCl, pH 7.4, 50 mM NaCl, and 2 mM CaCl 2 using a Vivaspin 4 centrifugal concentrator (Viva Science, Binbrook, UK). The concentrated racemase preparation (500 l, 385 g of protein) was combined with 14.5 units of recombinant enterokinase (Novagen) and incubated at 20°C for 16 h. Recombinant enterokinase was removed using enterokinase capture agarose (Novagen). Then, the racemase preparation was applied to a 1 ml of Ni-NTA column, which was eluted with 5 ml of equilibration buffer containing 20 mM imidazole. The fusion tags and the uncut fusion protein remained bound to the column. The purified racemase was found in the column pass-through and eluate, which were concentrated, exchanged into 20 mM HEPES, pH 7, 50 mM NaCl, 10 mM KCl, and stored at 4°C until analyzed.
Accession numbers-The GenBank TM accession number for Homo sapiens methylmalonyl-CoA racemase cDNA is AF364547. The Gen-Bank TM accession number for P. horikoshii methylmalonyl-CoA racemase coding sequence is AF364548.

Identification of Conserved Neighbors of Prokaryotic mcm
Genes-Prokaryotes often cluster genes involved in the same metabolic process (1)(2)(3)(4)(5). Therefore, to identify additional genes involved in propionyl-CoA metabolism, we searched for genes that frequently cluster with prokaryotic mcm genes (Fig. 1). The first step employed was to identify prokaryotes that had both mcm homologues and completed genome sequences. Blast searches (12) of the NCBI nonredundant (nr) data base identified 16 prokaryotes that had homologues (Expect values Յ7 ϫ 10 Ϫ11 ) of the human mcm gene. For eight of these, complete genome sequences were available. These included three Bacteria (Deinococcus radiodurans, Mycobacterium tuberculosis, and E. coli) and five Archaea (Archaeoglobus fulgidus, P. horikoshii, Pyrococcus abyssi, Aeropyrum pernix, and Halobacterium sp. NRC-1). Next, the chromosomal contexts of the mcm genes FIG. 1. Extrapolation of the functional information inherent in prokaryotic gene arrangements to eukaryotes. Prokaryotic genes A, AЈ, and AЉ are homologues. Genes B, BЈ, and BЉ are also homologues, etc. Since ABCD and their homologues are found in conserved gene groups within prokaryotic genomes, they are likely to function in the same metabolic process. Importantly, it can be inferred that the eukaryotic homologues of genes ABCD (eAeBeCeD) might be involved in the same metabolic process.

FIG. 2. Coenzyme B 12 -dependent propionyl-CoA metabolism.
Propionyl-CoA carboxylase catalyzes formation of the D-isomer of methylmalonyl-CoA. DL-Methylmalonyl-CoA racemase is needed to catalyze the conversion of D-methylmalonyl-CoA to L-methylmalonyl-CoA. Coenzyme B 12 -dependent MCM is specific for the L-isomer of methylmalonyl-CoA and catalyzes its conversion to succinyl-CoA. In humans, defects in the coenzyme B 12 -dependent MCM lead to methylmalonyl aciduria; hence, defects in the racemase and in enzymes needed for the production of coenzyme B 12 are expected to result in a similar dysfunction.
in these eight prokaryotes were determined. Blastp software was used to identify the homologues of ten genes on each side of the mcm gene. Then, it was determined whether these homologues also clustered with mcm genes by examining the protein tables available through Entrez Genomes (www.ncbi. nlm.nih.gov/PMGifs/Genomes/micr.html). By these analyses, two conserved neighbors of prokaryotic mcm genes were identified: genes with homology to lactoylglutathione (LGSH) lyases were neighbors of mcm genes in 5/8 prokaryotic genomes and in 4/5 Archaeal genomes examined, and genes with homology to lysine/arginine/ornithine (LAO) transport proteins were found near mcm homologues in 8/8 cases. These gene arrangements suggested that certain proteins with homology to LGSH lyases and LAO transporters were misidentified by homology searches and actually encode proteins involved in propionyl-CoA metabolism.
Identification of the Human Homologues of Conserved Neighbors of Prokaryotic mcm Genes-Next we sought to identify the human homologues of prokaryotic LGSH lyases and LAO transporters. The P. horikoshii protein gi7448629 (an LGSH lyase homologue) was used to query the human genome with tBlastn software (12). This prokaryotic sequence aligned with peptides encoded by two regions of chromosome II that likely corresponded to exons of the homologous human gene. However, the 5Ј exon identified did not include the expected ATG, suggesting that one or more exons remained to be detected. Hence, we compared the two exons that were found to the expressed sequence tag data base using Blastn software. This allowed the identification of several apparent full-length human cDNAs that included appropriate ATGs, Kozac consensus sequences for ribosome binding, and triplets corresponding to stop codons. Subsequent analyses showed that the full-length human cDNA (af364547) corresponded to three exons on chromosome II and encoded a protein homologous to the P. horikoshii LGSH lyase homologue with an Expect value of 2 ϫ 10 Ϫ21 (Fig. 3). Thus, a human LGSH lyase homologue was identified. Because the P. horikoshii LGSH lyase homologue is a conserved neighbor of mcm genes, we inferred that the P. horiko-shii gene and its human homologue are both involved in propionyl-CoA metabolism.
The full-length human cDNA identified above encoded a protein with about 40 N-terminal amino acids not found in its prokaryotic homologues (Fig. 3). In humans, propionyl-CoA metabolism occurs in the mitochondrion (7),and the additional amino acids of human protein would be needed for mitochondrial targeting. MitoProt II and Predotar protein localization prediction software supported mitochondrial localization for the human LGSH lyase homologue with scores of 0.95 and 0.99, respectively (www.inra.fr/Internet/Produits/Predotar/) (15). Thus, these results are consistent with the prediction that the human homologue of LGSH lyase genes was misidentified by the homology searches employed and in fact encoded a protein involved in propionyl-CoA metabolism.
To identify the human homologue of prokaryotic LAO transport proteins, a procedure similar to that described above for identification of the human LGSH lyase homologue was employed. The longest human cDNA identified (gi: 12914481) was a partial sequence that represented four exons on chromosome IV. The protein encoded by this cDNA aligned with the M. tuberculosis LAO transporter (gi: 3915555) over the greater portion of its length (263/334) with an Expect value of 3 ϫ 10 Ϫ33 using Blastx software. Because the human cDNA was incomplete at the 5Ј end, the possibility of a mitochondrial targeting sequence could not be examined. Nonetheless, the cDNA identified represents a probable human LAO transporter homologue.
Thus, two human genes were identified that were homologous to conserved neighbors of prokaryotic mcm genes. We inferred that these human genes were involved in propionyl-CoA metabolism. Unidentified genes thought to function in this process include those that encode DL-methylmalonyl-CoA racemases as well as those that encode enzymes needed to convert hydroxy-B 12 to coenzyme B 12 . Accordingly, we focused further studies on determining whether the human cDNAs identified encoded such enzymes. A DL-methylmalonyl-CoA racemase was previously purified from rat liver and was found to be composed of two 16-kDa polypeptides (16). Given that the LGSH lyase homologues have similar predicted molecular masses (15 kDa for the P. horikoshii homologue), we hypothesized that LGSH lyase homologues that cluster with mcm genes were misidentified by homology searches and actually encode DL-methylmalonyl-CoA racemase. Furthermore, we predicted that the human LGSH lyase homologue was also a DL-methylmalonyl-CoA racemase.
Cloning and Expression of the Predicted Human DL-Methylmalonyl-CoA Racemase cDNA-To determine whether the human LGSH lyase homologue identified above was in fact a DL-methylmalonyl-CoA racemase, cDNAs corresponding to the full-length predicted human racemase and to the portion of the racemase homologous to its prokaryotic counterparts were cloned into a T7 expression system that allows high level expression of proteins in E. coli. Both cDNA clones were prepared, because some enzymes are known to be inactive until their targeting sequences are proteolytically cleaved. Neither clone expressed significant amounts of a protein of the expected molecular mass. Subsequently, the clone encoding the portion of the human racemase homologous to the prokaryotic protein was cloned into a T7 expression plasmid so as to provide Nterminal glutathione S-transferase-and 6xHis-tags (Fig. 3,  bottom). The resulting plasmid (pTA1002) expressed high levels of a fusion protein of the expected molecular mass (ϳ45 kDa) and was used for further studies.
The protein produced from plasmid pTA1002 was shown to have DL-methylmalonyl-CoA racemase activity. Following nickel affinity chromatography, extracts derived from cells containing the expression plasmid with the cDNA clone (strain TA1002, Fig. 4, lane 3) contained high levels of racemase activity (60 mol/min/mg of protein). In contrast, no racemase activity was observed in similarly prepared cell extracts derived from cells containing vector without insert (strain TA1004). This indicated that the cloned cDNA encoded a functional DL-methylmalonyl-CoA racemase. The racemase reaction requirements were investigated. As expected, no racemase activity was observed without the addition of partially purified racemase, methylmalonyl-CoA, MCM, or coenzyme B 12 . MCM and coenzyme B 12 are required components, because the racemase assay involves coupled enzymatic reaction.
Purification of the Human DL-Methylmalonyl-CoA Racemase-The predicted human racemase was then purified using several affinity chromatography steps as well as enterokinase cleavage of the racemase fusion protein. The site of enterokinase cleavage is shown in Fig. 3. The progress of the purification was monitored by SDS-polyacrylamide gel electrophoresis followed by Coomassie staining (Fig. 4). The specific activity of the racemase increased at each step of the purification ( Table  I). The final specific activity of the highly purified racemase was 833 mol/min/mg of protein. This activity is within the range previously reported for DL-methylmalonyl racemases purified from other sources. DL-Methylmalonyl-CoA racemase previously purified from Propionibacterium shermanii had a specific activity of 33.4 or 607.5 mol/min/mg of protein depending on the purification protocol used (17,18), and DLmethylmalonyl-CoA racemase purified from rat liver had a specific activity of 8400 mol/min/mg of protein (16). Thus, the findings presented above constitute strong evidence that one of the two human genes identified as a conserved neighbor of prokaryotic mcm genes encodes a DL-methylmalonyl-CoA racemase rather than an LGSH lyase as would be suggested by homology searches.
Expression of the P. horikoshii DL-Methylmalonyl-CoA Racemase-To obtain further evidence that certain LGSH lyase homologues are in fact DL-methylmalonyl-CoA racemases, the predicted P. horikoshii racemase gene was cloned into a T7 expression vector. This clone mediated expression of a large amount of protein near the expected molecular mass, 15 kDa (not shown). Partially purified cell extracts were prepared by anion exchange chromatography, and those derived from cells containing the expression plasmid with insert (strain TA1015) contained racemase enzyme with a specific activity of 93 mol/ min/mg of protein. In contrast, no racemase activity was observed in cell extracts derived from cells containing vector without insert (strain BE119). This indicated that the cloned P. horikoshii gene encoded a DL-methylmalonyl-CoA racemase. These findings provided evidence that LGSH lyase homologues neighboring mcm genes were misidentified by homology searches and actually encode DL-methylmalonyl-CoA racemases. In addition, they provide confirmatory evidence that the human clone described above also encodes a DL-methylmalonyl-CoA racemase.

DISCUSSION
In this report, we sought to determine whether the analysis of prokaryotic gene arrangements could be used to help determine the function of human genes. Sequence similarity searching identified two genes of unknown function that were conserved neighbors of prokaryotic mcm genes and that had human homologues. Because MCMs are involved in propionyl-CoA metabolism, we inferred that conserved neighbors of mcm genes and their human homologues were also involved in this process, and we used this knowledge to guide biochemical tests. Subsequently, we showed that one conserved neighbor (PH0272) and its human homologue both encoded DL-methylmalonyl-CoA racemases. The purified human racemase had a  specific activity of 833 mol/min/mg of protein, which is comparable with the specific activity of DL-methylmalonyl-CoA racemases purified from other sources (see above) (16 -18). These findings constitute strong evidence that the human gene corresponding to cDNA af364547 is indeed a DL-methylmalonyl-CoA racemase gene. Thus, a previously unknown human gene was rapidly identified based on the analysis of prokaryotic gene arrangements. The human DL-methylmalonyl-CoA racemase gene has been implicated in the inherited disorder methylmalonyl aciduria (although this is uncertain, since a bypass pathway of D-methylmalonyl-CoA racemization may exist in mammals) (8). Thus, the analysis of prokaryotic gene arrangements has the potential to allow the rapid identification of human genes involved in inherited metabolic disorders. Importantly, the prediction that a given gene is involved in a particular inherited disease does not require the determination of its biochemical function. Such predictions require only knowledge that a gene is involved in a particular metabolic process, and this information can be obtained directly from the bioinformatic analysis of gene arrangements.
We also identified a second gene of unknown function that was a conserved neighbor of mcm genes and that had homology to a human gene. This gene had homology to those that encode LAO transport proteins. The function of this gene has not yet been determined. We suspect that it is not involved in lysine/ arginine/ornithine transport. It may have a role in the conversion of hydroxy-B 12 to coenzyme B 12 , or in the reactivation of MCM, as genes encoding such enzymes have been shown to cluster with genes encoding other coenzyme B 12 -dependent enzymes (9,13,14,19). Attempts to determine the function of this gene with biochemical tests were hampered by the fact that overexpression resulted in inclusion body formation under all conditions tested. 2 Nonetheless, the potential involvement of this gene in methylmalonyl aciduria is testable given its DNA sequence even without knowledge of its specific biochemical function.
It was expected that several genes involved in the conversion of vitamin B 12 to hydroxy-B 12 would be identified by the gene arrangement analyses reported herein. As mentioned above, only one candidate was found and its function was not verified. There are several possible reasons why the genes for such enzymes were not found. We feel the most likely reason is that some of the genes involved in the conversion of hydroxy-B 12 to coenzyme B 12 have functions in addition to propionyl-CoA metabolism that dictate alternative gene arrangements.
Last, we point out that the analysis of prokaryotic gene arrangements may have broad application for the identification of human genes. Potentially, the method is applicable to any physiological process that is shared between prokaryotes and eukaryotes. This includes many aspects of metabolism, transcription, translation, DNA replication, and DNA repair. Furthermore, the method may prove valuable for the identification of genes involved processes currently thought to be distinctively eukaryotic.