Complete Reconstitution of the Human Coenzyme A Biosynthetic Pathway via Comparative Genomics*

The biosynthesis of CoA from pantothenic acid (vita-min B 5 ) is an essential universal pathway in prokaryotes and eukaryotes. The CoA biosynthetic genes in bacteria have all recently been identified, but their counterparts in humans and other eukaryotes remained mostly unknown. Using comparative genomics, we have identified human genes encoding the last four enzymatic steps in CoA biosynthesis: phosphopantothenoylcysteine synthetase (EC 6.3.2.5), phosphopantothenoylcysteine de-carboxylase (EC 4.1.1.36), phosphopantetheine adenylyltransferase (EC 2.7.7.3), and dephospho-CoA kinase (EC 2.7.1.24). Biological functions of these human genes were verified using a complementation system in Escherichia coli based on transposon mutagenesis. The individual human enzymes were overexpressed in E. coli and purified, and the corresponding activities were experimentally verified. In addition, the entire pathway from phos-phopantothenate to CoA was successfully reconstituted in vitro using a mixture of purified recombinant enzymes. Human recombinant bifunctional phosphopantetheine adenylyltransferase/dephospho-CoA kinase was kinetically characterized. This enzyme and genetic footprinting in E. coli . Capitaliz-ing on the previously observed essentiality of E. coli genes involved in these last steps of CoA biosynthesis, we used the cloned human genes (cDNAs) to complement the loss of the endogenous enzymatic functions in E. coli . Introduction of the complementing human CoA biosynthetic genes into the expression vector made the corresponding E. coli genes nonessential as revealed by the appearance of transposon insertions. This approach can be sys-tematically applied for preliminary functional analysis of uncharacterized biosynthetic genes in a number of pathways shared between E. coli and humans.

CoA is an indispensable cofactor in all living organisms, where it functions as an acyl carrier and carbonyl-activating group in a number of central biochemical transformations, including the tricarboxylic acid cycle and fatty acid metabolism. It has been estimated that ϳ4% of all known enzymes utilize CoA as an obligate cofactor (1). Many bacteria as well as plants and yeast are capable of de novo CoA biosynthesis from aspartate and ketovalerate via pantothenic acid. In contrast, animals and some pathogenic microbes lack a de novo route, and they are totally dependent on scavenging exogenous pantothenic acid (pantothenate, vitamin B 5 ). Dietary vitamin B 5 deficiency in mammals causes many systemic effects such as impaired motor response, depressed heme synthesis, altered growth and maturation of the small intestines during neonatal and prenatal periods, and increased prenatal mortality. Levels of CoA are affected during metabolic stress such as starvation, alcoholism, and diabetes as well as in certain tumors (2). Recently, the first gene of the universal CoA biosynthetic pathway was linked with the neurodegenerative Hallervorden-Spatz syndrome in humans (3) and with aberrant mitosis and meiosis in flies (4), setting a precedent for the importance of this pathway in many aspects of cellular and metabolic development.
The biosynthesis of CoA from pantothenate contains five universal steps (see Fig. 1). Pantothenate is first phosphorylated by pantothenate kinase (EC 2.7.1.33). 4Ј-Phosphopantothenate is conjugated with cysteine by phosphopantothenoylcysteine synthetase (PPCS 1 ; EC 6.3.2.5) and then converted to 4Ј-phosphopantetheine by phosphopantothenoylcysteine decarboxylase (PPCDC; EC 4.1.1.36). The final two steps are adenylation by phosphopantetheine adenylyltransferase (PPAT; EC 2.7.7.3) to form dephospho-CoA and phosphorylation by dephospho-CoA kinase (DPCK; EC 2.7.1.24) to form CoA. Mechanistic details of CoA biosynthesis were studied in bacteria, yeast, and mammals, and most of the corresponding enzymes were at least partially purified and characterized (for the last reviews, see Refs. 1 and 2). However, as neither CoA nor any of its phosphorylated precursors can be transported across the cell membrane, classical auxotrophic mutant analysis methods could not be used, and the identification of CoA biosynthetic genes was significantly delayed.
Pantothenate kinase, the first gene in this pathway, was initially identified in Escherichia coli (coaA) (5). In the last 2 years, E. coli genes were identified for the last four enzymatic steps: coaBC (previously dfp), encoding a bifunctional PPCS/ PPCDC (6,7); coaD (previously kdtB), encoding PPAT (8); and coaE (previously yacE), encoding DPCK (9). Orthologs of all of these genes are preserved in all bacteria, except some obligate intracellular parasites such as Mycoplasma, Rickettsia, and Chlamydia, suggesting that these organisms have developed a specialized CoA (or dephospho-CoA) transport system.
When we began this study, only the gene for pantothenate kinase, which revealed no sequence similarity to bacterial CoaA, had been identified in the mammalian CoA biosynthetic pathway (10,11). Our goal was to identify, via a comparative genomics approach, human genes encoding the four remaining enzymatic steps in CoA biosynthesis to enable further genetic and expression studies of this critically important pathway in health and disease.
We report here the identification and experimental verification of human genes encoding PPCS, PPCDC, PPAT, and DPCK, thereby completing the CoA biosynthetic pathway in * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the DDBJ/GenBank TM /EBI Data Bank with accession number(s) AF453478.
¶ To whom correspondence should be addressed. humans. The human PPAT domain was identified based on biochemical evidence of a bifunctional PPAT/DPCK protein and reveals no sequence similarity to the previously described bacterial CoaD enzyme family. This is the first representative of the novel PPAT family present in eukaryotes and archaea. For this reason and the previous implication that PPAT/DPCK is involved in regulation of CoA biosynthesis (2), we have studied this protein in more detail. Steady-state kinetic parameters were obtained, and the possibility of transcriptional regulation of PPAT/DPCK gene expression was addressed using Northern blot analysis of various human healthy tissues and cancer cell lines.  Amino acid residues identical to the human proteins are shown as black bands, and the overall score (P-score) is shown below each protein or domain. B, construction of a full-length cDNA and a predicted open reading frame (ORF) for human PPAT/DPCK. Boxes represent protein coding sequences, and black lines represent nucleotide sequences. All numbers indicate nucleotide numbers, with 1 representing the first base pair of the complete reconstructed human cDNA encoding PPAT/DPCK. Labels indicate identifiers in GenBank TM with their respective annotations. C, comparison of the predicted human three-domain PPAT/DPCK protein with representatives of the other three kingdoms as described for A. The highly conserved nucleotidyltransferase (NTase) motif (HXXH) region is shown. No comparison could be made with archaeal DPCK, as this gene, as well as the gene for pantothenate kinase, remains missing in all archaea.

EXPERIMENTAL PROCEDURES
Strains, Plasmids, and Other Reagents-E. coli strains DH5␣ and BL21 (Stratagene, La Jolla, CA) were used for cloning and protein overexpression, and DH10B (Invitrogen) was used as a host strain for complementation analysis by transposon mutagenesis. For expression of all genes in E. coli, the vector pPROEX-HTa (Invitrogen), containing the trp promoter, a His 6 tag, and a tobacco etch virus protease cleavage site, was used. Human brain cDNA and multiple-tissue Northern blots were from CLONTECH (Palo Alto, CA). The DNA probe radiolabeling kit was from Invitrogen, and Microspin columns for removal of unincorporated nucleotides were from Amersham Biosciences. Enzymes for PCR and DNA manipulations were from New England Biolabs Inc. (Beverly, MA), MBI Fermentas (Vilnius, Lithuania), and CLONTECH. Plasmid purification kits were from Promega (Madison, WI). PCR purification kits and nickel-nitrilotriacetic acid resin were from QIAGEN Inc. (Valencia, CA). Oligonucleotides for PCR and sequencing were synthesized by MWG Biotech (High Point, NC) and Integrated DNA Technologies (Coralville, IA). Substrates for the enzyme assays (4Јphosphopantothenate, 4Ј-phosphopantothenoylcysteine, and 4Ј-phosphopantetheine) were a generous gift of Dr. Tadhg Begley (Cornell University). All other chemicals, including the assay components hexokinase, glucose-6-phosphate dehydrogenase, lactate dehydrogenase, pyruvate kinase, glucose, phosphoenolpyruvate, NADP, NADH, ATP, inorganic pyrophosphate (PP i ), dephospho-CoA, and pyrophosphate reagent, were from Sigma.
Comparative Genome Analysis-Most of the comparative analysis was performed using ERGO (previously WIT), a genomic data base and set of tools for comparative analysis (available by subscription from Integrated Genomics, Inc.) as previously described (12). Additional public bioinformatics resources used in this study were GenBank TM 2 ; the Saccharomyces Genome Database 3 ; BLAST, PSI-BLAST, and Genomic BLAST 4 ; PFAM 5 ; and SCOP. 6 PCR Amplification and Cloning-The three predicted complete coding regions were amplified using the three sets of primers indicated below. Introduced restriction sites (NcoI or BspHI for the 5Ј-end and SalI for the 3Ј-end) are shown in boldface; nucleotides not present in the original sequence are shown in lowercase. PPCS (gi 13638573): no mutations introduced, 5Ј-gggccATGGCGGAAATGGATCCGGTAGCC and 3Ј-ggggtcgacTCAGTTTCTGTCACCTATAAAAGCTGTGTGTCGAG; PPCDC (gi 14042206): AGT (Ser) codon inserted after the first codon, 5Ј-gggtcatgagtGAACCAAAGGCCTCCTGTCCAGCTG and 3Ј-ggg gtcgacTCAACTCTGCTGGAAGCCACTGTGCTG; and PPAT/DPCK (gi 17981024): second codon mutated from GCC 3 aCC (Ala 3 Thr), 5Ј-gggtcatgaCCGTATTCCGGTCGGGTCTCCTG and 3Ј-ggggtcgac TCAGTCGAGGGCCTGATGAGTCTTGG.
PCR amplification was performed using human brain cDNA and the Advantage cDNA PCR kit (both from CLONTECH). The corresponding PCR fragments matching to their predicted size (950 bp (PPCS), 632 bp (PPCDC), and 1709 bp (full-length PPAT/DPCK)) were cloned into expression vector pPROEX-HT3a, cleaved by NcoI and SalI. All constructs were verified by DNA sequencing, including a predicted fulllength coding region for the three-domain PPAT/DPCK protein (gi 17981024).
Verification by Complementation in E. coli-We performed functional complementation analysis in E. coli based on a genetic footprinting technique that has been recently described. 7 This technique, based on saturating transposon mutagenesis in E. coli, allows the identification of essential genes. The genes essential for E. coli survival were identified in the generated population of 1 ϫ 10 5 random transposon mutants as those open reading frames that do not contain transposon insertions after outgrowth. By this analysis, all of the E. coli genes in the universal CoA biosynthetic pathway (coaA, coaBC, coaD, and coaE) were revealed as essential. 7 Here, we performed the same analysis in the presence and absence of expression plasmids containing the predicted human CoA biosynthetic genes. Functional expression of human CoA biosynthetic genes was expected to complement the loss of function that would result from transposon insertion into the corresponding E. coli genes as revealed by PCR mapping. Following dialysis against 10 mM Tris acetate (pH 7.5) and 1 mM EDTA on 0.05-m filters (Millipore Corp., Bedford, MA), samples were transformed by electroporation to DH10B and grown in enriched LB medium with kanamycin (10 g/ml) overnight. Chromosomal DNA was isolated using a Bio-Rad miniprep kit. Detection of transposon inserts was performed using pairs of nested primers as shown in Fig. 3A. Each primer pair contained one transposon-specific primer and one genespecific primer. Two pairs of nested transposon-specific primers were used to detect transposons inserted in both orientations. Two consecutive PCR amplifications were performed, and the products were analyzed by agarose (0.65%) gel electrophoresis. Image capture and analysis were performed with 1D Image Analysis software (Eastman Kodak Co.). PCR products above a threshold relative intensity of 0.05 were used for insert mapping within a genome sequence. The same analysis was performed under identical conditions with E. coli DH10B: 1) transformed by the pPROEX plasmid (control); 2) cotransformed by two pPROEX-derived plasmids, one containing human PPCDC and the other containing human PPCS and a kanamycin resistance marker in place of the ampicillin resistance marker; and 3) transformed by a pPROEX-derived plasmid containing human PPAT/DPCK.
Protein Overexpression and Purification-Recombinant proteins were overexpressed as an N-terminal fusion with a His 6 tag and a tobacco etch virus protease cleavage site in E. coli strain BL21. Cells were grown to A 600 ϭ 0.8 -1.0 at 37°C; isopropyl-␤-D-thiogalactopyranoside was added to 0.8 mM; and harvesting was performed after ϳ12 h of shaking at 20°C. All proteins were purified from 50-ml cultures using nickel-nitrilotriacetic acid agarose minicolumns as described previously (12). Protein size, expression level, distribution between soluble and insoluble forms, and extent of purification were monitored by SDS-PAGE. In all three cases, a significant yield of soluble purified proteins was obtained (Ͼ0.5 mg for a 50-ml culture). A three-domain PPAT/DPCK protein was additionally produced at larger scale (6 liters) and subjected to two-step purification by nickel-nitrilotriacetic acid agarose chromatography and gel filtration on a HiLoad Superdex 200 16/60 column (Amersham Biosciences) for more detailed kinetic analysis and crystallization trials. Truncated versions of this protein (domains I and II (amino acids 1-358) and domains II and III (amino acids 180 -565)) were PCR-amplified using a corresponding set of primers and cloned in the same expression vector. The C-terminally truncated version (domains I and II) was expressed with a comparable yield of the soluble protein, preserved PPAT activity, and no DPCK activity. The second, N-terminally truncated version produced inactive protein in the form of inclusion bodies.
Coupled Enzyme Assays-All coupled assays were performed in 500 l of 50 mM Tris (pH 8.0) and 2 mM MgCl 2 using a Beckman DU-640 spectrophotometer to monitor the change in absorbance at 340 nm in a six-cuvette assembly thermostatted at 37°C. Additionally, the results of CoA biosynthetic enzyme assays were verified by direct HPLC analysis as described below, monitoring formation and/or consumption of ATP, ADP, AMP, dephospho-CoA, and CoA at 254 nm.
PPCS Activity Assay-The release of PP i was coupled to the oxidation of NADH to NAD and detection at 340 nm as modified (13). Briefly, 1.5 mM 4Ј-phosphopantothenate was incubated with 5 mM cysteine, 1 mM either ATP or CTP, ϳ0.5 g/ml partially purified human PPCS, and 200 l of pyrophosphate reagent.
PPCDC Activity Assay-An assay was developed in which formation of 4Ј-phosphopantothenate was coupled to consecutive enzymatic CoA formation and release of PP i , which was detected utilizing the same technique as described above. 4Ј-Phosphopantothenoylcysteine was in-cubated with 50 g/ml partially purified PPCDC, 25 g/ml purified PPAT/DPCK, 5 mM ATP, and 200 l of pyrophosphate reagent. Control samples lacking PPCDC showed no signal above the background rate.
Forward PPAT Activity Assay-PPAT activity was verified by detecting the release of PP i as described above. Briefly, 5-500 M 4Ј-phosphopantetheine was incubated with 5 mM ATP and 0.15-0.3 g/ml purified human PPAT/DPCK in the presence of pyrophosphate reagent detection mixture.
Reverse PPAT Activity Assay-The reverse PPAT reaction assay was performed by coupling the release of ATP to the reduction of NADP to NADPH and detection at 340 nm as described (14). Briefly, 5-500 M dephospho-CoA was incubated with 2 mM PP i , 5 mM glucose, 1 mM NADP, 2 units of hexokinase, 1 unit of glucose-6-phosphate dehydrogenase, and 0.5-1.0 g/ml purified human PPAT/DPCK.
DPCK Activity Assay-The DPCK activity of the full-length PPAT/ DPCK protein was determined by a standard technique: coupling the release of ADP to the oxidation of NADH to NAD as described (12). Briefly, 5-100 M dephospho-CoA was incubated with 1 mM ATP, 2 mM phosphoenolpyruvate, 0.3 mM NADH, 2.5 units of lactate dehydrogenase, 1.25 units of pyruvate kinase, and 0.15-0.3 g/ml purified human PPAT/DPCK.
To obtain kinetic parameters for the bifunctional PPAT/DPCK enzyme, the apparent K m of one substrate was determined by varying the concentration of that substrate while keeping the second substrate at a constant saturating concentration. An NADH or NADPH extinction coefficient of 6.22 mM Ϫ1 cm Ϫ1 was used for rate calculations.
Northern Blot Analysis-Blots with 2 g of mRNA from normal human tissues and human cancer cell lines were analyzed by hybridization with a radiolabeled DNA probe following the manufacturer's protocol (CLONTECH). Briefly, the DNA probe (PCR fragment corresponding to the central domain (PPAT) of human PPAT/DPCK) was radiolabeled with [␣-32 P]dATP using a commercially available kit (Invitrogen), and unincorporated nucleotides were removed using Microspin G-50 columns. The normal cell multiple-tissue blot and multiplecancer cell line blot (CLONTECH) were preincubated for 30 min in ExpressHyb solution supplied by the manufacturer. The labeled DNA probe was mixed with fresh ExpressHyb solution and incubated with the membranes for 1 h at 68°C. The blots were rinsed several times as suggested by the manufacturer and then exposed to x-ray film at -70°C. The blots were subsequently washed and rehybridized with radiolabeled ␤-actin to normalize for mRNA loading levels.

Prediction of Human CoA Biosynthetic Genes Using
Comparative Genomics-The universal CoA biosynthetic pathway is presented in Fig. 1. Previous biochemical analysis established the activity of the CoA biosynthetic enzymes in humans (for review, see Ref. 2). However, before we began this study, only the gene for human pantothenate kinase was known (3). The bacterial genes for the CoA biosynthetic enzymes have all been recently elucidated (for review, see Ref. 1). Given the conser-vation at the functional level of this pathway between humans and bacteria, we attempted to project the genes from bacteria to humans using comparative genomics.
PSI-BLAST searches allowed us to identify three proteins in the human cDNA sequence data base available from Gen-Bank TM as strong homologs of E. coli CoA biosynthesis enzymes. One homolog was found for PPCDC (gi 14042206, P-score ϳ 10 Ϫ12 ), and two homologs were found for DPCK (gi 13623688 and gi 13376838, P-score ϳ 10 Ϫ27 and 10 Ϫ15 , re-

FIG. 4. In vitro reconstitution of
CoA biosynthesis from phosphopantothenate using purified human enzymes. All samples contained 4Ј-phosphopantothenate, cysteine, ATP, partially purified human PPCS, and partially purified human PPCDC (ϳ5 g/ml each). In addition, the sample in B contained 10 g/ml purified human PPAT (lacking the DPCK domain and corresponding activity), and the sample in C contained 1 g/ml purified human PPAT/DPCK. Samples were incubated for 2 h, and protein was removed by ultrafiltration. Following dilution in fresh dithiothreitol solution, samples were subjected to isocratic separation. Locations corresponding to standards (AMP, ADP, ATP, dephospho-CoA (dPCoA), and CoA) incubated and separated under the same conditions are indicated on the chromatograms. spectively). No reliable homologs could be found for E. coli PPCS or PPAT.
The predicted human PPCDC appeared to be a monofunctional enzyme. This is in contrast to most bacteria, in which PPCDC is fused with PPCS, forming a bifunctional CoaBC protein. Early biochemical data indicate that human PPCDC and PPCS activities in fact reside in two separate proteins (15,16). Enzymatic activity was recently verified for the plant PPCDC ortholog (17), which was previously described as a halotolerance protein (18) and characterized at the three-dimensional structure level (19). Homologs of human PPCDC were found in all analyzed eukaryotic genomes.
Among prokaryotic genomes, only streptococci and enterococci contain monofunctional PPCDC genes. Bacterial monofunctional PPCDC from these organisms shows the highest sequence similarity to human monofunctional PPCDC. In the same bacterial genomes, PPCS is also monofunctional and is found in the same operon with PPCDC. Using this unique monofunctional PPCS from Streptococcus pneumoniae (gi 14972712), we were able to identify a candidate for human monofunctional PPCS (gi 13638573) with a reliable similarity (P-score ϳ 10 Ϫ7 ). The marginal similarity of PPCS domains of bacterial bifunctional proteins to putative human monofunc-tional PPCS (P-score ϳ 0.02 for the E. coli PPCS domain) was insufficient to predict this candidate with any degree of certainty. Homologs of the putative human PPCS are readily identified in other eukaryotes, including Saccharomyces cerevisiae (YIL083C) and Arabidopsis thaliana (hypothetical protein T7H20.130). The YIL083C gene was found to be essential for yeast viability in a systematic gene knockout study (Saccharomyces Genome Database) (see Ref. 20), providing additional support for this functional prediction. A comparison of eukaryotic (human and yeast), archaeal (Methanococcus jannaschii), and bacterial (E. coli) PPCS/PPCDC structural organizations is schematically presented in Fig. 2A.
At this stage of analysis, the only remaining gene "missing" in human CoA biosynthesis was PPAT. Previous biochemical analysis in rat and pig suggested the existence of a non-dissociable complex, potentially a bifunctional fusion protein, of PPAT and DPCK (14). As mentioned above, uncharacterized DPCK homologs can be found in the human genome and in all analyzed eukaryotic genomes. Based on the biochemical evidence of PPAT/DPCK fusion, additional searches in the human expressed sequence tag data base were performed, revealing that one of the predicted human DPCK open reading frames (gi 13376838) was potentially 5Ј-truncated. A larger contiguous cDNA of 2340 base pairs encoding a 565-amino acid putative protein (Fig. 2B) was assembled, amplified, cloned and verified by sequence analysis (gi 17981024). Comparative analysis of this putative human protein revealed the presence of three distinct domains as determined by comparison with monofunctional forms of PPAT and DPCK in other eukaryotic genomes: (i) an N-terminal domain of unknown function (amino acids 1-179), (ii) a central domain (amino acids 180 -358) encoding the putative nucleotidyltransferase, and (iii) and a C-terminal domain (amino acids 359 -565) encoding DPCK. The second human DPCK candidate (gi 13623688) could not be extended from the 5Ј terminus and is thus hypothesized to be a monofunctional DPCK.
Homologs of the human PPAT/DPCK protein were found in mouse (gi 12836393) and fly (gi 10728128). The central domain in all of these proteins contains the conserved HXXH motif, which, in combination with a predicted Rossman fold, is characteristic of the nucleotidyltransferase superfamily (21,22). Homologs of this central domain were found in the form of a monofunctional protein (Fig. 2C) in many organisms such as S. cerevisiae, A. thaliana, and M. jannaschii in which a gene encoding PPAT had remained uncharacterized. Some of these proteins are annotated as "putative nucleotidyltransferase" or "predicted cytidylyltransferase" based on the weak similarity to bacterial glycerol-3-phosphate cytidylyltransferase. The corresponding gene in yeast (YGR277C) is essential for viability, consistent with its expected functional role in the CoA biosynthetic pathway (Saccharomyces Genome Database). Bacterial PPAT belongs to the same superfamily (23), but PSI-BLAST revealed no overall sequence similarity between any representative of the bacterial PPAT family and eukaryotic or archaeal homologs of the predicted human PPAT. In the absence of sequence similarity, we were able to use the biochemical evidence of a human bifunctional protein to identify a novel PPAT family in eukaryotes and archaea.
The analysis of publicly available human genomic data allows us to establish chromosomal localization of all genes encoding the final four steps of CoA biosynthesis. Most of them exist as single copies, such as PPCS on chromosome 1, PPCDC on chromosome 15, and PPAT/DPCK as well as monofunctional DPCK with unconfirmed function on chromosome 17. Additionally, fragments of PPCS were detected on chromosome 6.
Verification by Complementation in E. coli-Functional verification of the cloned human PPCS (gi 13638573) and PPCDC (gi 14042206) genes and the predicted full-length PPAT/DPCK gene (gi 17981024) was performed using complementation analysis based on genetic footprinting in E. coli. Due to the essentiality of the universal CoA biosynthetic genes in E. coli, 7 in vivo inactivation of these genes by transposon mutagenesis is possible only in the presence of a complementing functional analog provided on an expression plasmid. E. coli strains transformed with expression plasmids containing individual human CoA biosynthetic genes were mutagenized by random insertion of transposons throughout the genome. After selective outgrowth, the total chromosomal DNA from the population of surviving mutants was analyzed for the presence of transposon inserts in E. coli CoA biosynthetic genes by PCR mapping. The results of this analysis are illustrated in Fig. 3. As can be seen, genes coaBC, coaD, and coaE do not contain transposon inserts in the presence of the pPROEX vector used as a control. Multiple transposon insertions appear in these three genes when corresponding human functional analogs are introduced on the expression plasmids. This experiment provided evidence of the biological activity of the predicted human CoA biosynthetic genes. This method of "complementation by genetic footprint-ing" can be extended to address the biological activity of various uncharacterized genes.
Expression and Characterization of Human Recombinant CoA Biosynthetic Enzymes-The three human genes were overexpressed in E. coli, and the corresponding recombinant proteins were purified using standard affinity tag techniques. Human recombinant PPCDC is a bright yellow protein, with a UV spectrum having two maxima at 382 and 458 nm, characteristic of flavins. Although earlier results had classified mammalian PPCDC as a pyruvoyl-dependent enzyme (24), our results are consistent with those for the E. coli bifunctional PPCS/PPCDC protein, where the tight binding of FMN was directly proven by mass spectrometry (7) and led to the proposal of a novel redox mechanism of decarboxylation (25). Our data indicating approximately equimolar FMN binding in the evolutionarily distant human monofunctional PPCDC support the mechanistic importance of the FMN cofactor for the whole family.
Using coupled enzyme assays, we have confirmed the predicted activity of all four human recombinant enzymes. In agreement with previous biochemical data (15), we observed that human PPCS can utilize ATP for the activation of substrate in the ligation reaction four times more efficiently than CTP. In contrast, E. coli PPCS shows a strong preference for CTP over ATP (7). This difference in the cosubstrate specificity may account for the low sequence similarity between human and E. coli PPCS. The biological implications of this difference are yet to be understood.
Due to the structural dissimilarity between human PPAT/ DPCK and the previously characterized bacterial PPAT and the implication that PPAT/DPCK is a point of regulation in CoA biosynthesis, we chose to study this protein in more depth. Kinetic parameters were obtained for both the forward PPAT reaction (4Ј-phosphopantetheine ϩ ATP 3 dephospho-CoA ϩ PP i ) and the reverse reaction (dephospho-CoA ϩ PP i 3 4Јphosphopantetheine ϩ ATP) as well as the DPCK reaction (Table I). The PPAT reaction equilibrium state is established at roughly equimolar concentrations of all substrates and products, beginning from either direction, as revealed by HPLC analysis after extensive incubation with C-terminally truncated human protein depleted of DPCK activity. In the presence of full-length PPAT/DPCK, the adenylyltransferase reaction becomes irreversible due to the consecutive phosphorylation of dephospho-CoA and formation of the final product, CoA. All attempts to express an isolated DPCK domain failed to produce any soluble protein, and the kinase activity was characterized using the full-length PPAT/DPCK protein and dephospho-CoA as a substrate. Moreover, although the function of the N-terminal domain remains unknown, it is absolutely required for the proper expression and folding of the protein when overexpressed in E. coli.
The ultimate verification of the human CoA biosynthetic pathway was achieved by the successful in vitro reconstitution of the four-step biochemical transformation of the committed precursor 4Ј-phosphopantothenate to the CoA cofactor. The reaction mixture contained 4Ј-phosphopantothenate, cysteine, ATP, and 0.25-2.5 g/ml concentrations of all three purified recombinant proteins: PPCS, PPCDC, and PPAT/DPCK. The conversion to CoA approached 100%, as monitored by HPLC (Fig. 4). When the bifunctional PPAT/DPCK enzyme was replaced with a C-terminally truncated enzyme with only PPAT activity, the reaction proceeded only to formation of dephospho-CoA. Incubation with only the first two enzymes quantitatively yields phosphopantetheine as evidenced by AMP formation.
Northern Blot Analysis of Human PPAT/DPCK-We used Northern blot analysis of mRNAs from various human healthy tissues and cancer cell lines to assess the possibility of PPAT/ DPCK regulation at the level of transcription (Fig. 5). The overall level of transcription of PPAT/DPCK in normal tissues varied between different cell types within a range of ϳ10-fold (as normalized by actin), being lowest in peripheral blood leukocytes and highest in kidney and liver, supporting potential transcriptional regulation of PPAT/DPCK activity. One may also notice a general tendency of tumor cells to have a consistently higher overall transcriptional activity of this gene, such that all cancerous tissues examined had higher expression of the PPAT/DPCK mRNA than any of the normal tissues, with the exception of skeletal muscle, liver, and kidney. In examples where tissues may be compared, such as normal lung tissue versus lung carcinoma and normal colon tissue versus colorectal adenocarcinoma, the cancerous tissues expressed 2-4 times the PPAT/DPCK mRNA of the normal tissues.
Two PPAT/DPCK transcripts ϳ2200 and 2600 nucleotides long were detected in all examined tissues, with the possible exception of blood leukocytes. Both transcripts are significantly larger than the full-length coding region (1692 nucleotides). In healthy tissues, a mass distribution between the two transcripts varied from almost 1:1 in the majority of tissues with a relatively low overall expression level to an ϳ3 times predominance of the smaller transcript in kidney and liver and up to an ϳ2 times bias toward the larger transcript in skeletal muscle and placenta. In all tumor cell lines, a larger transcript was the clearly predominant and in some cases the only detectable form of PPAT/DPCK mRNA. DISCUSSION With the sequencing of the human genome complete, the next great challenge has become the identification of genes involved in key cellular pathways. On the basis of comparative genome analysis, we have identified human genes responsible for the last four enzymatic reactions in CoA biosynthesis. The biological functions of these human enzymes were verified by a high-throughput complementation technique based on transposon mutagenesis and genetic footprinting in E. coli. Capitalizing on the previously observed essentiality of E. coli genes involved in these last steps of CoA biosynthesis, we used the cloned human genes (cDNAs) to complement the loss of the endogenous enzymatic functions in E. coli. Introduction of the complementing human CoA biosynthetic genes into the expression vector made the corresponding E. coli genes nonessential as revealed by the appearance of transposon insertions. This approach can be systematically applied for preliminary functional analysis of uncharacterized biosynthetic genes in a number of pathways shared between E. coli and humans.
The individual human CoA biosynthetic enzymes were overexpressed in E. coli and purified, and the predicted activities were experimentally confirmed. Moreover, we have shown that CoA is quantitatively formed in vitro from the committed precursor 4Ј-phosphopantothenate, ATP, and L-cysteine by incubation with a mixture of all four human recombinant enzymes.
Based on sequence similarity, human PPCS, PPCDC, and DPCK belong to the same families as recently described bacterial enzymes (6,7,9). Human PPAT reveals no sequence similarity to the bacterial CoaD family, establishing the existence of two structurally distinct families of nucleotidyltransferases performing adenylation of 4Ј-phosphopantetheine. The prediction and verification of a novel PPAT in humans allowed us to project this function to orthologs from other eukaryotic and archaeal genomes. Identification of human PPAT in the form of a fusion protein with DPCK is in agreement with previous biochemical data (14). The function of the third (N-terminal) domain of this protein is unclear. Based on our experimental data, this domain may be involved in protein folding. Addi-tional roles in intracellular compartmentalization to the mitochondrial matrix as well as interactions with other CoA biosynthetic enzymes or regulatory factors may be hypothesized.
Due to the structural dissimilarity between human PPAT/ DPCK and previously described bacterial enzymes and the implication of PPAT as a possible point of regulation in human CoA biosynthesis, we chose to study this enzyme in more depth. The two half-reactions of the PPAT/DPCK enzymes (transfer of the adenyl moiety and consecutive phosphorylation) reveal comparable catalytic efficiencies as estimated by k cat /K m(app) , suggesting that in vitro both steps are partially rate-limiting (Table I). Previous biochemical data suggested PPAT as a ratelimiting step based on the absence of dephospho-CoA accumulation in vivo (10). This may be reconciled as a potential consequence of the proximity of the two active sites in the bifunctional enzyme, supporting at least a partially non-dissociable mechanism (tunneling). Further kinetic and structural studies are required to assess this possibility. Alternatively, the additional monofunctional DPCK (gi 13623688), which was identified by our homology searches, may be responsible for increasing in vivo efficiency of dephospho-CoA to CoA conversion. Tentative identification of this monofunctional DPCK is in agreement with biochemical data indicating that although both PPAT and DPCK activities are associated with the mitochondrial matrix, only DPCK is present in the intermembrane space and in the outer membrane fraction (26).
To assess the possibility of regulation of the CoA biosynthetic pathway at the level of PPAT/DPCK gene expression, we have analyzed mRNA levels in both normal human tissues and human cancer cell lines using Northern blot hybridization (Fig.  5). The overall expression of the PPAT/DPCK transcript and the distribution between the larger and smaller transcripts varied significantly among the tissues analyzed. As we found only a single copy of a PPAT/DPCK gene in the human genome, the observed microheterogeneity may reflect alternative splicing. Although understanding the physiological role of such differences in PPAT/DPCK gene expression cannot be achieved without further studies, a role is suggested for this enzyme in regulating tissue-specific processes. Additionally, there is an overall tendency of tumor cells to have a greater abundance of PPAT/DPCK mRNA compared with normal tissues. Differential expression of the CoA biosynthetic genes may play a significant developmental role because the level of CoA consumption may vary significantly in various types of cells. One may expect higher production and consumption of CoA in rapidly growing cells, including tumor cells, where fatty acid synthase is known to be significantly up-regulated (see, for example, On the basis of our characterization of four enzymes in human CoA biosynthesis and the previously characterized mammalian pantothenate kinase (10) and human sodium-dependent pantothenate transporter (28), the minimal set of human genes required for the uptake and utilization of vitamin B 5 is presently known. Completion of the human CoA biosynthetic pathway will allow further analysis of its role in health and disease.
The conservation of this pathway in all three kingdoms of life allowed the efficient use of comparative genome analysis to reveal previously uncharacterized human genes. Fig. 2 illustrates remarkable patterns of "mosaicism" relating the evolutionary relationship within CoA biosynthetic genes in various taxons. These patterns reflect conservation, diversification, and multiple gene fusion and "unfusion" events, which may not be easily reconciled with the "tree of life," such as built on the basis of 16 S RNA (29). For example, the first two enzymes (PPCS and PPCDC) in the four-step transformation of 4Ј-phos-phopantothenate to CoA are consistently fused in bacteria (with the exception of streptococci and enterococci) and archaea, but "unfused" in all eukaryotes. Remarkably, one of these two components (PPCDC) is significantly more conserved between eukaryotes and prokaryotes than the other component (PPCS). A similar pattern is observed for the two last steps, with DPCK being significantly more conserved than PPAT. However, with respect to PPAT sequence conservation, archaea cluster with eukaryotes, and not with bacteria. In contrast to the PPCS/PPCDC case, a fusion event for the last two enzymes in the pathway appears to be a relatively modern invention, as it occurs in mammals and flies, but not in plants, fungi, or prokaryotes. Interestingly, in humans, which have both a bifunctional PPAT/DPCK and a monofunctional DPCK, the latter reveals a higher sequence similarity to bacterial monofunctional DPCK (P-score ϳ 10 Ϫ42 with DPCK from Bacillus subtilis) than to the DPCK domain of human bifunctional PPAT/DPCK (P-score ϳ 10 Ϫ19 ). The observed evolutionary mosaicism underscores the importance of using a diverse collection of organisms to increase the predictive power of comparative genome analysis. In particular, only monofunctional PPCS proteins present in a small group of bacteria (streptococci and enterococci) allowed the prediction of a human PPCS, whereas the sequence similarity to PPCS domains present in other bacterial genomes was insufficient to do so.
Despite significant variations in the sequences of individual enzymatic components, the overall topology of the universal CoA biosynthetic pathway appears to be preserved in all taxons. Previous biochemical data prompted the assertion that an alternate CoA biosynthetic pathway exists in S. cerevisiae (30). However, the identification of yeast orthologs for all of the human CoA biosynthetic enzymes described here, in combination with the essentiality of the corresponding S. cerevisiae genes, suggests that the CoA biosynthetic route in yeast is similar to that in E. coli and humans and that the former interpretation needs to be revised.
With the growing number of sequenced genomes, it is becoming apparent that numerous pathways and individual components of central metabolism, such as cofactor biosynthesis, amino acid biosynthesis, fatty acid metabolism, glycolysis, etc., have a remarkable tendency to be conserved across the three kingdoms. This tendency, along with the wealth of genomic sequencing data and advanced tools for comparative analysis, has significantly improved our ability to accurately identify entire biosynthetic pathways in complex organisms, as illustrated here for human CoA biosynthesis.