Genomic Organization of the Mouse and Human Genes Encoding the ATP Sulfurylase/Adenosine 5′-Phosphosulfate Kinase Isoform SK2*

Mammalian ATP sulfurylase/adenosine 5′-phosphosulfate (APS) kinase consists of kinase and sulfurylase domains, and catalyzes two sequential reactions to synthesize the universal sulfate donor, phosphoadenosine phosphosulfate (PAPS). In simpler organisms, the ATP sulfurylase and APS kinase reactions are catalyzed by separate enzymes encoded by two or three genes, suggesting that a fusion of separate genes during the course of evolution generated the bifunctional enzyme. We have characterized the genomic structure of the PAPS synthetase SK2 isoform genes for mouse (MSK2) and human (HSK2) and analyzed the possible fusion region. The MSK2 and HSK2 genes exhibit a common structure of 13 exons, including a 15-nucleotide alternatively spliced exon 8. Enzyme activities of several bacterially expressed exon assemblages showed exons 1–6 encode APS kinase, while exons 6–13 encode ATP sulfurylase. The MSK2 construct without the exon 6-encoded peptide showed no kinase or sulfurylase activity, demonstrating that exon 6 encodes sequences required for both activities. Exon 1 and its 5′-flanking sequence are highly divergent between the two species, and intron 1 of the HSK2 gene contains a region similar to the MSK2 promoter sequence, suggesting that it may be the remnant of a now-superceded regulatory region. The HSK2 promoter contains a GC-rich region, not present in the mouse promoter, and has few transcription factor binding sites in common with MSK2. These differences in the two promoter regions suggest that species-specific mechanisms regulate expression of the SK2 isoform.

Mammalian ATP sulfurylase/adenosine 5-phosphosulfate (APS) kinase consists of kinase and sulfurylase domains, and catalyzes two sequential reactions to synthesize the universal sulfate donor, phosphoadenosine phosphosulfate (PAPS). In simpler organisms, the ATP sulfurylase and APS kinase reactions are catalyzed by separate enzymes encoded by two or three genes, suggesting that a fusion of separate genes during the course of evolution generated the bifunctional enzyme. We have characterized the genomic structure of the PAPS synthetase SK2 isoform genes for mouse (MSK2) and human (HSK2) and analyzed the possible fusion region. The MSK2 and HSK2 genes exhibit a common structure of 13 exons, including a 15-nucleotide alternatively spliced exon 8. Enzyme activities of several bacterially expressed exon assemblages showed exons 1-6 encode APS kinase, while exons 6 -13 encode ATP sulfurylase. The MSK2 construct without the exon 6-encoded peptide showed no kinase or sulfurylase activity, demonstrating that exon 6 encodes sequences required for both activities. Exon 1 and its 5-flanking sequence are highly divergent between the two species, and intron 1 of the HSK2 gene contains a region similar to the MSK2 promoter sequence, suggesting that it may be the remnant of a now-superceded regulatory region. The HSK2 promoter contains a GC-rich region, not present in the mouse promoter, and has few transcription factor binding sites in common with MSK2. These differences in the two promoter regions suggest that species-specific mechanisms regulate expression of the SK2 isoform.
Biosynthesis of the universal sulfate donor, phosphoadenosine phosphosulfate (PAPS), 1 requires two sequential enzymatic reactions: the transfer of sulfate to ATP to form APS catalyzed by ATP sulfurylase (EC 2.7.7.4), followed by transfer of phosphate to APS yielding PAPS catalyzed by APS kinase (EC 2.7.1.25) (1). The ATP sulfurylase and APS kinase reactions are catalyzed by separate proteins in organisms such as bacteria (2)(3)(4), fungi (5,6), yeast (7,8), and plants (9 -11). In contrast, the catalytic units for the ATP sulfurylase and APS kinase reactions are fused in animals; cDNAs encoding a single bifunctional protein have been isolated in the marine worm Urechis caupo (12), in Drosophila melanogaster (13), and in several mammalian species (14 -17). Recombinant individual ATP sulfurylase or APS kinase domains of a mouse PAPS synthetase retain the ability to catalyze the respective reactions (18), implying that the ancestral PAPS synthetase gene was formed by fusion of separate genes encoding the independent functional units.
Variations in the structural organization of monofunctional and bifunctional PAPS synthetase genes in simpler organisms and animals, respectively, likely reflect differing histories of gene fusion/duplication events in evolution. For instance, the sulfate activation operon in Escherichia coli encodes ATP sulfurylase (cysD and cysN), followed by APS kinase (cysC) (3). In a symbiotic nitrogen-fixing bacterium, Rhizobium meliloti, there are only two genes; nodP is homologous to cysD, and encodes a subunit of ATP sulfurylase, while nodQ is homologous to cysN and cysC, encoding both a subunit of ATP sulfurylase and APS kinase (4). In a filamentous fungus, Penicillium chrysogenum, the ATP sulfurylase gene encodes the ATP sulfurylase domain plus a nonfunctional APS kinase-like domain at the C terminus (5), reminiscent of the cysN-cysC fusion in R. meliloti nodQ. A functional APS kinase is encoded by a separate gene in P. chrysogenum as well. In plants and yeast, the functional catalytic units are encoded by two separate genes with no combination of domains. In the fused PAPS synthetases, the structural orientation of the two functional domains is the reverse of the nodQ arrangement with APS kinase Nterminal of the ATP sulfurylase, suggesting an unrelated origin of gene fusion. Our previous activity and stability studies with rearranged recombinant enzymes showed that reversing the order of the domain organization resulted in either diminished activity or a thermally unstable enzyme (18). Such a structural or functional advantage could be a possible selective factor for gene fusion in a specific order. Thus far, the complete genomic structure of the fused PAPS synthetase genes has not been determined; further, this additional information is needed to gain insights into the mechanism of the gene fusion that brought two separate genes into a single gene encoding a bifunctional PAPS synthetase.
The PAPS synthesis pathway plays an important role in normal cartilage and skeletal development in mammals as demonstrated by the recent identification of PAPS synthetase isoforms (14,16) and of mutations in one member of the PAPS synthetase gene family, SK2, in murine brachymorphism and human spondyloepimetaphysial dysplasia (16,17). Brachymorphism is characterized by a dome-shaped skull, short thick tail, and shortened but not widened limbs (19), and is associated with limited PAPS availability due to a reduction of PAPS synthetase activity (20 -22). Interestingly, severely reduced PAPS synthetase activity was found in brachymorphic cartilage and liver, while no reduction was observed in brachymorphic brain and skin (23). Spondyloepimetaphysial dysplasia (Pakistani type) is characterized by short and bowed lower limbs, enlarged knee joints, and early onset of degenerative joint disease in the hands and knees (24). The presence of two PAPS synthetase genes and the tissue-specific defects in PAPS synthesis in mammalian mutants implicate coordinated mechanisms that control the expression of the PAPS synthetase genes.
To gain better insights into the origin and regulation of the PAPS synthetase gene, we determined the mouse and human genomic structures of one PAPS synthetase gene family member, SK2, whose importance in development is highlighted by both human and murine growth disorders due to mutations in this gene. Recombinant monofunctional ATP sulfurylase and APS kinase proteins and a fused protein with an internal deletion were produced to determine the distribution of exons for each catalytic domain. We also analyzed the 5Ј-flanking region sequences of the genes for potential transcription binding sites to guide future functional expression studies.

MATERIALS AND METHODS
Analysis of the Mouse and Human SK2 Genomic Structure-A mouse C57BL/6 genomic BAC library (GenomeSystem, St. Louis, MO) was screened with a 32 P-labeled mouse SK2 cDNA fragment following the manufacturer's instructions. A human genomic BAC library, release I (GenomeSystem) was screened by PCR with a set of human SK2 primers derived from cDNA sequence (accession no. AF173365 reported in the present study). Two mouse BAC clones, BACSK2#1 and BACSK2#5, and one human BAC clone, BAChSK2, were isolated and various plasmid subclones were prepared. Direct sequencing and PCR amplification with cDNA-derived primers were performed to determine the genomic structure, comprising exon-intron boundaries and the intron sizes. Search of publicly available human genomic sequences identified two BAC clone, CIT-HSP-306M19 (Caltech Genome Research Laboratory) and AC006191 (Human Chromosome 10 Sequencing Group, Sanger Center), both containing exons 1 through 9 of the HSK2 and at least 8 kb from the upstream 5Ј-flanking region of the gene. The sequence data obtained from BAChSK2, CIT-HSP-306M19, and AC006191 were used to determine the complete human genomic structure. The TESS (Transcription Element Search Software) program was used to search the sequence for high quality matches to a data base of position-weighted nucleotide distribution matrices.
Alternatively Spliced Mouse and Human SK2 Open Reading Frame cDNA-Reverse transcription-PCR was performed on total RNA extracted from 3-day-old C57BL mouse cartilage. The first strand was synthesized by SuperScript II Reverse Tanscriptase (Life Technologies, Inc.) using an antisense primer (SK5, 5Ј-GCAATTGGATACAGAG-CAGC-3Ј) complementary to the 3Ј-untranslated region of SK2 mRNA. The complete open reading frame cDNA was amplified with a 5Ј end sense primer containing a NdeI site (SK30, 5Ј-AGAGAGTTCCATATG-TCTGCAAATTCCAAAATGAACCATAAAAGAGACCAGC-3Ј) and a 3Ј end antisense primer containing a XhoI site (SK31, 5Ј-GAGAGAGAT-TCGAGCTAGTTGGTCTTCTCCAGAGACCTGTAGTAATCTGTCAAC-AC-3Ј) using Expand polymerase (Roche Molecular Biochemicals). Reverse transcription-PCR total cartilage RNA yielded a single band approximately 1.9 kb in length. The fragment was digested with NdeI and XhoI and cloned into appropriate restriction sites in a bacterial expression vector, pET-15b (Novagen). Sequence was determined using the dRhodamine terminator cycle sequencing kit (Perkin-Elmer) and ABI model 377 DNA sequencer (Applied Biosystems). The sequence was compared with previously reported mouse SK2 cDNA (accession no. AF052453) prepared from liver RNA.
In order to isolate a human SK2 cDNA, the deduced amino acid sequence of mouse SK2 (accession no. AF052453) was used as a BLAST query sequence against the human EST data base maintained by the National Center for Biotechnology Information at the National Library of Medicine. Nucleotide sequences from several EST clones suspected to contain human SK2 cDNA were used to generate oligonucleotide primers. The 5Ј end sequence of a human SK2 cDNA was obtained by the 5Ј-inverse PCR method (25) using human fetal liver poly(A) ϩ RNA (CLONTECH). The complete open reading frame for a liver HSK2 cDNA was amplified from Marathon-ready human fetal liver cDNA (CLONTECH) with a 5Ј end sense primer containing a NdeI site (HSK2-16, 5Ј-AGAGAGTTCCATATGTCGGGGATCAAGAAGC-3Ј) and a 3Ј end antisense primer containing a BglII site (HSK2-4, 5Ј-GAGA-GAGATCTCGAGTTAGTTCTTCTCCAGGGACCTGTAATAATCTG-3Ј) using Expand polymerase (Roche Molecular Biochemicals). The sequence was compared with a previously reported HSK2 cDNA (accession no. AF091242) prepared from cartilage RNA (17).

RESULTS
Genomic Structure of the MSK2 and HSK2 Genes-The genomic structure of one PAPS synthetase gene family member, SK2, was determined both in mouse and human. A mouse BAC clone, BACSK2#1, contained exons 2-13, while another BAC clone, BACSK2#5, contained the mouse exon 1 and 5Јflanking regions. One human BAC clone, BAChSK2, contained the whole gene. Two additional BAC clones, CIT-HSP-306M19 (Caltech Genome Research Laboratory) and AC006191 (Human Chromosome 10 Sequencing Group, Sanger Center) con-tained exons 1-9 of the HSK2 gene. The sequence data obtained from these five clones were used to determine the whole genomic structure for the MSK2 and HSK2 genes.
The MSK2 and HSK2 genes have identical exon organizations, consisting of 13 exons (Fig. 1A) with a 15-nt alternatively spliced exon 8 ( Fig. 2; see below). The minimum sizes of the genes are 40 and 85 kb for the MSK2 and HSK2 genes, respectively. Nucleotide sequences of the intron/exon boundaries show that all introns in the MSK2 and HSK2 genes begin with a 5Ј-GT dinucleotide and conclude with 3Ј-AG termini ( Table I). The size of exons and phases of splicing are also conserved between the species, with the exception of exon 1. The deduced amino acid sequences at the N termini encoded by exon 1, as well as the nucleotide sequences of the 5Ј-flanking regions, are significantly different between MSK2 and HSK2 (Fig. 4). By scanning the intron 1 sequence of the HSK2 gene, we identified a 800-bp sequence 25 kb downstream from exon 1, which is similar to the MSK2 5Ј-flanking region sequence (68% identity). However, this region does not appear to encode a peptide corresponding to the MSK2 N-terminal sequence, or to contain transcription factor binding motifs as found in the MSK2 promoter region, and thus may be the remnant of a former promoter.
Comparison of the exon structures to the functional domains of PAPS synthetases reveals that the APS kinase domain is encoded by exon 1 through exon 6, base 39, and the ATP sulfurylase domain is encoded by exon 6, base 40, through exon 13. The intervening sequence present in the functional APS kinase domain is encoded by exon 5, base 48 through exon 6, base 39. An alternatively spliced 15-nt sequence comprises one exon, designated as exon 8 (Fig. 2).
Exon distribution of several functional motifs was also analyzed. The GXXGXGK motif essential for ATP binding (P-loop) (29) is found in the APS-kinase domain ( 50 GLSGAGK 56 ). The first base of the G 50 codon is encoded by exon 2, and the rest of the P-loop codons reside in exon 3, forming a phase I splicing.
Our previous study on mutational analysis of the P-loop in mouse SK1 showed that a mutation of glycine 50 to alanine did not affect any of the enzyme activities (30), indicating that the functionally essential residues of the P-loop are solely encoded by exon 3. Another functional unit, a PAPS-dependent enzyme motif, KAXAXXXXFTG, in mouse SK2 is found as 166 KRARA-GEIKG 175 FTG 178 in the kinase portion. The first base of the G 175 codon is encoded by exon 4, and the rest of the motif is encoded by exon 5, forming a phase I splicing. The FISP motif (31), HXXH motif (phosphodiester cleavage) (32) and PP-loop (IVGRDPAG, pyrophosphate binding) (33) motif are each encoded entirely by exons 3, 11, and 12, respectively. Functionally essential residues in each of these form motifs have been identified. 2 Alternatively Spliced SK2 Form Isolated Mouse and Human-In previous studies of mouse and human SK2 cDNAs (16,17), the deduced amino acid sequences showed high sequence identity (93%). However, HSK2 lacks five amino acids in the ATP-sulfurylase domain, and the first 9 or 11 N-terminal residues are dissimilar between MSK2 and HSK2. Amplification of MSK2 partial cDNA generated from cartilage RNA identified a MSK2 cDNA variant, referred to as MSK2⌬8 (Fig.  2). This cartilage cDNA lacks 15 nt at positions 869 -883, predicting a deletion of the five amino acids, GVVPR, from residues 290 -294, in the ATP sulfurylase domain. Analysis of the genomic sequence shows that this 5-amino acid segment is encoded by an exon with typical flanking donor and acceptor sequences (Table I). PCR analysis of MSK2 gene expression showed that MSK2⌬8 is the major form in cartilage, while the original MSK2 is the major form in liver (data not shown). We have prepared liver HSK2 cDNA, and the sequence was compared with HSK2 cDNA isolated from a cartilage cDNA library (accession no. AF091242). The liver HSK2 cDNA is equivalent to MSK2 and contains the 15-nt exon 8 sequence, while the cartilage HSK2 protein is equivalent to MSK2⌬8, lacking the 5-amino acid sequence, GMALP from residue 289 to 293, encoded by exon 8. Both MSK2 and MSK2⌬8 proteins were bacterially expressed, and the ability of these proteins to catalyze the APS kinase, ATP sulfurylase, and overall PAPS synthetase reactions in vitro was compared. No significant difference between the two proteins was observed in any of the three assays (see below).
Recombinant Enzyme Assays-In order to determine the distribution of exons encoding each enzyme catalytic unit, several expression constructs with different exon compositions were prepared, and the expressed proteins subjected to enzyme assays (Fig. 3). Three assays were used to assess expression levels for the recombinants as well as the functionality of each mutant: the reverse sulfurylase, the forward kinase, and the overall reaction, which measures APS and PAPS simultaneously. The full-length proteins MSK2 and MSK2⌬8 were soluble proteins, and after purification and dialysis were assayed for the three activities (see "Materials and Methods" for 2   details). MSK2 exhibited normal sulfurylase (2.35 mol of ATP/min⅐mg), kinase (227.9 pmol of PAPS/min⅐mg), and overall (13.6 nmol of PAPS/min⅐mg) activities. Alternatively spliced isoform MSK2⌬8 also exhibited all three activities comparably to MSK2. The APS kinase constructs APSK2 1-5.5 and APSK2 1-5 were both soluble in IMAC5 buffer, but APSK2 1-5 exhibited decreased binding to His⅐Bind resin. APSK2 1-5 also failed to exhibit kinase activity, whereas an APS kinase construct with C-terminal amino acids encoded by exon 6 (APSK2 1-5.5 ) showed APS kinase activity comparable to MSK2. This result shows that the first 13 residues of the linker sequence encoded by exon 6 are required for the APS kinase activity. The ATP sulfurylase construct ATPS2 6 -13 was solu-ble, whereas ATPS2 7-13 was found in the bacterial pellet and had to be solubilized with 6 M urea followed by dialysis into phosphate buffer. When tested for sulfurylase activity, ATPS2 6 -13 showed normal sulfurylase activity while ATPS2 7-13 was inactive. To ensure that the urea treatment did not cause irreversible denaturation, ATPS2 6 -13 was also subjected to urea treatment and reequilibration. Urea-solubilized ATPS2 6 -13 was initially inactive, but after dialysis into phosphate buffer, activity comparable to the non-urea-treated enzyme preparations was restored. The MSK2⌬6 construct was designed in order to test whether additional sequences at the C terminus of APSK2 [1][2][3][4][5] and N terminus of ATPS2 7-13 would increase enzyme solubility and restore the kinase and sulfurylase activities, respectively. MSK2⌬6 was insoluble, and following urea extraction and dialysis into phosphate buffer, the recombinant protein was still unable to catalyze ATP sulfurylase or APS kinase reactions. This demonstrates a requirement for exon 6 sequences similar to that observed for the separate domain constructs.
Sequence Analysis of the 5Ј-Flanking Region of the MSK2 and HSK2 Genes-The MSK2 (1 kb) and HSK2 (2ϩ kb) 5Јflanking sequences were examined for the presence of promoter elements. The TESS (Transcription Element Search Software) program was used to search the sequence for high quality matches to a data base of position-weighted nucleotide distribution matrices. Analysis of 1 kb of the MSK2 gene immediately flanking the translation start site detected potential binding motifs for a variety of transcription factors (Fig. 4A). These include response elements for progesterone receptor, EBP-1, NF-B, MyoD, PU.1, CREB, Pit-1a, CCAAT-binding protein, NF-Y, Evi-1, TIN-1, and GR␣␤. With regard to the HSK2 gene, the 5Ј-flanking sequences of the HSK2 gene differ significantly from those of the MSK2 gene (Fig. 4B). The HSK2 proximal region (2000 bp) has a subregion with high content of G ϩ C (73% in the 500-bp proximal region) and at least nine potential Sp-1 binding sites, present in many constitutive gene promoters. Although not an exhaustive list, potential binding sites in  3. Schematic diagram of bacterially expressed mouse SK2 constructs and enzyme activities. The APS kinase domain is depicted as a white box, the ATP sulfurylase domain as a stippled box, and the 5-amino acid insert as a black box. The striped boxes represent the interdomain region. The intact sulfurylase/kinase (MSK2 1-13 ) is encoded by exons 1-13, while MSK2⌬8 lacks the 5 amino acid residues encoded by exon 8. APSK2 1-5.5 includes the APS-kinase domain and the first 13 amino acids encoded by exon 6, while APSK2 1-5 is an APS kinase domain encoded by exon 1-5 only. ATPS2 6 -13 is an ATP sulfurylase domain encoded by exon 6 -13, while ATPS2 7-13 lacks the region encoded by exon 6. MSK2⌬6 is MSK2 without the exon 6-encoded region. The results of in vitro enzyme assays for each construct are shown at the left. Y represents the presence of the activity, while N represents the lack of the activity. An asterisk (*) represents expressed protein that failed to be released into the reaction buffer and had to be solubilized as described.
the HSK2 include NF-E2, LyF-1, MEF-2, Pit-1a, AP-1, TFE3-S, USF, CAC-binding protein, NFAT-1, and CCAAT-binding protein. Only a few detected motifs (Pit-1a, CCAAT-binding protein) were common to both MSK2 and HSK2 5Ј-flanking regions. DISCUSSION A number of eukaryotic multifunctional enzymes have been shown to consist of several monofunctional catalytic domains, each of which are encoded by a single gene in simpler organisms (18, 34 -38). When expressed as monofunctional domains, each domain often retains its activity, suggesting that the genes encoding each catalytic unit were fused during the course of evolution (18,39,40). It is also common that these presentday multidomain enzymes have interdomain segments that are not homologous to any protein sequences and display no clear boundary of those domains (14,36,37,41,42). In addition, structure comparisons, as well as phylogenetic analysis of mul- tidomain proteins in different organisms, show a striking variety of gene fusion events. In the case of PAPS synthetases, the APS kinase domain is located at the N terminus and ATP sulfurylase at the C terminus linked by a short non-homologous intervening sequence, and the boundary of the two functional catalytic units has been established (18). The structural organization of genes encoding APS kinase and APS sulfurylase in E. coli and fungi suggests the simple gene fusion of adjacent genes, while the reverse domain structure in the present-day bifunctional PAPS synthetases implicates a historically unrelated gene fusion event. Elucidation and analysis of the genomic structures of the mammalian and Caenorhabditis elegans PAPS synthetase genes provide a possible mechanism for how these bifunctional enzymes have evolved.
The MSK2 and HSK2 genes are structurally very similar, consisting of 13 exons with some variation in intron sizes. Exon sizes and intron phases are also conserved between the species, with the exception of exon 1. The exact sizes of the genes have not been determined, but the minimum sizes are 45 and 80 kb for the MSK2 and HSK2 genes, respectively. We have identified two splicing variants in mouse and human SK2 cDNAs, and the alternatively spliced 15 nt was identified as an exon in both genomes ( Figs. 1 and 2). The SK2⌬8 proteins lack 5 amino acids, GVVPR in MSK2 and GMALP in HSK2 in the ATP sulfurylase domain. A similar 15-nt/5-amino acid insertion, MDGSY, is found in D. melanogaster, and a larger 19 -22amino acid insertion in C. elegans PAPS synthetase at the same location. Consistent occurrence of a splice variant at this position among the different species implies some functional significance; however, no differences in the ability of MSK2 and MSK2⌬8 to catalyze ATP sulfurylase, APS kinase, or PAPS synthetase overall reactions were detected. Additionally, there is no apparent motif associated with the inserted sequences; therefore, the insert may not influence enzyme function directly.
Relative to the exon distribution, the APS kinase domain is encoded by exons 1-6, while the ATP sulfurylase domain is encoded by exon 6 -13. Based on the recombinant monofunctional constructs of the mouse SK1 protein, the first 236 amino acid residues constitute a functional APS kinase domain, while the ATP sulfurylase domain consists of residues 237-624 (18). The functional boundary between the APS kinase and ATP sulfurylase is thus located in exon 6 of the MSK2 gene. In fact, exon 6 encodes only the last 13 C-terminal amino acid residues of the APS kinase. Initially, because of the location of the functional boundary relative to the location of the exon 5/6 boundary and the phase 0 exon insertion of intron 5, we suspected that the exon 5/6 boundary might be the ancestral fusion site where the two separate genes were fused. However, the bacterially expressed protein encoded by exons 1-5, APSK2 1-5 , failed to exhibit APS kinase activity, and APS kinase activity was restored only when an additional 13 amino acid residues encoded by exon 6 were added to the APSK2 1-5.5 protein. The failure of the MSK2⌬6 protein to catalyze both APS sulfurylase and APS kinase reactions further supports the contention that the exon 5/6 boundary does not reflect the ancestral fusion site.
In fact, examination of the C. elegans PAPS synthetase gene, which consists of only six exons, with the large exon 4 encoding a part of both the APS kinase and ATP sulfurylase domains, further suggests random insertion of introns. Since there is no correspondence in the location of the introns between the C. elegans PAPS synthetase gene and the mammalian PAPS synthetase gene (Fig. 1B), it is plausible that introns were inserted randomly and independently into an intron-less ancient fused PAPS synthetase gene in different organisms. Alternatively, pre-existing introns could have been lost as well. Locations of introns are not necessarily coincident with the boundaries of catalytic units in other bi-or multifunctional enzymes (43,44). Introns appear to be able to be located proximal to the functional boundary as seen in PAPS synthetases, or distal as seen in the UMP synthetase gene (43). Although introns appear to have been introduced randomly, it is interesting that most of the highly conserved functional motifs that this complicated bifunctional enzyme contains, including FISP, phosphodiester (HXXH) motif, and PP-loop, are all contained within single exons. An exception is Gly 50 of the P-loop, which is encoded by exon 2 while the rest of the motif is encoded by exon 3. However, we have shown that Gly 50 is not essential for kinase activity, and therefore may not be part of this common motif in PAPS synthetases.
Another interesting finding from the genomic sequence data for the two species is the lack of sequence similarity in exon 1 and its 5Ј-flanking regions. For instance, nine amino acid residues, MSGIKKQKT, are encoded by HSK2 exon 1, while 11 residues, MSANSKMNHKR, are encoded by MSK2 exon 1. The 5Ј-flanking region of the HSK2 gene is rich in GC sequence, characteristic of a housekeeping-type promoter with multiple Sp-1 binding sites, while the MSK2 gene contains several potential recognition sites for tissue-specific transcription factors, such as progesterone receptor, MyoD, PU.1, GC1, Pit-1a, Evi-1, TIN-1, and GR␣␤. Even though the significance of these transcription factor binding motifs must await functional analysis, the highly heterologous promoter sequences of the two orthologous genes implies different origins and modes of regulation of the promoter. A stretch of 800 bp in intron 1 of the HSK2 gene is 68% identical to the proximal promoter region and exon 1 of the MSK2 gene but lacks a translatable or spliceable exon, suggesting it is a former exon/promoter, which was once an alternative or sole promoter. It is not unusual to have more than one promoter in many genes, and some promoters are preferentially used in a tissue-specific manner. We still cannot preclude possible SK2 transcript variants differing at the 5Ј end sequence due to the usage of alternative promoters in either species. It is possible that the mammalian SK2 gene had two promoters, and that the HSK2 gene used the upstream promoter, while the MSK2 used the downstream promoter as the major promoter, creating dissimilar N-terminal SK2 so far identified in human and mouse. Alternatively, the ancestral mammalian gene may have had only one promoter, and an additional or new promoter was acquired by the primate line. A bacterially expressed mouse SK1 construct starting from the second methionine codon, which is equivalent to SK2 without the exon 1-encoded sequence, showed equivalent enzyme activity to the first methionine SK1 construct (30). Thus, it is likely that the sequence encoded by exon 1 is catalytically nonessential, allowing divergence of the region. The dissimilarity of the promoter sequence of the orthologous mouse and human genes also suggests different spatio-temporal expression patterns of the gene in the two species. Although no data are currently available for the SK1 isoform, the failure of SK1 to fully complement the SK2 mutation in the brachymorphic mouse (16) and human spondyloepimetaphysial dysplasia (17) suggests differential roles for the members of this gene family as well. Analysis of expression patterns of the PAPS synthetase genes and functional analysis of its promoters will provide the specific roles of each PAPS synthetase in each species. In sum, we have provided the first report on the genomic structure, intron/ exon mapping, alternative spliced forms, correlation between functional protein domains and genomic organization, and initial promoter analysis for the PAPS synthetase gene family.