Molecular Cloning and Characterization of a Dermatan-specific N -Acetylgalactosamine 4- O -Sulfotransferase*

We have identified and characterized an N -acetyl-galactosamine-4- O -sulfotransferase designated derma-tan-4-sulfotransferase-1 (D4ST-1) (GenBank TM accession number AF401222) based on its homology to HNK-1 sulfotransferase. The cDNA predicts an open reading frame encoding a type II membrane protein of 376 amino acids with a 43-amino acid cytoplasmic domain and a 316-amino acid luminal domain containing two potential N -linked glycosylation sites. D4ST-1 has significant amino acid identity with HNK-1 sulfotransferase (21.4%), N -acetylgalactosamine-4- O -sulfotrans-ferase1(GalNAc-4-ST1)(24.7%), N -acetylgalactosamine-4- O -sulfotransferase 2 (GalNAc-4-ST2) (21.0%), chon-droitin-4- O -sulfotransferase 1 (27.3%), and chondroitin-4-O -sulfotransferase 2 (22.8%). D4ST-1 transfers sulfate to the C-4 hydroxyl of (cid:1)

The oligosaccharide moieties on glycoproteins, glycolipids, and proteoglycans are frequently selectively modified with sulfate. The addition of sulfate is temporally and spatially regulated, indicating that the presence of sulfate in many cases confers biological function to these saccharides (1)(2)(3). Chondroitin sulfate proteoglycans consist of repeating units of (GlcUA␤1,3GalNAc␤1,4) n 1 that are sulfated on the 4-hydroxyl of GalNAc (chondroitin sulfate A), the 6-hydroxyl of GalNAc (chondroitin sulfate C), or both the 4-and 6-hydroxyls of Gal-NAc (chondroitin sulfate E). Variable amounts of GlcUA in chondroitin are converted to IdoUA by glucuronylC5-epimerase to produce dermatan sulfate, also referred to as chondroitin sulfate B, a glycan consisting of (IdoUA␣1,3GalNAc␤1,4) n and (GlcUA␤1,3GalNAc␤1,4) n disaccharides (4,5). The proportion of IdoUA and GlcUA in dermatan varies considerably in tissues and is influenced by the level of free sulfate in the medium of cultured cells (6). In addition to sulfation of the GalNAc on the C-4 hydroxyl, the GalNAc in dermatan can also be sulfated at the C-6 hydroxyl (7), and the IdoUA can be sulfated at the C-2 hydroxyl (8). While chondroitin and dermatan glycans are selectively expressed on a number of proteoglycans, many aspects of the regulation of their synthesis remain to be elucidated. In light of the many functions attributed to proteoglycans bearing dermatan and chondroitin oligosaccharides (9,10), developing an understanding of the regulation of their synthesis will yield information about the biologic significance of these complex sulfated structures.

Isolation of a cDNA Encoding Human Dermatan 4-O-Sulfotransferase 1-A 1131
-bp open reading frame (ORF) encoding a novel sulfotransferase, designated D4ST-1, was identified using FGENES (version 1.6) (17). A single exon gene on the genomic clone RP11-64K12 with the GenBank TM accession number AC013356 (see "Results") was predicted. PCR was used to amplify this 1131-bp ORF from human brain cDNA (see below). Filtering of repetitive elements (Repeat Masker, version 07/16/00, University of Washington Genome Center) and BLASTN (version 2.1.3) searches against the databases of GenBank TM , EMBL, and DDBJ expressed sequence tag divisions (dbEST) at the NCBI were carried out on this genomic clone to verify exonic sequences of D4ST-1 and to locate exons of neighboring genes. CpG islands as defined by Gardiner-Garden and Frommer (18) were located in genomic sequences 5Ј of the D4ST-1 exon using the WWWCPG program (19).
5Ј-and 3Ј-rapid amplification of cDNA ends using Marathon Ready TM cDNA (adult kidney; BD Biosciences-CLONTECH, Heidelberg, Germany) as a template was performed to obtain a full-length sequence. The 3Ј-region was obtained by amplifying the cDNA using a primer consisting of nucleotides 130 -153 (D4ST-1-GSP1; 5Ј-ATG CTG ATG TTT GCG GTG ATC GTG-3Ј) and the AP1 anchor primer (CLON-TECH), followed by a nested PCR (30 cycles) using a primer consisting of nucleotides 473-499 (D4ST-1-NGSP1; 5Ј-CCT GCT CTA ACT GGA AGC GGG TGA TGA-3Ј) and the AP2 anchor primer (CLONTECH). The DNA products were gel-purified (Life Technologies, Inc.), cloned into pCR-TOPO 2.1 (Invitrogen, Karlsruhe, Germany), and sequenced on both strands. Amplified sequences were assembled using the Lasergene (DNASTAR Inc., Madison, WI) software suite. The 5Ј-UTR could not be amplified using a number of different primer combinations.
The protein sequences of HNK-1 sulfotransferase family members were analyzed using the ClustalW (version 1.4) algorithm (20) imple-mented in the Bioedit suite to perform multiple alignments. Pairwise alignments were executed using the Smith-Waterman algorithm (21) implemented in the European Molecular Biology Open Software Suite at the European Bioinformatics Institute.
Amplification of ORF of D4ST-1 and Construction of pcDNA3.1-D4ST-1-The ORF of D4ST-1 was amplified from a SuperScript TM human fetal brain cDNA library (Life Technologies) by PCR using 1) the 5Ј-specific primer 5Ј-ATA TGA ATT CGC CAC CAT GTT CCC CCG CCC GCT GAC CC-3Ј containing a EcoRI site, the consensus Kozak sequence GCCACC (22), and a start codon and 2) the 3Ј-specific primer 5Ј-ATA TTC TAG ATC ACT GCT GAC ACG CCT CCT TGG TGA CA-3Ј containing a stop codon and an XbaI site. PCRs were carried out using PfuTurbo ® HotStart polymerase (Stratagene, Amsterdam, The Netherlands) with an initial denaturation for 2 min at 96°C, followed by 35 cycles of a reaction consisting of 30-s denaturation at 96°C, 60-s annealing at 60°C, and 120-s of elongation at 72°C. The PCR fragment containing the ORF with the expected length of 1131 bp was directionally subcloned into the eukaryotic expression vector pcDNA3.1 (Invitrogen), completely sequenced on both strands, and designated pcDNA3. 1-D4ST-1.
Based on the sequence of GenBank TM entry AJ289131, the primers 5Ј-CCC GGG GCA GGA TGA CCA-3Ј and 5Ј-TCC AGG CAC GCG AGA AAA AG-3Ј were used to amplify (40 cycles) the complete ORF of human C4ST-2 including adjacent untranslated regions from Marathon Ready cDNA (CLONTECH) derived from human adult testis. The DNA product was gel-purified (Life Technologies) and used as a template for a nested PCR with the primers 5Ј-ATA TGA ATT CGC CAC CAT GAC CAA GGC CCG GCT GTT-3Ј and 5Ј-ATA TCT CGA GTC AGT CTC GGA GGA GGT TTT CGG-3Ј, containing restriction enzyme sites (EcoRI and XhoI, respectively) and, in the case of the forward primer, the consensus Kozak sequence in front of the start codon. The amplified sequence was directionally subcloned into the eukaryotic expression vector pcDNA3.1 (Invitrogen), completely sequenced on both strands, and designated pcDNA3.1-C4ST-2. Transient Expression of Human D4ST-1, C4ST-1, and C4ST-2-CHO/Tag cells were transfected with 13 g of pcDNA3.1-D4ST-1, pcDNA3.1-C4ST-1, pcDNA3.1-C4ST-2, or pcDNA3.1 using 35 g of LipofectAMINE (Life Technologies) in serum-free medium for 6 h according to the manufacturer's protocol. Sixty hours after transfection, the cells and medium were collected separately for analysis. Cells were lysed with 200 l of 20 mM HEPES buffer, pH 7.4, 5 mM MgCl 2 , 175 mM KCl, 2% Triton X-100, and protease inhibitors (23 millitrypsin inhibitor units of aprotinin and 4 g each of leupeptin, antipain, pepstatin, and chymostatin) per 100-mm diameter culture plate. The homogenate was mixed by rotation for 1 h and sedimented at 12,000 ϫ g for 20 min. The supernatant was designated as the cell extract. The culture medium was pooled and sedimented at 12,000 ϫ g for 20 min. The culture supernatant was adjusted to a final concentration of 20 mM HEPES, pH 7.4, and protease inhibitors were added as noted above.
Northern Blot and Expression Array Analysis-Human Multiple Tissue Northern (MTN®) blots and Human Multiple Tissue Expression (MTE TM ) arrays were purchased from CLONTECH. They were hybridized with 5-15 ϫ 10 6 cpm of a specific ␣-32 P-labeled cDNA probe ([␣-32 P]dCTP and Megaprime TM labeling kit purchased from Amersham Pharmacia Biotech) and washed according to the manufacturer's specifications. The membranes were exposed to Biomax MS films (Eastman Kodak Co.) for 2-5 days at Ϫ80°C with intensifying screens. The 425-bp probe used for the labeling reactions corresponds to nt 673-1097 of the D4ST-1 cDNA (GenBank TM accession number AF401222).
Further screening of the nonredundant data base at the NCBI using the deduced amino acid sequence of human HNK-1 FIG. 2. Multiple amino acid sequence alignment of human D4ST-1, GalNAc-4-ST1, GalNAc-4-ST2, C4ST-1, C4ST-2, and HNK-1 ST. Alignment was performed using the ClustalW algorithm. Introduced gaps are shown as hyphens, and aligned amino acids are boxed (black for identical residues and dark gray for similar residues). Putative binding sites for the 5Ј-phosphosulfonate group (5Ј-PSB) and 3Ј-phosphate group (3Ј-PB) of PAPS and three additional highly conserved domains (III, IV, and V) are indicated.
ST (GenBank TM accession no. AF033827) identified a 177-kilobase pair genomic clone, GenBank TM accession number AC013356, that included a sequence related to HNK-1 ST (E value 1eϪ23). We identified an ORF with a length of 1131 bp encoded by a single predicted exon in the region displaying homology. This ORF was amplified from human brain cDNA and sequenced (Fig. 1A). 5Ј-and 3Ј-rapid amplification of cDNA ends was carried out to obtain a full-length sequence of D4ST-1 but only succeeded in elongation of the 3Ј-end (see "Experimental Procedures"). The amplified sequences contained the 3Ј-UTR with a polyadenylation signal and a poly(A) tail. Fig. 1A shows the 1960-bp cDNA that includes the 1131-bp ORF encoding a protein of 376 amino acid residues with two potential N-glycosylation sites and a calculated molecular mass of 43 kDa. The deduced protein, designated D4ST-1 (GenBank TM accession number AF401222) is a typical type II transmembrane protein with a transmembrane region close to the N terminus (see the Kyte-Doolittle hydrophobicity profile (29) in Fig. 1B).
Multiple alignment of the protein sequence of D4ST-1 with other members of the HNK-1 sulfotransferase family was performed using the ClustalW algorithm (Fig. 2). The alignment indicates that D4ST-1 is 27.3% identical to C4ST-1, 22.8% identical to C4ST-2, 24.7% identical to GalNAc-4-ST1, 21.0% identical to GalNAc-4-ST2, and 21.4% identical to HNK-1 ST (all of the protein sequences shown are of human origin). The regions with the highest degree of identity are the putative 5Ј-phosphosulfate binding site (5Ј-PSB), the putative 3Ј-phosphate binding site (3Ј-PB), and three regions of unknown function designated III, IV, and V that are carboxyl-terminal to the 3Ј-phosphate binding site (Fig. 2). Identical and similar amino acids are shaded if they occur at a specific position in at least five of the six sequences shown in the multiple sequence alignment in Fig. 2. While this emphasizes key features of a protein family, relationships between individual members are under-emphasized, especially in the highly variable N-terminal regions of this particular enzyme family. Pairwise alignments using the Smith-Waterman algorithm, however, confirmed  that D4ST-1 is more closely related to C4ST-1 (32.2% identical amino acids) and GalNAc-4-ST1 (31.1% identical amino acids) than to C4ST-2 (28.7% identical amino acids), HNK-1 ST (26.1% identical amino acids), and GalNAc-4-ST2 (24.6% identical amino acids). Identification of a Murine ORF Encoding the Mouse Ortholog of D4ST-1-The human D4ST-1 cDNA sequence was used to query the nonredundant data base at the NCBI. A mouse cDNA clone 2600016L03 (GenBank TM accession number AK011230) was identified, and the predicted ORF with the identical length as the ORF of the human counterpart of 1131 bp (nt 93-1223) was amplified by PCR from adult mouse lung and kidney cDNA (data not shown). The mouse ortholog of D4ST-1 is 92.7% identical to human D4ST-1 at the protein level (89% identical at the nucleotide level). In contrast to the coding regions, the 3Ј-UTRs of human and mouse D4ST-1 show a decreased level of identity (59%). The 92 nts upstream of the start codon of the mouse D4ST-1 transcript, probably a part of the 5Ј-UTR, were used as a query sequence to search the nonredundant and High Throughput Genomic Sequences data bases at the NCBI. Regions showing significant homology to this stretch could not be detected in human genomic DNA, although most of the sequences of chromosome 15 have been released and the complete sequence of the gap between the D4ST-1 and the KIAA0945 (see below) locus is known. Since coding regions are typically more highly conserved between orthologs than their UTRs, the absence of human genomic sequences homologous to the 5Ј-end of the mouse cDNA clone indicates that the query sequence represents the murine 5Ј-UTR and that we have cloned the complete human ORF.
Genomic Organization and Chromosome Localization of D4ST-1-The genomic BAC clone RP11-64K12 (GenBank TM accession number AC013356) initially identified as encoding D4ST-1 is annotated to be mapped on human chromosome 15q14. Comparison of the genomic sequence with the amplified D4ST-1 cDNA indicates that the ORF and the 3Ј-UTR of D4ST-1 are encoded by a single exon (Fig. 3). Only 2.96 kilobase pairs of sequence separate the 3Ј-UTR of the gene encoding the KIAA0945 protein (GenBank TM accession number AB023162) (30) from the start codon of the D4ST-1 gene. Downstream of the D4ST-1 locus, we identified a copy of the gene encoding the ribosomal protein L9 (GenBank TM accession number BC004156) that probably represents a pseudogene (31). An analysis of splice sites directly upstream of the start codon of D4ST-1 with a minimum score of 0.9 determined the nearest 3Ј/acceptor splice site to be 532 bp upstream (analysis performed with Splice Site Prediction as available at the Berkeley Drosophila Genome project Web site). The start codon of the 1131-bp D4ST-1 ORF in Fig. 1 is the first potential site of initiation downstream of the nearest potential splice site in the genomic sequence that is in the correct reading frame and preceded by a stop codon. We were not, however, able to amplify the predicted sequences upstream of the start codon from any human cDNA.
The genomic sequence was also examined for the presence of CpG islands as defined by Gardiner-Garden and Frommer (18). A CpG island was identified extending from 620 bp upstream to 880 bp downstream of the D4ST-1 start codon. Such CpG islands have been detected in 82% of analyzed genes showing widespread expression and are indicative of the presence of a promotor region (32,33). The high G ϩ C content of this region may have contributed to the difficulty of amplifying the complete cDNA.
The results summarized in Table I indicate that D4ST-1, C4ST-1, and C4ST-2 have distinct specificities for chondroitin and dermatan. Although D4ST-1 has a hydrophobic sequence near its amino terminus that is predicted to act as a transmem-brane domain, only 3% of the activity is retained by the transfected cells as compared with that released into the culture medium. The D4ST-1 released into the medium may reflect a proteolytic clip, as has been seen for a number of transferases including GalNAc-4-ST1 and GalNAc-4-ST2 (15,16,34,35); however, this possibility has not been formally addressed for D4ST-1. While D4ST-1 is able to transfer sulfate to both dermatan and chondroitin, the rate of transfer is nearly 10-fold greater to dermatan than to chondroitin when the same substrate concentrations are used (Table I). Like D4ST-1, C4ST-1 transfers sulfate to both chondroitin and dermatan; however, C4ST-1 transfers 5-fold more sulfate to chondroitin than to dermatan at the same substrate concentrations. While a large percentage of C4ST-1 activity is also found in the medium, roughly 25% of the sulfotransferase activity produced is retained by the transfected cells. C4ST-2 is also found predominantly in the medium following transfection. In contrast to either D4ST-1 or C4ST-1, C4ST-2 transfers sulfate to dermatan and chondroitin at similar rates; however, the rates are one-tenth of that obtained with either D4ST-1 or C4ST-1.
The relative specificities of D4ST-1, C4ST-1, and C4ST-2 for desulfated chondroitin and dermatan were further examined by comparing their rates of transfer at multiple concentrations of chondroitin and dermatan as shown in Fig. 4, A and B, respectively. D4ST-1 only displays a significant transfer of sulfate to chondroitin at relatively high substrate concentrations, raising the possibility that the chondroitin is contaminated with small amounts of dermatan and/or that occasional iduronic acid residues are present in the chondroitin and utilized by the D4ST-1. C4ST-1 transfers sulfate to (GlcUA␤1,3GalNAc␤1,4) n sequences (11,12). Since dermatan is known to contain GlcUA␤1,3GalNAc␤1,4 sequences interspersed with the IdoUA␣1,3GalNAc␤1,4 sequences (36), the transfer of sulfate to both chondroitin and dermatan by C4ST-1 is expected. However, the rate of transfer to dermatan from porcine intestine (90% IdoUA, 10% GlcUA) should be lower than to chondroitin due to the low content of GlcUA (Fig. 4B). C4ST-2 also transfers sulfate to both chondroitin and dermatan (12); however, it does not demonstrate a marked preference for chondroitin or dermatan and requires relatively high concentrations of acceptor to obtain significant levels of sulfate transfer (Fig. 4).
The 35 S-sulfated chondroitin and dermatan products were digested with chondroitinase AC I and chondroitinase B. Digestion of both the [ 35 S]SO 4 -chondroitin and [ 35 S]SO 4 -dermatan products obtained with C4ST-1 yielded a single peak that comigrated with D-glucono-4-enepyranoside ␤-1,3-GalNAc-4-SO 4 (⌬Di-4S) (Fig. 5, A and B). The location of the sulfate on the C-4 hydroxyl of the GalNAc was confirmed by the release of free sulfate upon digestion of this product with chondroitin-4-sulfatase (not shown). Chondroitinase AC I digestion of the [ 35 S]SO 4 -labeled chondroitin and dermatan products produced by D4ST-1 produced a minor peak (peak a) that comigrated with ⌬Di-4S and was sensitive to digestion with chondroitin-4-sulfatase and a major peak (peak b) that is more acidic when examined by HPLC on a MicroPak AX-5 column (Fig. 5, C and  D). The D4ST-1 products released by chondroitinase AC I digestion were isolated and analyzed by gel filtration. In both instances peak a comigrated with authentic disaccharide, and peak b comigrated with authentic tetrasaccharide on Superdex 30 (Fig. 6, A and B). is adjacent to an IdoUA. Further, since chondroitinase AC I cleaves the GalNAc␤1,4GlcUA but not GalNAc␤1,4IdoUA, the tetrasaccharide was predicted to arise from an [IdoUA-Gal-NAc] flanked by [GlcUA-GalNAc] as shown in Fig. 7 and to have the sequence ⌬UA␤1,3GalNAc␤1,4IdoUA␣1,3GalNAc. The structure of the tetrasaccharide and the location of the sulfate were established by the experiments shown in Fig. 8. The tetrasaccharide was treated with Hg(OAc) 2 to remove the unsaturated uronic acid (⌬UA). When analyzed on a MicroPak AX-5 column, the product eluted at an earlier time (Fig. 8A) due to the reduction in negative charge associated with loss of the unsaturated uronic acid. Following digestion with ␤-hexosaminidase, the sulfated product migrated just ahead of ⌬Di-4S (Fig. 8B) consistent with the structure IdoUA␣1, 3GalNAcol-4-SO 4 . Digestion with ␣-L-iduronidase generated a product that migrated at the position of SO 4 -4-GalNAc (Fig. 8C).
Digestion of the D4ST-1 [ 35 S]SO 4 -dermatan product with chondroitinase B yielded two [ 35 S]SO 4 -labeled products when analyzed by HPLC on MicroPak AX-5 column (Fig. 5F). The minor product, peak a, migrated at the location of ⌬Di-4S, whereas the major peak, peak c, was more acidic than the tetrasaccharide produced by digestion with chondroitinase AC I and migrated at the position of a hexasaccharide when examined on Superdex 30 (Fig. 9B). No product was obtained when the D4ST-1 [ 35 S]SO 4 -chondroitin product was digested with chondroitinase B (Fig. 5E). This suggests that the major product of chondroitinase B digestion is ⌬UA␤1,3GalNAc-4-SO 4 -␤1,4GlcUA␤1,3GalNAc␤1,4GlcUA␤1,3GalNAc as shown in Fig. 7. The absence of either ⌬Di-4S or this hexasaccharide in the chondroitinase B digestion product of [ 35 S]SO 4 -chondroitin produced by D4ST-1 indicates that the chondroitin is not contaminated with dermatan but does have an occasional IdoUA␣1,3GalNAc␤1,4 sequence that is a substrate of the D4ST-1. Digestion of the hexasaccharide with chondroitinase AC I converted the [ 35 S]SO 4 -hexasaccharide (Fig. 9A, peak b,  and Fig. 9B) to a [ 35 S]SO 4 -tetrasaccharide when analyzed by gel filtration on Superdex 30 (Fig. 9D). While less acidic than the hexasaccharide, the tetrasaccharide did not comigrate with the tetrasaccharide produced by chondroitinase AC I digestion (compare Fig. 9C with peak b of Fig. 5D). The difference in elution time reflects the difference in the location of the sulfate and the presence of GlcUA as opposed to IdoUA (See Fig. 7).
As was seen with the tetrasaccharide product of chondroitinase AC I digestion of the [ 35 S]SO 4 -dermatan, treatment of the [ 35 S]SO 4 -hexasaccharide produced by chondroitinase B digestion with Hg(Ac) 2 to remove the unsaturated uronic acid shifted the [ 35 S]SO 4 -labeled material (peak a) to a less acidic position (peak b) when examined by anion exchange on Mi-croPak AX-5 (Fig. 10A). The shift in elution time was similar to that seen upon removal of the unsaturated uronic acid from the tetrasaccharide released by chondroitinase AC I digestion (Fig.   6A). Digestion of the [ 35 S]SO 4 -labeled product obtained by removal of the unsaturated uronic acid (Fig. 10A) with ␤-hexosaminidase did not alter its elution time (Fig. 10B, peak d), indicating that the GalNAc was not released. In contrast, digestion with N-acetylgalactosamine-4-O-sulfatase (arylsulfatase B) resulted in the appearance of free [ 35 S]SO 4 (Fig. 10B, peak c) confirming the location of the sulfate on the GalNAc exposed by the removal of the unsaturated uronic acid.
The high yield of [ 35 S]SO 4 -labeled products from both the chondroitinase AC I and the chondroitinase B digestions indicates that the structure shown in Fig. 7 is the predominant sequence modified by D4ST-1 in the dermatan substrate. The origin of the ⌬Di-4-S, peak a, following digestion of the D4ST-1 product with chondroitinase AC I is not at present clear. The ratio of peak b to peak a was typically higher upon chondroitinase AC I digestion of the dermatan than chondroitin product; however, there was considerable variation in this ratio. The presence of ⌬Di-4S in the chondroitinase AC I digests may indicate that D4ST-1 can transfer sulfate to a GalNAc flanked by GlcUA on both sides; however, further analyses with more highly defined substrates are in progress to more fully address this issue.
Expression Pattern of D4ST-1-Northern blot and array analyses were used to determine the expression pattern for D4ST-1 in human tissues ( Fig. 11 and 12, respectively). A single 2.4-kilobase pair transcript was detected by Northern blot analysis in heart, placenta, liver, and pancreas with significantly lower levels in skeletal muscle and kidney (Fig. 11). Array analysis indicated that nearly every tissue represented expressed at least modest levels of the D4ST-1 message. The strongest signals were seen in pituitary (D3), placenta (B8), uterus (D8), thyroid (D9), fetal lung (G11), and the colorectal adenocarcinoma cell line SW480 (G10) (Fig. 12).  were released from dermatan by chondroitinase B indicating that they originated from IdoUA-GalNAc-4-SO 4 -IdoUA-Gal-NAc, the vast majority of IdoUA-GalNAc sequences that were sulfated by D4ST-1 in vitro were flanked on both sides by GlcUA-GalNAc (see Fig. 7). While such sequences have been described in dermatan (37,38), they represent minor components in the mature proteoglycan. The selective sulfation of IdoUA-GalNAc that is flanked by GlcUA-GalNAc suggests that sulfation of GalNAc in dermatan is immediately followed by the epimerization of GlcUA to IdoUA. Following sulfate addition to GalNAc, the glucuronyl C5-epimerase is no longer able to convert IdoUA back to GlcUA (39). This raises the possibility that chondroitin sequences are converted to dermatan by the processive action of the glucuronyl C5-epimerase and D4ST-1.
Previous studies by a number of investigators would support such a mechanism. For example, the amount of IdoUA in dermatan produced by cultured fibroblasts is enhanced by increasing the levels of sulfate in the culture medium (6). This supported the hypothesis that there is a relationship between epimerization and sulfation. Furthermore, the addition of PAPS to microsomes from fibroblasts enhances the in vitro epimerization of GlcUA to IdoUA (40,41), again suggesting that epimerization and sulfation are linked. Since the D-gluco configuration is favored over the L-ido configuration at equilibrium in the presence of the epimerase by a factor of 85 to 15 (39), selective sulfation of the GalNAc attached to the reducing terminus of the IdoUA by D4ST-1 would have the effect of shifting the equilibrium toward IdoUA, since it could not longer equilibrate back to GlcUA. Such a mechanism would also result in very efficient sulfation of IdoUA-GalNAc disaccharides in dermatan. If this is the case, the levels of PAPS, D4ST-1, and glucuronyl C5-epimerase would each have the potential to determine the extent to which GlcUA-GalNAc disaccharides are converted to IdoUA-GalNAc in a particular tissue or cultured cell.
Proteoglycans that bear dermatan sequences are widely expressed by a variety of different cells in the body (42). Thus, the nearly ubiquitous presence of message in tissues is expected. It is apparent from both the Northern blot and array analyses that the levels of D4ST-1 message expressed vary widely. Tissues such as heart, placenta, liver, pancreas, uterus, pituitary, and thyroid gland express much higher levels than other tissues. Since D4ST-1 and glucuronyl C5-epimerase may individually or collectively determine the IdoUA to GlcUA ratio for dermatan, it will be of interest to determine how expression of D4ST-1 is regulated and whether the levels of D4ST-1 expression correlate with the extent of GlcUA to IdoUA conversion on dermatan chains present on proteoglycans.
D4ST-1 is structurally related to the other members of the HNK-1 family of sulfotransferases. The highest percentage of sequence identity to other family members is seen in the regions hypothesized to bind the 5Ј-phosphosulfate and the 3Јphosphate of PAPS as well as the three regions of unknown function. The regions with the lowest percentage of identical amino acids among the HNK-1 sulfotransferase family include the stem region, the transmembrane domain, and the cytosolic domain. D4ST-1 has the highest percentage of identical amino acids when compared with C4ST-1 (27.3%) in the multiple sequence alignment. D4ST-1 and C4ST-1 transfer sulfate to the C-4 hydroxyl of GalNAc internal IdoUA-GalNAc and Glc-UA-GalNAc disaccharide sequences, respectively. The specificity of C4ST-2, which has 22.8% identical amino acid residues with D4ST-1 on the multiple sequence alignment, differs from that of either D4ST-1 or C4ST-1. The relatively high concentrations of desulfated dermatan and chondroitin that are required to detect activity raise the possibility that the native substrate has not yet been identified. More extensive characterization of the product may reveal if this is the case. C4ST-2 is, however, reported to be a GalNAc-4-sulfotransferase that transfers sulfate to internal UA-GalNAc sequences (12). In contrast to D4ST-1, C4ST-1, and C4ST-2, GalNAc-4-ST1 and GalNAc-4-ST2 both transfer sulfate to GalNAc␤1,4GlcNAc␤sequences found at the termini of N-and O-linked oligosaccharides (15,16). Since deletion of the stem region from both GalNAc-4-ST1 and GalNAc-4-ST2 converts these terminal sulfotransferases to forms that will transfer sulfate to internal GlcUA-GalNAc sequences as well as to terminal GalNA c␤1,4GlcNAc␤-sequences, the high percentage of identical amino acids, 24.7 and 21.0% respectively, is not surprising. HNK-1 ST has 21.4% identical amino acids yet transfers sulfate to the C-3 hydroxyl of terminal GlcUA␤1,3Gal␤1,4Glc-NAc␤- (13,14). The percentage of identical amino acids does not allow a prediction as to the specificity for internal versus terminal ␤1,4-linked GalNAc or terminal ␤1,3-linked GlcUA.
While D4ST-1 requires an IdoUA-GalNAc for transfer it also appears to be sensitive to the character of the flanking uronic acid moieties favoring surrounding sequences that contain Glc-UA-GalNAc over those with IdoUA-GalNAc. This suggests that the binding site may accommodate more than just the IdoUA-GalNAc. It will be of interest to determine the minimum features of the saccharide sequence that are required for recognition by the various members of the HNK-1 family of sulfotransferases and how these relate to the structural features of the sulfotransferases. D4ST-1 is located on human chromosome 15q14, while HNK-1 ST is on chromosome 2 (GenBank TM accession number AC012493), GalNAc-4-ST1 is on chromosome 19q13.1 (15), Gal-NAc-4-ST2 is on chromosome 18q11.2 (16), C4ST-1 is on chromosome 12q23 (12), and C4ST-2 is on chromosome 7p22 (12). D4ST-1, like C4ST-2 (12), appears to be encoded by a single exon. In contrast, GalNAc-4-ST1 and GalNAc-4-ST2 are encoded by multiple exons. The additional exons in GalNAc-4-ST1 and GalNAc-4-ST2 encode the cytosolic domain, transmembrane domain, and much of the stem region. The stem region appears to confer the specificity for terminal rather than internal GalNAc on GalNAc-4-ST1 and GalNAc-4-ST2 (15,16). While 5Ј-rapid amplification of cDNA ends and attempts to amplify predicted sequences did not yield the entire 5Ј-UTR for the D4ST-1 cDNA, there are a number of indications that the entire coding sequence for the ORF is contained in the cDNA sequence we have obtained. This conclusion is based on the following. 1) There is a hydrophobic sequence near the N terminus that is predicted to function as a transmembrane domain and would make D4ST-1 a type II transmembrane protein as has been seen for both saccharide specific sulfotransferases and for glycosyltransferases. 2) When expressed in CHO cells, D4ST-1 is appropriately directed to the lumen of the endoplasmic reticulum. 3) Despite the high percentage of sequence identity in the coding regions of mouse and human D4ST-1 and the lower but still significant percentage of sequence identity in the 3Ј-UTRs of the human and mouse sequences, searches of available data bases with the putative mouse 5Ј-UTR did not reveal any human target sequence with significant homology. The orthologous mouse and human ORFs are of identical size, and all of the encoded protein domains of D4ST-1 have been conserved. 4) There is a CpG island flanking the start codon of the D4ST-1 ORF we have described, indicative of a promotor region. This finding further suggests that D4ST-1 has a short 5Ј-UTR with a complex structure due to the high G ϩ C content of this region.
The HNK-1 family of sulfotransferases is proving to be extensive. While related structurally, the various members of this family display significant differences in their specificities, in the structures that are synthesized, in the regulation of their expression, and in their biological functions. D4ST-1 and C4ST-1 play key roles in the sulfation of dermatan and chondroitin, respectively. The role of C4ST-2 in sulfation of dermatan and chondroitin should become clear as a more detailed understanding of its specificity is developed. In contrast to D4ST-1, C4ST-1, and C4ST-2, HNK-1 ST, GalNAc-4-ST1, and GalNAc-4-ST2 are more highly restricted in their expression, and the structures they produce are only found on limited numbers of glycoproteins. HNK-1-bearing structures are involved in cell interactions in the nervous system (43,44), whereas N-linked oligosaccharides with terminal ␤1,4-linked GalNAc-4-SO 4 , the product of GalNAc-4-ST1, on glycoprotein hormones such as luteinizing hormone play a critical role in vivo by regulating its circulatory half-life (45)(46)(47). Future structure function studies will provide insights into the biology as well as the specificity of this novel family of sulfotransferases.