Purification and cDNA cloning of a human UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase.

A UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase (GalNAc-transferase) from human placenta was purified to apparent homogeneity using a synthetic acceptor peptide as affinity ligand. The purified GalNAc-transferase migrated as a single band with an approximate molecular weight of 52,000 by reducing sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Based on a partial amino acid sequence, the cDNA encoding the transferase was cloned and sequenced from a cDNA library of a human cancer cell line. The cDNA sequence has a 571-amino acid coding region indicating a protein of 64.7 kDa with a type II domain structure. The deduced protein sequence showed significant similarity to a recently cloned bovine polypeptide GalNAc-transferase (Homa, F. L., Hollanders, T., Lehman, D. J., Thomsen, D. R., and Elhammer, Å. P.(1993) J. Biol. Chem. 268, 12609-12616). A polymerase chain reaction construct was expressed in insect cells using a baculovirus vector. Northern analysis of eight human tissues differed clearly from that of the bovine GalNAc-transferase. Polymerase chain reaction cloning and sequencing of the human version of the bovine transferase are presented, and 98% similarity at the amino acid level was found. The data suggest that the purified human GalNAc-transferase is a novel member of a family of polypeptide GalNAc-transferases, and a nomenclature GalNAc-T1 and GalNAc-T2 is introduced to distinguish the members.

Mucin-type O-linked glycosylation is one of the dominant forms of glycosylation of glycoproteins. Mucin-type glycosylation is initiated by the addition of the monosaccharide Nacetylgalactosamine to the hydroxyl group of serine and threonine amino acids (GalNAc␣1-O-Ser/Thr). GalNAc O-glycosylation is found on a variety of glycoproteins but is more prominent on high molecular weight secretory glycoproteins such as mucins, where they may constitute up to 80% of the total mass. O-Linked glycosylation also appears to have a role in the conformation and protease resistance of "stilk regions" of membrane proteins necessary for correct exposure and accessibility of a functional domain of membrane-bound protein (Jentoft, 1990;Sadler, 1984;Lis and Sharon, 1993;Varki, 1993). Recently, O-glycans have been selectively implicated in carbohydrate-lectin cell adhesion phenomena (Springer, 1994).
One pertinent question has been whether one or several GalNAc-transferases are involved in mucin-type O-linked glycosylation initiation. The main evidence supporting multiple transferases has been the lack of activity toward serine substrates in purified transferase preparations (Wang et al., 1992;O'Connell et al., 1991). Elhammer et al. (1993) have, however, recently been able to identify serine transferase activity albeit apparently 35-fold lower than threonine transferase activity in purified bovine colostrum transferase using the peptide sequence PPASSSAPG. Furthermore, Wang et al. (1993) showed both activities in a purified porcine GalNAc-transferase preparation. Further evidence is the identification of an apparently unique GalNAc-transferase activity associated with fetal and tumor tissue responsible for the synthesis of the oncofetal fibronectin epitope defined by monoclonal antibodies FDC6 and 5C10 (Matsuura et al., 1988(Matsuura et al., , 1989). We have not been able to demonstrate GalNAc-transferase activity toward the reported fibronectin-derived acceptor peptide VTHPGY (Mandel et al., 1994). There is thus no clear evidence suggestive of more than one transferase to date.
Here we report the purification of a novel human placenta GalNAc-transferase using a defined synthetic acceptor peptide as affinity ligand. Based on partial amino acid sequences cDNA * This work was supported by the Danish Medical Research Council, the Lundbeck Foundation, and Ingeborg Roikjer's Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Materials
Human placenta were collected (6 -24 h postdelivery) after informed consent at the Herlev Hospital, Copenhagen, Denmark. UDP-GalNAc was purchased from Sigma, and UDP-[ 14 C]GalNAc (Amersham, United Kingdom) was mixed to final specific activities as indicated. Peptides were synthesized by CarlBiotech (Copenhagen, Denmark), and the purity and confirmation of sequence were obtained by HPLC and amino acid analysis. Muc2 was PTTTPISTTMVTPTPTPTC (Gum et al., 1989): hCG-␤ was PRFQDSSSSKAPPPSLPSPSRL (Birken et al., 1981). The cDNA library of human gastric cancer cell line MKN45 (Yamamoto et al., 1990) was prepared on a custom basis by Clontech. Oligonucleotides were synthesized on an Applied Biosystem synthesizer (model 380B) at the State Serum Institute (Copenhagen, Denmark). Polymerase chain reaction (PCR) was performed on a Perkin-Elmer thermocycler model 480 using Taq polymerase (Cetus). Reverse transcription PCR (RT-PCR) was performed using a RT-PCR kit (Perkin-Elmer), and all sequencing reactions used the Ampli-Taq sequencing kit (Perkin-Elmer). Restriction enzymes were purchased from Promega. All chemicals were of highest purity and purchased from Sigma. [␣-32 P]dCTP (3,000 Ci/ mmol), [␥-32 P]dATP (5,000 Ci/mmol), and UDP-[ 14 C]GalNAc were supplied by Amersham. Prep-A-gene kit was from Bio-Rad, pT7T3U19 was from Pharmacia Biotech Inc., T4 ligase was from Boehringer Mannheim, and polynucleotide kinase was from New England BioLabs. Sequence analysis was performed using DNASIS/PROSIS software (Hitachi).

Protein Determination
Protein concentrations were determined by the method of Bradford (Bio-Rad protein assay) using bovine serum albumin as standard.

Polypeptide GalNAc-transferase Assay
The GalNAc-transferase activity was determined in reaction mixtures containing 10 mM Tris buffer (pH 7.4), 5 mM MnCl 2 , 5 mM CDPcholine, 0.25% Triton X-100, 0.05 mM UDP-[ 14 C]GalNAc (4,000 cpm/ nmol) in a total volume of 100 l including enzyme. Product was routinely determined by scintillation counting after Dowex 1 formic acid cycle chromatography, but initially all transferase preparations past step 1 were monitored for proteolytic activity with the different acceptor substrate peptides by C-18 reverse phase chromatography (PC 3.2/3 or RPC C2/C18 SC 2.1/10, Pharmacia) and scintillation counting of peak fractions.
Placenta tissue stored frozen at Ϫ20°C was thawed 1-3 days at 4°C and homogenized in a Waring blender (Waring Products Division) in 3-fold deionized water twice with intermittent centrifugation at 10,000 ϫ g. The pellets were collected and extracted overnight in buffer A containing 1.5% Triton X-100. Supernatants obtained after centrifugation at 10,000 ϫ g for 1 h were processed as follows.
Step 1: Cibacron Blue 3GA Chromatography-The Triton X-100 extract was applied (100 ml/h) to a column of Cibacron blue 3GA-agarose (1.5 liter of extract/20 ml of resin) equilibrated in buffer A. Enzyme was eluted by one-step elution with 1.5 M KCl in buffer A. The eluted enzyme-containing fractions (200 ml) were pooled and dialyzed overnight against buffer B without detergent.
Step 2: DEAE Chromatography-The dialyzed enzyme fractions from step 1 were applied (200 ml/h) to a column of DEAE-Sephacel (200 ml) equilibrated in buffer B. Most of the enzyme activity passed unretarded through the column where it was collected and the pH adjusted to 6.5 with BisTris.
Step 3: S-Sepharose Chromatography-The unretarded and pH-adjusted fractions from step 2 were applied (50 ml/h) to a column of S-Sepharose (20 ml) equilibrated in buffer A. Following wash with buffer A the enzyme was eluted by one-step elution with buffer A containing 750 mM NaCl. The eluted enzyme (50 ml) was diluted 1:4 in buffer C.
Step 4: Muc2 Affinity Chromatography-The acceptor-substrate affinity chromatography was modified from Sugiura et al. (1982) and Elhammer and Kornfeld (1986). The diluted enzyme eluate from step 3 was applied (20 ml/h) to a column of Muc2 peptide-Sepharose (20 ml) equilibrated in buffer D. The affinity ligand was prepared by coupling the Muc2 peptide (200 mg) to cyanogen bromide-activated Sepharose 4B (20 ml of swelled gel) according to the manufacturer's description (Pharmacia). After washing the column in 50 ml of buffer D elution was carried out in buffer E, and fractions containing enzyme (20 ml) were dialyzed-concentrated 10-fold against buffer E containing 2 mM EDTA.
Step 5: Mono S Ion Exchange Chromatography-The dialyzed and concentrated eluate from step 4 was diluted 1:4 in BisTris (pH 6.2) and applied to a Mono S PC 1.6/5 column (SMART System, Pharmacia) equilibrated in buffer F at a flow rate of 100 l/min. Washing in buffer F for 5-10 min after complete application resulted in the elution of sufficient Triton X-100 so that UV adsorbance at 280 and 214 nm was measurable. Elution was carried out in buffer F containing 0.5 M NaCl in a 20-min gradient. Fractions (50 l) were collected and assayed for activity and evaluated by SDS-PAGE (0.5 l/fraction) (Phast System, Pharmacia).
Step 6: S-12 Superose Chromatography-For analytical purposes pooled fractions from the Mono S peak enzyme fractions were used for protein sequencing either directly or after SDS-PAGE blotting. Peak fractions (5-10 l) from step 5 were applied directly to an S-12 PC 3.2/30 Superose column (SMART System), equilibrated, and run in buffer A without detergent. Fractions (50 l) were collected and assayed for activity as well as evaluated by SDS-PAGE (10 l/fraction after acetone precipitation).

Protein Sequence Analysis
Protein was hydrolyzed at 110°C for 24 h or 74 h in 6 N HCl under vacuum and applied to an amino acid analyzer (Hitachi L-8500). The amino-terminal sequence of the protein was obtained directly from the Mono S-purified preparation (step 5) and from Coomassie Blue-stained protein from this same step on a PVDF blot. Approximately 10 g of GalNAc-transferase was reduced and pyridylethylated (Rü egg and Rudinger, 1977;Friedman et al., 1970) and purified further by gel permeation chromatography on two SynChropak GPC G300 columns (4.6 ϫ 250 mm each) in 10 mM sodium phosphate (pH 6.0) containing 6 M guanidine hydrochloride. Reduced and pyridylethylated protein was digested with Achromobacter lyticus protease I, a lysyl bond-specific endopeptidase (Aebersold et al., 1987) in 50 mM Tris-HCl (pH 9.0) containing 2 M urea at 37°C for 15 h. Alternatively, protein blotted on nitrocellulose was digested (Masaki et al., 1981) with A. lyticus protease I. Peptides generated were separated by reverse phase HPLC on an RP300 column (2.1 ϫ 100 mm, Perkin-Elmer) in 0.09% trifluoroacetic acid at a flow rate of 200 l/min with a linear gradient of 0 -60% acetonitrile in 20 min.

Preparation of PCR Probe
Based on the combined amino-terminal sequence (see Table II) the following degenerate oligonucleotide primers were synthesized (I indicates inosine). EBHC23: Total RNA from the gastric tumor cell line MKN45 was extracted by standard procedures (Chomczynski and Sacchi, 1987), and polyadenylated RNA was purified using a Promega poly(A) tract mRNA purification kit. cDNA was synthesized using EBHC25 and Moloney murine leukemia virus-reverse transcriptase. PCR was performed on 5 l of the reverse transcriptase reaction in a 25-l solution containing 0.5 M EBHC25 and EBHC24, 2 mM MgCl 2 , 50 mM KCl, 125 M dNTP, and 2.5 units of Taq polymerase (Cetus) and subjected to the following thermocycle conditions: 95°C, 45 s; 43°C, 5 s; 72°C, 15 s, for 40 cycles. Generated products were digested with EcoRI and ligated into the EcoRI site of pT7T3U19 (Pharmacia). Competent Escherichia coli cells, SURE (Stratagene), were transformed with the ligated constructs and plated onto LB AMP TET inositol 1-thio-␤-D-galactopyranoside 5-bromo-4-chloro-3-indolyl ␤-D-galactoside plates for colorimetric selection. Isolated white colonies were selected and sequenced. Plasmid from one clone, TEB1, was selected and used as probe.

Cloning of the Human GalNAc-transferase (GalNAc-T2)
A gt10 DNA library was constructed from human gastric cancer cell line MKN45 (kindly donated by Dr. T. Suzuki) poly(A) ϩ RNA using size-selected random primed cDNA. Approximately 0.75 million plaques were screened with the TEB1 probe. Eleven out of initially 40 positive clones were plaque purified and subcloned into pT7T3U19 plasmid vector for sequencing.

Expression of GalNAc-T2 in Sf9 Cells
Based on the cDNA sequences obtained, the soluble form of Gal-NAc-T2 was RT-PCR cloned using primers EBHC75 (5Ј-TCGAATTC-AAAAAGAAAGACCTTCATCAC-3Ј) and EBHC68 (5Ј-TCGAATTCCT-ACTGCTGCAGGTTGAGC-3Ј. The amino-terminal end of the expressed protein would include the three lysine amino acids (amino acids 52-54) determined by the amino-terminal sequencing of the purified transferase (see Fig. 3). The PCR product was cloned into the expression vector pAcGP67 (Pharmingen). The expression construct was sequenced to verify the sequence and correct insertion into the cloning site. Cotransfection of Sf9 cells with pAcGP67-GalNAc-T2-sol and Baculo-Gold DNA was performed according to the manufacturer's description (Pharmingen).

PCR Cloning of Human Version of GalNAc-T1
Based on the reported sequence of bovine GalNAc-T1 (Homa et al., 1993) PCR primer pairs EBHC104 (nucleotide positions Ϫ2 to ϩ20)/ EBHC101 (positions 1237 to 57), and EBHC100 (positions 946 to 65)/ EBHC107 (positions 1661 to 1681) were used to amplify two overlapping fragments covering the entire coding region. Amplification was performed on 1 g of MKN45 cDNA using the following conditions: 95°C, 45 s; 53°C, 1 min; 72°C, 2 min, 35 cycles. Products were cloned into pT7T3U19 and sequenced in both directions using primers spaced at 200 -300 bp.

DNA Sequencing
All sequencing reactions were performed by the dideoxy chain termination method using Ampli-Taq sequencing kit and ␥-32 P-labeled kinased sequencing primers. Double-stranded DNA sequencing was performed using oligonucleotide sequencing primers spaced approximately 300 bp apart.

Purification to Homogeneity of a Polypeptide GalNAc-transferase from Human Placenta
Step 1: Table I summarizes the purification of the GalNActransferase from 2.5 kg of human placenta tissue. The GalNActransferase was solubilized in 1.5% Triton X-100 yielding a specific activity of 0.00052 unit/mg using the Muc2 peptide substrate. Little or no activity was found in the first water extraction, and the specific activity of the crude detergent homogenate was approximately 0.00002-0.00010 unit/mg. The degree of measurement of activity in crude homogenates was quite inconsistent, however, and the activity was highly labile. The purification is thus calculated from the soluble Triton X-100 extract excluding a preceding approximately 5-fold factor.
Step 2: Cibacron Blue Chromatography-This step was found to be sufficient for further affinity chromatography using the peptide acceptor substrate. Although this step only yielded 3.3-fold purification, it was a simple, fast, and highly concentrating step. Approximately 50% of the transferase activity measured with the Muc2 peptide bound to the column. Attempts to show differences in substrate specificity between detergent extract and the pass-through and eluate of the Cibacron column were unsuccessful with the different synthetic acceptor peptide substrates (not shown). Attempts to improve the purification by gradient NaCl elution resulted in a significantly lower yield, leaving one-step elution with 1.5 M KCl optimal.
Step 3: S-Sepharose-This step was employed to reduce the volume necessary for the subsequent affinity chromatography in the presence of UDP, but it was not necessary for binding. In separate experiments the S-Sepharose was also used as a detergent exchange method replacing Triton X-100 with octyl glucoside for running the Muc2 peptide affinity column in octyl glucoside rather than Triton X-100. Detergent exchange was necessary to purify the apparent membrane-bound form of the transferase further as described later.
Step 4: Muc2 Peptide Acceptor Substrate Chromatography-The peptide column was run successfully using peptide coupled to TNB-thiol Sepharose (Pierce) to the carboxyl-terminal cysteine residue as well as peptide coupled to cyanogen bromideactivated Sepharose 6B (Pharmacia). The purification reported in Table I utilized cyanogen bromide-coupled peptide run in Triton X-100. The peptide column only bound a minor fraction of the total transferase activity measured by the Muc2 peptide substrate (approximately 5-10%), but re-running the column eluate yielded essentially no further binding.
Further binding to the Muc2 column could be achieved when a detergent exchange from Triton X-100 to n-octyl glycoside was performed. Detergent exchange was performed on S-Sepharose of the pass-through of the first Muc2 chromatog- a One unit of enzyme is defined as the amount of enzyme that will transfer 1 mol of GalNAc in 1 min using the standard reaction mixture as described under ''Experimental Procedures'' with 25 g of Muc2 peptide as acceptor substrate.
b Approximately 5-fold purification reached in the Triton X-100 extract, which is not included in the calculations because of variability in assaying the crude transferase homogenate. raphy (step 4), and this resulted in more than 50% of the residual transferase activity binding another Muc2 peptide column run in the n-octyl glucoside detergent. Importantly, however, the enzyme eluted in n-octyl glucoside was not active when the detergent was removed in contrast to the enzyme activity eluted from the Triton X-100 Muc2 chromatography. Thus, n-octyl glycoside-purified enzyme could only be further purified on Mono S chromatography in the presence of detergent. Omission of detergent at this step resulted in a complete loss of enzyme activity in eluted fractions. Furthermore, S-12 gel filtration of the enzyme activity eluted from the Mono S column run in n-octyl glucoside resulted in a spread of activity throughout the eluate, suggesting tight interaction between protein and detergent (not shown). The purification and protein analyses reported were all performed on the enzyme purified by Muc2 chromatography run in Triton X-100 even though this constituted a minor fraction.
Step 5. Mono S Cation Exchange-The eluate of the first affinity chromatography run in Triton X-100 with a specific activity of 0.126 unit/mg was essentially pure, as estimated by SDS-PAGE, but further concentration and purification as well as evaluation of purity were achieved by Mono S chromatography (Fig. 1). Essentially one major band of approximately 52,000 by SDS-PAGE under reducing conditions was obtained, which coincided with the transferase activity. It is important to note that although the affinity chromatography was run in Triton X-100, the Mono S chromatography was run without detergent without loss of transferase activity. Invariable amounts of a small peptide of approximately 10,000 were found to elute at higher ionic strength (see Fig. 1B), and this was identified as lysosyme by amino-terminal amino acid sequencing (not shown). The peak fractions 17 and 18 (50-l fractions) (Fig. 1) were pooled and subjected to protein sequence analysis. Protein was analyzed either directly from the pooled fractions or after SDS-PAGE blotting to PVDF membrane.

S-12 Gel Filtration Chromatography
Analytical assessment of the purity of the GalNAc-transferase was obtained by S-12 gel filtration chromatography run in buffer without detergent. Ten l, containing approximately 1 g of protein with a specific activity of 0.7 unit/mg from the pooled peak fractions of the Mono S chromatography were applied. One major peak of approximately 50,000 -55,000 coinciding with the transferase activity was obtained, and SDS-PAGE of the peak fraction also gave the 52-kDa band by SDS-PAGE (Fig. 2). Careful measurement of the A 214 nm peak compared with a standard of bovine serum albumin by integration showed a calculated specific activity of 0.76 unit/mg, identical to that of the Mono S fractions. The ability to run both the Mono S as well as the S-12 chromatography without detergent and obtaining almost quantitative yield of transferase activity as well as one distinct peak of protein and activity strongly indicated that the transferase was in a soluble form. The residual transferase activity obtained by detergent exchange to n-octyl glucoside is believed to be the membrane form requiring detergent for solubility.

Properties of the Purified GalNAc-transferase
In the accompanying paper (Sørensen et al., 1995) the kinetic parameters of the soluble transferase with different peptide substrates as well as structural characterization of products are presented. Importantly, the pure transferase was capable of glycosylating the Muc2 peptide containing nine threonine sites as well as the hCG peptide containing exclusively serine sites (Fig. 2). Table II summarizes the amino acid sequences of the purified transferase. The amino-terminal region of the GalNActransferase was obtained by direct sequencing of the pooled peak of the Mono S chromatography (Fig. 1A). Furthermore, sequencing was performed directly from the PVDF-blotted FIG. 1. Panel A, NaCl gradient elution of GalNAc-transferase from Mono S cation exchange column (step 5). The eluate from the Muc2 peptide column (step 4), diluted and pH-adjusted, was applied to a cation exchange (Mono S PC 1.6/5) column. The column was equilibrated in buffer F and washed with 1 ml of the same, followed by elution with a gradient of 0 -1 M NaCl over 30 min at a flow rate of 50 l/min. Elution was monitored at A 214 nm (and 280 nm, not shown), and fractions of 50 l were assayed for transferase activity using the Muc2 peptide substrate. Fractions are 1 min, and fraction 14 starts at 79.4 min. Panel B, SDS-PAGE of fractions 14 -20 stained by Coomassie Blue. Prestained molecular weight markers (std) are shown with molecular weights indicated in the margin (phosphorylase b, 106,000; serum albumin, 80,000; ovalbumin, 49,500; carbonic anhydrase, 32,500; trypsin inhibitor, 27,500; lysozyme, 18,500). Fractions 17 and 18 contained the peak of the transferase activity and one major protein band with an apparent molecular weight of 52,000. 52,000 band derived from Mono S chromatography, yielding essentially the same sequence. However, the major sequence (approximately 60%) started from EEK with a minor sequence (approximately 40%) starting from SNG differing 10 and 7 residues, respectively (Table II). This difference most likely reflects differences in proteolytic degradation induced by batch variation caused by varying tissue thawing time before extraction in order to increase the proportion of soluble GalNActransferase. Approximately 25 peaks were separated from the A. lyticus protease I digest by reverse phase HPLC, and selected pools were subjected to automated Edman degradation. As seen in Table II, 14 sequences were interpreted, although many of these included some unidentified or weakly assigned residues. Note that a putative N-glycosylation site in internal peptide K12 was revealed by amino acid sequencing, suggesting that this site may not be glycosylated.

Cloning of the Purified Human GalNAc-transferase
The predicted amino-terminal sequence obtained by combining overlapping peptide sequences was used for the design of sense and antisense oligonucleotide primers. The PCR-generated clone TEB1 was sequenced and found to correspond to the full amino-terminal amino acid sequence of the GalNAc-transferase. The TEB1 fragment was used to screen a random primed human cancer cell line cDNA library from which 40 positive clones were obtained from 0.75 million plaques screened. Eleven of these were subcloned and sequenced.
Two clones were selected for complete sequence analysis (clones 2782 and 5551). Clone 2782 contained 2.5 kb including 3Ј-and 5Ј-untranslated sequences (approximately 900 and 50 bp, respectively). Clone 5551 contained 2.3 kb including the 5Ј end of the coding region but was 100 bp short of the 3Ј end. The combined nucleotide sequence of the GalNAc-transferase cDNA clones is shown in Fig. 3. Overlapping parts of the two isolated clones were identical in sequence apart from one nucleotide "insertion" in the 3Ј end of clone 2782 (position 1534, Fig. 3). Since internal peptide sequences apparently were located out of frame past the 3Ј termination introduced by the insertion in clone 2782, the purified GalNAc-transferase was longer than predicted by this clone. The apparent insertion in clone 2782 was concluded to be an artifact in the library as clone 5551 lacked this insertion, and more importantly, RT-PCR of this area using a variety of total RNA sources including the MKN45 RNA only yielded the sequence identified in clone 5551 (not shown). Furthermore, genomic sequencing of cloned PCR products covering this position only yielded the sequence found in clone 5551. The remaining nine cDNA clones selected were partially sequenced, and all were found to contain 5Ј intron sequences and were limited to coding sequences in the 5Ј-TEB1 probe area.

Primary Structure of the Human GalNAc-transferase (GalNAc-T2)
The GalNAc-T2 coding sequence has 1713 bp and codes for a protein of 571 amino acids (64,729 Da). Initiation is presumed to occur at the ATG codon at position 1 as an adenosine at position Ϫ3 fulfills the requirement for an initiation codon.
GalNAc-T2 is predicted to be a type II transmembrane protein with a strongly hydrophobic domain (amino acids 7-24) and not flanked by charged amino acids (Prosis software, Hitachi).
As shown in Fig. 3 all of the peptide sequences obtained (Table II) with minor discrepancies at weakly assigned amino acid residues are accounted for in the coding sequence. In   FIG. 2. S-12 gel filtration chromatography of the GalNActransferase. Ten l of the peak fraction from a Mono S chromatography containing approximately 1 g of protein with a specific activity of 0.76 unit/mg was applied to an S-12 gel filtration column (S-12 3.2/30) and run in phosphate-buffered saline at 40 l/min. Elution was monitored at A 214 nm (and 280 nm, not shown). Fractions (100 l) were collected and assayed for transferase activity using the Muc2 substrate as well as the hCG-␤ peptide. SDS-PAGE chromatography of fractions revealed a faint band with apparent molecular weight of 52,000 corresponding to the peak transferase activity (not shown). agreement with the proposed soluble nature of the isolated transferase protein, the amino-terminal sequence obtained from this protein is found at amino acid position 51 carboxylterminal to the putative transmembrane-anchoring domain. The predicted amino acid sequence of the human GalNAc-T2 has one consensus sequence (-Asn-Xaa-Ser/Thr-) for N-glycosylation (amino acid 516). N-Glycanase digestion of the soluble transferase did not alter SDS-PAGE mobility (not shown), and as the amino acid sequence of one of the A. lyticus protease I-released peptides included this site and gave positive identification of asparagine, it may be suggested that this site is not utilized or is only partially utilized. The molecular mass of the deduced amino acid sequence of the soluble transferase is 59,205 Da, which is slightly higher than the experimentally determined molecular weight but within the limits of accuracy of the SDS-PAGE and gel filtration systems used.

Expression of GalNAc-transferase (GalNAc-T2)
Expression of the pAcGP67-GalNAc-T2-sol construct resulted in an almost 50-fold increase in GalNAc-transferase activity using the Muc2 substrate-peptide in the culture medium of infected cells compared with uninfected controls as well as cells infected with the blood group A glycosyltransferase (Bennett et al., 1995) (Table III). Similarly, a 50-fold increase was observed using the hCG peptide. Background levels in uninfected cell medium were always higher than in control infected cell medium, probably as a result of endogenous Sf9 produced GalNAc-transferase activity being higher due to a larger number of cells.

Distant Homology with Bovine GalNAc-transferase (GalNAc-T1)
Shown in Fig. 4 is an amino acid sequence comparison of GalNAc-T1 and -T2 (DNASIS, Hitachi) (Higgins and Sharp, 1988). The overall similarity is 44%. The amino-terminal sequences of the two enzymes show no similarity, whereas more significant similarity is found in the central area of the putative catalytic domains. The size of the two enzymes is quite similar at the amino acid level. The bovine GalNAc-transferase cDNA indicates a 560-amino acid coding region, and the amino terminus of the purified soluble enzyme was found at amino acid 41, yielding a soluble protein of 520 amino acids. The human GalNAc-transferase described here is 11 amino acids total longer, and the amino terminus of the soluble enzyme was found at amino acid 52 thus yielding a soluble protein of 520 amino acids. The discrepancy in reported molecular weight of the bovine colostrum transferase (approximately 70,000) and the human transferase reported here (approximately 52,000) must be due partly to differences in glycosylation. N-Glycanase treatment of the soluble bovine colostrum transferase reduced the SDS-PAGE molecular weight to approximately 64,000 (predicted molecular mass of 59,358 Da), suggesting that some of the three N-linked consensus glycosylation sites are utilized in contrast to the human placenta transferase which appears not to be N-glycosylated. Fig. 4 also shows that both transferases have an abundance of cysteine residues (GalNAc-T2 has 13; bovine and human GalNAc-T1 has 16) and that 12 of these are aligned (Homology plot, Clustal). Some of these aligned cysteines are positioned in highly similar regions, but others are in areas of more divergent sequences. The apparent conserved number and spacing of cysteines indicate that the GalNActransferases are highly disulfide-bonded proteins maintaining a similar tertiary structure. Interestingly, Wang et al. (1992) noted that a reducing agent increased the transferase activity of porcine submaxillary gland GalNAc-transferase, and we observed the same for the human placenta transferase. Drickamer (1993) originally observed a similar conserved pattern of cysteine residues in sialyltransferases with generally limited similarity (Livingston and Paulson, 1993). It therefore seems likely that the two genes (T2 and T1) possess homologous domains originating from a common ancestral gene.

PCR Cloning of the Human Version of GalNAc-T1
To verify that the purified and cloned GalNAc-T2 was distinct from the previously cloned bovine transferase, we have cloned the human equivalent of the bovine enzyme. Human GalNAc-T1 was PCR amplified and sequenced as two overlapping fragments. Since the PCR primers utilized were placed in the coding region ends (positions 5Ј, Ϫ2 to 20; and 3Ј, 1661 to 1681), the 15-20 bp at the initiation and termination codons are unknown. The DNA sequence of the residual major part of the gene was found to be 95% similar to that of the bovine sequence, giving rise to only six amino acid substitutions (99%) (Fig. 4).

Northern Analysis
Northern analysis of the human gastric cancer cell line MKN45 using a 2782 PvuII/HindIII fragment (TEB2) as a probe revealed a discrete band of 4.5 kb as well as multiple lower bands of 2-3 kb in the poly(A) ϩ mRNA lane (Fig. 5). It is unknown which represents the mature mRNA. No hybridization was observed with total RNA, and this is most likely due to lower representation of target RNA.
Analysis of a human multiple tissue blot from Clontech showed hybridization to a 4.5-kb mRNA in all tissues (Fig. 6). Similar to the MKN45 cell line, several of the tissues also expressed the smaller size 2-3-kb mRNAs. The same Northern blot from Clontech was analyzed by Homa et al. (1993) using the bovine GalNAc-transferase-T1. 4.2-kb mRNA was detected using the GalNAc-T1 probe in all tissues but kidney, and the level of expression differed significantly from that found in Fig. 6. DISCUSSION Mucin-type GalNAc-transferase activity has previously been purified to apparent homogeneity from bovine colostrum and porcine submaxillary glands using either apomucin acceptor substrate chromatography or 5-mercury-UDP-GalNAc donor substrate chromatography as the principal affinity purification step (Elhammer and Kornfeld, 1986;Wang et al., 1992;O'Connell et al., 1991). Here we report the purification of a unique GalNAc-transferase from human placenta using an acceptor substrate chromatography that is based on a defined synthetic peptide derived from the human intestinal mucin Muc2 (Gum et al., 1989). The rationale for selecting this strategy was a hypothesis that multiple GalNAc-transferases would exist. Initially, we attempted to use acceptor peptides with few threonine or serine sites to limit the number of different substrate sites, but to date only the Muc2 sequence with multiple threonine residues yielded significant purification. As shown in the following paper, our affinity chromatography apparently resulted in the separation of at least two distinct GalNActransferase activities (Sørensen et al., 1995). The identity of the isolated putative GalNAc-transferase cDNA was established by functional expression using the baculovirus system (Table III). The cloned GalNAc-transferase is predicted to be a type II transmembrane protein in concordance with all other cloned mammalian glycosyltransferases (Paulson and Colley, 1989;Kleene and Berger, 1993). The purified enzyme using Triton X-100 in the affinity chromatography steps was found to be soluble by gel filtration analysis, and this was confirmed by comparing the amino-terminal amino acid sequence of the purified protein with the coding region predicted from the cloned cDNA (Fig. 3). Purification of other glycosyltransferases using the detergent Triton X-100 in the affinity chromatography step has also resulted in selective isolation of the soluble forms of the enzymes (Weinstein et al., 1987;Clausen et al., 1990;Sarkar et al., 1991). An exception to this is the purification of the Gal␤1-3GalNAc ␣2-3-sialyltransferase, where both the membrane and soluble forms were isolated (Gillespie et al., 1992). The soluble forms of glycosyltransferases like GalNAc-T2 appear to be in the monomer form. Recently, the apparent membrane form of the ␤1-4-galactosyltransferase was purified and found to be a high molecular weight, probably multimeric, complex (Bendiak et al., 1993).
We found here that a detergent exchange prior to the final affinity chromatography (step 4) to n-octyl glucoside apparently resulted in purification of the membrane-bound form of the GalNAc-transferase. Initially, we purified the placenta transferase by this method because significantly more activity was recovered. However, validation of the purification was impaired by difficulties with further purification steps including ion exchange and/or gel filtration. Fractions isolated by SDS-PAGE withg a molecular weight of approximately 60,000 were found to be amino-terminally blocked for sequencing, and gel filtration failed to yield distinct peaks (not shown). Because of the possibility of multiple copurified GalNAc-transferases we chose to focus on the soluble enzyme. Leaving placenta tissue thawing for 3-4 days at 4°C prior to extraction appeared to increase the relative amount of soluble enzyme.
Most of the glycosyltransferases characterized to date have been found to be N-linked glycoproteins (Kleene and Berger, 1993), although the ␤1-2-N-acetylglucosaminyltransferases I and II appear to be exceptions (Sarkar et al., 1991;Kumar et al., 1990;Kleene and Berger, 1993). GalNAc-T1 has three N-linked glycosylation consensus sites (human GalNAc-T1 has four), and some of these appear to be utilized (Homa et al., 1993). GalNAc-T2 has one consensus site, but both N-glycanase digestion (not shown) and amino acid sequencing (Table II) indicate that the soluble enzyme lacks N-linked glycosylation, or at least that the site is only partially utilized.
The isolated and cloned human GalNAc-transferase is predicted to be a novel member of a family of polypeptide GalNActransferases. The similarity between the human and bovine GalNAc-transferases (44% at the amino acid level) is close to that found for the different members of the sialyltransferase family (Wen et al., 1992). Species similarities of glycosyltransferases at the amino acid level have generally been found to be within 95-98% (Kleene and Berger, 1993), and here we show that the human counterpart of the bovine GalNAc-transferase (Homa et al., 1993) was 99% similar at the amino acid level (Fig. 4). We therefore suggest that a nomenclature be introduced to identify the GalNAc-transferases: GalNAc-T1 and GalNAc-T2, where the latter is the novel human transferase cloned in the present paper. Comparison of the two amino acid sequences shows two to three regions of high similarity (Ն80%) which could be targets for a PCR cloning strategy to identify potentially additional members of this family of transferases. Such an approach has been applied with success to the sialyltransferase family comprising to date six members with a highly conserved 55-amino acid segment in the putative catalytic domain of the transferases (Livingston and Paulson, 1993). Our preliminary results indicate that this strategy will be successful for GalNAc-transferases as well.
Homology among different glycosyltransferases is generally very limited with the exception of members of the ␣1-3/4 fucosyltransferases (FUT 3, 5, 6) (Weston et al., 1992) and the human blood group A/B with the ␣1-3 galactosyltransferase (Joziasse et al., 1991). Drickamer (1993) showed that despite lacking homology among three members of the sialyltransferase family, cysteine residues were conserved. The present findings that 12 of 13 cysteine residues in GalNAc-T2 align with GalNAc-T1 and that these are distributed throughout the proteins suggest that the enzymes have a similar overall structure. Potential disulfide bonding is expected to occur intramolecularly as evidenced by the molecular weight of GalNAc-T2 estimated by gel filtration (Fig. 2). Interestingly, GalNActransferase activity appears to be increased 2-3-fold in the presence of reducing agents (Wang et al., 1992), possibly suggesting an advantage of some "opening" of the predicted globular catalytic domain for substrate accessibility and catalytic activity at least as measured in vitro using peptide substrates.
The finding that multiple GalNAc-transferases are involved in O-glycosylation is important for the understanding of this predominant post-translational modification. Detailed studies of the fine specificity of the individual transferase members are necessary for defining the role of each enzyme, and a preliminary study on this presented in the accompanying paper (Sørensen et al., 1995) clearly shows that distinct differences in substrate specificity may be expected. Here we show that Northern blotting with our human cDNA sequence (Fig. 5) revealed a slightly larger mRNA and a different organ distribution than that published by Homa et al. (1993) for GalNAc-T1, although most organs express both transcripts. Differential cell/organ expression of different GalNAc-transferases may result in different GalNAc O-glycosylation processing between cells and species. This could have a significant impact on interpretations of the peptide specificity of O-glycosylation inferred from identified in vivo glycosylation (O'Connell et al., 1991;Elhammer et al., 1993) and may eventually result in the identification of more defined consensus sequences of the acceptor peptide sites of individual transferases.
In conclusion, the present data provide evidence that mucintype GalNAc O-glycosylation is controlled by at least two distinctly different GalNAc-transferases that are expressed differentially in cells and organs. Further studies of the substrate specificity of these and potentially additional enzymes are required, but it may be anticipated that this understanding will be significant for our insight into O-glycosylation processing and of practical use, for example, for designing appropriate mammalian expression systems for recombinant glycoproteins in drug use.