Cloning of a Human UDP-N-Acetyl-α-d-Galactosamine:PolypeptideN-Acetylgalactosaminyltransferase That Complements Other GalNAc-Transferases in Complete O-Glycosylation of the MUC1 Tandem Repeat*

A fourth human UDP-GalNAc:polypeptideN-acetylgalactosaminyltransferase, designated GalNAc-T4, was cloned and expressed. The genomic organization of GalNAc-T4 is distinct from GalNAc-T1, -T2, and -T3, which contain multiple coding exons, in that the coding region is contained in a single exon. GalNAc-T4 was placed at human chromosome 12q21.3-q22 by in situ hybridization and linkage analysis. GalNAc-T4 expressed in Sf9 cells or in a stably transfected Chinese hamster ovary cell line exhibited a unique acceptor substrate specificity. GalNAc-T4 transferred GalNAc to two sites in the MUC1 tandem repeat sequence (Ser in GVTSA and Thr in PDTR) using a 24-mer glycopeptide with GalNAc residues attached at sites utilized by GalNAc-T1, -T2, and -T3 (TAPPAHGVTSAPDTRPAPGSTAPPA, GalNAc attachment sites underlined). Furthermore, GalNAc-T4 showed the best kinetic properties with an O-glycosylation site in the P-selectin glycoprotein ligand-1 molecule. Northern analysis of human organs revealed a wide expression pattern. Immunohistology with a monoclonal antibody showed the expected Golgi-like localization in salivary glands. A single base polymorphism, G1516A (Val to Ile), was identified (allele frequency 34%). The function of GalNAc-T4 complements other GalNAc-transferases in O-glycosylation of MUC1 showing that glycosylation of MUC1 is a highly ordered process and changes in the repertoire or topology of GalNAc-transferases will result in altered pattern of O-glycan attachments.

A fourth human UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase, designated GalNAc-T4, was cloned and expressed. The genomic organization of Gal-NAc-T4 is distinct from GalNAc-T1, -T2, and -T3, which contain multiple coding exons, in that the coding region is contained in a single exon. GalNAc-T4 was placed at human chromosome 12q21.3-q22 by in situ hybridization and linkage analysis. GalNAc-T4 expressed in Sf9 cells or in a stably transfected Chinese hamster ovary cell line exhibited a unique acceptor substrate specificity. GalNAc-T4 transferred GalNAc to two sites in the MUC1 tandem repeat sequence (Ser in GVTSA and Thr in PDTR) using a 24-mer glycopeptide with GalNAc residues attached at sites utilized by GalNAc-T1, -T2, and -T3 (TAPPAHGVTSAPDTRPAPGSTAPPA, GalNAc attachment sites underlined). Furthermore, GalNAc-T4 showed the best kinetic properties with an O-glycosylation site in the P-selectin glycoprotein ligand-1 molecule. Northern analysis of human organs revealed a wide expression pattern. Immunohistology with a monoclonal antibody showed the expected Golgi-like localization in salivary glands. A single base polymorphism, G1516A (Val to Ile), was identified (allele frequency 34%). The function of GalNAc-T4 complements other GalNAc-transferases in O-glycosylation of MUC1 showing that glycosylation of MUC1 is a highly ordered process and changes in the repertoire or topology of GalNActransferases will result in altered pattern of O-glycan attachments.
A family of UDP-GalNAc:polypeptide N-acetylgalactosami-nyltransferases (GalNAc-transferases) 1 (EC 2.4.1.41) control the initiation of mucin-type O-linked protein glycosylation, in which N-acetylgalactosamine is transferred to serine and threonine amino acid residues (1). Four members of the animal GalNAc-transferase family have been reported (1)(2)(3)(4)(5). The Gal-NAc-transferase gene family in animals contain several additional members 2 and recently it was reported that a number of GalNAc-transferase homologues exist in Caenorhabditis elegans (6). The GalNAc-transferases characterized so far have distinct acceptor substrate specificities (4,5,7,8), and show different patterns of expression in human cells and organs (4,5,9). The chromosomal localization and genomic organization of human GalNAc-T1, -T2, and -T3 (GALNT1, GALNT2, and GALNT3) are different; however, a number of conserved intron/ exon boundaries confirm their evolutionary relationships (1,10,11). Taken together, these features strongly suggest that each GalNAc-transferase has distinct functions, and that the large number of members of this gene family has evolved as a consequence of the need for O-glycosylation of different sequences. The fine substrate specificities of GalNAc-transferases may represent a major determining factor for sites of O-glycan attachments.
O-Glycosylation of MUC1 has attracted attention because it is altered in cancer cells with smaller and fewer O-glycans (12)(13)(14). The change in O-glycosylation leads to the exposure of cancer-associated peptide epitopes within the tandem repeat region of MUC1, although the exact molecular nature of this phenomenon is still unknown (15,16). Analysis of the in vitro O-glycosylation properties of various GalNAc-transferase preparations including purified recombinant GalNAc-T1, -T2, and -T3 suggests that only three of the potential five sites in the repeat were glycosylated (AHGVTSAPDTRPAPGSTAPPA, in vitro glycosylation sites underlined). However, recently Muller et al. (17) elegantly established that all five sites can be occu-* This work was supported by the Danish Cancer Society, the Mizutani Foundation, the Velux Foundation, the Danish Medical and Natural Science Research Councils, the Novo Nordisk Foundation, National Institutes of Health Grant 1 RO1 CA66234, funds from the EU Biotech 4th Framework, and the Dutch Cancer Society. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  pied in purified milk MUC1. Two explanations for this discrepancy are: 1) that in vitro O-glycosylation assays fail to reflect the in vivo function of GalNAc-transferases, or 2) that additional GalNAc-transferases have substrate specificity for the two last sites in the MUC1 repeat. The present study presents evidence supporting the latter explanation.
Previously, a sequence motif shared between GalNAc-T1 and -T2 was used to identify and clone GalNAc-T3 (4). Two candidate cDNA sequences containing the GalNAc-transferase motif were identified: one was the GalNAc-T3 gene. Here, we report the cloning and expression of the second identified cDNA, which was designated GalNAc-T4 (1). A putative murine orthologue also designated GalNAc-T4 was recently reported by Hagen et al. (5). The recombinant human GalNAc-T4 had a highly restricted acceptor substrate specificity, which is distinct from that of previously reported GalNAc-transferases. Thus, GalNAc-T4 was found to complement GalNAc-T1, -T2, and -T3, in O-glycosylation of the MUC1 tandem repeat by transferring GalNAc to two sites in the MUC1 repeat not utilized by other known GalNAc-transferases. Importantly, GalNAc-T4 showed preference for the MUC1 glycopeptide, GalNAc 4 TAP24, which was previously glycosylated with GalNAc-T2.

EXPERIMENTAL PROCEDURES
Identification and Cloning of cDNA for GalNAc-T4 -A cDNA sequence (TE4) with extensive similarity to the GalNAc-transferase motif was previously identified following reverse transcriptase-PCR of MKN45 mRNA using degenerate primers (4). The PCR product was cleaved by BstNI to remove GalNAc-T1 and -T2 derived products. Uncleaved product was isolated using the prep-A-gene kit (Bio-Rad) and cloned into the pT7T3U19 vector (Pharmacia). A rapid cDNA library screening strategy was performed as described previously (4). A human salivary gland gt11 library (CLONTECH) was amplified and aliquoted into 40 sublibraries, which were screened to identify phage clones containing TE4. Fourteen sublibraries found to contain TE4 gt11 clones were further assayed by PCR using EBHC201 (5Ј-GCG-GATCCGCGGGCACCATATGCTCG) or EBHC203 (5Ј-GAGCAGTET-TCTGTAGGAAATTGG) primers combined with primers based on flanking gt11 vector sequence to estimate lengths of cDNA inserts for selection of sublibraries with the largest 3Ј or 5Ј sequences. Amplifications were: 35 cycles of 95°C for 45 s; 53°C for 1 s; 72°C for 2 min. One sublibrary (number 22) generated a 3Ј PCR product (EBHC201/gt11 REV) of approximately 750 bp, and one sublibrary (number 34) generated a 5Ј PCR product (EBHC203/gt11 REV) of approximately 800 bp. Additional 5Ј sequence was obtained by 5Ј RACE using a 5Ј Ready-RACE Lung cDNA Kit (CLONTECH) in combination with antisense primer EBHC309 (5Ј-TCAAAAGAACTGCAGGAGAAG): 35 cycles of 95°C for 45 s; 60°C for 15 s; 72°C for 3 min. The RACE products were blunt-end cloned into pT7T3U19 and multiple clones were sequenced. The longest insert was approximately 550 bp and did not contain the full 5Ј coding sequence. Further 5Ј RACE failed and the remaining 5Ј sequence was obtained by sequencing P1 genomic clones.
In Situ Hybridization to Metaphase Chromosomes-Fluorescence in situ hybridization was performed on normal human lymphocyte met-aphase chromosomes, using procedures described previously (18). Briefly, P1 DNA was labeled with digoxigenin-14-dUTP (Boehringer Mannheim) using the BioNICK labeling system (Life Technologies). The labeled DNA was precipitated with ethanol in the presence of herring sperm DNA. A total of 200 ng of P1 DNA was precipitated with 50 ϫ human Cot 1 DNA (Life Technologies, Inc.) and dissolved in 12 ml of hybridization solution (2 ϫ SSC, 10% dextran sulfate, 1% Tween 20, and 50% formamide, pH 7.0). Prior to hybridization the probe was heat-denatured at 80°C for 10 min, chilled on ice, and incubated at 37°C to allow re-annealing of highly repetitive sequences. After denaturation of the slides, probe incubations were carried out under a 18 ϫ 18-mm coverslip in a moist chamber for 45 h. Immunochemical detection of the probe was achieved using sheep anti-digoxigenin (fluorescein isothiocyanate) (Boehringer) and donkey anti-sheep (fluorescein isothiocyanate) (Jackson Laboratories) antibodies. For evaluation of the chromosomal slides, a Zeiss epifluorescence microscope equipped with appropriate filters for visualization of fluorescein isothiocyanate was used. Hybridization signals and 4,6-diamidino-2-phenylindole-counterstained chromosomes were transformed into pseudo-colored images using image analysis software. For precise localization and chromosome identification 4,6-diamidino-2-phenylindole-converted banding patterns were generated using the BDS-image TM software package (ONCOR).
Linkage Analysis of GalNAc-T4 with the Base Pair G1516A Substitution-Seven normal families with an average of 10 children from the Copenhagen Family Bank (19) were analyzed for the G/A polymorphism at bp position 1516 using the PCR assay with EBHC300/EBHC307. To confirm in situ localization on 12q21.3-q22, a set of DNA markers with known physical localization were used. The order and distances for these were drawn from the Généthon Linkage map (20): D12S90-(18.8)-D12S80-(10.8)-D12S81-(7.5)-D12S101. The physical map of these markers are: D12S80, 12p13.2-q21.1; D12S81, 12q21; and D12S101, 12q22. Lod scores were calculated with the LIPED software (21).
Expression of GalNAc-T4 in Sf9 Cells-Expression constructs designed to contain amino acid residues 32-578 of the coding sequence of the putative GalNAc-T4 gene were prepared by genomic PCR on the two different P1 clones using the primer pair EBHC318 (5Ј-AGCGGATC-CTTTTCATGCCTCCGCAGGAGCCG) and EBHC307 with BamHI restriction sites (Fig. 1). These PCR products were cloned into a BamHI site of the expression vector pAcGP67 (Pharmingen), and the expression construct was sequenced. The constructs, pAcGP67-GalNAc-T4 506V sol and pAcGP67-GalNAc-T4 506I sol, were designed to yield a putative soluble form of the GalNAc-T4 protein with an N-terminal end positioned immediately C-terminal to the potential transmembrane domain and including the entire sequence expected to contain the catalytic domain. Control constructs pAcGP67-GalNAc-T1-sol, pAcGP67-GalNAc-T2-sol, pAcGP67-GalNAc-T3-sol, and pAcGP67-O 2sol were prepared as described previously (3,4,22).
A full coding expression construct was prepared by PCR of P1 clone P1-6213 (GalNAc-T4 506V ) using primer pair EBHC320 (5Ј-AGCG-GATCCCACCATGGCGGTGAGGTGGACTTGG)/EBHC307. The product was cloned into BamHI sites of the expression vector pVL1392 (Pharmingen). Co-transfection of Sf9 cells with pAcGP67-constructs or pVL-constructs and Baculo-Gold TM DNA was performed according to the manufacturer's specifications. Briefly, 0.4 g of construct was mixed with 0.1 g of Baculo-Gold DNA and co-transfected in Sf9 cells in 24-well plates. Ninety-six hours post-transfection recombinant virus was amplified in 6-well plates at dilutions of 1:10 and 1:50. Titer of amplified virus was estimated by titration in 24-well plates with monitoring of GalNAc-transferase activities. Initial transferase assays were performed on supernatants of Sf9 cells in 6-well plates infected with first or second amplified virus. Titers representing end point dilutions giving optimal enzyme activities were used. Transferase assays of the full coding expression construct was performed by extracting washed cells in 1% Triton X-100 as described previously (7).
Purification of GalNAc-T4 and Production of Monoclonal Antibodies-pAcGP67-GalNAc-T4 506V sol was expressed in High Five™ cells grown in serum-free medium in upright roller bottles shaking at 140 rpm in 27°C waterbaths. GalNAc-T4 506V was purified essentially as described previously (8), using successive chromatography on Amberlite (IRA95, Sigma), DEAE-Sephacel (Pharmacia), and S-Sepharose Fast Flow (Pharmacia). Final purification was performed on Mini-S™ (PC 3.2/3, Pharmacia) using the Smart System (Pharmacia), and peak fractions used for immunizing BALB/c mice as described previously (5-10 g/subcutaneous injection three times with one intravenous boost) (9). Monoclonal antibodies were selected and characterized by immunocytology on Sf9 cells infected with various GalNAc-transferase expression constructs and selective immunoprecipitation of active Gal-NAc-T4 activity. Immunofluorescence staining of frozen sections of submandibular glands was performed as described previously (9), and monoclonal antibody PANH 2 reacting with MUC5B (23), was used as marker of mucous cells.
Stable Expression of Secreted GalNAc-T4 in Chinese Hamster Ovary Cells-The insert of pAcGP67-GalNAc-T4 506V sol was excised and cloned into a modified pCDNA3 vector (Invitrogen), which includes 19 amino acids of the ␥-interferon secretion signal sequence (24). CHO-K1 (ATCC) was transfected using 0.2 g of DNA and 5 l of Lipo-fectAMINE (Invitrogen) in subconfluent 6-well plates according to the manufacturer's protocol. After 48 h the medium was changed and 400 g/ml G418 was added. At 72 h 10 -20% of the wells were trypsinized and the percentage of cells expressing GalNAc-T4 was evaluated by immunocytology as described previously (9), using a novel anti-Gal-NAc-T4 monoclonal antibody, UH6. Based on the frequency of positive cells the residual transfectant cells were trypsinized and plated in 96-well plates. Two rounds of screening and cloning by limiting dilution using immunoreactivity with UH6 were performed and clones reaching over 50% positive cells were selected, and tested for level of secreted enzyme in supernatant of confluent cultures.
Structure Determination-Using matrix-assisted laser desorption/ ionization mass spectrometry (MALDI-TOF) the mass spectra were acquired on either Voyager-DE or Voyager-Elite MALDI time of flight mass spectrometers (Perseptive Biosystem Inc.), equipped with delay extraction. The MALDI matrix was a 9:1 mixture of 2,5-dihydroxybenzoic acid (2,5-DHB) 25 g/liter and 2-hydroxy-5-methoxybenzoic acid 25 g/liter (Aldrich) dissolved in a 2:1 mixture of 0.1% trifluoroacetic acid in water and acetonitrile. Samples dissolved in 0.1% trifluoroacetic acid to a concentration of approximately 2 pmol/ml were prepared for analysis by placing 1 l of sample solution on a probe tip followed by 1 l of matrix.

RESULTS
Identification and Cloning of cDNA for GalNAc-T4 -Previously, reverse transcriptase-PCR was performed on mRNA from a variety of human organs and cell lines, using a pair of degenerate primers (EBHC100/EBHC106) corresponding to sequences flanking a putative GalNAc-transferase motif. Two novel sequences were identified, TE3 and TE4, with approximately 80% similarity in sequence to GalNAc-T1 and -T2 (4). The reverse transcriptase-PCR product obtained from the human gastric carcinoma cell line MKN45 was cleaved with BstNI (known to cleave GalNAc-T1 and -T2 sequences at nonconserved restriction sites), and the remaining uncleaved product was subcloned and sequenced. Eight out of 40 independent clones contained sequences similar but not identical to Gal-NAc-T1 and -T2. Six of these clones were derived from Gal-NAc-T3 (TE3) (4), and two clones contained a novel sequence designated TE4.
Cloning and sequencing of the complete coding sequence of GalNAc-T4 was achieved by a combination of PCR screening of 40 sublibraries from a human salivary gland gt11 library, 5Ј RACE, and genomic P1 cloning. The combined sequences contained an open reading frame of 1734 bp (GenBank accession number Y08564) (Fig. 1). The entire coding sequence was confirmed by sequencing of P1 clones in both directions. The deduced sequence of GalNAc-T4 is predicted to be a type II transmembrane protein with a hydrophobic retention signal in residues 11-31. A BLAST search of the EST data base (Gen-Bank/NCBI) with the coding region of GalNAc-T4 only detected one human GalNAc-T4 EST, which is in contrast to several other GalNAc-transferases that are highly represented by ESTs.
Genomic Organization and Chromosomal Localization of GalNAc-T4 -Three P1 clones each covering the entire coding sequence of GalNAc-T4 were isolated. Sequencing of all three P1 clones showed that the entire coding region of GalNAc-T4 was contained in a single exon.
Fluorescence in situ hybridization revealed that the Gal-NAc-T4 gene resides at human chromosome 12q21.3-q22 (Fig.  2). No specific hybridization signals were observed at other chromosomal sites. A total of 20 cells in metaphase were analyzed. Further confirmation of the chromosomal location was achieved by linkage analysis using the PCR assay for the polymorphism at position 1516 combined with chromosome 12 microsatellite markers (Table I). Analysis of 10 families yielded significant Lod score (Z Ͼ 3) between D12S80, D12S81, D12S101, and the GalNAc-T4 polymorphism. One recombination to the marker D12S81 was detected in an intercross mating (Z ϭ 10.50 at (MϭF) ϭ 0.02). A marker D12S7 (12q14-q24.1) showed no recombination with GalNAc-T4, but the marker is not on the Généthon map. The suggested order according to the Généthon map was as follows: D12S90-D12S80-D12S81-("GalNAc-T4," D12S7)-D12S101.
The identified polymorphism at bp position 1516 in Gal-NAc-T4 was common. Gene frequencies of the alleles were p(G) ϭ 0.663 and p(A) ϭ 0.337 (Table II). There was no significant deviation from the Hardy-Weinberg expectation (X 2 (1) ϭ 0.012; 0.05 Ͻ p Ͻ 0.90).
Expression of GalNAc-T4 506V -Expression of pAcGP67-Gal-NAc-T4 506V sol resulted in GalNAc-transferase activity in the culture medium of infected cells that was greater than background values obtained with uninfected controls or cells infected with the histo-blood group O 2 gene (not shown). Activi-ties measured with the mucin derived substrates, Muc7, EA2, and Muc2 (see Table III for structures), were only 2-7-fold over background values. Very low activities with a few other substrates were also observed, but for many of the "mucin-like" substrates a relative high background with Sf9 cells infected with irrelevant constructs made assessment difficult. Gal-NAc-T4 was expressed in High Five™ cells and purified to near homogeneity, however, very low endogenous activity was detected with some substrates in the same fractions when medium from cells infected with irrelevant expression constructs were used in the same purification strategy. The activities presented in Table III were obtained with purified GalNAc-T4 and -T2, and are expressed as milliunits/mg (the concentration of enzyme protein estimated by SDS-PAGE using bovine serum albumin as standard). The highest activities with GalNAc-T4 were found (as expected) from assays of Sf9 medium with the mucin peptides Muc7, EA2, and Muc2, however, the efficiencies with these substrates were considerably lower than GalNAc-T2.

GalNAc-T4 Complements Other GalNAc-Transferases in Complete O-Glycosylation of the MUC1 Tandem Repeat-Gal-
NAc-T4 showed poor activities with MUC1-derived peptides, Muc1a, Muc1b, and TAP24 (Table III). This is consistent with the substrate specificity of immunopurified murine GalNAc- T4, where the only efficient substrate identified was the EA2 sequence from rat submandibular glands (5). MUC1-derived substrates, Muc1a (19-mer) and Muc1b (15-mer) at 500 M, yielded negligible activity with murine GalNAc-T4. The glycopeptide, GalNAc 4 TAP24 (Table III), was tested because it has all the acceptor sites for GalNAc-T1, -T2, and -T3 occupied (8), and therefore is not a substrate for these enzymes. Surprisingly, a low but significant activity with this substrate was observed with purified GalNAc-T4 (Table III, IV). Detailed analysis of the activity revealed a strikingly low apparent K m of 90 M, although V max was low (Table IV). Analysis of the reaction products by capillary zone electrophoresis, as described previously (8), indicated that a total of 2 mol of GalNAc were incorporated when the reaction was run to completion (not shown). Analysis of the HPLC purified product of this reaction by mass spectrometry confirmed the presence of 6 mol of GalNAc (Fig. 3). Sites of glycosylation of both GalNAc 4 -TAP24 and GalNAc 6 TAP24 were determined by a strategy that combined PFPA hydrolysis and mass spectrometry as described elsewhere. 3 The two extra sites occupied in GalNAc 6 TAP24 were identified as Ser in GVTSA and Thr in PDTR (not shown).
GalNAc-T4 O-Glycosylates Thr 57 in PSGL-1-Two peptide designs of the NH 2 -terminal sequence of mature PSGL-1 were tested (Table III). In a recent study, Thr 57 of PSGL-1 was identified as the carrier site of an O-glycan required for Pselectin binding (29). GalNAc-T1, -T2, and -T3, exhibited poor activities with both peptide designs, but HPLC analysis of prolonged reaction indicated that both Thr 44 and Thr 57 served as substrates, although the reaction did not go to completion for either site (not shown, Tables III and IV). GalNAc-T4, in contrast, showed low activity with the PSGL-1b peptide containing both Thr 44 and Thr 57 , but not with PSGL-1a without the Thr 57 site. The reaction went to completion and the apparent K m for this peptide was very low (Tables III and IV). Analysis of the exhaustively glycosylated PSGL-1b peptide by mass spectrometry confirmed that a single GalNAc was incorporated (not shown). A strong substrate inhibition with the negatively charged PSGL-1b peptide was observed at concentrations greater than 1 mM. The K m of purified GalNAc-T4 for UDP-GalNAc was 160 M using the Muc7 acceptor substrate. No incorporation with UDP-Gal or UDP-GlcNAc was found using the same peptide.
Establishment of a Stably Transfectant CHO Line Secreting High Levels of GalNAc-T4 -GalNAc-T4 enzyme from insect cells was found to have poor kinetic properties compared with other GalNAc-transferases even with acceptor substrates that exhibited low K m values. A stable CHO transfectant was produced to obtain a stable source of GalNAc-T4, and to determine if another cell type produced GalNAc-T4 with better kinetic properties. 3 E. Mirgorodskaya and P. Roepstorff, manuscript in preparation. a Peptide concentration in a 50-l assay as described under "Experimental Procedures." b One unit of enzyme is defined as the amount of enzyme that transfers 1 mol of GalNAc in 1 min using the standard reaction mixture as described under "Experimental Procedures." c NT, not tested. d GalNAc 4 TAP24 represents the TAP24 peptide terminally glycosylated with GalNAc-T2, and GalNAc attachment site are underlined (8). e ND, not detectable, indicates that no incorporation is observed with substrate even after prolonged incubation (24 h). A monoclonal antibody was developed to the enzyme to facilitate screening and selection of a high expressing variant, as the GalNAc-T4 enzyme assay is time consuming, costly, and insensitive. Mice were immunized with a purified GalNAc-T4 preparation that gave a single band of approximately 58,000 on a SDS-PAGE Coomassie-stained gel (not shown). UH6 reacted with Sf9 cells infected with pAcGP67-GalNAc-T4 506V sol, pAcGP67-GalNAc-T4 506I sol, and pVL-GalNAc-T4 506V full, but not with cells infected with constructs for soluble or full-length GalNAc-T1, -T2, -T3, or irrelevant genes (not shown). Furthermore, UH6 selectively immunoprecipitated GalNAc-T4 and not other GalNAc-transferases (not shown).
One clone, CHO/GalNAc-T4/21A1, was selected and culture medium of confluent T-flasks contained up to 0.95 milliunits/ml activity measured with Muc7 in the standard assay. A stable CHO clone, CHO/GalNAc-T3/H3-6, secreting 5 milliunits/ml GalNAc-T3 (using Muc1a acceptor substrate) has also been established using the same immunoscreening procedure with an anti-GalNAc-T3 monoclonal antibody (not shown). Gal-NAc-T3 has a specific activity of 0.5 unit/mg with Muc1a, and GalNAc-T4 appear to have a specific activity of 0.053 unit/mg with Muc7 (Table III), thus the secretion level of CHO/GalNAc-T4/21A1 is comparable or better than that of CHO/GalNAc-T3/ H3-6. Since there is no detectable endogenously secreted Gal-NAc-transferase activity in the medium of wild-type CHO, this enzyme source is valuable for studies of substrates with low efficiency.
Expression of GalNAc-T4 506I -Expression of pAcGP67-Gal-NAc-T4 506I sol did not result in detectable GalNAc-transferase activity in the culture medium of infected cells that was greater than background values obtained with uninfected controls or cells infected with irrelevant constructs (not shown). However, as outlined above the activity detectable with pAcGP67-Gal-NAc-T4 506V sol directly in the culture medium of infected Sf9 cells was only a few fold over background. The expression of pAcGP67-GalNAc-T4 506I sol as well as pAcGP67-GalNAc-T4 506V sol and pVL-GalNAc-T4 506V full in Sf9 cells was monitored by immunocytological reactivity with UH6. Cells infected with pAcGP67-GalNAc-T4 506I sol were clearly positive, but the number of cells positive and the intensity was lower than for cells infected with pAcGP67-GalNAc-T4 506V sol and pVL-Gal-NAc-T4 506V full (not shown). The expression of GalNAc-T4 506V sol was equivalent (in terms of number of positive cells and intensity) to that previously reported for GalNAc-T1, -T2, and -T3 constructs using specific monoclonal antibodies to these enzymes (9). Expression of pAcGP67-GalNAc-T4 506I sol in High Five cells followed by purification (as described for pAcGP67-GalNAc-T4 506V sol) produced a catalytically active enzyme fraction with activities essentially similar to that of the GalNAc-T4 506V sol variant. Unfortunately, comparison of the specific activities was not possible with the available amount of pAcGP67-GalNAc-T4 506I sol (not shown).
Expression of GalNAc-T4 506V full-Expression of pVL-Gal-NAc-T4 506V full in Sf9 cells did not produce significant GalNActransferase activities in homogenates with the same substrates as found for the secreted construct (not shown), however, the assay is influenced by a very high endogenous GalNAc-transferase background. The expression level was evaluated by immunocytology with UH6, and cells infected with the full coding construct stained stronger than those infected with the pAcGP67-GalNAc-T4 506V sol construct. When a Triton X-100 homogenate of infected Sf9 cells was incubated with GalNAc 4 TAP24 for 24 h with one addition of UDP-GalNAc and extra enzyme, no incorporation was detected by MALDI-TOF of HPLC purified product.
Northern Blot Analysis of Human Organs-Northern blots with mRNA from 24 human adult and 5 human fetal organs were probed with GalNAc-T4 using similar blots previously used for probing GalNAc-T1, -T2, and -T3 (4). GalNAc-T4 hybridized to a single mRNA of approximately 6 kilobases with varying intensity in all organs (Fig. 4), suggesting a ubiquitous expression pattern similar to GalNAc-T1 and -T2. This pattern is different than the expression pattern found in Northern analysis of murine organs with the putative murine GalNAc-T4 orthologue, where a more restricted expression pattern was observed (5).
Immunolocalization of GalNAc-T4 with UH6 -Hagen et al. (5) showed by Northern analysis that the putative murine GalNAc-T4 orthologue was strongly expressed in sublingual and not submandibular salivary glands. Human mRNA from these organs were not available for this study. Immunohistology on frozen sections of human submandibular glands show that GalNAc-T4 is strongly expressed in mucous cells, whereas staining of serous cells was very weak (Fig. 5). Thus, the failure of Northern analysis to detect expression in submandibular glands is likely to be result of the low number of mucous cells in this gland combined with low expression in serous cells (5).
GalNAc-T4 contains a hydrophobic signal sequence which is predicted to provide retention in Golgi (30). The observed subcellular staining pattern with supranuclear reactivity suggestive of Golgi localization was identical to the staining pattern found with antibodies to GalNAc-T1, -T2, and -T3 (9). Im- muno-EM localization of these three latter GalNAc-transferases confirmed Golgi localization (31,32). DISCUSSION In the present study, a fourth human GalNAc-transferase gene, designated GalNAc-T4, was cloned and expressed. Gal-NAc-T4 was shown to have polypeptide GalNAc-transferase activity with a unique acceptor substrate specificity distinct from other human GalNAc-transferases. We have previously identified unique substrates for the first three human GalNActransferases (4,8). GalNAc-T4 was exceptional in that it had no or only poor activities with acceptor substrates derived from several mucin tandem repeats, which represent some of the best substrates for other GalNAc-transferases. In contrast, GalNAc-T4 showed unique activities for new acceptor sites in the MUC1 tandem repeat and in the P-selectin ligand, PSGL-1. Thus, in vitro studies of the activities of recombinant GalNActransferases provide increasing evidence that each GalNActransferase serve different roles in O-glycosylation (8,33). The finding that each enzyme has unique acceptor sites may suggest that evolution of this large family of GalNAc-transferases was driven by the need to glycosylate a diverse set of protein sequences.
The molecular processes governing the specificity and kinetics of O-glycosylation remain obscure. It is clear that a single consensus peptide sequence does not exist for O-glycosylation (34 -36). The extent to which the primary amino acid sequence determines O-glycosylation has long been debated, and the failure to identify consensus motifs has indicated that secondary structure and surface accessibility plays a larger role than primary sequence context in specifying acceptor sites for Oglycosylation. This hypothesis was initially supported by an in vivo study of O-glycosylation of a single acceptor site, where extensive amino acid substitutions in flanking sequences produced no or only little influence of the observed O-glycosylation (37). The in vivo result was in striking contrast to the effects of similar substitutions in an in vitro study with the same acceptor site using purified bovine GalNAc-T1, in which GalNAc-T1 was particularly inhibited by flanking residues with negative charge (38). It is important to note, however, that while in vitro studies can assess the specificity of a single GalNAc-transferase, in vivo studies mostly evaluate the accumulated activities of the entire repertoire of GalNAc-transferases expressed in the cell type chosen for study. An exception to this is the strategy developed by Rottger et al. (32), where a single Gal-NAc-transferase is relocated to endoplasmic reticulum. Nevertheless, the data indicate that in vitro assessed specificity of GalNAc-transferases failed to reflect their in vivo activity. In agreement with this, Muller et al. (17) recently showed that five potential sites in the MUC1 tandem repeat peptide sequence can be glycosylated in vivo, while previous in vitro studies indicated that only a total of three sites in the repeat were utilized by various GalNAc-transferase preparations (39 -41) as well as by recombinant GalNAc-T1, -T2, and -T3 (8).
An important point is that most observed discrepancies between in vitro and in vivo O-glycosylation patterns have been examples where in vitro glycosylation is less effective than in vivo glycosylation. Since the complete repertoire of GalNActransferases and their accumulated substrate specificities is still unknown, it is possible that all in vivo identified sites can be accommodated by these enzymes. Furthermore, it may not be possible to evaluate the O-glycosylation capacity of a given cell or organ by merely analyzing the total activities in extracts. Studies addressing these issues in detail indicate that the primary amino acid sequence of acceptor sites may be the major determining factor of O-glycosylation. Previously, a unique acceptor site for human GalNAc-T3 was identified in the HIV IIIB V3-loop sequence (4). In a recent study, Nehrke et al. (33) used an in vivo model to show that in vivo glycosylation of the HIV peptide sequence was dependent on the co-expression of GalNAc-T3 in the host cell. Furthermore, it was demonstrated that the unique specificity of GalNAc-T3 for this substrate was directed by a single charged residue at position ϩ3 of the acceptor site. As shown here, the previous failure of complete O-glycosylation of the MUC1 tandem repeat peptides in in vitro studies, may be explained by the unique acceptor specificity of GalNAc-T4 and its kinetic properties. Both human and murine GalNAc-T4 showed poor activity with various peptide designs from the MUC1 tandem repeat (Table III) (5), whereas the partially glycosylated repeat peptide, GalNAc 4 TAP24, was one of the best substrates identified for the human homologue. The observed poor catalytic efficiency of GalNAc-T4 may explain why this enzyme activity cannot be measured in total extracts when the starting substrate is the naked peptide. This is further supported by the finding that the full coding construct of GalNAc-T4 expressed in Sf9 cells as monitored by immunocytology failed to yield significant activities above background values in extracts. Elucidating the mechanism for this require further studies. It is likely that the failure to identify a single consensus acceptor sequence for O-glycosylation may be explained by the existence of a large number of GalNAc-transferases that each have different acceptor specificities.
GalNAc-T4 exhibited unique acceptor substrate specificities different from other studied GalNAc-transferases. Thus, the acceptor site identified in PSGL-1 that exhibited the lowest K m is flanked by negatively charged residues (-ETE-), which is a sequence context rarely found in glycoproteins (36). Since Pselectin binding to PSGL-1 is dependent on the presence of an O-glycan at this site (29), identification of the GalNAc-transferase responsible for O-glycosylation is important. PSGL-1 is O-glycosylated at Thr 57 in HL-60 cells and in CHO (42), which may suggest that GalNAc-T4 will be expressed in these cell lines. The repertoire of GalNAc-transferases in HL-60 cells include GalNAc-T1, -T2, and -T3 (9), and immunocytology with UH6 also demonstrated weak expression of GalNAc-T4 (not shown). One of the unique acceptor sites identified in MUC1 (-PDTR-) is also flanked by charged residues. The finding that GalNAc-T4 showed strong preference for the partially glycosylated MUC1 repeat demonstrates that prior O-glycosylation may induce new acceptor sites or at least enhance the activity with these sites. Previously, a negative influence of prior gly-cosylation was shown with MUC2-derived glycopeptides using GalNAc-T1 (43). Presently, it is not clear which of the three sites in the MUC1 repeat glycosylated by GalNAc-T1, -T2, and -T3 are required to induce the GalNAc-T4 activity. The order in which GalNAc-T4 processes the two last sites is also not clear. Nevertheless, it is evident that complete O-glycosylation of the MUC1 tandem repeat requires several GalNAc-transferases perhaps acting in a specific order. The reduced number of O-glycans of MUC1 in cancer (14), may be explained by specific changes in GalNAc-transferase repertoire and/or topology in cancer cells.
A putative mouse orthologue of GalNAc-T4, with 91% amino acid sequence similarity to human GalNAc-T4, was previously identified (5). The murine GalNAc-T4 was found to exhibit unique specificity for the EA2 peptide, whereas no glycosylation was observed with several other mucin-related acceptor sequences including MUC1-derived peptides. Human Gal-NAc-T4 also utilized the EA2 peptide, but the tandem repeat sequence of human MUC7 was a better substrate. The expression pattern of murine GalNAc-T4 was restricted and high expression only found in salivary glands, colon, and gastric mucosa (5). The expression pattern of human GalNAc-T4 in the same organs as tested for the murine enzyme were correlated, although the same high level of expression in colon and stomach was not found (Fig. 4). A wider expression pattern was found in human since more organs were tested. The finding that organs of the immune and endocrine systems expressed GalNAc-T4 is potentially highly significant. Since the human and mouse GalNAc-T4 are similar in sequence and their substrate specificities expression patterns appear to be similar, it is likely that they represent orthologous genes. However, it is important to note that several of the GalNAc-transferases exist in paired copies with high sequence similarity, and with identical genomic organization. A close homologue of GalNAc-T3, designated GalNAc-T5, 4 has an identical genomic structure with 10 coding exons (11), and its kinetic properties are identical to GalNAc-T3. It may be difficult to identify orthologous GalNAc-transferase genes, and an example of this was the recent attempt to generate a knock-out of murine GalNAc-T1 based on the bovine sequence (44). Apparently, a close homologous gene, which is yet not fully cloned and characterized, was targeted and the resultant lack of phenotype is therefore still unexplained (45).
Studies of the genomic organization of the GalNAc-transferase family have revealed features that suggest a close evolutionary link between GalNAc-T1, -T2, and -T3, such as conserved intron/exon boundaries (11). Characterization of the GalNAc-T4 gene showed an organization entirely different from other GalNAc-transferase genes. This may shed light on the origin and function of GalNAc-T4. The organization of the GalNAc-T4 gene leads us to propose two possibilities regarding its origin. The first is that GalNAc-T4 arose earlier in evolution than the human GalNAc-T1, -T2, and -T3 genes, if introns are considered late introductions into these genes (46). We recently characterized the genomic organization of human GalNAc-T1, -T2, and -T3 (11). These three genes are encoded in multiple exons (10 -16). Several intron/exon boundaries within the coding regions of these genes are conserved, suggesting that the genes arose through gene duplication from a common ancestral gene containing these introns. The genomic organization of several C. elegans homologues have been characterized and they all contain multiple coding exons (6), and at least one of these genes contain an intron/exon boundary conserved with human GalNAc-T1, -T2, and -T3 (11). This strongly suggest that the GalNAc-transferase family arose early in evolution. A second possibility regarding the origin of GalNAc-T4 is that it arose by retrotransposition from one of the GalNAc-transferases. A retrotransposed pseudogene with high similarity to GalNAc-T1 has been identified, however, it is not expressed and the gene contains inserted sequences, multiple frameshift mutations, and repeat sequences characteristic of transposons (10). Several findings are contrary to this hypothesis: it is unusual to find expressed and functional pseudogenes; common features of primate retroposons and pseudogenes such as direct repeats and A rich boxes, were not found in the 5Ј and 3Ј sequences flanking the coding region (47); and GalNAc-T4 does not exhibit substantial identity to any of the identified and characterized GalNAc-transferases.
The GalNAc-transferase genes are not clustered at one locus. All genes reported thus far are located on different chromosomes (1,11), as are two additional putative GalNAc-transferase genes that we are currently investigating, including a gene that shows a high degree of similarity to GalNAc-T3. 4 The significance of this is presently unknown, but assignment of chromosomal localization and identification of marker microsatellites will allow correlation with diseases linked to particular chromosomal sites. Thus, GalNAc-T3 which is selectively strongly expressed in pancreas co-localizes to a recently identified diabetes susceptibility gene on 2q24 -31 (11).
A single missense polymorphism G1516A with high inci-dence was identified in GalNAc-T4. The GalNAc-T4 506I variant expressed less well than GalNAc-T4 506V in insect cells, and activity was only demonstrable after partial purification. Preliminary analysis of the kinetic properties of purified GalNAc-T4 506I revealed no differences, suggesting that the mutation has no functional effect. Mutations have been found in several glycosyltransferase genes. The majority are associated with the ABO, H/SE, and Lewis histo-blood groups, where structural defects in glycosyltransferases result in polymorphism in carbohydrate antigen expression (48,49). Disease-causing mutations in glycosyltransferase genes have been identified in two cases involving central steps in N-linked glycosylation and the biosynthesis of the glycophosphatidylinositol anchor (50,51). In summary, the identified GalNAc-T4 gene may represent an ancestral gene of the GalNAc-transferase family. The acceptor substrate specificity of GalNAc-T4 was unique, and suggests that GalNAc-T4 plays important roles in glycosylation of PSGL-1 and MUC1.