JBC

HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Originally published In Press as doi:10.1074/jbc.M202684200 on March 29, 2002

J. Biol. Chem., Vol. 277, Issue 25, 22623-22638, June 21, 2002
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
277/25/22623    most recent
M202684200v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Schwientek, T.
Right arrow Articles by Clausen, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schwientek, T.
Right arrow Articles by Clausen, H.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Functional Conservation of Subfamilies of Putative UDP-N-acetylgalactosamine:Polypeptide N-Acetylgalactosaminyltransferases in Drosophila, Caenorhabditis elegans, and Mammals

ONE SUBFAMILY COMPOSED OF l(2)35Aa IS ESSENTIAL IN DROSOPHILA*

Tilo Schwientekabc, Eric P. Bennettacd, Carlos Florese, John Thackerf, Martin Hollmanng, Celso A. Reish, Jane Behrensa, Ulla Mandela, Birgit Kecka, Mireille A. Schäferg, Kim Haselmanni, Roman Zubarevi, Peter Roepstorffi, Joy M. Burchellj, Joyce Taylor-Papadimitriouj, Michael A. Hollingsworthk, and Henrik Clausenal

From the a School of Dentistry, University of Copenhagen, Nørre Alle 20, 2200 Copenhagen N, Denmark, e University of Wisconsin, Laboratory of Genetics, Madison, Wisconsin 53706, f Medical Research Council, Radiation and Genome Stability Unit, Oxfordshire OX11 0RD, United Kingdom, g Zoologisches Institut-Entwicklungsbiologie, Abt. Molekulare Entwicklungsgenetik, Georg-August-Universität Göttingen, Humboldtallee 34A, 37073 Göttingen, Germany, h Institute of Molecular Pathology and Immunology of the University of Porto, IPATIMUP, 4200 Porto, Portugal, the i Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense University, Odense 5230, Denmark, j Imperial Cancer Research Fund, Breast Cancer Biology Group, 3rd Floor, Thomas Guy House, Guy's Hospital, London SE1 9RT, United Kingdom, and the k Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, Nebraska 68198

Received for publication, March 20, 2002, and in revised form, March 28, 2002

    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

The completed fruit fly genome was found to contain up to 15 putative UDP-N-acetyl-alpha -D-galactosamine:polypeptide N-acetylgalactosaminyltransferase (GalNAc-transferase) genes. Phylogenetic analysis of the putative catalytic domains of the large GalNAc-transferase enzyme families of Drosophila melanogaster (13 available), Caenorhabditis elegans (9 genes), and mammals (12 genes) indicated that distinct subfamilies of orthologous genes are conserved in each species. In support of this hypothesis, we provide evidence that distinctive functional properties of Drosophila and human GalNAc-transferase isoforms were exhibited by evolutionarily conserved members of two subfamilies (dGalNAc-T1 (l(2)35Aa) and GalNAc-T11; dGalNAc-T2 (CG6394) and GalNAc-T7). dGalNAc-T1 and novel human GalNAc-T11 were shown to encode functional GalNAc-transferases with the same polypeptide acceptor substrate specificity, and dGalNAc-T2 was shown to encode a GalNAc-transferase with similar GalNAc glycopeptide substrate specificity as GalNAc-T7. Previous data suggested that the putative GalNAc-transferase encoded by l(2)35Aa had a lethal phenotype (Flores, C., and Engels, W. (1999) Proc. Natl. Acad. Sci. U. S. A. 96, 2964-2969), and this was substantiated by sequencing of three lethal alleles l(2)35AaHG8, l(2)35AaSF12, and l(2)35AaSF32. The finding that subfamilies of GalNAc-transferases with distinct catalytic functions are evolutionarily conserved stresses that GalNAc-transferase isoforms may serve unique biological functions rather than providing functional redundancy, and this is further supported by the lethal phenotype of l(2)35Aa.

    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

The family of UDP-N-acetyl-alpha -D-galactosamine:polypeptide N-acetylgalactosaminyltransferases (GalNAc-transferases1; EC 2.4.1.41), which transfer GalNAc to serine and threonine acceptor sites and initiate mucin-type O-glycosylation, is old in evolutionary terms. Hagen and Nehrke (1) originally identified and presented a preliminary characterization of the GalNAc-transferase gene family in Caenorhabditis elegans, consisting of seven distinct genes and additional isoforms derived from alternative splicing. After completion of the C. elegans genome, two additional homologous genes were identified, which comprised a gene family of nine distinct putative polypeptide GalNAc-transferases (2). So far, three distinct members of the C. elegans gene family have been shown to encode functional enzymes using one peptide substrate (1). Preliminary phylogenetic analysis of the available mammalian GalNAc-transferases (8 genes) and C. elegans homologs (9 genes) has suggested evolutionary conservation of individual subfamilies of putative orthologous genes (2). However, a comparative analysis of the kinetic properties of C. elegans and mammalian enzymes has not been presented.

The homologous mammalian GalNAc-transferase family is predicted to be composed of up to 16-18 distinct genes (2). To date, functions of eight distinct members of the human or rodent GalNAc-transferase family have been characterized (3-12).2 It is predicted that most of the mammalian GalNAc-transferase isoforms have different functions. Analysis of mammalian polypeptide GalNAc-transferases has demonstrated that different isoforms have different kinetic properties, and unique substrate specificities have been identified for many isoenzymes (2, 13). Some isoforms can be classified into subfamilies of closely homologous GalNAc-transferases, and as shown for the human subfamily composed of GalNAc-T3 and -T6, such isoforms share unique kinetic properties (10). Our understanding of the enzymatic functions of individual GalNAc-transferase isoforms is mainly derived from in vitro studies. The limited number of in vivo studies have confirmed the in vitro results (14-16). In addition to different enzymatic properties, each isoform has different cell and tissue expression patterns. Despite these findings, it is still not possible to define the in vivo catalytic functions of individual GalNAc-transferase isoforms with respect to characteristic acceptor sequence motifs, range of acceptor proteins, and in vivo biological function. Thus, definition of specific in vivo functions for unique GalNAc-transferase isoforms remains to be defined. It is presently unclear why such a large enzyme family evolved early and to what degree the diversity and number of genes reflects redundancy rather than functional requirements.

The existence of large and similarly numerous GalNAc-transferase gene families in nematodes, insects, and mammals suggests that there is a profound functional requirement for multiple enzymatic isoforms and implies that important biological functions are associated with individual isoforms. This hypothesis is supported by studies of mannosyl-O-glycosylation in yeast, where multiple polypeptide O-mannosyltransferases were found to be essential for growth and survival of yeast cells (17). The first gene ablation of a putative GalNAc-transferase in mice did not lead to significant phenotypic changes, although it is still unproven that a functional GalNAc-transferase gene was targeted (18). A recent preliminary report on targeted inactivation of the murine GalNAc-T1 gene found evidence of minor changes in O-glycosylation (19). One explanation for the finding that there are only minor phenotypes associated with inactivation of single polypeptide GalNAc-transferases may be deduced from phylogenetic analyses of the GalNAc-transferase gene family, which indicates the existence of subfamilies of closely related genes that are hypothesized to provide functional redundancy (2, 10). It remains to be determined whether individual GalNAc-transferase isoforms have unique functions in vivo that are conserved through evolution, as suggested by sequence analysis indicating conservation of subfamilies.

Recent genetic studies of the fruit fly Drosophila melanogaster have extended our understanding of glycan synthesis and functions in developmental biology. The signaling molecule Fringe is a glycosyltransferase that determines dorso-ventral cell interactions during wing formation through modulation of Notch receptor activity (20-22). A developmental disorder resulting in rotated abdomen of adult flies is assigned to a defect in the putative polypeptide O-mannosyltransferase encoded by the rt gene (23, 24). Flores and Engels (25) have reported that a gene homologous to mammalian polypeptide GalNAc-transferases, the l(2)35Aa gene, is likely to be essential for development of D. melanogaster. Transgenic flies expressing an ectopic 5-kb fragment containing the l(2)35Aa gene were found to be viable and fertile (25). The 5-kb rescue fragment contains essentially only one predicted intact open reading frame, and this encodes a protein with homology to GalNAc-transferases. In this report, sequence analysis of three lethal l(2)35Aa alleles showed alterations in the putative GalNAc-transferase coding region of l(2)35Aa. Combined with data showing that the l(2)35Aa gene in fact encodes a functional GalNAc-transferase, this represents the first example of a polypeptide GalNAc-transferase gene conclusively shown to play an essential role in normal development of an organism.

Why is a single GalNAc-transferase isoform essential to development in an organism having 13-15 putative GalNAc-transferase genes? To address this question, we have classified the phylogenic traits of GalNAc-transferase families in the nematode, fly, and mammals. A survey of the completed genome of Drosophila led to identification of 13-15 putative GalNAc-transferase genes. Phylogenetic analysis of the putative catalytic domains of the homologous mammalian, C. elegans, and Drosophila genes indicated a common horizontal pattern of evolution of GalNAc-transferases with a variable degree of species specific gene duplications and divergence. The prediction of evolutionary conservation of subfamilies of GalNAc-transferase genes was confirmed by functional analysis of members of two distinct subfamilies found in Drosophila and humans. In both cases, the subfamilies were shown to encode enzymes with unique and identical functions. These data provide strong evidence for an evolutionary conservation of distinct kinetic properties and substrate specificities within GalNAc-transferase subfamilies. The dGalNAc-T1 fly gene l(2)35Aa is the first GalNAc-transferase gene shown to have a required function in vivo. The orthologous human GalNAc-T11 is predicted to have important in vivo functions. Human GalNAc-T11 displayed a unique expression pattern that was largely limited to kidney.

    EXPERIMENTAL PROCEDURES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Genome Survey of the Drosophila Polypeptide GalNAc-transferase Family-- tBLASTn analysis with human GalNAc-transferases was used to search the entire D. melanogaster genome released by the Berkeley Drosophila Genome Project (26). The genome annotation data base GadFly3 based on the published D. melanogaster genome sequence was used to identify genes annotated to the sequences, and cDNA sequences of identified genes were obtained from GenBankTM and the Drosophila Gene collection (DGC Release 1 and 2 (27)). cDNA sequence information of putative GalNAc-transferase genes not represented in DGC and GenBankTM was obtained by identification of EST clots and single ESTs in the Berkeley Drosophila Genome Project and NCBI EST data bases. cDNA clones of ESTs with the longest inserts were obtained from Research Genetics Inc. and sequenced.

Sequence Alignment and Phylogenetic Analysis-- Amino acid sequences representing the central evolutionarily conserved domain of polypeptide:N-acetylgalactosaminyltransferases from C. elegans (1) and Homo sapiens (28) were aligned with the orthologous D. melanogaster proteins using ClustalX 1.8 with Gonnet 250 protein weight matrix and default gap penalties (29). Multiple sequence alignments were revised for maximal residue conservation and minimal gaps in conserved amino acid sequence motifs. Distance analyses of the amino acid alignments were performed using the PHYLIP software package with PAM250 substitution matrix of PROTDIST, and phylograms were generated using neighbor joining and least square algorithms with 1000 bootstrap replicates (51).

Cloning and Characterization of l(2)35Aa (dGalNAc-T1) and CG6394 (dGalNAc-T2) cDNA-- The l(2)35Aa gene was originally identified and sequenced in a detailed analysis of the genomic Adh region of D. melanogaster chromosome 2 (30). EST clones LD24449 and SD08641 representing l(2)35Aa cDNA (5'-EST GenBankTM accession numbers AA941044 and AI542350) were obtained from Research Genetics. An open reading frame of 1896 base pairs as well as the sequence of the 5'- and 3'-untranslated regions of l(2)35Aa mRNA was represented in both clones and fully sequenced. The coding region of l(2)35Aa was previously found to be organized in one exon (30). Confirmatory sequencing was performed on a genomic clone obtained by PCR (35 cycles at 95 °C for 10 s, 55 °C for 15 s, and 68 °C for 2 min 30 s) on D. melanogaster (Canton S) genomic DNA (CLONTECH) with the sense primer TSHC1 (5'-AGCGGATCCATGATGCAAATCAAGC-3') and the antisense primer TSHC3 (5'-AGCGGATCCTCGCTCACAGCGCCTTGTCCG-3'). The gene CG6394 was identified and sequenced in the D. melanogaster genome sequence project. Embryonal D. melanogaster cDNA was generated using the SMART RACE cDNA Amplification Kit with embryonal poly(A)+ RNA (CLONTECH) according to the manufacturer's instructions. cDNA for dGalNAc-T2 was obtained by PCR (25 cycles at 95 °C for 10 s; 54 °C for 15 s, and 72 °C for 2 min 30 s) with primer pair TSHC254 (5'-CTGGCCACCTAGACACGTC-3') and TSHC263 (5'-GATCCATTCCTTGATCCTC-3'). An open reading frame of 1773 base pairs as well as partial sequences of the 5'- and 3'-untranslated regions of CG6394 mRNA were cloned into pCR4-TOPO (Invitrogen) and fully sequenced.

Cloning and Characterization of Human and Murine GALNT11-- The predicted mammalian orthologs of dGalNAc-T2, human and rodent GalNAc-T7, were reported previously (9, 11). A novel human gene designated GALNT11, predicted to represent an ortholog of dGalNAc-T1, was originally identified in the course of a positional cloning approach to isolate the DNA repair gene XRCC2 (31). A cDNA clone (cDNA25L) containing a partial open reading frame with sequence similarity to human polypeptide GalNAc-transferases was found. Based on the sequence of cDNA25L, primers EBHC600 (5'-AGCGGATCCAACACACAACTCTTTGCGGCTG-3') and EBHC602 (5'-AGCGGATCCGACAAAAATGGGTTGTTGGGG-3') were designed and used in a PCR strategy to screen a gastric lambda gt11 cDNA library (MKN45) using the rapid cDNA library screening procedure previously described (5). In brief, using a combination of lambda -vector primers and EBHC600 or EBHC602, 20 aliquots derived from a MKN45 cDNA library were screened by PCR. Three sublibraries (numbers 19, 7, and 13) were found to contain clones with 2-, 1.5-, and 0.8-kbp inserts, respectively. PCR products were blunt end-cloned and sequenced. The 3' sequence contained an in-frame stop codon, and the 5' sequences contained a potential open reading, including both a hydrophobic transmembrane domain and an in-frame initiating methionine codon that was downstream of a Kozak sequence. Using the human sequence as a query for BLASTn analysis of the dEST data base, one single murine EST clone (GenBankTM accession AA104499), predicted to represent an orthologous isoform, was identified, and the full coding region was sequenced. The murine open reading frame predicted a protein sequence with 87% identity to the human protein, which is in agreement with other rodent-human orthologs. The genomic organization of human GALNT11 was determined from a P1 clone DMPC-HFF#1-1321(E8) (P114808) isolated from a human foreskin genomic library (DuPont Merck Pharmaceutical Co. Human Foreskin Fibroblast P1 Library) using the primer pair EBHC616 (5'-GCGAATTCAGTGAAGTGACTCAGCCAC-3')/EBHC619 (5'-CACGACTGTCTATCACATCGTC-3') for screening. The entire sequence of the open reading frame, compiled from the cDNA sequence, was sequenced. During the course of this study, a human genomic PAC clone, RP5-98107 (GenBankTM accession number AC006017), was sequenced, and an open reading frame for a homologous putative GalNAc-transferase was identified (GenBankTM accession number AAD45821). This sequence is identical to the sequence reported here as well as to a recently deposited cDNA sequence (GenBankTM accession number AK025287) derived from colon as part of a cDNA sequencing project.

Expression of dGalNAc-T1, dGalNAc-T2, and GalNAc-T11 in Insect Cells-- An expression construct encoding amino acid residues 40-632 of dGalNAc-T1 was prepared by PCR using D. melanogaster genomic DNA and the primer pair TSHC2 (5'-AGCGGATCCTCGCTCACAGCGCCTTGTCCG-3') and TSHC3 with BamHI restriction sites. A dGalNAc-T2 expression construct encoding amino acid residues 39-591 was prepared by PCR using pCR4-CG6394 and the primer pair TSHC264 (5'-CAGCGGATCCAGGAGGCGTATCACACG-3') and TSHC265 (5'-GAGCGAATTCTTACCAGCGTGGGCGTATC-3') with BamHI and EcoRI restriction sites, respectively. Expression constructs of the secreted forms (amino acids 32-608) of the putative human and murine GALNT11 genes were prepared using the following primer set for the human construct: EBHC629 (5'-GCGAATTCGTGAAGTGACTCAGCCACTTAAG-3') and EBHC631 (5'-GCGGAATTCCACCTTAACCTTCCAAATGC-3'); and the following for the murine construct: EBHC711 (5'-GCGGAATTCGTGAAGTGACTCAGCCCCTTAGG-3') and EBHC710 (5'-GCGGAATTCAATTATGTTGTGTCCAGCCAGGG-3'). MKN45 mRNA was used as template for human RT-PCR, whereas cDNA from the identified EST clone (GenBankTM accession number AA104499) was used as template for the murine PCR. The products were cloned into the vector pAcGP67 (Pharmingen) and fully sequenced.

Plasmids pAcGP67-dGalNAc-T1-sol, pAcGP67-dGalNAc-T2-sol, and pAcGP67-GalNAc-T11-sol were cotransfected with Baculo-GoldTM DNA (PharMingen), and recombinant baculovirus was obtained after two successive amplifications in Sf9 cells, as described previously (5).

Polypeptide GalNAc-transferase Assays-- Assays were performed with media harvested 3 days after infection of Sf9 cells, or when indicated with enzymes purified from infected High FiveTM cells grown in serum-free media (Invitrogen). Purification was performed essentially as described previously with consecutive ion exchange chromatographies as described below (32). Quantification of enzyme protein concentration was performed by comparative SDS-PAGE Coomassie-stained gels using bovine serum albumin and transferrin as standards.

Enzyme activities were analyzed by two methods: (i) initial velocity assays where consumption of substrates was limited to less than 10% and (ii) a product development assay where the reactions were exhaustive/terminal using two additions of enzyme and sugar nucleotides in order to evaluate the final number of acceptor sites utilized by the enzymes. Standard reaction mixture (50-µl final volume) contained 25 mM cacodylate (pH 7.4), 10 mM MnCl2, 0.25% Triton X-100, 100-200 µM UDP-[14C]GalNAc (2000 cpm/nmol) (Amersham Biosciences), and 0.006-800 µM acceptor peptides. Products were quantified by scintillation counting after chromatography on Dowex 1X8. In assays designed to determine acceptor sites in peptides used by the various isoforms, the reaction mixtures were modified to include 1.7 mM cold UDP-GalNAc (Sigma). Reactions were incubated for 12 h, followed by the addition of 50% more enzyme and UDP-GalNAc and a 4-h incubation. The glycosylation as determined by moles of GalNAc incorporated into peptide was monitored at 2-4-h intervals by MALDI-TOF mass spectrometry as previously described (33). Peptides were synthesized by ourselves and by Neosystems (Strasbourg), and structures were ascertained by amino acid analysis and mass spectrometry. GalNAc glycopeptides were prepared as previously described using human GalNAc-T1 and -T2 (7, 11). The kinetic properties were determined with partially purified enzymes (secreted forms) expressed in High FiveTM cells. Partial purification was performed by consecutive chromatography on Amberlite IRA-95, DEAE-Sephacryl, and SP-Sepharose essentially as described (32). The purification scheme does not involve affinity chromatography, and a low level endogenous activity secreted from insect cells can be found, depending on substrates.

Determination of Acceptor Sites Utilized by GalNAc-transferase Isoforms-- MALDI mass spectrometry of glycosylated peptides was performed on a Voyager-Elite MALDI time-of-flight mass spectrometer (PerSeptive Biosystems Inc., Framingham, MA), equipped with delayed extraction. The MALDI matrix was 2,5-dihydroxybenzoic acid 10 g/liter (Aldrich) dissolved in a 2:1 mixture of 0.1% trifluoroacetic acid in 30% aqueous acetonitrile (Rathburn). Samples dissolved in 0.1% trifluoroacetic acid to a concentration of ~2 pmol/ml were prepared for analysis by placing 1 µl of sample solution on a probe tip followed by 1 µl of matrix. All mass spectra were obtained in the linear mode. Data processing was carried out using GRAM/386 software. Acceptor sites utilized were determined in MUC1-, MUC4-, and EA2-derived peptides as follows: Glycopeptide products of exhaustive reactions were purified by high pressure liquid chromatography and dissolved in a water/methanol/acetic acid mixture (49:49:2, v/v/v) to a concentration of ~10-5 M. An aliquot of 4 µl was loaded into a precoated nanoelectrospray needle (MDS Protana, Odense, Denmark). A 4.7-Tesla Ultima (Ionspec, Irvine, CA) Fourier transform ion cyclotron resonance mass spectrometer was used to perform electron capture dissociation. External accumulation in the hexapole of the ESI source (Analytica, Branford, MA) was followed by gated trapping. Selection of desired charge states (MS/MS) was made by applying a preprogrammed waveform (34). For the glycosylated 60-mer peptide, three charge states (6, 7, and 8+) were selected. For the glycosylated TAP25 peptide, the most intense charge state (between 3+ and 5+) was selected. Selected cations were irradiated by <0.2-eV electron from a heated tungsten electron ionization filament for 9 s. Between 50 and 200 scans were accumulated.

Sequence Analysis of the l(2)35Aa Alleles-- The DNA sequence of l(2)35Aa was determined for three mutant alleles and one wild-type allele. Drosophila stocks with the EMS-induced mutant alleles l(2)35Aa1 (AKA l(2)35AaHG8), l(2)35Aa3 (AKA l(2)35AaSF12), and l(2)35Aa4 (AKA l(2)35AaSF32) (35) were crossed to a stock carrying Df(2L)b84h1, a deletion that removes all of GalNAc-T1 and the surrounding genes. Since this gene is essential for adult survival, we used a translocation of the Tubby gene to follow the genotypes at the larval stage. Progeny that carried the mutant allele over the deletion were identified among the progeny during the third larval instar and selected for sequencing. In the same way, a wild-type allele of dGalNAc-T1 from another stock, b1 Adhn4 l(2)35BfSF18, was isolated for sequencing. This stock was chosen because there was a possibility that it still carried the progenitor allele from which mutants l(2)35Aa3 were derived. The dGalNAc-T1 gene was amplified by PCR from DNA extracts from each of these larvae and sequenced using BigDyeTM fluorescent dideoxyterminator reactions according to the manufacturer's protocol (ABI, Applied Biosystems). The primers used for PCR and sequencing are GalN1 (5'-ATTGGTTTTTGCTTGGGCATC-3'), GalN2 (5'-GTGATGCTAACGTTGGGTTGG-3'), GalN3 (5'-ATACCATGGCATTCGCACAAA-3'), GalN4 (5'-CGAATGCCATGGTATGCAAAT-3'), GalN5 (5'-GCAGGCTGTGCGAGTAGAAAA-3'), GalN6 (5'-CAGGATGGAGGATCGACGAAG-3'), GalN7a (5'-CACCACACCCAACAGTTCCAG-3'), GalN8 (5'-CAGGCCTCCATAGTCATGTGC-3'), GalN9 (5'-GCGTTCCAGCACGGTCTTAAT-3'), GalN10 (5'-GTCAATCAGCAGTGGCTGGAG-3'), GalN11 (5'-AGTGGACTGGGCGTGTACTCA-3'), GalN12 (5'-TCCCGTGTCGGTCACATATTC-3'), GalN13 (5'-TCGTTGCGGAATATGTGACC-3'), GalN14 (5'-CATCTTTCAGCCTTGGCACTC-3'), GalN15 (5'-TCCAGAAACCCTTCACCTTGG-3'), GalN16 (5'-AACGCCAACAGTCCCGTCTAC-3'), GalN17 (5'-CGTTGTTGGACTTGGAGCACAGAT-3'), GalN19 (5'-ACCCACCATTGGCCATTAATCA-3'), and GalN21 (5'-CCACCAGATAGTCGGTATTGAAA-3').

Expression Pattern of l(2)35Aa-- Northern analysis was performed with poly(A)+ mRNA isolated directly from different hand-dissected imago tissues using oligo(dT)-linked Dynabeads (Dynal). Five µg of mRNA were separated by electrophoresis in formaldehyde gels, transferred to a nylon membrane (Hybond-N; Amersham Biosciences), and cross-linked by ultraviolet light. The filter was hybridized with alpha -32P-labeled in vitro transcripts of l(2)35Aa cDNA and exposed for autoradiography. Whole mount in situ hybridization to mRNA in ovarian tissue was performed following the protocol of D. Tautz and C. Pfeifle (36) with modifications by E. Knust and K. H. Glätzer.4 For in situ hybridizations to egg chambers, the protocols were modified as follows. Ovaries were partially dissected into ovarioles and fixed with heptane-saturated 4% formaldehyde in PBT (0.2% Tween 20 in phosphate-buffered saline) and 10% Me2SO for 20 min, rinsed with 90% methanol and 50 mM EGTA, and digested with proteinase K after several washes in PBT. A digoxigenin-labeled (Roche Biochemicals) cDNA fragment of l(2)35Aa encoding soluble dGalNAc-T1 (base pairs 118-1896) was used as a probe for hybridization.

Cell and Tissue Expression of Human GalNAc-T11 by Northern Analysis and Immunohistochemistry-- Multiple tissue Northern blots were obtained from CLONTECH. A mouse multiple tissue blot was prepared with 20 µg of total RNA isolated from C57B mouse organs using the guanidinium thiocyanate procedure. Total RNA was separated by 1% formaldehyde gel electrophoresis and transferred to Hybond N+ membrane. The soluble cDNA expression cassettes (nucleotides 91-1827, human and murine) were used as probes on corresponding human and murine blots. Probes were random prime-labeled using [alpha -32P]dCTP (Amersham Biosciences) and an oligolabeling kit (Amersham Pharmacia Biotech). Blots were probed as described previously (5) and washed 5 times at 42 °C with 2× SSC, 0.1% SDS; once with 0.5× SSC, 0.1% SDS; and once at 55 °C with 0.1× SSC, 0.1% SDS in a minihybridization oven (Hybaid).

A murine anti-human GalNAc-T11 monoclonal antibody (IgG1 isotype), designated UH8 (1B2), was produced and characterized essentially as described previously (37). Hybridomas were selected by immunocytology on air-dried, acetone-fixed Sf9 cells infected with baculovirus containing various secreted GalNAc-transferase constructs. Immunocytology was performed with a series of human cancer cell lines (A704, HT29, MKN45, HeLa, WI38, HL60, Colo205, AGS, A431, MiaPaca, SUIT2, ASPC1, MTSV1, T47D, MCF7, and IMR32). Cells were grown to subconfluence in the appropriate media as recommended by American Type Culture Collection. Cells were fixed in ice-cold acetone for 10 min and then kept at -70 °C before staining. Fresh frozen human tissue samples from skin (n = 2), oral mucosa (n = 5), salivary glands (minor n = 4, submandibular n = 3, parotic n = 1), small intestine (n = 1), colon (n = 2), and kidney (n = 2) were obtained as previously described (37). The experiments were approved by the local Human Investigations Committee (in Denmark; J#KF 03-004/95), and the use of mice was authorized by the Danish Animal Inspectorate. Processing of frozen tissue sections and fluorescence immunohistology were performed as previously described (37) and examined in a Zeiss fluorescence microscope using epi-illumination.

    RESULTS
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

The Drosophila Polypeptide GalNAc-transferase Family-- The genome survey of D. melanogaster revealed 13 homologous putative GalNAc-transferase genes and two related but apparently truncated genes. Individual members of the GalNAc-transferase gene family are found on chromosomes X, 2, and 3 of the four chromosomes contained in the Drosophila genome (Table I). The algorithms used to generate GadFly predict that coding regions for the identified genes are organized in 2-8 exons. One gene, l(2)35Aa, is contained in a single exon (30). Evaluation of ESTs derived from libraries listed in Table I indicates expression of 12 of these genes in cultured cell lines, at different stages of fly development, and in adult flies. Expression of putative GalNAc-transferase genes not represented in the EST data base was confirmed by RT-PCR analysis of embryonal, larval, or adult D. melanogaster mRNA (data not shown).

                              
View this table:
[in this window]
[in a new window]
 
Table I
Members of the D. melanogaster polypeptide:N-acetylgalactosaminyltransferase gene family

The putative reading frames of the Drosophila GalNAc-transferase genes are predicted to encode proteins similar to most mammalian and C. elegans GalNAc-transferases, ranging from 557 to 667 amino acids (Fig. 1). All members of the Drosophila GalNAc-transferase gene family are predicted by the TMpred algorithm5 to encode type II transmembrane proteins. The predicted amino-terminal cytoplasmic domains range from 6 (CG3254, CG8845a) to 20 amino acids (CG9152), and the predicted transmembrane domains include a hydrophobic region of 17-22 amino acids near the N terminus (Fig. 1). All members contain a ricin-like lectin domain of ~130 residues in the C-terminal region. Conserved sequence motifs among Drosophila proteins resemble those previously identified for mammalian (2, 28, 38) and C. elegans proteins (1). The central catalytic domain of GalNAc-transferases was previously defined as containing the GT1 and Gal/GalNAcT motifs (38), and a multiple sequence alignment of the putative catalytic domains (ClustalX 1.8) of 13 Drosophila GalNAc-transferases is shown in Fig. 2. An expanded analysis including the putative catalytic domains of C. elegans and human GalNAc-transferases was analyzed by protein distance methods with statistical confidence ensured by bootstrap analysis. A consensus tree generated by neighbor joining and least square algorithms is shown in Fig. 3. The tree is unrooted, since no common ancestral GalNAc-transferase has been identified to enable the definition of a phylogenetic outgroup. The phylogram indicates 10 statistically significant clades of homologous GalNAc-transferases. The majority of the identified clades contain at least one enzyme each from fly, human, and worm, suggesting that there are essential functions associated with the genes grouped in these clades. Groups of GalNAc-transferases represented only in Drosophila (CG7304, CG7579, CG7297, CG10000) or human (GalNAc-T8 and -T9; GalNAc-T5) may represent isoenzymes with unique species-specific functions (Fig. 3). Additional uncharacterized mammalian GalNAc-transferases have been identified that would increase the human family to 16-18 members, but these are not included in the present analysis. The analysis clearly indicates a horizontal evolutionary relationship among members of the GalNAc-transferase genes of nematodes, fly, and mammals in distinct subfamilies. The most likely explanation for this finding is that individual subfamilies serve distinct functions. The existence of multiple members in some subfamilies in a single species may represent simple redundancy as well as further diversification of functions by refinements in enzyme properties or differential regulation of expression. The question of function and expression of members of a subfamily has so far only been reported for human GalNAc-T3 and -T6 (10), but similar findings are found for at least one additional human subfamily.6


View larger version (23K):
[in this window]
[in a new window]
 
Fig. 1.   Depicted domain structure of D. melanogaster homologous putative polypeptide GalNAc-transferases. A, schematic representation of polypeptide GalNAc-transferase protein domains. CT, cytoplasmic tail; TM, transmembrane region. B, structure of the putative D. melanogaster polypeptide GalNAc-transferase family members. Protein domains were predicted based on the available cDNA or genomic DNA sequence. The proteins were aligned at four conserved cysteine residues in the catalytic domains, as shown in Fig. 2. Conserved cysteine residues (C) are indicated by dotted lines. The position of the DXH motif is indicated. Potential N-glycosylation sites are indicated by trees. CLD and QXW repeats of the ricin-like lectin domain are indicated by filled circles and open squares. The N-terminal portion of CG10000 with uncertain topology is depicted in dotted form.


View larger version (96K):
[in this window]
[in a new window]
 
Fig. 2.   Multiple sequence analysis of the catalytic domains of predicted D. melanogaster polypeptide GalNAc-transferases. ClustalX 1.8 was used as a progressive sequence alignment tool. Alignments were revised for maximal residue conservation and minimal gaps in the conserved sequence motifs. Introduced gaps are shown as hyphens, and aligned identical residues are boxed (black for all 13 sequences, dark gray for 10-12 sequences, and light gray for eight or nine sequences). The positions of four conserved cysteine residues are indicated by asterisks. The arrow indicates the amino acid position affected by the l(2)35AaSF32 mutation. Conserved GT1 and Gal/GalNAcT domains are indicated above the sequences.


View larger version (30K):
[in this window]
[in a new window]
 
Fig. 3.   Phylogram of polypeptide GalNAc-transferases in D. melanogaster, H. sapiens, and C. elegans. The consensus tree from protein distance analyses of predicted catalytic GalNAc-transferase domains is based on progressive sequence alignments similar to Fig. 2. Bootstrap percentage values from 1000 replicates are indicated above the nodes. Putative D. melanogaster polypeptide GalNAc-transferases are indicated by their GadFly annotation. GalNAc-T and GLY indicate polypeptide GalNAc-transferases identified in humans and C. elegans, respectively. Phylogenetic subfamilies are indicated by background shading. Human GalNAc-T8 and -T9 have been reported without confirmation of enzymatic functions and may hence not represent genuine GalNAc-transferase genes (46, 47). The recently reported rat GalNAc-T9, which was functionally characterized and shown to display glycopeptide substrate specificity, has been redesignated rGalNAc-T10 (12). GalNAc-T12* refers to a novel genuine human polypeptide GalNAc-transferase (E. P. Bennett and H. Clausen, unpublished results). The predicted CG2103 protein exhibits high amino acid sequence similarity (60% identity in the catalytic domain; Fig. 2) to C. elegans GLY-10 and forms a strongly supported phylogenetic group with CG8845a, CG8845b, and rat GalNAc-T10. CG6394, encoding the dGalNAc-T2 protein is closely related to C. elegans GLY-7 and human GalNAc-T7 with a respective sequence identity of 62 and 54% in the catalytic domain (1, 11). These orthologous GalNAc-transferases form a strongly supported phylogenetic group and share similar enzymatic properties (Table III). The CG6394 gene is organized in three exons and is the only putative D. melanogaster GalNAc-transferase found on chromosome X (Table I). CG10000, together with CG7304, CG7579, and CG7297, form a subgroup of putative GalNAc-transferases found only in Drosophila. The CG8182 protein is most closely related to the recently described C. elegans putative GalNAc-transferase GLY-9 (1). The worm and fly proteins exhibit 52% amino acid sequence identity in the catalytic domain. Among the putative D. melanogaster GalNAc-transferases, CG9152 is most closely related to the mammalian GalNAc-T1 (3). CG9152 shares 72% amino acid sequence identity with the catalytic domain of human GalNAc-T1 and forms a strongly supported clade with CG4445, GLY-3, and GLY-6. Five of seven introns of the CG9152 locus are located at positions similar to the human GALNT1 gene and four at positions similar to gly-6, suggesting conservation from a common ancestral gene (39). The putative GalNAc-transferase CG3254 exhibits close sequence similarity to human GalNAc-T2 (77% amino acid identity in the catalytic domain). Lower but significant sequence similarity is found to the C. elegans GalNAc-transferase GLY-4 (1). CG3254, GalNAc-T2, and GLY-4 form a separate phylogenetic cluster with maximal (100%) bootstrap support.

Two phylogenetic clades were selected for further studies in order to test the hypothesis that orthologous relationships of subfamilies were consistent with conserved unique functions. The lethal l(2)35Aa gene was previously suggested to encode a GalNAc-transferase and was chosen because of its importance in fly development. The l(2)35Aa gene was predicted to be orthologous with a single, novel, putative human GalNAc-transferase gene, here designated GalNAc-T11. The CG6394 gene was selected for analysis because the predicted orthologous human gene, GalNAc-T7, has been extensively characterized to encode a GalNAc-transferase with unique GalNAc glycopeptide substrate specificity (9, 11).

GalNAc-transferase Subfamily dGalNAc-T1(l(2)35Aa)/Human GalNAc-T11: Isoforms with Unique Polypeptide Substrate Specificity-- The l(2)35Aa gene is organized in one exon (30), and cloned cDNA as well as genomic DNA was found to contain an open reading frame of 1896 base pairs encoding a protein of 632 amino acids with three potential N-linked glycosylation sites (Fig. 1). TMpred predicted a type II domain structure with an N-terminal cytoplasmic domain of 10 amino acids, a transmembrane segment of 19 amino acids, and a stem region and catalytic domain of 603 residues.

Human GALNT11 was identified during positional cloning of the human DNA repair gene XRCC2 on chromosome 7q36.1 and was subsequently identified by the human genome project as a predicted gene with sequence similarity to GalNAc-transferases. In the study presented here, a partial human cDNA sequence was obtained from the gastric carcinoma cell line MKN45. The sequence was confirmed by sequencing of P1 clones covering the entire coding sequence. Sequencing of a P1 clone revealed that the coding region of the GALNT11 gene was contained in 10 exons, and the intron/exon boundaries identified were identical to the boundaries predicted from sequencing of a human genomic PAC clone RP5-98107 (GenBankTM accession number AC006017). GALNT11 is localized at 7q36.1 between D7S2450 and D7S550 (148.6-180.8 centimorgans) and represented in EST cluster Hs.97056. Comparison of positions of the nine intron/exon boundaries in the coding region of GALNT11 with those of other human GalNAc-transferases showed conservation of the positioning of four boundaries among GALNT1, -T2, -T3, and -T6, and two additional boundaries were conserved only in GALNT3 and -T6 (10, 39). The coding region of human GalNAc-T11 predicts a type II transmembrane protein of 608 amino acids with two potential N-linked glycosylation sites (GenBankTM accession number Y12434). The putative transmembrane hydrophobic signal sequence is ~20 residues flanked by only one charged residue at the N-terminal border. The C-terminal border of the hydrophobic signal sequence is flanked by a putative N-glycosylation site (residue 29), and another is found in the putative catalytic domain (residue 428).

The murine Galnt11 (GenBank accession number Y12435) encodes a protein of identical size to the human. The two N-glycosylation sites are conserved, and an additional putative N-glycosylation site is found close to the conserved site in the catalytic domain (residue 423). The N-glycosylation site at residue 428 is also found in GalNAc-T5. The amino acid sequence identity with human GalNAc-T11 is 87%, which is in agreement with other rodent-human orthologs. Murine Galnt11 is localized to mouse chromosome 5 in a region syntenic with human 7q36.

Comparison of sequences of human and murine GalNAc-T11 with dGalNAc-T1 reveals a few common characteristics. dGalNAc-T1 encodes a protein of 632 amino acids, which is 24 amino acids longer than the mammalian sequences, and most of the difference is due to an extended C-terminal end. dGalNAc-T1 has three N-linked glycosylation sites, but these are not conserved with the mammalian proteins. The sequence of both GalNAc-T11 and dGalNAc-T1 differ significantly from other reported mammalian GalNAc-transferases in that they contain an insert of ~18-20 residues in the N-terminal region immediately preceding the ricin-like lectin domain found in the C-terminal ends of most GalNAc-transferases. The lectin domain of ~130 amino acids was shown to be functional on at least one isoform, GalNAc-T4, where it directs the GalNAc glycopeptide substrate specificity (33). The GalNAc-transferase lectin domain is characterized by six cysteine residues with three -CL(D/E)- motifs and three -QXW- motifs. The first -CLD- motif of GalNAc-T4 was previously shown to be important for the lectin-mediated glycopeptide specificity by substitution of -CLD459- to -CLH459-, since the substitution destroyed the lectin-mediated function without affecting the general catalytic function of GalNAc-T4 (33). The three -CLD- motifs in the GalNAc-T11 lectin domain have the sequences -CLV495-, -CLD538-, and -CLR580-, where the first motif -CLV495- corresponds to -CLD459- of GalNAc-T4. dGalNAc-T1 has very similar sequence motifs, -CAA495-, -CLE541-, and -CLR582-, to GalNAc-T11 in the lectin domain. The two first QXW motifs are conserved in both genes, whereas only mammalian GalNAc-T11 has the last QXW sequon. The patterns of CLD and QXW motifs are different from other GalNAc-transferase isoforms. This may suggest that the lectin domains of GalNAc-T11/dGalNAc-T1 have functions different from that previously established for GalNAc-T4 (33).

Expression of secreted constructs of dGalNAc-T1 and human/murine GalNAc-T11 in Sf9 cells resulted in GalNAc-transferase activity in the culture medium of infected cells that was significantly above background values for uninfected cells or cells infected with irrelevant constructs (not shown). dGalNAc-T1 and human GalNAc-T11 were purified to near homogeneity and analyzed in further detail. The purified enzymes exhibited strict donor substrate specificity for UDP-GalNAc and did not utilize UDP-galactose, UDP-N-acetylglucosamine, UDP-glucose, or UDP-xylose. Purified dGalNAc-T1 exhibited an apparent Km of 8.5 ± 1.2 µM for UDP-GalNAc (EA2 acceptor peptide at 500 µM) and an apparent Km value of 0.35 ± 0.13 mM for EA2 (UDP-GalNAc at 100 µM). These values are comparable but lower than those detected for human GalNAc-T11 (Km of 25.3 ± 1.6 µM for UDP-GalNAc and 0.67 ± 0.11 mM for the EA2 peptide substrate). dGalNAc-T1 and human GalNAc-T11 exhibited no activity with GalNAc glycopeptides (GalNAc4TAP25 and GalNAc1-2EA2) (not shown).

Comparative analyses of activities of purified dGalNAc-T1 and human GalNAc-T11 are presented in Table II. The data in Table II were produced with partially purified enzyme preparations where it was not possible to clearly quantify the specific enzyme protein concentrations, and assays were normalized by activity with the EA2 substrate at 200 µM. Subsequent preparations of enzymes were quantified by SDS-PAGE Coomassie staining with protein standards, and specific activities for two substrates, EA2 and Muc4.1, were analyzed using the same assay conditions as in Table II (200 µM peptide). dGalNAc-T1 and human GalNAc-T11 showed nearly identical relative activity profiles with the panel of peptide substrates tested. dGalNAc-T1 exhibited a specific activity of 1.9 units/mg with EA2 and 0.27 units/mg with Muc4.1, whereas GalNAc-T11 showed 0.88 with EA2 and 0.48 units/mg with Muc4.1. The differences in specific activities with EA2 are minor and correlate with the differences in apparent Km for this peptide. All of the acceptor substrates identified for the two enzymes contain multiple potential acceptor sites and are derived from mucin tandem repeats. Since peptide substrates derived from mucin tandem repeat sequences often function as substrates for multiple GalNAc-transferase isoforms, it is in many cases important to identify actual acceptor sites utilized by individual isoforms to be able to identify differences in substrate specificity. One substrate that has been very successful in characterizing distinct functions of GalNAc-transferase isoforms has been the MUC1 tandem repeat sequence. Several GalNAc-transferases, albeit with different kinetics, utilize three of five potential sites (HGVT*SAPDTRPAPGS*T*APPA; asterisks indicate GalNAc attachment sites) (32), whereas GalNAc-T4 is the only isoform identified that can utilize the two remaining acceptor sites (7). Importantly, extensive analyses with different designs and length of peptide substrates up to 200-mer peptides have clearly established that the site specificity is primarily directed by primary peptide sequence.

                              
View this table:
[in this window]
[in a new window]
 
Table II
Acceptor substrate specificities of human GalNAc-T11 and Drosophila dGalNAc-T1

Detailed comparative analysis of the activities of human GalNAc-T11 and dGalNAc-T1 with MUC1 tandem repeat peptides revealed a preference for the short peptide designs Muc1a over Muc1b, which is similar to the relative activities observed with GalNAc-T1 and -T3/6 but different from GalNAc-T2 (32). Structural analysis of terminal products formed with 25- and 60-mer MUC1 peptides revealed that both GalNAc-T11 and dGalNAc-T1 only incorporated 2 mol of GalNAc residues/mol of tandem repeat sequence (Fig. 4, Table I), and the sites utilized were identified as Thr in -VTSA- and -GSTA- (HGVT*SAPDTRPAPGST*APPA) (Table II). GalNAc-T11 and dGalNAc-T1 are the only GalNAc-transferase isoforms characterized to date that exhibit this pattern of acceptor site specificity with MUC1 peptides, and this may therefore be regarded as a common distinguishing feature.


View larger version (26K):
[in this window]
[in a new window]
 
Fig. 4.   MALDI-TOF analysis of terminal glycosylation reactions of MUC1 tandem repeat derived 25-mer peptide substrate (TAP25) with human GalNAc-T11 and dGalNAc-T1. A, unglycosylated TAP 25 peptide; B, TAP25 glycosylated by GalNAc-T11; C, TAP25 glycosylated by GalNAc-T11 and then GalNAc-T2; D, TAP25 glycosylated by dGalNAc-T1; E, TAP25 glycosylated by dGalNAc-T1 and then GalNAc-T2; F, TAP25 glycosylated by dGalNAc-T1 followed by GalNAc-T11. Asterisks indicate sites of quantitative transfer of GalNAc residues, determined as described under "Experimental Procedures." GalNAc-T11 and dGalNAc-T1 attach only 2 mol of GalNAc even in combined reactions. The products formed with both GalNAc-T11 and dGalNAc-T1 can be further glycosylated by GalNAc-T2, previously shown to utilize four sites in this peptide design (32).

Another striking example of unique substrate specificity was the finding that human (and murine) GalNAc-T11 and dGalNAc-T1 only incorporated one GalNAc residue in the 21-mer peptide derived from the MUC4 tandem repeat sequence, which contains a total of seven potential O-glycosylation sites (CPLPVTDTSSASTGHAT*PLPV) (Fig. 5). Human GalNAc-T1 incorporated three residues into this substrate, whereas other GalNAc-transferase isoforms attached multiple residues. The sites of incorporation with other isoforms were not determined. Using a series of overlapping smaller peptides, the acceptor site for both GalNAc-T11 and dGalNAc-T1 was narrowed to the C-terminal region by the peptide design Muc4.3. Structural analysis revealed that for both peptide designs, the acceptor site catalyzed by both enzymes was the Thr in -ATP- (TGHAT*PLPV).


View larger version (32K):
[in this window]
[in a new window]
 
Fig. 5.   MALDI-TOF analysis of terminal glycosylation reactions of MUC4 tandem repeat derived 21-mer peptide substrate (Muc4.1) with human GalNAc-T11 and dGalNAc-T1. A, unglycosylated Muc4.1 peptide; B, Muc4.1 glycosylated by GalNAc-T11; C, Muc4.1 glycosylated by dGalNAc-T1; D, Muc4.1 glycosylated by GalNAc-T1. Asterisks in parenthesis indicate sites of partial transfer of GalNAc residues, determined as described under "Experimental Procedures."

Because human GALNT11 was found to be selectively expressed in kidney, we analyzed a peptide substrate derived from erythropoietin, which is produced in kidney and contains a single O-glycosylation site (Ser126). Peptide substrates containing this site have been found to represent quite inefficient in vitro substrates for GalNAc-transferases (32, 40). It therefore remains possible that a unique GalNAc-transferase with particularly good kinetic properties for the erythropoietin substrate exists. As shown in Table II, GalNAc-T11 and dGalNAc-T1 have no detectable activity with the Ser containing erythropoietin peptide substrate and exhibit specificity for the Thr substituted sequence similar to other GalNAc-transferases characterized to date. Thus, despite the remarkable selective expression of GalNAc-T11 in kidney, this isoform does not appear to be involved in glycosylation of erythropoietin. Similar to other GalNAc-transferases (32), GalNAc-T11 and dGalNAc-T1 can transfer GalNAc to adjacent Thr residues. Characterization of the two sites utilized in the EA2 peptide derived from rat submaxillary gland mucin (Table II) showed that GalNAc was incorporated into adjacent Thr residues (PTTDST*T*PAPTTK).

The findings that GalNAc-T11 and dGalNAc-T1 exhibited activities identical with the panel of peptides evaluated here and that they produced the same unique glycoform products with three substrates (MUC1, MUC4.1, and EA2) clearly indicate that they overall possess identical catalytic properties. Importantly, the panel of peptides applied in this study is more exhaustive than what has previously been used to distinguish catalytic properties of other human GalNAc-transferase isoforms (10). The analysis does not provide specific information that can be used to predict e.g. a shared acceptor sequence motif, and we only conclude that a unique and distinguishable property of the catalytic functions has been conserved.

GalNAc-transferase Subfamily dGalNAc-T2 (CG6394)/Human GalNAc-T7: Isoforms with Unique GalNAc Glycopeptide Substrate Specificity-- The CG6394 gene was cloned based on embryonal Drosophila cDNA and found to contain an open reading frame of 1773 base pairs encoding a protein of 591 amino acids with two potential N-linked glycosylation sites (Fig. 1). TMpred predicted a type II domain structure with an N-terminal cytoplasmic domain of 10 amino acids, a transmembrane segment of 19 amino acids, and a stem region and catalytic domain of 562 residues. The lectin domains of human GalNAc-T7 and dGalNAc-T2 share the same CLD (first and third) and QXW (third) motifs, which are nearly identical to the patterns found in GalNAc-T4 as well as in rat GalNAc-T10 (GalNAc-T10 also has the second QXW motif). Both GalNAc-T4 and GalNAc-T10 exhibit GalNAc glycopeptide substrate specificity; however, several other isoforms with polypeptide substrate specificity also share this pattern of motifs in the lectin domain (GalNAc-T1 and -T2).

Recombinant dGalNAc-T2 exhibited no activity with the panel of peptides listed in Table II, including the peptide EA2; however, significant enzymatic activity was found with the GalNAc-EA2 glycopeptide (Table III). So far, three mammalian GalNAc-transferase isoforms, GalNAc-T4 (7), -T7 (9, 11), and -T10 (12), have been shown to selectively or exclusively function with GalNAc glycopeptides, and the available data suggest that the substrate specificities of the three isoforms are different, similar to what is found for different isoforms functioning with polypeptide substrates. Thus, the activities of human GalNAc-T4 and -T7 were shown to be distinguishable using a panel of GalNAc glycopeptides derived from the tandem repeats of human MUC1, MUC2, MUC5AC, MUC7, and the rat submandibular mucin (EA2 peptide) (Table III) (11). GalNAc-T4 activity is characteristic in utilizing GalNAc-MUC1 and GalNAc-MUC5AC but not GalNAc-EA2. In contrast, GalNAc-T7 selectively utilized GalNAc-EA2 and showed no activity with the two former substrates used by GalNAc-T4. Both isoforms were active with GalNAc-MUC2. The GalNAc peptide activities of rat GalNAc-T7 and -T10 are less well characterized, but they were also shown to be distinguishable using a single glycopeptide derived from the tandem repeat of human MUC5AC (12). Nevertheless, as shown in Table III, dGalNAc-T2 exhibited the exact same pattern of activity with the panel of GalNAc glycopeptides previously used to define specificity of human GalNAc-T7. dGalNAc-T2 and human GalNAc-T7 are thus clearly distinguishable from GalNAc-T4. Rat GalNAc-T7 and -T10 have activity with a different peptide design of the tandem repeat of MUC5AC (GTTPSPVPTTSTTSAP); however, neither dGalNAc-T2 nor human GalNAc-T7 shows activity with the GalNAc-MUC5AC peptide design used in this study (Table III). The results thus clearly indicate that dGalNAc-T2 and human GalNAc-T7 overall possess identical catalytic properties that with present knowledge are distinguishable from other isoforms.

                              
View this table:
[in this window]
[in a new window]
 
Table III
Acceptor substrate specificities of Drosophila dGalNAc-T2 and human GalNAc-T4 and -T7

The finding that dGalNAc-T2 as predicted exhibited GalNAc glycopeptide substrate specificity indicates that the lectin domain-mediated glycopeptide specificity of some GalNAc-transferase isoforms developed early in evolution. Recently, a GalNAc-transferase homologous gene in the parasite Toxoplasma gondii was also determined to encode a GalNAc-transferase with GalNAc glycopeptide substrate specificity (41).

Lethal l(2)35Aa Alleles Have Inactivating Mutations in the dGalNAc-T1 Coding Region-- Previously, Flores and Engels (25) searched for the candidate gene of the lethal phenotype associated with l(2)35Aa and demonstrated that a 5-kb rescue fragment contained an open reading frame encoding a protein with similarity to polypeptide GalNAc-transferases. Analysis of the 5-kb rescue fragment reveals only one conventional open reading frame that predicts a protein with similarity to known proteins and from which ESTs have been mapped (Fig. 6). A group of 5' ESTs from a single library (pOTB7 AT library from Drosophila adult male testes and seminal vesicles) was recently described, which could originate from a potential open reading frame on the minus strand that partly overlaps and is complementary to dGalNAc-T1 on the plus strand. However, this unannotated open reading frame has the unusual feature of ~1000 bp in frame before the first ATG, and the group of 5' ESTs, which potentially could originate from the dGalNAc-T1 complementary strand, may be inverted and represent 3' ESTs from the 3'-untranslated region of dGalNAc-T1. In order to further substantiate that it is the dGalNAc-T1 open reading frame that directs the l(2)35Aa phenotype, three lethal alleles were sequenced (Fig. 6). Two of these contained single nucleotide nonsense mutations (l(2)35AaHG8 and l(2)35AaSF12), introducing a stop at codons 89 and 195 of the dGalNAc-T1 open reading frame, respectively, which excludes the majority of the catalytic domains and hence are predicted to inactivate the function of the enzyme. If assigned to the hypothetical minus strand open reading frame, the C265 right-arrow T mutation of l(2)35AaHG8 is silent, and the T593 right-arrow A transversion in the l(2)35AaSF12 allele may cause an amino acid change from Gln143 to Leu. The third allele (l(2)35AaSF32) contains a single transition (C679 right-arrow T), which again is silent in the potential reading frame on the minus strand but leads to an amino acid change in a highly conserved sequence of the GT1 motif of the catalytic domain of dGalNAc-T1 (IRSR227WVIG) on the plus strand (Fig. 6). The positively charged residue Arg at position 227 is conserved in all putative fly GalNAc-transferases (Fig. 2). In human GalNAc-transferase genes, Arg and His are found at this position, but the subfamily pair GalNAc-T3 and -T6 has an uncharged residue (Val) at this position. Although it is likely that the nonconservative mutation to Trp in l(2)35AaSF32 is detrimental for the activity, this needs to be formally confirmed by experimental analysis.


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 6.   Molecular analysis of the l(2)35Aa locus. The genomic region of chromosome 2 around l(2)35Aa is shown. Coding and noncoding exon sequences are indicated as filled boxes and open boxes, respectively. Nucleotide sequences of the mutant alleles are shown with the amino acid sequence in single-letter codes above (plus strand) or below (minus strand) the sequence. The hypothetical gene on the minus strand is depicted in dotted form. Nucleotide positions are indicated relative to A1TG of the respective open reading frame.

A great number of sequence polymorphisms were detected during sequencing of l(2)35Aa alleles (Table IV). Of significant interest is that the majority (14 of 15) of the single nucleotide polymorphisms identified are synonymous in the dGalNAc-T1 reading frame, whereas the same nucleotide polymorphisms cause amino acid changes in the hypothetical reading frame on the minus strand. Seven of the nucleotide polymorphisms identified are synonymous codon changes in dGalNAc-T1, where the encoded amino acid has only two alternative codons. Of specific interest is T1011 right-arrow C, which causes a nucleotide polymorphism in the translational start codon of the hypothetical minus strand open reading frame (i.e. A1TG right-arrow G1TG). In contrast, T1011 right-arrow C is a conservative codon change for His337 in the dGalNAc-T1 (plus strand) reading frame (Table IV). These findings further substantiate that the hypothetical complementary reading frame on the minus strand does not represent a functional gene.

                              
View this table:
[in this window]
[in a new window]
 
Table IV
Nucleotide polymorphisms in D. melanogaster l(2)35Aa

Analysis of Expression Pattern of dGalNAc-T1 (l(2)35Aa)-- ESTs derived from l(2)35Aa were identified in adult testis embryo and Schneider L2 cell cDNA libraries. Northern blot analyses of D. melanogaster mRNA from adult flies revealed a single transcript of l(2)35Aa of ~2.2 kilobases. Higher expression levels of l(2)35Aa were detected in female adult flies as compared with males (data not shown). Detailed Northern analyses revealed that female flies expressed high l(2)35Aa mRNA levels in the ovary. Lower transcript levels were detected in testes and in the male and female carcass (Fig. 7A). Northern analysis of mRNA obtained from unfertilized eggs and embryos revealed a strong maternal contribution to zygotic transcription of l(2)35Aa. A comparison of mRNA levels detected in unfertilized eggs and expression levels in early embryos revealed only a marginal increase in l(2)35Aa expression after fertilization (Fig. 7B).


View larger version (55K):
[in this window]
[in a new window]
 
Fig. 7.   Expression analysis of l(2)35Aa. A, Northern blot analysis of poly(A)+ RNA from different female and male adult D. melanogaster tissues. B, maternal contribution to l(2)35Aa expression in early embryos. Blots were probed with 32P-labeled antisense in vitro transcripts of l(2)35Aa (base pairs 91-1896) corresponding to the soluble dGalNAc-T1 expression construct. Expression of the ribosomal rpL9 gene is indicated as a control (48).

Expression Pattern of l(2)35Aa during Oogenesis-- Analysis of l(2)35Aa mRNA expression during oogenesis was performed by in situ hybridization to whole mount wild-type adult ovarian tissue. Expression of l(2)35Aa was detected in germ cells and follicle epithelia of all developmental stages (Fig. 8, A and B). l(2)35Aa expression initiates during early stages of oogenesis in region I and reaches high levels in regions IIa and IIb of the germarium (Fig. 8B). High expression was detected in stage 2 egg chambers, and transcript levels remained high during later stages of oogenesis (s2 to s8 in Fig. 8A).


View larger version (79K):
[in this window]
[in a new window]
 
Fig. 8.   Whole mount in situ hybridization of dGalNAc-T1 cDNA l(2)35Aa to wild-type ovarian tissues. A, ubiquitous expression of l(2)35Aa along the anterior-posterior axis of different stage egg chambers. High expression levels of l(2)35Aa are detected by a digoxygenin-labeled antisense probe in various cell types of the germarium (G), and early (s2) to late stage follicles (s6, s8) throughout the vitellarium (V). B, detail showing l(2)35Aa expression in cytologically distinct regions I, IIa, IIb, and III of the germarium. Onset of expression is detected in early germarial region I.

Tissue Distribution of Human and Murine GalNAc-T11-- The remarkable phenotype associated with inactivation of dGalNAc-T1 in the fruit fly may be related to a unique function of the protein or the possibility that the protein is required in a specific cell type where other GalNAc-transferases otherwise capable of the same functions are not expressed. Studies of in vivo functions of GalNAc-transferase isoforms are missing due to the inherent difficulty of assessing the finite number of isoforms and specificity in a given cell, and it is presently not possible to address the first option. Thus, the expression pattern of human and murine GalNAc-T11 was evaluated. The origin of ESTs in data bases can give an indication of expression pattern of genes and indicates that GalNAc-T11 is widely expressed in diverse tissues of both humans and mice. ESTs of the human Hs.97056 cluster were derived from a large variety of tissue sources including brain, germ cell, kidney, ovary, pancreas, placenta, prostate, stomach, uterus, whole embryo, breast, lung, nervous tissue, ovary, prostate, skin, stomach, and uterus. ESTs of the mouse Mm.19390 cluster were derived from a more restricted set of organs including lymph node, kidney, liver, muscle, lung, mammary glands, skin, whole fetus (19.5 days postcoitum), and whole embryo (13.5/14.5 days postcoitum). Northern blots with mRNA from 12 human adult organs showed that GalNAc-T11 hybridized to a single mRNA of ~3 kb predominantly in kidney (Fig. 9A). Lower levels of mRNA were observed in brain, heart, and skeletal muscle, whereas other organs including placenta yielded very weak signals. A survey of mRNA from 76 different human tissues indicated that GalNAc-T11 was very weakly expressed in most organs including testis and ovary, but there was high expression only in the kidney (data not shown). Northern analysis of adult murine organs revealed strong expression in kidney, in agreement with the analysis of human GalNAc-T11 (Fig. 9B). Very weak expression was found in most other organs. Thus, some differences between humans and mice were observed in brain and heart, where higher levels were found in humans. Further analysis of mRNA expression among a panel of human cell lines showed high expression in a number of cell lines of breast, pancreatic, and colonic origin (data not shown).


View larger version (39K):
[in this window]
[in a new window]
 
Fig. 9.   Northern blot analysis of human and murine tissues. A, multiple human Northern blot (human 12-lane blot; CLONTECH) probed with human GalNAc-T11; B, Northern blot analysis of total RNA isolated from murine (C57 black) organs probed with murine GalNAc-T11; C, ethidium bromide gel stain of murine RNA gel.

Because of the unique expression pattern of GalNAc-T11, we chose to generate a monoclonal antibody to the human enzyme. One mAb, UH8 (1B2), secreting IgG1, specifically reacted with GalNAc-T11 and showed no cross-reaction with any other GalNAc-transferase isoform tested (human GalNAc-T1, -T2, -T3, -T4, and -T6), which is a general finding we have observed with mAbs to polypeptide GalNAc-transferase isoforms (7, 10, 37). To date, we have made isoform-specific monoclonal antibodies to six of the human GalNAc-transferases. Interestingly, recent studies with domain swapping among different isoforms revealed that most of the antibodies react with the lectin domains.7

A preliminary immunohistological analysis of a limited number of human tissues corroborated the Northern analysis and revealed high expression only in kidney. The staining pattern of six human GalNAc-transferase isoforms in the kidney cortex is shown in Fig. 10. GalNAc-T1 was expressed in glomeruli and weakly in tubules (Fig. 10A). GalNAc-T2 in contrast was strongly expressed both in tubules and glomeruli (Fig. 10B). GalNAc-T3 labeled weakly a few tubules (Fig. 10C), whereas GalNAc-T4 and -T6 show no staining (Fig. 10, D and E). GalNAc-T11 was strongly expressed in tubules similarly to GalNAc-T2; however, no labeling was observed in glomeruli (Fig. 10F). The morphology of the frozen kidney sections does not allow detailed evaluation of subcellular localization, but the intracellular granular staining pattern observed with all antibodies is consistent with Golgi localization. Very weak expression of GalNAc-T11 was found in all cell layers of skin and in the lower cell layers in buccal mucosa (not shown). Previously, we found marked changes in the GalNAc-transferase repertoire (GalNAc-T1, -T2, and -T3) in different cell layers of the stratified squamous epithelium of buccal mucosa (37). No staining with anti-GalNAc-T11 was observed in colon, small intestine (not shown), and different salivary glands (Fig. 11, B and C).


View larger version (101K):
[in this wind