 |
INTRODUCTION |
The family of
UDP-N-acetyl-
-D-galactosamine:polypeptide
N-acetylgalactosaminyltransferases
(GalNAc-transferases1; EC
2.4.1.41), which transfer GalNAc to serine and threonine acceptor sites
and initiate mucin-type O-glycosylation, is old in
evolutionary terms. Hagen and Nehrke (1) originally identified and
presented a preliminary characterization of the GalNAc-transferase gene
family in Caenorhabditis elegans, consisting of seven
distinct genes and additional isoforms derived from alternative
splicing. After completion of the C. elegans genome, two
additional homologous genes were identified, which comprised a gene
family of nine distinct putative polypeptide GalNAc-transferases (2).
So far, three distinct members of the C. elegans gene
family have been shown to encode functional enzymes using one peptide
substrate (1). Preliminary phylogenetic analysis of the available
mammalian GalNAc-transferases (8 genes) and C. elegans
homologs (9 genes) has suggested evolutionary conservation of
individual subfamilies of putative orthologous genes (2). However, a
comparative analysis of the kinetic properties of C. elegans
and mammalian enzymes has not been presented.
The homologous mammalian GalNAc-transferase family is predicted to
be composed of up to 16-18 distinct genes (2). To date, functions of
eight distinct members of the human or rodent GalNAc-transferase family
have been characterized
(3-12).2 It is
predicted that most of the mammalian GalNAc-transferase isoforms have
different functions. Analysis of mammalian polypeptide GalNAc-transferases has demonstrated that different isoforms have different kinetic properties, and unique substrate specificities have
been identified for many isoenzymes (2, 13). Some isoforms can be
classified into subfamilies of closely homologous GalNAc-transferases, and as shown for the human subfamily composed of GalNAc-T3 and -T6,
such isoforms share unique kinetic properties (10). Our understanding
of the enzymatic functions of individual GalNAc-transferase isoforms is
mainly derived from in vitro studies. The limited number of
in vivo studies have confirmed the in vitro
results (14-16). In addition to different enzymatic properties, each
isoform has different cell and tissue expression patterns. Despite
these findings, it is still not possible to define the in
vivo catalytic functions of individual GalNAc-transferase isoforms
with respect to characteristic acceptor sequence motifs, range of
acceptor proteins, and in vivo biological function. Thus,
definition of specific in vivo functions for unique
GalNAc-transferase isoforms remains to be defined. It is presently
unclear why such a large enzyme family evolved early and to what degree
the diversity and number of genes reflects redundancy rather than
functional requirements.
The existence of large and similarly numerous GalNAc-transferase gene
families in nematodes, insects, and mammals suggests that there is a
profound functional requirement for multiple enzymatic isoforms and
implies that important biological functions are associated with
individual isoforms. This hypothesis is supported by studies of
mannosyl-O-glycosylation in yeast, where multiple
polypeptide O-mannosyltransferases were found to be
essential for growth and survival of yeast cells (17). The first gene
ablation of a putative GalNAc-transferase in mice did not lead to
significant phenotypic changes, although it is still unproven that a
functional GalNAc-transferase gene was targeted (18). A recent
preliminary report on targeted inactivation of the murine GalNAc-T1
gene found evidence of minor changes in O-glycosylation
(19). One explanation for the finding that there are only minor
phenotypes associated with inactivation of single polypeptide
GalNAc-transferases may be deduced from phylogenetic analyses of the
GalNAc-transferase gene family, which indicates the existence of
subfamilies of closely related genes that are hypothesized to provide
functional redundancy (2, 10). It remains to be determined whether
individual GalNAc-transferase isoforms have unique functions in
vivo that are conserved through evolution, as suggested by
sequence analysis indicating conservation of subfamilies.
Recent genetic studies of the fruit fly Drosophila
melanogaster have extended our understanding of glycan synthesis
and functions in developmental biology. The signaling molecule
Fringe is a glycosyltransferase that determines
dorso-ventral cell interactions during wing formation through
modulation of Notch receptor activity (20-22). A
developmental disorder resulting in rotated abdomen of adult flies is
assigned to a defect in the putative polypeptide
O-mannosyltransferase encoded by the rt gene (23,
24). Flores and Engels (25) have reported that a gene homologous to
mammalian polypeptide GalNAc-transferases, the l(2)35Aa
gene, is likely to be essential for development of D. melanogaster. Transgenic flies expressing an ectopic 5-kb fragment
containing the l(2)35Aa gene were found to be viable and
fertile (25). The 5-kb rescue fragment contains essentially only one
predicted intact open reading frame, and this encodes a protein with
homology to GalNAc-transferases. In this report, sequence analysis of
three lethal l(2)35Aa alleles showed alterations in the
putative GalNAc-transferase coding region of l(2)35Aa.
Combined with data showing that the l(2)35Aa gene in fact
encodes a functional GalNAc-transferase, this represents the first
example of a polypeptide GalNAc-transferase gene conclusively shown to
play an essential role in normal development of an organism.
Why is a single GalNAc-transferase isoform essential to development in
an organism having 13-15 putative GalNAc-transferase genes? To address
this question, we have classified the phylogenic traits of
GalNAc-transferase families in the nematode, fly, and mammals. A survey
of the completed genome of Drosophila led to identification
of 13-15 putative GalNAc-transferase genes. Phylogenetic analysis of
the putative catalytic domains of the homologous mammalian, C. elegans, and Drosophila genes indicated a common
horizontal pattern of evolution of GalNAc-transferases with a variable
degree of species specific gene duplications and divergence. The
prediction of evolutionary conservation of subfamilies of
GalNAc-transferase genes was confirmed by functional analysis of
members of two distinct subfamilies found in Drosophila and
humans. In both cases, the subfamilies were shown to encode enzymes
with unique and identical functions. These data provide strong evidence
for an evolutionary conservation of distinct kinetic properties and
substrate specificities within GalNAc-transferase subfamilies. The
dGalNAc-T1 fly gene l(2)35Aa is the first
GalNAc-transferase gene shown to have a required function in
vivo. The orthologous human GalNAc-T11 is predicted to have
important in vivo functions. Human GalNAc-T11 displayed a
unique expression pattern that was largely limited to kidney.
 |
EXPERIMENTAL PROCEDURES |
Genome Survey of the Drosophila Polypeptide GalNAc-transferase
Family--
tBLASTn analysis with human GalNAc-transferases was used
to search the entire D. melanogaster genome released by the
Berkeley Drosophila Genome Project (26). The genome
annotation data base GadFly3
based on the published D. melanogaster genome sequence was
used to identify genes annotated to the sequences, and cDNA
sequences of identified genes were obtained from GenBankTM
and the Drosophila Gene collection (DGC Release 1 and 2 (27)). cDNA sequence information of putative GalNAc-transferase
genes not represented in DGC and GenBankTM was obtained by
identification of EST clots and single ESTs in the Berkeley
Drosophila Genome Project and NCBI EST data bases. cDNA
clones of ESTs with the longest inserts were obtained from Research
Genetics Inc. and sequenced.
Sequence Alignment and Phylogenetic Analysis--
Amino acid
sequences representing the central evolutionarily conserved domain of
polypeptide:N-acetylgalactosaminyltransferases from C. elegans (1) and Homo sapiens (28) were aligned with the
orthologous D. melanogaster proteins using ClustalX 1.8 with Gonnet 250 protein weight matrix and default gap penalties (29). Multiple sequence alignments were revised for maximal residue conservation and minimal gaps in conserved amino acid sequence motifs.
Distance analyses of the amino acid alignments were performed using the
PHYLIP software package with PAM250 substitution matrix of PROTDIST,
and phylograms were generated using neighbor joining and least square
algorithms with 1000 bootstrap replicates (51).
Cloning and Characterization of l(2)35Aa (dGalNAc-T1) and CG6394
(dGalNAc-T2) cDNA--
The l(2)35Aa gene was originally
identified and sequenced in a detailed analysis of the genomic
Adh region of D. melanogaster chromosome 2 (30). EST clones LD24449 and SD08641 representing l(2)35Aa
cDNA (5'-EST GenBankTM accession numbers AA941044 and AI542350)
were obtained from Research Genetics. An open reading frame of 1896 base pairs as well as the sequence of the 5'- and 3'-untranslated
regions of l(2)35Aa mRNA was represented in both clones
and fully sequenced. The coding region of l(2)35Aa was previously found to be organized in one exon (30). Confirmatory sequencing was performed on a genomic clone obtained by PCR (35 cycles
at 95 °C for 10 s, 55 °C for 15 s, and 68 °C for 2 min 30 s) on D. melanogaster (Canton S) genomic
DNA (CLONTECH) with the sense primer TSHC1
(5'-AGCGGATCCATGATGCAAATCAAGC-3') and the antisense primer TSHC3
(5'-AGCGGATCCTCGCTCACAGCGCCTTGTCCG-3'). The gene CG6394 was
identified and sequenced in the D. melanogaster genome
sequence project. Embryonal D. melanogaster cDNA was
generated using the SMART RACE cDNA Amplification Kit with
embryonal poly(A)+ RNA (CLONTECH)
according to the manufacturer's instructions. cDNA for
dGalNAc-T2 was obtained by PCR (25 cycles at 95 °C for 10 s; 54 °C for 15 s, and 72 °C for 2 min 30 s)
with primer pair TSHC254 (5'-CTGGCCACCTAGACACGTC-3') and TSHC263
(5'-GATCCATTCCTTGATCCTC-3'). An open reading frame of 1773 base
pairs as well as partial sequences of the 5'- and 3'-untranslated
regions of CG6394 mRNA were cloned into pCR4-TOPO
(Invitrogen) and fully sequenced.
Cloning and Characterization of Human and Murine
GALNT11--
The predicted mammalian orthologs of
dGalNAc-T2, human and rodent GalNAc-T7, were reported
previously (9, 11). A novel human gene designated GALNT11,
predicted to represent an ortholog of dGalNAc-T1, was
originally identified in the course of a positional cloning approach to
isolate the DNA repair gene XRCC2 (31). A cDNA
clone (cDNA25L) containing a partial open reading frame with
sequence similarity to human polypeptide GalNAc-transferases was
found. Based on the sequence of cDNA25L, primers EBHC600
(5'-AGCGGATCCAACACACAACTCTTTGCGGCTG-3') and EBHC602
(5'-AGCGGATCCGACAAAAATGGGTTGTTGGGG-3') were designed and used in
a PCR strategy to screen a gastric
gt11 cDNA library (MKN45)
using the rapid cDNA library screening procedure previously described (5). In brief, using a combination of
-vector primers and
EBHC600 or EBHC602, 20 aliquots derived from a MKN45 cDNA library
were screened by PCR. Three sublibraries (numbers 19, 7, and 13) were
found to contain clones with 2-, 1.5-, and 0.8-kbp inserts,
respectively. PCR products were blunt end-cloned and sequenced. The 3'
sequence contained an in-frame stop codon, and the 5' sequences
contained a potential open reading, including both a hydrophobic
transmembrane domain and an in-frame initiating methionine codon that
was downstream of a Kozak sequence. Using the human sequence as a query
for BLASTn analysis of the dEST data base, one single murine EST clone
(GenBankTM accession AA104499), predicted to represent an
orthologous isoform, was identified, and the full coding region was
sequenced. The murine open reading frame predicted a protein sequence
with 87% identity to the human protein, which is in agreement with other rodent-human orthologs. The genomic organization of human GALNT11 was determined from a P1 clone DMPC-HFF#1-1321(E8)
(P114808) isolated from a human foreskin genomic library (DuPont Merck
Pharmaceutical Co. Human Foreskin Fibroblast P1 Library) using
the primer pair EBHC616 (5'-GCGAATTCAGTGAAGTGACTCAGCCAC-3')/EBHC619
(5'-CACGACTGTCTATCACATCGTC-3') for screening. The entire sequence
of the open reading frame, compiled from the cDNA sequence, was
sequenced. During the course of this study, a human genomic PAC clone,
RP5-98107 (GenBankTM accession number AC006017), was
sequenced, and an open reading frame for a homologous putative
GalNAc-transferase was identified (GenBankTM accession
number AAD45821). This sequence is identical to the sequence reported
here as well as to a recently deposited cDNA sequence
(GenBankTM accession number AK025287) derived from colon as
part of a cDNA sequencing project.
Expression of dGalNAc-T1, dGalNAc-T2, and GalNAc-T11 in Insect
Cells--
An expression construct encoding amino acid residues
40-632 of dGalNAc-T1 was prepared by PCR using
D. melanogaster genomic DNA and the primer pair TSHC2
(5'-AGCGGATCCTCGCTCACAGCGCCTTGTCCG-3') and TSHC3 with BamHI
restriction sites. A dGalNAc-T2 expression construct
encoding amino acid residues 39-591 was prepared by PCR using
pCR4-CG6394 and the primer pair TSHC264
(5'-CAGCGGATCCAGGAGGCGTATCACACG-3') and TSHC265
(5'-GAGCGAATTCTTACCAGCGTGGGCGTATC-3') with BamHI and EcoRI restriction sites, respectively. Expression constructs
of the secreted forms (amino acids 32-608) of the putative human and
murine GALNT11 genes were prepared using the following
primer set for the human construct: EBHC629
(5'-GCGAATTCGTGAAGTGACTCAGCCACTTAAG-3') and EBHC631
(5'-GCGGAATTCCACCTTAACCTTCCAAATGC-3'); and the following for the murine
construct: EBHC711 (5'-GCGGAATTCGTGAAGTGACTCAGCCCCTTAGG-3') and EBHC710
(5'-GCGGAATTCAATTATGTTGTGTCCAGCCAGGG-3'). MKN45 mRNA was
used as template for human RT-PCR, whereas cDNA from the identified EST clone (GenBankTM accession number AA104499) was used as
template for the murine PCR. The products were cloned into the vector
pAcGP67 (Pharmingen) and fully sequenced.
Plasmids pAcGP67-dGalNAc-T1-sol,
pAcGP67-dGalNAc-T2-sol, and pAcGP67-GalNAc-T11-sol were
cotransfected with Baculo-GoldTM DNA (PharMingen), and recombinant
baculovirus was obtained after two successive amplifications in
Sf9 cells, as described previously (5).
Polypeptide GalNAc-transferase Assays--
Assays were performed
with media harvested 3 days after infection of Sf9 cells,
or when indicated with enzymes purified from infected High FiveTM cells
grown in serum-free media (Invitrogen). Purification was
performed essentially as described previously with consecutive ion
exchange chromatographies as described below (32). Quantification of
enzyme protein concentration was performed by comparative SDS-PAGE
Coomassie-stained gels using bovine serum albumin and transferrin as standards.
Enzyme activities were analyzed by two methods: (i) initial velocity
assays where consumption of substrates was limited to less than 10%
and (ii) a product development assay where the reactions were
exhaustive/terminal using two additions of enzyme and sugar nucleotides
in order to evaluate the final number of acceptor sites utilized by the
enzymes. Standard reaction mixture (50-µl final volume) contained 25 mM cacodylate (pH 7.4), 10 mM
MnCl2, 0.25% Triton X-100, 100-200 µM
UDP-[14C]GalNAc (2000 cpm/nmol) (Amersham Biosciences),
and 0.006-800 µM acceptor peptides. Products were
quantified by scintillation counting after chromatography on Dowex 1X8.
In assays designed to determine acceptor sites in peptides used by the
various isoforms, the reaction mixtures were modified to include 1.7 mM cold UDP-GalNAc (Sigma). Reactions were incubated for
12 h, followed by the addition of 50% more enzyme and UDP-GalNAc
and a 4-h incubation. The glycosylation as determined by moles of
GalNAc incorporated into peptide was monitored at 2-4-h intervals by
MALDI-TOF mass spectrometry as previously described (33). Peptides were
synthesized by ourselves and by Neosystems (Strasbourg), and structures
were ascertained by amino acid analysis and mass spectrometry. GalNAc
glycopeptides were prepared as previously described using human
GalNAc-T1 and -T2 (7, 11). The kinetic properties were determined with partially purified enzymes (secreted forms) expressed in High FiveTM
cells. Partial purification was performed by consecutive chromatography
on Amberlite IRA-95, DEAE-Sephacryl, and SP-Sepharose essentially as
described (32). The purification scheme does not involve affinity
chromatography, and a low level endogenous activity secreted from
insect cells can be found, depending on substrates.
Determination of Acceptor Sites Utilized by GalNAc-transferase
Isoforms--
MALDI mass spectrometry of glycosylated peptides was
performed on a Voyager-Elite MALDI time-of-flight mass spectrometer
(PerSeptive Biosystems Inc., Framingham, MA), equipped with delayed
extraction. The MALDI matrix was 2,5-dihydroxybenzoic acid 10 g/liter
(Aldrich) dissolved in a 2:1 mixture of 0.1% trifluoroacetic acid in
30% aqueous acetonitrile (Rathburn). Samples dissolved in 0.1%
trifluoroacetic acid to a concentration of ~2 pmol/ml were prepared
for analysis by placing 1 µl of sample solution on a probe tip
followed by 1 µl of matrix. All mass spectra were obtained in the
linear mode. Data processing was carried out using GRAM/386 software.
Acceptor sites utilized were determined in MUC1-, MUC4-, and
EA2-derived peptides as follows: Glycopeptide products of exhaustive
reactions were purified by high pressure liquid chromatography and
dissolved in a water/methanol/acetic acid mixture (49:49:2, v/v/v) to a concentration of ~10
5 M. An aliquot of 4 µl was loaded into a precoated nanoelectrospray needle (MDS Protana,
Odense, Denmark). A 4.7-Tesla Ultima (Ionspec, Irvine, CA) Fourier
transform ion cyclotron resonance mass spectrometer was used to perform
electron capture dissociation. External accumulation in the hexapole of
the ESI source (Analytica, Branford, MA) was followed by gated
trapping. Selection of desired charge states (MS/MS) was made by
applying a preprogrammed waveform (34). For the glycosylated 60-mer
peptide, three charge states (6, 7, and 8+) were selected. For the
glycosylated TAP25 peptide, the most intense charge state (between 3+
and 5+) was selected. Selected cations were irradiated by <0.2-eV
electron from a heated tungsten electron ionization filament for
9 s. Between 50 and 200 scans were accumulated.
Sequence Analysis of the l(2)35Aa Alleles--
The DNA sequence
of l(2)35Aa was determined for three mutant alleles and one
wild-type allele. Drosophila stocks with the EMS-induced
mutant alleles l(2)35Aa1 (AKA
l(2)35AaHG8), l(2)35Aa3 (AKA
l(2)35AaSF12), and l(2)35Aa4 (AKA
l(2)35AaSF32) (35) were crossed to a stock carrying
Df(2L)b84h1, a deletion that removes all of GalNAc-T1 and
the surrounding genes. Since this gene is essential for adult survival,
we used a translocation of the Tubby gene to follow the
genotypes at the larval stage. Progeny that carried the mutant allele
over the deletion were identified among the progeny during the third
larval instar and selected for sequencing. In the same way, a wild-type
allele of dGalNAc-T1 from another stock,
b1 Adhn4
l(2)35BfSF18, was isolated for sequencing. This
stock was chosen because there was a possibility that it still carried
the progenitor allele from which mutants l(2)35Aa3
were derived. The dGalNAc-T1 gene was amplified by PCR from
DNA extracts from each of these larvae and sequenced using BigDyeTM fluorescent dideoxyterminator reactions according to the
manufacturer's protocol (ABI, Applied Biosystems). The primers
used for PCR and sequencing are GalN1 (5'-ATTGGTTTTTGCTTGGGCATC-3'),
GalN2 (5'-GTGATGCTAACGTTGGGTTGG-3'), GalN3
(5'-ATACCATGGCATTCGCACAAA-3'), GalN4 (5'-CGAATGCCATGGTATGCAAAT-3'), GalN5 (5'-GCAGGCTGTGCGAGTAGAAAA-3'), GalN6
(5'-CAGGATGGAGGATCGACGAAG-3'), GalN7a (5'-CACCACACCCAACAGTTCCAG-3'),
GalN8 (5'-CAGGCCTCCATAGTCATGTGC-3'), GalN9
(5'-GCGTTCCAGCACGGTCTTAAT-3'), GalN10 (5'-GTCAATCAGCAGTGGCTGGAG-3'), GalN11 (5'-AGTGGACTGGGCGTGTACTCA-3'), GalN12
(5'-TCCCGTGTCGGTCACATATTC-3'), GalN13 (5'-TCGTTGCGGAATATGTGACC-3'),
GalN14 (5'-CATCTTTCAGCCTTGGCACTC-3'), GalN15
(5'-TCCAGAAACCCTTCACCTTGG-3'), GalN16 (5'-AACGCCAACAGTCCCGTCTAC-3'), GalN17 (5'-CGTTGTTGGACTTGGAGCACAGAT-3'), GalN19
(5'-ACCCACCATTGGCCATTAATCA-3'), and GalN21
(5'-CCACCAGATAGTCGGTATTGAAA-3').
Expression Pattern of l(2)35Aa--
Northern analysis was
performed with poly(A)+ mRNA isolated directly from
different hand-dissected imago tissues using oligo(dT)-linked Dynabeads
(Dynal). Five µg of mRNA were separated by electrophoresis in
formaldehyde gels, transferred to a nylon membrane (Hybond-N; Amersham
Biosciences), and cross-linked by ultraviolet light. The filter was
hybridized with
-32P-labeled in vitro
transcripts of l(2)35Aa cDNA and exposed for autoradiography. Whole mount in situ hybridization to
mRNA in ovarian tissue was performed following the protocol of D. Tautz and C. Pfeifle (36) with modifications by E. Knust and
K. H. Glätzer.4
For in situ hybridizations to egg chambers, the protocols
were modified as follows. Ovaries were partially dissected into
ovarioles and fixed with heptane-saturated 4% formaldehyde in PBT
(0.2% Tween 20 in phosphate-buffered saline) and 10%
Me2SO for 20 min, rinsed with 90% methanol and 50 mM EGTA, and digested with proteinase K after several
washes in PBT. A digoxigenin-labeled (Roche Biochemicals) cDNA
fragment of l(2)35Aa encoding soluble dGalNAc-T1
(base pairs 118-1896) was used as a probe for hybridization.
Cell and Tissue Expression of Human GalNAc-T11 by Northern
Analysis and Immunohistochemistry--
Multiple tissue Northern blots
were obtained from CLONTECH. A mouse multiple
tissue blot was prepared with 20 µg of total RNA isolated from C57B
mouse organs using the guanidinium thiocyanate procedure. Total RNA was
separated by 1% formaldehyde gel electrophoresis and transferred to
Hybond N+ membrane. The soluble cDNA expression
cassettes (nucleotides 91-1827, human and murine) were used as probes
on corresponding human and murine blots. Probes were random
prime-labeled using [
-32P]dCTP (Amersham Biosciences)
and an oligolabeling kit (Amersham Pharmacia Biotech). Blots were
probed as described previously (5) and washed 5 times at 42 °C with
2× SSC, 0.1% SDS; once with 0.5× SSC, 0.1% SDS; and once at
55 °C with 0.1× SSC, 0.1% SDS in a minihybridization oven (Hybaid).
A murine anti-human GalNAc-T11 monoclonal antibody (IgG1 isotype),
designated UH8 (1B2), was produced and characterized essentially as
described previously (37). Hybridomas were selected by immunocytology on air-dried, acetone-fixed Sf9 cells infected with baculovirus containing various secreted GalNAc-transferase constructs.
Immunocytology was performed with a series of human cancer cell lines
(A704, HT29, MKN45, HeLa, WI38, HL60, Colo205, AGS, A431, MiaPaca,
SUIT2, ASPC1, MTSV1, T47D, MCF7, and IMR32). Cells were grown to
subconfluence in the appropriate media as recommended by
American Type Culture Collection. Cells were fixed in ice-cold acetone
for 10 min and then kept at
70 °C before staining. Fresh frozen
human tissue samples from skin (n = 2), oral mucosa
(n = 5), salivary glands (minor n = 4, submandibular n = 3, parotic n = 1),
small intestine (n = 1), colon (n = 2),
and kidney (n = 2) were obtained as previously described (37). The experiments were approved by the local Human Investigations Committee (in Denmark; J#KF 03-004/95), and the use of
mice was authorized by the Danish Animal Inspectorate. Processing of
frozen tissue sections and fluorescence immunohistology were performed
as previously described (37) and examined in a Zeiss fluorescence
microscope using epi-illumination.
 |
RESULTS |
The Drosophila Polypeptide GalNAc-transferase Family--
The
genome survey of D. melanogaster revealed 13 homologous
putative GalNAc-transferase genes and two related but apparently truncated genes. Individual members of the GalNAc-transferase gene
family are found on chromosomes X, 2, and 3 of the four
chromosomes contained in the Drosophila genome (Table
I). The algorithms used to
generate GadFly predict that coding regions for the identified genes
are organized in 2-8 exons. One gene, l(2)35Aa, is
contained in a single exon (30). Evaluation of ESTs derived from
libraries listed in Table I indicates expression of 12 of these genes
in cultured cell lines, at different stages of fly development, and in
adult flies. Expression of putative GalNAc-transferase genes not
represented in the EST data base was confirmed by RT-PCR analysis of
embryonal, larval, or adult D. melanogaster mRNA (data
not shown).
The putative reading frames of the Drosophila
GalNAc-transferase genes are predicted to encode proteins similar to
most mammalian and C. elegans GalNAc-transferases, ranging
from 557 to 667 amino acids (Fig. 1). All
members of the Drosophila GalNAc-transferase gene family are
predicted by the TMpred
algorithm5 to encode type II
transmembrane proteins. The predicted amino-terminal cytoplasmic
domains range from 6 (CG3254, CG8845a) to 20 amino acids (CG9152), and
the predicted transmembrane domains include a hydrophobic region of
17-22 amino acids near the N terminus (Fig. 1). All members contain a
ricin-like lectin domain of ~130 residues in the C-terminal region.
Conserved sequence motifs among Drosophila proteins resemble
those previously identified for mammalian (2, 28, 38) and C. elegans proteins (1). The central catalytic domain of
GalNAc-transferases was previously defined as containing the GT1 and
Gal/GalNAcT motifs (38), and a multiple sequence alignment of the
putative catalytic domains (ClustalX 1.8) of 13 Drosophila
GalNAc-transferases is shown in Fig. 2. An expanded analysis including the putative catalytic domains of
C. elegans and human GalNAc-transferases was analyzed by
protein distance methods with statistical confidence ensured by
bootstrap analysis. A consensus tree generated by neighbor joining and
least square algorithms is shown in Fig.
3. The tree is unrooted, since no common
ancestral GalNAc-transferase has been identified to enable the
definition of a phylogenetic outgroup. The phylogram indicates 10 statistically significant clades of homologous GalNAc-transferases. The
majority of the identified clades contain at least one enzyme each from
fly, human, and worm, suggesting that there are essential functions
associated with the genes grouped in these clades. Groups of
GalNAc-transferases represented only in Drosophila (CG7304, CG7579, CG7297, CG10000) or human (GalNAc-T8 and -T9; GalNAc-T5) may
represent isoenzymes with unique species-specific functions (Fig. 3).
Additional uncharacterized mammalian GalNAc-transferases have been
identified that would increase the human family to 16-18 members, but
these are not included in the present analysis. The analysis clearly
indicates a horizontal evolutionary relationship among members of the
GalNAc-transferase genes of nematodes, fly, and mammals in distinct
subfamilies. The most likely explanation for this finding is that
individual subfamilies serve distinct functions. The existence of
multiple members in some subfamilies in a single species may represent
simple redundancy as well as further diversification of functions by
refinements in enzyme properties or differential regulation of
expression. The question of function and expression of members of a
subfamily has so far only been reported for human GalNAc-T3 and -T6
(10), but similar findings are found for at least one additional human
subfamily.6

View larger version (23K):
[in this window]
[in a new window]
|
Fig. 1.
Depicted domain structure of D. melanogaster homologous putative polypeptide
GalNAc-transferases. A, schematic representation of
polypeptide GalNAc-transferase protein domains. CT,
cytoplasmic tail; TM, transmembrane region. B,
structure of the putative D. melanogaster polypeptide
GalNAc-transferase family members. Protein domains were predicted based
on the available cDNA or genomic DNA sequence. The proteins were
aligned at four conserved cysteine residues in the catalytic domains,
as shown in Fig. 2. Conserved cysteine residues (C) are
indicated by dotted lines. The position of the
DXH motif is indicated. Potential N-glycosylation
sites are indicated by trees. CLD and QXW repeats
of the ricin-like lectin domain are indicated by filled
circles and open squares. The N-terminal portion of
CG10000 with uncertain topology is depicted in dotted
form.
|
|

View larger version (96K):
[in this window]
[in a new window]
|
Fig. 2.
Multiple sequence analysis of the catalytic
domains of predicted D. melanogaster polypeptide
GalNAc-transferases. ClustalX 1.8 was used as a progressive
sequence alignment tool. Alignments were revised for maximal residue
conservation and minimal gaps in the conserved sequence motifs.
Introduced gaps are shown as hyphens, and aligned identical
residues are boxed (black for all 13 sequences,
dark gray for 10-12 sequences, and light gray
for eight or nine sequences). The positions of four conserved cysteine
residues are indicated by asterisks. The arrow
indicates the amino acid position affected by the
l(2)35AaSF32 mutation. Conserved GT1 and
Gal/GalNAcT domains are indicated above the
sequences.
|
|

View larger version (30K):
[in this window]
[in a new window]
|
Fig. 3.
Phylogram of polypeptide GalNAc-transferases
in D. melanogaster, H. sapiens,
and C. elegans. The consensus tree from protein
distance analyses of predicted catalytic GalNAc-transferase domains is
based on progressive sequence alignments similar to Fig. 2. Bootstrap
percentage values from 1000 replicates are indicated above
the nodes. Putative D. melanogaster polypeptide
GalNAc-transferases are indicated by their GadFly annotation.
GalNAc-T and GLY indicate polypeptide
GalNAc-transferases identified in humans and C. elegans,
respectively. Phylogenetic subfamilies are indicated by
background shading. Human GalNAc-T8 and -T9 have been
reported without confirmation of enzymatic functions and may hence not
represent genuine GalNAc-transferase genes (46, 47). The recently
reported rat GalNAc-T9, which was functionally characterized and shown
to display glycopeptide substrate specificity, has been redesignated
rGalNAc-T10 (12). GalNAc-T12* refers to a novel genuine human
polypeptide GalNAc-transferase (E. P. Bennett and H. Clausen,
unpublished results). The predicted CG2103 protein exhibits high amino
acid sequence similarity (60% identity in the catalytic domain; Fig.
2) to C. elegans GLY-10 and forms a strongly supported
phylogenetic group with CG8845a, CG8845b, and rat GalNAc-T10.
CG6394, encoding the dGalNAc-T2 protein is closely
related to C. elegans GLY-7 and human GalNAc-T7 with a
respective sequence identity of 62 and 54% in the catalytic domain (1,
11). These orthologous GalNAc-transferases form a strongly supported
phylogenetic group and share similar enzymatic properties (Table III).
The CG6394 gene is organized in three exons and is the only
putative D. melanogaster GalNAc-transferase found on
chromosome X (Table I). CG10000, together with CG7304, CG7579, and
CG7297, form a subgroup of putative GalNAc-transferases found only in
Drosophila. The CG8182 protein is most closely related to
the recently described C. elegans putative
GalNAc-transferase GLY-9 (1). The worm and fly proteins exhibit 52%
amino acid sequence identity in the catalytic domain. Among the
putative D. melanogaster GalNAc-transferases, CG9152 is most
closely related to the mammalian GalNAc-T1 (3). CG9152 shares 72%
amino acid sequence identity with the catalytic domain of human
GalNAc-T1 and forms a strongly supported clade with CG4445, GLY-3, and
GLY-6. Five of seven introns of the CG9152 locus are located
at positions similar to the human GALNT1 gene and four at
positions similar to gly-6, suggesting conservation from a
common ancestral gene (39). The putative GalNAc-transferase CG3254
exhibits close sequence similarity to human GalNAc-T2 (77% amino acid
identity in the catalytic domain). Lower but significant sequence
similarity is found to the C. elegans GalNAc-transferase
GLY-4 (1). CG3254, GalNAc-T2, and GLY-4 form a separate phylogenetic
cluster with maximal (100%) bootstrap support.
|
|
Two phylogenetic clades were selected for further studies in order to
test the hypothesis that orthologous relationships of subfamilies were
consistent with conserved unique functions. The lethal
l(2)35Aa gene was previously suggested to encode a
GalNAc-transferase and was chosen because of its importance in fly
development. The l(2)35Aa gene was predicted to be
orthologous with a single, novel, putative human GalNAc-transferase
gene, here designated GalNAc-T11. The CG6394 gene was
selected for analysis because the predicted orthologous human gene,
GalNAc-T7, has been extensively characterized to encode a
GalNAc-transferase with unique GalNAc glycopeptide substrate
specificity (9, 11).
GalNAc-transferase Subfamily dGalNAc-T1(l(2)35Aa)/Human GalNAc-T11:
Isoforms with Unique Polypeptide Substrate Specificity--
The
l(2)35Aa gene is organized in one exon (30), and cloned
cDNA as well as genomic DNA was found to contain an open reading frame of 1896 base pairs encoding a protein of 632 amino acids with
three potential N-linked glycosylation sites (Fig. 1).
TMpred predicted a type II domain structure with an
N-terminal cytoplasmic domain of 10 amino acids, a
transmembrane segment of 19 amino acids, and a stem region and
catalytic domain of 603 residues.
Human GALNT11 was identified during positional cloning of
the human DNA repair gene XRCC2 on chromosome 7q36.1
and was subsequently identified by the human genome project as a
predicted gene with sequence similarity to GalNAc-transferases. In the
study presented here, a partial human cDNA sequence was obtained
from the gastric carcinoma cell line MKN45. The sequence was confirmed
by sequencing of P1 clones covering the entire coding sequence.
Sequencing of a P1 clone revealed that the coding region of the
GALNT11 gene was contained in 10 exons, and the intron/exon
boundaries identified were identical to the boundaries predicted from
sequencing of a human genomic PAC clone RP5-98107
(GenBankTM accession number AC006017). GALNT11
is localized at 7q36.1 between D7S2450 and D7S550 (148.6-180.8
centimorgans) and represented in EST cluster Hs.97056. Comparison of
positions of the nine intron/exon boundaries in the coding region of
GALNT11 with those of other human GalNAc-transferases showed
conservation of the positioning of four boundaries among
GALNT1, -T2, -T3, and -T6,
and two additional boundaries were conserved only in GALNT3
and -T6 (10, 39). The coding region of human GalNAc-T11
predicts a type II transmembrane protein of 608 amino acids with two
potential N-linked glycosylation sites
(GenBankTM accession number Y12434). The putative
transmembrane hydrophobic signal sequence is ~20 residues flanked by
only one charged residue at the N-terminal border. The
C-terminal border of the hydrophobic signal sequence is
flanked by a putative N-glycosylation site (residue 29), and
another is found in the putative catalytic domain (residue 428).
The murine Galnt11 (GenBank accession number Y12435) encodes
a protein of identical size to the human. The two
N-glycosylation sites are conserved, and an additional
putative N-glycosylation site is found close to the
conserved site in the catalytic domain (residue 423). The
N-glycosylation site at residue 428 is also found in
GalNAc-T5. The amino acid sequence identity with human GalNAc-T11 is
87%, which is in agreement with other rodent-human orthologs. Murine
Galnt11 is localized to mouse chromosome 5 in a region
syntenic with human 7q36.
Comparison of sequences of human and murine GalNAc-T11 with
dGalNAc-T1 reveals a few common characteristics.
dGalNAc-T1 encodes a protein of 632 amino acids, which is 24 amino acids longer than the mammalian sequences, and most of the
difference is due to an extended C-terminal end.
dGalNAc-T1 has three N-linked glycosylation sites, but these are not conserved with the mammalian proteins. The
sequence of both GalNAc-T11 and dGalNAc-T1 differ
significantly from other reported mammalian GalNAc-transferases in that
they contain an insert of ~18-20 residues in the
N-terminal region immediately preceding the ricin-like
lectin domain found in the C-terminal ends of most
GalNAc-transferases. The lectin domain of ~130 amino acids was shown
to be functional on at least one isoform, GalNAc-T4, where it directs
the GalNAc glycopeptide substrate specificity (33). The
GalNAc-transferase lectin domain is characterized by six cysteine
residues with three -CL(D/E)- motifs and three -QXW- motifs.
The first -CLD- motif of GalNAc-T4 was previously shown to be important
for the lectin-mediated glycopeptide specificity by substitution of
-CLD459- to -CLH459-,
since the substitution destroyed the lectin-mediated function without
affecting the general catalytic function of GalNAc-T4 (33). The three
-CLD- motifs in the GalNAc-T11 lectin domain have the sequences
-CLV495-, -CLD538-, and -CLR580-,
where the first motif -CLV495- corresponds to
-CLD459- of GalNAc-T4. dGalNAc-T1 has
very similar sequence motifs, -CAA495-,
-CLE541-, and -CLR582-, to GalNAc-T11 in the
lectin domain. The two first QXW motifs are conserved in
both genes, whereas only mammalian GalNAc-T11 has the last
QXW sequon. The patterns of CLD and QXW motifs
are different from other GalNAc-transferase isoforms. This may suggest that the lectin domains of GalNAc-T11/dGalNAc-T1 have
functions different from that previously established for GalNAc-T4
(33).
Expression of secreted constructs of dGalNAc-T1 and
human/murine GalNAc-T11 in Sf9 cells resulted in
GalNAc-transferase activity in the culture medium of infected cells
that was significantly above background values for uninfected cells or
cells infected with irrelevant constructs (not shown).
dGalNAc-T1 and human GalNAc-T11 were purified to near
homogeneity and analyzed in further detail. The purified enzymes
exhibited strict donor substrate specificity for UDP-GalNAc and did not
utilize UDP-galactose, UDP-N-acetylglucosamine, UDP-glucose,
or UDP-xylose. Purified dGalNAc-T1 exhibited an apparent Km of 8.5 ± 1.2 µM for
UDP-GalNAc (EA2 acceptor peptide at 500 µM) and an
apparent Km value of 0.35 ± 0.13 mM for EA2 (UDP-GalNAc at 100 µM). These
values are comparable but lower than those detected for human
GalNAc-T11 (Km of 25.3 ± 1.6 µM
for UDP-GalNAc and 0.67 ± 0.11 mM for the EA2 peptide substrate). dGalNAc-T1 and human GalNAc-T11 exhibited no
activity with GalNAc glycopeptides (GalNAc4TAP25 and
GalNAc1-2EA2) (not shown).
Comparative analyses of activities of purified dGalNAc-T1
and human GalNAc-T11 are presented in Table
II. The data in Table II were produced
with partially purified enzyme preparations where it was not possible
to clearly quantify the specific enzyme protein concentrations, and
assays were normalized by activity with the EA2 substrate at 200 µM. Subsequent preparations of enzymes were quantified by
SDS-PAGE Coomassie staining with protein standards, and specific
activities for two substrates, EA2 and Muc4.1, were analyzed using the
same assay conditions as in Table II (200 µM peptide).
dGalNAc-T1 and human GalNAc-T11 showed nearly identical relative activity profiles with the panel of peptide substrates tested.
dGalNAc-T1 exhibited a specific activity of 1.9 units/mg with EA2 and 0.27 units/mg with Muc4.1, whereas GalNAc-T11 showed 0.88 with EA2 and 0.48 units/mg with Muc4.1. The differences in specific
activities with EA2 are minor and correlate with the differences in
apparent Km for this peptide. All of the acceptor
substrates identified for the two enzymes contain multiple potential
acceptor sites and are derived from mucin tandem repeats. Since peptide
substrates derived from mucin tandem repeat sequences often function as
substrates for multiple GalNAc-transferase isoforms, it is in many
cases important to identify actual acceptor sites utilized by
individual isoforms to be able to identify differences in substrate
specificity. One substrate that has been very successful in
characterizing distinct functions of GalNAc-transferase isoforms has
been the MUC1 tandem repeat sequence. Several GalNAc-transferases, albeit with different kinetics, utilize three of five potential sites
(HGVT*SAPDTRPAPGS*T*APPA;
asterisks indicate GalNAc attachment sites) (32), whereas GalNAc-T4 is the only isoform identified that can utilize the two remaining acceptor
sites (7). Importantly, extensive analyses with different designs and
length of peptide substrates up to 200-mer peptides have clearly
established that the site specificity is primarily directed by primary
peptide sequence.
Detailed comparative analysis of the activities of human GalNAc-T11 and
dGalNAc-T1 with MUC1 tandem repeat peptides revealed a
preference for the short peptide designs Muc1a over Muc1b, which is
similar to the relative activities observed with GalNAc-T1 and -T3/6
but different from GalNAc-T2 (32). Structural analysis of terminal
products formed with 25- and 60-mer MUC1 peptides revealed that both
GalNAc-T11 and dGalNAc-T1 only incorporated 2 mol of GalNAc
residues/mol of tandem repeat sequence (Fig.
4, Table I), and the sites utilized
were identified as Thr in -VTSA- and -GSTA-
(HGVT*SAPDTRPAPGST*APPA) (Table II). GalNAc-T11 and dGalNAc-T1 are the only
GalNAc-transferase isoforms characterized to date that exhibit this
pattern of acceptor site specificity with MUC1 peptides, and this may
therefore be regarded as a common distinguishing feature.

View larger version (26K):
[in this window]
[in a new window]
|
Fig. 4.
MALDI-TOF analysis of terminal glycosylation
reactions of MUC1 tandem repeat derived 25-mer peptide substrate
(TAP25) with human GalNAc-T11 and dGalNAc-T1.
A, unglycosylated TAP 25 peptide; B, TAP25
glycosylated by GalNAc-T11; C, TAP25 glycosylated by
GalNAc-T11 and then GalNAc-T2; D, TAP25 glycosylated by
dGalNAc-T1; E, TAP25 glycosylated by
dGalNAc-T1 and then GalNAc-T2; F, TAP25
glycosylated by dGalNAc-T1 followed by GalNAc-T11.
Asterisks indicate sites of quantitative transfer of GalNAc
residues, determined as described under "Experimental Procedures."
GalNAc-T11 and dGalNAc-T1 attach only 2 mol of GalNAc even
in combined reactions. The products formed with both GalNAc-T11 and
dGalNAc-T1 can be further glycosylated by GalNAc-T2,
previously shown to utilize four sites in this peptide design
(32).
|
|
Another striking example of unique substrate specificity was the
finding that human (and murine) GalNAc-T11 and dGalNAc-T1 only incorporated one GalNAc residue in the 21-mer peptide derived from
the MUC4 tandem repeat sequence, which contains a total of seven
potential O-glycosylation sites
(CPLPVTDTSSASTGHAT*PLPV) (Fig. 5). Human GalNAc-T1 incorporated
three residues into this substrate, whereas other GalNAc-transferase
isoforms attached multiple residues. The sites of incorporation with
other isoforms were not determined. Using a series of overlapping
smaller peptides, the acceptor site for both GalNAc-T11 and
dGalNAc-T1 was narrowed to the C-terminal region by the
peptide design Muc4.3. Structural analysis revealed that for both
peptide designs, the acceptor site catalyzed by both enzymes was the
Thr in -ATP- (TGHAT*PLPV).

View larger version (32K):
[in this window]
[in a new window]
|
Fig. 5.
MALDI-TOF analysis of terminal glycosylation
reactions of MUC4 tandem repeat derived 21-mer peptide substrate
(Muc4.1) with human GalNAc-T11 and dGalNAc-T1.
A, unglycosylated Muc4.1 peptide; B, Muc4.1
glycosylated by GalNAc-T11; C, Muc4.1 glycosylated by
dGalNAc-T1; D, Muc4.1 glycosylated by GalNAc-T1.
Asterisks in parenthesis indicate sites of partial transfer
of GalNAc residues, determined as described under "Experimental
Procedures."
|
|
Because human GALNT11 was found to be selectively expressed
in kidney, we analyzed a peptide substrate derived from erythropoietin, which is produced in kidney and contains a single
O-glycosylation site (Ser126). Peptide
substrates containing this site have been found to represent quite
inefficient in vitro substrates for GalNAc-transferases (32,
40). It therefore remains possible that a unique GalNAc-transferase with particularly good kinetic properties for the erythropoietin substrate exists. As shown in Table II, GalNAc-T11 and
dGalNAc-T1 have no detectable activity with the Ser
containing erythropoietin peptide substrate and exhibit specificity for
the Thr substituted sequence similar to other GalNAc-transferases
characterized to date. Thus, despite the remarkable selective
expression of GalNAc-T11 in kidney, this isoform does not appear to be
involved in glycosylation of erythropoietin. Similar to other
GalNAc-transferases (32), GalNAc-T11 and dGalNAc-T1 can
transfer GalNAc to adjacent Thr residues. Characterization of the two
sites utilized in the EA2 peptide derived from rat submaxillary gland
mucin (Table II) showed that GalNAc was incorporated into adjacent Thr
residues
(PTTDST*T*PAPTTK).
The findings that GalNAc-T11 and dGalNAc-T1 exhibited
activities identical with the panel of peptides evaluated here and that they produced the same unique glycoform products with three substrates (MUC1, MUC4.1, and EA2) clearly indicate that they overall possess identical catalytic properties. Importantly, the panel of peptides applied in this study is more exhaustive than what has previously been
used to distinguish catalytic properties of other human
GalNAc-transferase isoforms (10). The analysis does not provide
specific information that can be used to predict e.g. a
shared acceptor sequence motif, and we only conclude that a unique and
distinguishable property of the catalytic functions has been conserved.
GalNAc-transferase Subfamily dGalNAc-T2 (CG6394)/Human GalNAc-T7:
Isoforms with Unique GalNAc Glycopeptide Substrate
Specificity--
The CG6394 gene was cloned based on
embryonal Drosophila cDNA and found to contain an open
reading frame of 1773 base pairs encoding a protein of 591 amino acids
with two potential N-linked glycosylation sites (Fig. 1).
TMpred predicted a type II domain structure with an
N-terminal cytoplasmic domain of 10 amino acids, a
transmembrane segment of 19 amino acids, and a stem region and catalytic domain of 562 residues. The lectin domains of human GalNAc-T7
and dGalNAc-T2 share the same CLD (first and third) and
QXW (third) motifs, which are nearly identical to the
patterns found in GalNAc-T4 as well as in rat GalNAc-T10
(GalNAc-T10 also has the second QXW motif). Both GalNAc-T4
and GalNAc-T10 exhibit GalNAc glycopeptide substrate specificity;
however, several other isoforms with polypeptide substrate specificity
also share this pattern of motifs in the lectin domain (GalNAc-T1 and
-T2).
Recombinant dGalNAc-T2 exhibited no activity with the panel
of peptides listed in Table II, including the peptide EA2; however, significant enzymatic activity was found with the GalNAc-EA2
glycopeptide (Table III). So far,
three mammalian GalNAc-transferase isoforms, GalNAc-T4 (7), -T7 (9,
11), and -T10 (12), have been shown to selectively or exclusively
function with GalNAc glycopeptides, and the available data suggest that
the substrate specificities of the three isoforms are different,
similar to what is found for different isoforms functioning with
polypeptide substrates. Thus, the activities of human GalNAc-T4 and -T7
were shown to be distinguishable using a panel of GalNAc glycopeptides
derived from the tandem repeats of human MUC1, MUC2, MUC5AC, MUC7, and the rat submandibular mucin (EA2 peptide) (Table III) (11). GalNAc-T4 activity is characteristic in utilizing GalNAc-MUC1 and GalNAc-MUC5AC but not GalNAc-EA2. In contrast, GalNAc-T7 selectively utilized GalNAc-EA2 and showed no activity with the two former substrates used
by GalNAc-T4. Both isoforms were active with GalNAc-MUC2. The GalNAc
peptide activities of rat GalNAc-T7 and -T10 are less well
characterized, but they were also shown to be distinguishable using a
single glycopeptide derived from the tandem repeat of human MUC5AC
(12). Nevertheless, as shown in Table III, dGalNAc-T2 exhibited the exact same pattern of activity with the panel of GalNAc
glycopeptides previously used to define specificity of human GalNAc-T7.
dGalNAc-T2 and human GalNAc-T7 are thus clearly distinguishable from GalNAc-T4. Rat GalNAc-T7 and -T10 have activity with a different peptide design of the tandem repeat of MUC5AC (GTTPSPVPTTSTTSAP); however, neither dGalNAc-T2 nor human
GalNAc-T7 shows activity with the GalNAc-MUC5AC peptide design used in
this study (Table III). The results thus clearly indicate that
dGalNAc-T2 and human GalNAc-T7 overall possess identical
catalytic properties that with present knowledge are distinguishable
from other isoforms.
The finding that dGalNAc-T2 as predicted exhibited GalNAc
glycopeptide substrate specificity indicates that the lectin
domain-mediated glycopeptide specificity of some GalNAc-transferase
isoforms developed early in evolution. Recently, a GalNAc-transferase
homologous gene in the parasite Toxoplasma gondii was also
determined to encode a GalNAc-transferase with GalNAc glycopeptide
substrate specificity (41).
Lethal l(2)35Aa Alleles Have Inactivating Mutations in the
dGalNAc-T1 Coding Region--
Previously, Flores and Engels (25)
searched for the candidate gene of the lethal phenotype associated with
l(2)35Aa and demonstrated that a 5-kb rescue fragment
contained an open reading frame encoding a protein with similarity to
polypeptide GalNAc-transferases. Analysis of the 5-kb rescue fragment
reveals only one conventional open reading frame that predicts a
protein with similarity to known proteins and from which ESTs have been
mapped (Fig. 6). A group of 5' ESTs from
a single library (pOTB7 AT library from Drosophila adult
male testes and seminal vesicles) was recently described, which could
originate from a potential open reading frame on the minus strand that
partly overlaps and is complementary to dGalNAc-T1 on the
plus strand. However, this unannotated open reading frame has the
unusual feature of ~1000 bp in frame before the first ATG, and the
group of 5' ESTs, which potentially could originate from the
dGalNAc-T1 complementary strand, may be inverted and
represent 3' ESTs from the 3'-untranslated region of
dGalNAc-T1. In order to further substantiate that it is the
dGalNAc-T1 open reading frame that directs the
l(2)35Aa phenotype, three lethal alleles were sequenced
(Fig. 6). Two of these contained single nucleotide nonsense mutations
(l(2)35AaHG8 and l(2)35AaSF12),
introducing a stop at codons 89 and 195 of the dGalNAc-T1
open reading frame, respectively, which excludes the majority of the catalytic domains and hence are predicted to inactivate the function of
the enzyme. If assigned to the hypothetical minus strand open reading
frame, the C265
T mutation of
l(2)35AaHG8 is silent, and the T593
A transversion in the l(2)35AaSF12 allele may cause
an amino acid change from Gln143 to Leu. The third
allele (l(2)35AaSF32) contains a single transition
(C679
T), which again is silent in the potential
reading frame on the minus strand but leads to an amino acid change in
a highly conserved sequence of the GT1 motif of the catalytic domain of dGalNAc-T1 (IRSR227WVIG) on the plus strand (Fig.
6). The positively charged residue Arg at position 227 is conserved in all putative fly GalNAc-transferases (Fig. 2). In human
GalNAc-transferase genes, Arg and His are found at this position, but
the subfamily pair GalNAc-T3 and -T6 has an uncharged residue (Val) at
this position. Although it is likely that the nonconservative mutation to Trp in l(2)35AaSF32 is detrimental for the
activity, this needs to be formally confirmed by experimental
analysis.

View larger version (16K):
[in this window]
[in a new window]
|
Fig. 6.
Molecular analysis of the l(2)35Aa
locus. The genomic region of chromosome 2 around
l(2)35Aa is shown. Coding and noncoding exon sequences are
indicated as filled boxes and open boxes,
respectively. Nucleotide sequences of the mutant alleles are shown with
the amino acid sequence in single-letter codes
above (plus strand) or below (minus strand) the
sequence. The hypothetical gene on the minus strand is depicted in
dotted form. Nucleotide positions are indicated
relative to A1TG of the respective open reading frame.
|
|
A great number of sequence polymorphisms were detected during
sequencing of l(2)35Aa alleles (Table
IV). Of significant interest is that the
majority (14 of 15) of the single nucleotide polymorphisms identified
are synonymous in the dGalNAc-T1 reading frame, whereas the
same nucleotide polymorphisms cause amino acid changes in the
hypothetical reading frame on the minus strand. Seven of the nucleotide
polymorphisms identified are synonymous codon changes in
dGalNAc-T1, where the encoded amino acid has only two
alternative codons. Of specific interest is T1011
C, which causes a nucleotide polymorphism in the translational start codon of the hypothetical minus strand open reading frame (i.e. A1TG
G1TG). In contrast, T1011
C is a
conservative codon change for His337 in the
dGalNAc-T1 (plus strand) reading frame (Table IV). These findings further substantiate that the hypothetical complementary reading frame on the minus strand does not represent a functional gene.
Analysis of Expression Pattern of dGalNAc-T1 (l(2)35Aa)--
ESTs
derived from l(2)35Aa were identified in adult testis embryo
and Schneider L2 cell cDNA libraries. Northern blot analyses of
D. melanogaster mRNA from adult flies revealed a single
transcript of l(2)35Aa of ~2.2 kilobases. Higher
expression levels of l(2)35Aa were detected in female adult
flies as compared with males (data not shown). Detailed Northern
analyses revealed that female flies expressed high l(2)35Aa
mRNA levels in the ovary. Lower transcript levels were detected in
testes and in the male and female carcass (Fig.
7A). Northern analysis of
mRNA obtained from unfertilized eggs and embryos revealed a strong
maternal contribution to zygotic transcription of l(2)35Aa.
A comparison of mRNA levels detected in unfertilized eggs and
expression levels in early embryos revealed only a marginal increase in
l(2)35Aa expression after fertilization (Fig.
7B).

View larger version (55K):
[in this window]
[in a new window]
|
Fig. 7.
Expression analysis of
l(2)35Aa. A, Northern blot analysis of
poly(A)+ RNA from different female and male adult D. melanogaster tissues. B, maternal contribution to
l(2)35Aa expression in early embryos. Blots were probed with
32P-labeled antisense in vitro transcripts of
l(2)35Aa (base pairs 91-1896) corresponding to the soluble
dGalNAc-T1 expression construct. Expression of the ribosomal
rpL9 gene is indicated as a control (48).
|
|
Expression Pattern of l(2)35Aa during Oogenesis--
Analysis of
l(2)35Aa mRNA expression during oogenesis was performed
by in situ hybridization to whole mount wild-type adult ovarian tissue. Expression of l(2)35Aa was detected in germ
cells and follicle epithelia of all developmental stages (Fig.
8, A and B).
l(2)35Aa expression initiates during early stages of
oogenesis in region I and reaches high levels in regions IIa and IIb of the germarium (Fig. 8B). High expression was detected in
stage 2 egg chambers, and transcript levels remained high during later stages of oogenesis (s2 to s8 in Fig.
8A).

View larger version (79K):
[in this window]
[in a new window]
|
Fig. 8.
Whole mount in situ
hybridization of dGalNAc-T1 cDNA
l(2)35Aa to wild-type ovarian tissues.
A, ubiquitous expression of l(2)35Aa along the
anterior-posterior axis of different stage egg chambers. High
expression levels of l(2)35Aa are detected by a
digoxygenin-labeled antisense probe in various cell types of the
germarium (G), and early (s2) to late stage
follicles (s6, s8) throughout the vitellarium
(V). B, detail showing l(2)35Aa
expression in cytologically distinct regions I, IIa, IIb, and III of
the germarium. Onset of expression is detected in early germarial
region I.
|
|
Tissue Distribution of Human and Murine GalNAc-T11--
The
remarkable phenotype associated with inactivation of
dGalNAc-T1 in the fruit fly may be related to a unique
function of the protein or the possibility that the protein is
required in a specific cell type where other GalNAc-transferases
otherwise capable of the same functions are not expressed. Studies of
in vivo functions of GalNAc-transferase isoforms are missing
due to the inherent difficulty of assessing the finite number of
isoforms and specificity in a given cell, and it is presently not
possible to address the first option. Thus, the expression pattern of
human and murine GalNAc-T11 was evaluated. The origin of ESTs in data bases can give an indication of expression pattern of genes and indicates that GalNAc-T11 is widely expressed in diverse tissues of
both humans and mice. ESTs of the human Hs.97056 cluster were derived
from a large variety of tissue sources including brain, germ cell,
kidney, ovary, pancreas, placenta, prostate, stomach, uterus, whole
embryo, breast, lung, nervous tissue, ovary, prostate, skin, stomach,
and uterus. ESTs of the mouse Mm.19390 cluster were derived from a more
restricted set of organs including lymph node, kidney, liver, muscle,
lung, mammary glands, skin, whole fetus (19.5 days postcoitum),
and whole embryo (13.5/14.5 days postcoitum). Northern blots
with mRNA from 12 human adult organs showed that GalNAc-T11
hybridized to a single mRNA of ~3 kb predominantly in kidney
(Fig. 9A). Lower levels of
mRNA were observed in brain, heart, and skeletal muscle, whereas
other organs including placenta yielded very weak signals. A survey of
mRNA from 76 different human tissues indicated that GalNAc-T11 was
very weakly expressed in most organs including testis and ovary, but
there was high expression only in the kidney (data not shown). Northern
analysis of adult murine organs revealed strong expression in kidney,
in agreement with the analysis of human GalNAc-T11 (Fig.
9B). Very weak expression was found in most other organs.
Thus, some differences between humans and mice were observed in brain
and heart, where higher levels were found in humans. Further analysis
of mRNA expression among a panel of human cell lines showed high
expression in a number of cell lines of breast, pancreatic, and colonic
origin (data not shown).

View larger version (39K):
[in this window]
[in a new window]
|
Fig. 9.
Northern blot analysis of human and murine
tissues. A, multiple human Northern blot (human 12-lane
blot; CLONTECH) probed with human GalNAc-T11;
B, Northern blot analysis of total RNA isolated from murine
(C57 black) organs probed with murine GalNAc-T11; C,
ethidium bromide gel stain of murine RNA gel.
|
|
Because of the unique expression pattern of GalNAc-T11, we chose
to generate a monoclonal antibody to the human enzyme. One mAb, UH8
(1B2), secreting IgG1, specifically reacted with GalNAc-T11 and showed
no cross-reaction with any other GalNAc-transferase isoform tested
(human GalNAc-T1, -T2, -T3, -T4, and -T6), which is a general finding
we have observed with mAbs to polypeptide GalNAc-transferase isoforms
(7, 10, 37). To date, we have made isoform-specific monoclonal
antibodies to six of the human GalNAc-transferases. Interestingly,
recent studies with domain swapping among different isoforms revealed
that most of the antibodies react with the lectin
domains.7
A preliminary immunohistological analysis of a limited number of human
tissues corroborated the Northern analysis and revealed high expression
only in kidney. The staining pattern of six human GalNAc-transferase
isoforms in the kidney cortex is shown in Fig. 10. GalNAc-T1 was expressed in
glomeruli and weakly in tubules (Fig. 10A). GalNAc-T2 in
contrast was strongly expressed both in tubules and glomeruli (Fig.
10B). GalNAc-T3 labeled weakly a few tubules (Fig.
10C), whereas GalNAc-T4 and -T6 show no staining (Fig. 10,
D and E). GalNAc-T11 was strongly expressed in
tubules similarly to GalNAc-T2; however, no labeling was observed in
glomeruli (Fig. 10F). The morphology of the frozen kidney
sections does not allow detailed evaluation of subcellular
localization, but the intracellular granular staining pattern observed
with all antibodies is consistent with Golgi localization. Very
weak expression of GalNAc-T11 was found in all cell layers of skin and
in the lower cell layers in buccal mucosa (not shown). Previously, we
found marked changes in the GalNAc-transferase repertoire (GalNAc-T1, -T2, and -T3) in different cell layers of the stratified squamous epithelium of buccal mucosa (37). No staining with anti-GalNAc-T11 was
observed in colon, small intestine (not shown), and different salivary
glands (Fig. 11, B and
C).