Expression and regulation of the human and mouse aspartylglucosaminidase gene.

Aspartylglucosaminidase (AGA) is a lysosomal enzyme that catalyzes one of the final steps in the degradation of N-linked glycoproteins. Here we have analyzed the tissue-specific expression and regulation of the human and mouse AGA genes. We isolated and characterized human and mouse AGA 5'-flanking sequences including the promoter regions. Primer extension assay revealed multiple transcription start sites in both genes, characteristic of a housekeeping gene. The cross-species comparison studies pinpointed an approximately 450-base pair (bp) homologous region in the distal promoter. In the functional analysis of human AGA 5' sequence, the critical promoter region was defined, and an additional upstream region of 181 bp exhibiting an inhibitory effect on transcription was identified. Footprinting and gel shift assays indicated protein binding to the core promoter region consisting of two Sp1 binding sites, which were sufficient to produce basal promoter activity in the functional studies. The results also suggested the binding of a previously uncharacterized transcription factor to a 23-bp stretch in the inhibitory region.

Aspartylglucosaminidase (AGA, 1 EC 3.5.1.26) is a lysosomal hydrolase that catalyzes the cleavage of the N-glycosidic bond between asparagine and N-acetylglucosamine in the degradation of glycoproteins (1). Deficiency of the enzyme leads to an autosomal recessively inherited lysosomal storage disorder, aspartylglucosaminuria (AGU) (2). The human AGA gene has been assigned to chromosome 4 q 34-35, corresponding to mouse syntenic region 8B, where the mouse gene is located (3). Both cDNAs encoding for 346-amino acid long AGA polypeptides have been previously cloned, and the genomic structures of the genes were resolved (3)(4)(5)(6). The 1041-bp coding regions are 84% homologous. Northern hybridization analysis of human control fibroblasts has demonstrated the presence of two mRNA species of 2.2 and 1.4 kb due to the utilization of alter-native polyadenylation signals. In mouse liver, only one, even shorter, transcript has been found. (3).
AGA is a ubiquitous enzyme widely distributed in mammalian tissues (7). The three-dimensional structure of human AGA has been resolved by crystallization (8). The mature enzyme was shown to be a heterotetramer representing the only known eukaryotic member of the recently described enzyme family of N-terminal hydrolases (9). Furthermore, its intracellular synthesis, assembly, and catalytic function have been well established (10 -12). However, only preliminary data exist on the expression of AGA enzyme in normal tissues and in the cells of AGU patients (7). Despite the household nature of the enzyme, some variation in the expression of AGA protein and in specific AGA activity has been observed between tissues; leukocyte homogenate and liver exhibit the highest levels of AGA activity, whereas brain tissue and fibroblasts display only 10% or less of the AGA activity detected in leukocytes. The distribution of AGA polypeptides has been shown to be similar in tissues from control individuals and AGU patients with the exception of brain samples. No trace of AGA protein has been detected in the cerebral cortex of AGU patients; this finding is in agreement with the clinical phenotype of AGU, in which the most severe symptoms are due to dysfunction in the central nervous system.
The present study was undertaken to investigate the function and regulation of expression of the AGA gene. We present for the first time data on the expression of AGA mRNA in various human and mouse tissues and show that both of the differentially polyadenylated human mRNAs are translated into a polypeptide. We have also characterized the promoter region of the human AGA gene and performed comparison studies with the mouse AGA 5Ј sequence. Following characterization of the 5Ј sequence, we located the areas responsible for transcriptional activity by analyzing serial deletions of the human 5Ј-flanking sequence in a reporter construct. The binding sites for the trans-acting regulatory proteins were evaluated employing the DNase I footprinting assay and gel-shift method.

Isolation and Analysis of Human and Mouse Genomic Clones-A
PCR-amplified DNA fragment containing the first exon and the 5Јuntranslated region of the AGA gene together with the AGA cDNA were used as 32 P-labeled probes to screen a human placenta genomic lambda phage library (Stratagene). As a result, a DNA clone containing 400 bp of the first intron of AGA and extending 12 kb upstream was isolated. A 4.8-kb PstI fragment from the 3Ј end of the genomic clone was subcloned into pGEM3Zf(ϩ) vector (Promega) and sequenced from both strands. The 5Ј sequence of the mouse AGA gene was previously cloned by us (3). Sequence analysis and comparison studies were carried out with a GCG computer program using Compare, Dotplot, or Bestfit. Putative binding sites for transcription factors were identified using Findpatterns and a Tfsites GCG-file created by Dr. David Ghosh in publicly accessible transcription factor database. * This work was supported by the Academy of Finland, the Hjelt Foundation, the Rinnekoti Research Foundation and the Sigrid Juselius Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) U82618 for the human and U82617 for the mouse.
RNA Analysis-Northern blot analysis was carried out by using commercially available human and mouse poly(A) ϩ RNA membranes (Clontech). The blots were hybridized with a 32 P-labeled human or mouse AGA cDNA and ␤-actin cDNA (Clontech). To determine 5Ј ends of human and mouse AGA transcripts, total RNA was isolated by the guanidine thiocyanate/CsCl method from cultured fibroblasts of normal human individuals and from normal mouse liver tissue as described previously (13). Primer extension of 15 g of mouse liver and human fibroblast total RNA was performed with 32 P-end-labeled oligonucleotide complementary to the human AGA gene region nt Ϫ138 to Ϫ169 (relative to ATG) and to the mouse AGA gene region nt ϩ33 to ϩ1 as described (14).
Cell Culture and Transfections-HeLa, N 18 glioblastoma, and COS-1 cells were grown for 24 h in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum and antibiotics. The 85% confluent cells were transfected with 5 g of plasmid DNA using Lipofectin reagent (Life Technologies, Inc.) as described previously (15).
Assay of the AGA Promoter Activity-HeLa and glioblastoma cells were transfected with the reporter plasmid constructs. A pXGH5 (Nichols Institute) plasmid containing the mouse metallothionein-I promoter fused to hGH structural sequences was used as a control. After a 48-h incubation, aliquots of media were collected and assayed for hGH protein in duplicate by using a commercially available radioimmunoassay (Nichols Institute). Dot blot hybridization analysis with a 32 Plabeled reporter plasmid was used to monitor the differences in transfection efficiencies (17).
Preparation of Nuclear Extracts-Nuclear extracts from HeLa cells were prepared as described previously (18). Protein concentration was determined according to Bradford (19). A commercially available HeLa cell nuclear extract (Promega) was used in some assays.
DNase I Footprinting-The probes for DNase I footprinting analysis were prepared essentially as described previously (20). Appropriate genomic regions of the AGA promoter DNA were PCR amplified and subcloned into pGEM3Zf(ϩ) vector (Promega). The construct AGA(Ϫ2 to ϩ279)hGH was used for probe FP1. The regions of human AGA promoter analyzed in DNase footprinting were FP1 (Ϫ2 to ϩ279), FP2 (Ϫ247 to ϩ58), and FP3 (Ϫ471 to Ϫ199) numbered relative to the major transcription start site. Both template and sense strand probes were analyzed (only reactions with the sense strand are shown). DNase I footprint assays with 5-20 g of HeLa cell nuclear extract and 1 footprinting unit of Sp1 protein were carried out essentially as described (20,21). When Sp1 protein was used, nonspecific competitor poly(dI-dC) DNA was not added. The digestion was performed with 0.1-0.6 units of DNase (Promega).

Analysis of Human and Mouse AGA mRNA in Various
Tissues-To study AGA mRNA expression in diverse tissues, Northern blot analysis of ten different human and mouse tissues was performed using commercially available multitissue membranes (Fig. 1). AGA mRNA is detected in all human and mouse tissues studied, except in mouse brain and spleen where mRNA levels are virtually undetectable. In human brain, only the longer transcript is expressed. Since low enzyme levels have been detected in brain (7), we further explored whether the longer 2.2-kb mRNA is translated into a polypeptide. The polyadenylation signals for the shorter mRNA were destroyed by site-directed mutagenesis, and the mutant construct coding for only the longer 2.2-kb mRNA was in vitro expressed in COS-1 cells. A shorter construct containing AGA cDNA was used as a control. Immunoprecipitation analysis demonstrated Determination of Transcription Start Sites of the Human and Mouse AGA Genes-Accurate mapping of the 5Ј ends of the human and mouse AGA genes was accomplished by using primer extension (Fig. 2). In the human AGA gene, one major transcription start site Ϫ298 (relative to the ATG translation start codon) and two minor sites Ϫ286 and Ϫ395 were detected. The initiation of transcription in the mouse AGA gene was scattered in a larger region. Results displayed multiple transcription start sites between nucleotides Ϫ70 and Ϫ142 (relative to ATG). No major transcription start site was present.
Isolation and Characterization of the 5Ј Clones of the Human and Mouse AGA Genes-To isolate the 5Ј regions of the human AGA gene, a human genomic phage library was screened using PCR-amplified genomic and cDNA fragments of AGA as probes. Finally, a 4.8-kb fragment upstream from the first intron of the AGA gene was subcloned into a plasmid vector and sequenced to produce 3.9-kb of novel 5Ј AGA sequence. The mouse AGA gene together with its 5Ј-flanking region has been recently cloned (3). Here we have sequenced a total of 1000 bp of 5Ј upstream region of the mouse AGA gene. A computerized analysis (GCG program) of the human AGA sequence revealed two complete Alu-repeats, one direct and one inverted (data not shown). The sequence homologies of the repeats to the Alu consensus sequence were 82% and 88% respectively. The GC content of the human AGA 5Ј-untranslated region was determined to be 58%, while in the coding region of AGA it was 46%. The GC contents of the mouse AGA 5Ј-untranslated region and coding region were 61% and 47%, respectively.
The alignment of human and mouse 5Ј-flanking sequences by the GCG computer program demonstrated 58.2% homology (Fig. 3A). Subsequent comparison by two different programs, linear sequence, and dot matrix analyses displayed a region of highest homology covering 442 bp from nt Ϫ475 to Ϫ916 (relative to ATG translation start codon) in the human and 453 bp from nt Ϫ550 to Ϫ1002 in the mouse AGA gene (Fig. 3, A and  B). The sequence identity in this particular region was 76.5%. Unexpectedly, approximately 500 bp of the human and mouse AGA gene immediately upstream of the translation initiation site were significantly less homologous than the sequence further upstream. To ascertain that this was not due to any cloning artifact, the human and mouse AGA 5Ј regions were PCR amplified and sequenced from genomic DNA. No changes as compared with the genomic clones could be detected (data not shown). More detailed analysis of the promoter sequences revealed several putative binding sites for transcription factors that are indicated in Fig. 3A.
Functional Analysis of Human AGA Promoter Region-To define the regions accounting for transcriptional activity, seven deletion constructs consisting of variable lengths of the 5Ј region of the human AGA gene were produced (Fig. 4). The fragments including putative regulatory elements were inserted into a promoterless hGH reporter plasmid. HeLa and glioblastoma cells were transiently transfected with the fusion genes, and the transcriptional efficiency of each construct was determined by measuring the amount of hGH secreted into the culture medium. The highest transcriptional efficiencies were obtained with constructs AGA(Ϫ143)hGH in HeLa cells and AGA(ϩ156)hGH in glioblastoma cells (Fig. 4). In HeLa cells, a deletion extending to nt ϩ232 completely abolished the transcriptional activity. The construct AGA(ϩ156)hGH containing three putative Sp1 binding sites restored 36% activity while the construct containing 143 bp upstream of the transcription initiation site was sufficient to produce the highest promoter activity. The activity observed with AGA(Ϫ322)hGH was only sequence of the highest homology and is marked by bold lettering. Putative sites for transcription factor binding found in both the human and mouse sequences are bordered by a box. Binding site motifs found only in either of the sequences are indicated by brackets above (human) or below (mouse) the sequences. Putative binding motifs indicated are TATA (45) 22%, suggesting that the region spanning nt Ϫ322 to Ϫ143 may bind a negatively acting transcription factor (Figs. 3A and 4). This region overlaps with the highest homology area between the human and mouse sequence (Fig. 3C). In glioblastoma cells, the inhibitory effect was milder and detected over a relatively larger area extending from nt Ϫ474 to Ϫ143.
Protein Binding Elements of the AGA Promoter-To determine whether the differences observed in the deletion analysis were related to the actual binding of nuclear proteins, three fragments, FP1-FP3, from the 5Ј-flanking region of the human AGA gene were analyzed by DNase I footprinting assays using purified Sp1 protein or a nuclear protein extract prepared from HeLa cells (Fig. 5, A-D). The locations of the protected fragments were determined from adjacent dideoxy sequencing reactions. With probe FP1, a protected region from nt ϩ214 to ϩ240 was detected using the Sp1 protein (Fig. 5B). This region contains two overlapping Sp1 consensus binding sites (Fig. 3A). With HeLa cell nuclear extract, the footprint is seen in a more restricted region. With probe FP2, no detectable protected regions were observed. Probe FP3, overlapping the inhibitory region identified in the functional analysis, revealed a pro-tected area from nt Ϫ321 to Ϫ292 (Fig. 5D).
Binding of nuclear proteins to the protected regions was further assessed by gel retardation assays. A 20-bp doublestranded oligonucleotide, nt ϩ207 to ϩ226, (5Ј-GGGCGCCAG-GCGGGCGGGGC) containing two Sp1 binding sites, that protected a region in footprinting analysis with FP1 was analyzed with purified human Sp1 protein. The results show formation of a specific complex, which completely disappears in the presence of 100-fold molar excess of an unlabeled Sp1 consensus oligonucleotide (Fig. 6A). Analysis of the protected region detected with probe FP3 in the inhibitory region using a 23-bp double-stranded oligonucleotide, Inh, nt Ϫ322 to Ϫ300, (5Ј-TAGGCCGTTTCTGTTTTTCTTCC), and HeLa cell nuclear extract also revealed one DNA-protein complex (Fig. 6B). In competition assays with an unlabeled Inh oligonucleotide, a gradual decrease in the intensity of the complex is seen as the concentration of the oligonucleotide is increased. In contrast, the intensity of the complex remains unaltered with increasing amounts of an unrelated competitor. This distinct difference detected between the assays with a self-competitor and an unrelated competitor suggests binding of proteins to this particular area, but no precise consensus motifs for known factors could be identified by computer analysis. DISCUSSION The human and mouse AGA genes were found to be expressed in diverse tissues, consistent with the housekeeping role of the enzyme. In mouse brain and spleen, the AGA mRNA was virtually undetectable as judged by the steady-state mRNA levels. However, we have previously shown that AGAspecific mRNA is also present in mouse brain (3). Northern hybridization of the human brain RNA visualized only the longer AGA transcript, which we observed to produce polypeptide as well. To further evaluate this finding, the precise halflives of the two forms of mRNA should be analyzed. The transcription initiation start sites of human and mouse genes were also quite characteristic of a housekeeping gene; multiple start sites were detected. In mouse, however, start site utilization is less well defined. This could implicate that the regulation of AGA has gained more importance during evolution and needs to be more strictly controlled in human.
To analyze the regulation of AGA expression, we isolated the 5Ј-flanking region of the human AGA gene and compared it with the recently cloned mouse AGA 5Ј sequence (3). The human sequence contained an unusually high number of Alu repeats (23), which might be involved in sequence rearrangements. The results of the comparison studies of human and mouse 5Ј sequences were quite surprising; no significant sequence homology was detected up to 500 bp upstream of the translation initiation start codon. Similarly, no conserved proximal promoter elements could be identified. Nevertheless, both human and mouse AGA 5Ј regions are relatively GC-rich, containing several putative Sp1 binding sites. In the human AGA promoter, no TATA box relative to the major transcription start site is present, suggesting that the gene is regulated by a housekeeping-type promoter. There is, however a TATA-like sequence Ϫ28 from one of the minor start sites, but it is probably nonfunctional, since the region was not protected in the footprinting analysis. Conventionally, housekeeping genes involved in the metabolic functions of the cell are considered to be GC-rich and lack a TATA box (24,25). Many genes encoding for lysosomal enzymes fulfill these criteria (26 -32), but human glucocerebrosidase, mouse ␤-hexosaminidase Hexb, and murine ␤-glucuronidase genes do have TATA elements (31,33,34). The lysosomal cathepsin D gene contains a mixed promoter, which has features of a housekeeping gene as well as a functional TATA box, when it is under estrogen regulation (35).
A number of TATA-less genes have been reported to contain initiator elements (Inr) for determination of the transcription initiation site. A loose sequence consensus, 5Ј-YYCAYYYYY-3Ј (Y is pyrimidine), for these elements had been noticed several years ago (36). Smale and Baltimore (37) further restricted the consensus to 5Ј-CTCANTCT-3Ј (transcription initiation at A) in the murine terminal deoxynucleotide transferase promoter. Two other types of initiators, YY1, binding a consensus sequence 5Ј-AANATGGN(G/C)-3Ј (38,39), and E2F, which binds the sequence 5Ј-TTTCGCGC-3Ј in the dihydrofolate reductase promoter, have also been identified (40 -42). In two genes coding for lysosomal enzymes, human ␤-glucuronidase and mouse HEXA, Inr sequence homologies have been detected (19,26). Some homology is seen in the sequence at the human AGA major transcription initiation site (5Ј-TTCCCAATAT-3Ј, initiation at the second T) as well, but the transcription initiation takes place at T instead of A. The presence of the major transcription initiation site in the human AGA gene would justify existence of an Inr element, but it would have a somewhat modified consensus sequence.
The functional analyses of the human AGA gene demonstrated that the first 145 bp upstream of the translation initiation were sufficient to produce the highest promoter activity in glioblastoma cells. In HeLa cells, this region containing three putative Sp1 binding sites exhibited 36% activity, which can be considered as a basal promoter region since the activity clearly exceeded (1.5-fold, data not shown) the activity of the pXGH5 control plasmid (see "Materials and Methods"). However, in HeLa cells, an additional factor binding upstream seems to be required for the highest promoter activity, which is achieved with construct AGA(Ϫ143)hGH. The activity observed with construct AGA(Ϫ2)hGH is not significantly lower either, implicating that the putative binding site for AP-2 could be responsible for this enhanced activity (Fig. 4). Additional consensus sites for CAAT box, AP-1, and Sp1 found in the upstream sequence may be contributing factors. Moreover, the analyses pointed out a 181-bp region displaying a strong inhibitory effect on the reporter expression in HeLa cells. This region maps to the 3Ј end of the human-mouse homologous sequence, possibly suggesting that this stretch of DNA may play an important role in the regulation of AGA. In glioblastoma cells, a weaker inhibitory effect in a larger region was detected. It can be speculated that the expression of AGA is kept low under normal conditions and only in certain situations, when it is needed in higher amounts, will the inhibitory control decline leading to enhanced AGA gene transcription.
Footprinting and gel-shift assays demonstrated binding of Sp1 protein to the same region that was sufficient in the functional analyses to provide the highest promoter activity in glioblastoma cells and basal activity in HeLa cells. Pugh and Tjian have concluded that the same set of basic initiation factors are required in the presence and absence of a TATA sequence, and that Sp1 acts to recruit TFIID to TATA-less promoters (43,44). In the inhibitory region of AGA, a protected area was identified in the footprinting analysis, and the binding of protein(s) was further supported by gel-shift assays. In the competition assays, all the protein was not completely competed off as in Sp1 assays, most probably due to a more complex composition of HeLa nuclear extract. Future identification and purification of bound protein(s) is a prerequisite for detailed characterization of the inhibitory interaction. Moreover, detection of few protected regions in the footprinting assays may be due to weak DNA-protein interactions rather than to their complete absence.
In conclusion, the human aspartylglucosaminidase gene appears to be regulated by a core promoter consisting of two functionally important Sp1 binding sites and, possibly, an additional contributing AP-2 site. Moreover, a more distantly located region exhibiting inhibitory control on gene expression was detected. Subsequent studies in neuron cultures and in the AGU knock-out mouse model will be relevant to further characterize the regulation of the AGA gene, especially in neuronal tissues. The results presented here facilitate the elucidation of molecular pathogenesis of AGU disease and are essential for strategy design of potential gene therapy in the disease.