Molecular cloning and characterization of lysosomal sialic acid O-acetylesterase.

O-Acetylation and de-O-acetylation of sialic acids have been implicated in the regulation of a variety of biological phenomena, including endogenous lectin recognition, tumor antigenicity, virus binding, and complement activation. Applying a strategy designed to identify genes preferentially expressed in active sites of embryonic hematopoiesis, we isolated a novel cDNA from the pluripotent hematopoietic cell line FDCPmixA4 whose open reading frame contained sequences homologous to peptide fragments of a lysosomal sialic acid O-acetylesterase (Lse) previously purified from rat liver, but with no evident similarity to endoplasmic reticulum-derived acetylesterases. The expressed Lse protein exhibits sialic-acid O-acetylesterase activity that is not attributable to a typical serine esterase active site. lse expression is spatially and temporally restricted during embryogenesis, and its mRNA levels correlate with differences in O-acetylesterase activity described in adult tissues and blood cell types. Using interspecific backcross analysis, we further mapped the lse gene to the central region of mouse chromosome 9. This constitutes the first report on the molecular cloning of a sialic acid-specific O-acetylesterase in vertebrates and suggests novel roles for the 9-O-acetyl modification of sialic acids during the development and differentiation of mammalian organisms.

Sialic acids are a family of 9-carbon acidic sugars that constitute terminal units on diverse oligosaccharide chains of glycoconjugates in higher invertebrates and vertebrates. Modifications of the parent sialic acid N-acetylneuraminic acid have been described that show remarkable molecule and tissue specificity as well as developmental regulation (1)(2)(3)(4)(5). However, the relationship between these modifications on sialic acids and specific events in mammalian development is unclear. The availability of cDNAs encoding sialic acid-modifying enzymes is therefore expected to be relevant for the understanding of the role played by sialic acids in the biology of mammalian organ-isms (3,4). One modification of particular interest is the Oacetylation and de-O-acetylation of sialic acids, which determines the presence or absence of the outermost possible structure on typical N-linked oligosaccharides, i.e. O-acetyl esters (6). These esters have been implicated in lectin recognition, cell adhesion, tissue morphogenesis, and a variety of other biological phenomena, including tumor antigenicity, virus binding, and complement activation (1)(2)(3). Enzymes specifically capable of removing O-acetyl esters from the 9-position of sialic acids have been described in viruses (7)(8)(9)(10) and in vertebrates (11)(12)(13), but no cDNA for the latter group has yet been isolated. A disulfide-linked heterodimeric protein with sialic acid-specific 9-O-acetylesterase activity was purified from rat liver membrane compartments and initially called luminal sialic acid O-acetylesterase (Lse) (11,14) to differentiate it from a cytosolic enzyme with similar activity (12,13). The Lse protein was subsequently shown to have a primarily lysosomal location (15) and has hence been renamed lysosomal sialic acid O-acetylesterase (retaining the acronym Lse). Early attempts to clone this esterase by conventional methods were thwarted by poor immunoreactivity and extreme resistance to proteolytic digestion. 1 Only two amino acid sequences were obtained (14), but these were not strongly immunogenic or useful for cloning using degenerate oligonucleotides. 2 We have recently described a novel strategy to identify genes differentially expressed throughout development and potentially involved in hematopoiesis (16,17). Using this strategy, we identified a number of short cDNA fragments representing genes preferentially expressed in the yolk sac and up-regulated during the in vitro development of embryoid bodies (EBs) 3 (17). Here we describe the cloning and characterization of one of these genes, a cDNA encoding the mouse homolog of the purified rat Lse protein. We have expressed the Lse protein, studied the role of the proteolytic cleavage of the immature Lse protein into its disulfide-linked heterodimeric form in generating O-acetylesterase activity (which we attribute to a novel enzymatic mechanism), examined the tissue distribution of the mRNA, and mapped the lse gene on mouse chromosome 9. The molecular cloning of lse should allow the study of the role of sialic acid esterases in development and tumorigenesis, a task that until now was impaired by the lack of any vertebrate cDNA encoding these enzymes.

EXPERIMENTAL PROCEDURES
Isolation of a cDNA encoding Lse-Total RNA derived from the hematopoietic cell line FDCPmixA4 (18), cultured as described (17), was obtained using RNAzol solution (Tel-test Inc., Friendswood, TX) following the manufacturer's instructions. Poly(A) ϩ RNA was selected from FDCPmixA4 total RNA, and a phage cDNA library was made as described (16). The PCR product Clone 165, generated by differential display by PCR (16,17), was labeled by random priming using the Prime-it II kit (Stratagene, La Jolla, CA) and used to screen the FD-CPmixA4-derived library (19). Positive clones were purified and subcloned also as described (16). The complete nucleotide sequence of Clone 165-109 was obtained as described (16) and deposited in GenBank (20) with accession number U40408.
Constructs for Protein Expression-Two constructs were generated for protein expression of Clone 165-109 in COS-7 cells (American Type Cell Culture) in which a tag (FLAG) sequence (21) was introduced in the Lse protein. The open reading frame of Clone 165-109 was amplified using a strategy of PCR mutagenesis (22) designed to allow the introduction of a FLAG peptide sequence either at the C terminus of the protein (LF construct) or in the middle of the two Lse subunits (LFL construct). The primers used for the Lse construct were as follows: sense primer, 5Ј-CACTTTGCGGCCGCGCACCATGGTTTCCCCGGGG-CCTGTGTTTG-3Ј; and antisense primer, 5Ј-CGTACTAGTTTTACTT-GTCATCGTCGTCCTTGTAGTCGATACCCCTGTGTGAAATTTG-3Ј. Two sets of primers for two independent reactions were used in the first round of PCR for the generation of the LFL construct: sense primer (set 1), 5Ј-CACTTTGCGGCCGCGCACCATGGTTTCCCCGGGGCCTGTGT-TTG-3Ј; antisense primer (set 1), 5Ј-GGCGTTCCAAAGGACGGAGTG-TCTTGTCGGTCCCTTGTCATCGTCGTCCTTGTAGTCAGTCACAC-GAACAGATGGGACAACCCTAAAAGG-3Ј; sense primer (set 2), 5Ј-CC-TTTTAGGGTTGTCCCATCTGTTCGTGTGACTGACTACAAGGACGA-CGATGACAAGGGACCGACAAGACACTCCGTCCTTTGGAACGCC-3Ј; and antisense primer (set 2), 5Ј-CCCTTCATTGCTCAAATTTCACAC-AGGGGTATCTAAAACTAGTACG-3Ј. The products that resulted from these two reactions were gel-extracted and amplified in a single second PCR to generate the complete LFL construct by using the sense primer from set 1 and the antisense primer from set 2. Pfu DNA polymerase (Stratagene) was used with 18 cycles of PCR using the following parameters: 94°C for 30 s, 55°C for 1 min, and 72°C for 4 min. These constructs were cloned into the PME18X vector (DNAX) using NotI and SpeI sites incorporated into the 5Ј-and 3Ј-primers, respectively.
Sequence and Structural Analysis of the Lse Protein-The original BLAST program (23) at the National Center for Biotechnology Information (accessed over the Internet at URL http://www.ncbi.nlm.nih. gov/Recipon/bs_seq.html) and an enhanced version called BEAUTY (24) developed at the Human Genome Center at the Baylor College of Medicine (http://dot.imgen.bcm.tmc.edu:9331/seq-search/protein-search. html) were used to comb nonredundant protein and nucleotide data bases for Lse homologs. The search for more distant relatives of Lse employed the sensitive sequence comparison strategies of Altschul et al. (25) and Bork et al. (26); banks of diagnostic sequence patterns, motifs, and profiles were also screened for faint matches to the component Lse domains (for a recent review, see Ref. 27) (http://www.emblheidelberg.de/ϳbork/pattern.html). Predictive analysis of the Lse secondary structure principally utilized the PHD neural network program (28) accessed at the EMBL PHD Internet server (http://www.emblheidelberg.de/predictprotein/predictprotein.html). Potential O-glycosylation sites were suggested by the NetOglyc program (29) (http://www.cbs.dtu.dk/netOglyc/cbsnetOglyc.html).
Protein Expression of the LF and LFL Constructs-COS-7 cells were maintained as described (17). Plasmid DNA was transfected as described (17). Cell lysates and media were collected 3 days after transfection. Lysis buffer (17) was added to the plates, which were kept on ice for 45 min. Lysates were centrifuged to eliminate cell debris (17). Supernatants of centrifuged cell lysates and sterile-filtered media from cultured cells were incubated with anti-FLAG M2 affinity gel (Kodak Scientific Imaging Systems, New Haven, CT) at 4°C overnight and washed four times with phosphate-buffered saline. Immunoprecipitates were eluted, neutralized, and concentrated by precipitation with 24% trichloroacetic acid (Sigma) and 2% deoxycholic sodium salt (Sigma) as described (17). Pellets were eluted in 2 ϫ sample buffer (Novex, San Diego, CA), electrophoresed on 4 -20% Tris/glycine gels (Novex), and transferred to polyvinylidene difluoride membranes (Immobilon-P, Millipore Corp., Bedford, MA). Membranes were exposed to 3% nonfat milk for 1 h at 37°C. Anti-FLAG M2 antibody (Kodak) was used in a 1:2000 dilution. Horseradish peroxidase-conjugated anti-mouse Ig (Amersham Corp.) was also used at a 1:2000 dilution, and peroxidase detection was performed with ECL detection reagents (Amersham Corp.) as recommended. M2-purified LFL construct-derived protein was run on a 8% Tris/glycine gel and processed as described (14) for amino-terminal gas-phase sequencing.
Large-scale Purification of the COS Cell-expressed LFL Construct from Cell Culture Medium-20 dishes of COS cells were seeded at 1 ϫ 10 6 cells/10-cm culture dish in minimum Eagle's medium containing 5% fetal calf serum and incubated overnight at 37°C in 5% CO 2 . Cells were transfected with 10 g of plasmid DNA/plate, with either the LF or LFL construct, using LipofectAMINE (Life Technologies, Inc.) exactly as per the manufacturer's recommendations and were left growing in 10 ml of serum-free medium. 5 h later, an equal volume of medium with 10% serum was added to the cells. ϳ24 h after transfection, the medium was replaced with 10 ml of fresh medium containing 5% serum and collected after 3 days. After centrifugation to remove the cells and filtration to remove any particulate material, the clarified medium was passed through a 1-ml column of Bio-Rad Bio-Gel A-agarose resin (equilibrated in Tris-buffered saline) to preclear any nonspecific binding material. A small volume of Tris-buffered saline was used to wash the column. The medium was then incubated with a 1-ml packed volume of anti-FLAG M2 affinity gel (prepared as per the manufacturer's recommendations) at 4°C overnight with gentle mixing. The medium and resin were poured into a small column, and the medium was then re-passed through the column to maximize the binding of the protein.  (11,14), the aliquot was denatured in 0.1% SDS and 0.1 M ␤-mercaptoethanol as described above, and aliquots with or without endo-␤-N-acetylglucosaminidase H (3 milliunits) in 45 l of 200 mM sodium citrate (pH 5.0) were incubated at 37°C for 18 h.
Assay and Reprecipitation of Sialic Acid O-Acetylesterase Activity-Sialic acid 9-O-acetylesterase activity was assayed and expressed in units exactly as described previously (11,14) using [acetyl-9-3 H]9-Oacetyl-N-acetyl-neuraminic acid as a substrate under conditions where product formation (release of [ 3 H]acetate) was linear with time and added enzyme. Aliquots of the LFL epitope-tagged antibody-eluted preparations were incubated with either anti-FLAG M2 affinity gel or control Bio-Gel A overnight at 4°C in 0.1 M Tris-HCl (pH 8.0) with mixing. The suspension was centrifuged to separate the supernatant from the beads. Additional buffer was added to the beads. The substrate [acetyl-9-3 H]9-O-acetyl-N-acetyl-neuraminic acid was added to both the beads and the supernatant of each sample, and the activity was determined after 3 h at 37°C.
Northern Blot and PCR Analyses-Fetal and adult mouse tissues were isolated, cell lines were cultured, and total RNA was isolated and used for Northern blot analysis as described (17). Large-scale preparations of plasmid DNA containing the differential display by PCR product Clone 165 (17) were made as described (17). Plasmid DNA was cut with EcoRI, gel-extracted with the QIAEX gel extraction kit (QIAGEN Inc.), and random-primed with [␣-32 P]dCTP (Amersham Corp.) as described (17). 20-g samples of total RNA were run on formaldehyde gels (19), transferred to Nytran membranes (Schleicher & Schuell) by standard methods (19), hybridized, and washed at 65°C as described (17).
Southern Blot Analysis and Interspecific Mouse Backcross Mapping-Genomic DNA from HeJ mice was purchased from the Jackson Laboratory (Bar Harbor, Maine). DNA digests were obtained using EcoRI, HindIII, and PstI and processed as described (19) for Southern blot analysis using as probe entire random prime-labeled Clone 165-109 DNA.
Interspecific progeny were generated by mapping (C57BL/6J ϫ Mus spretus)F 1 females and C57BL/6J males as described (41). A total of 205 N 2 mice were used to map the lse locus (see "Results" for details). DNA isolation, restriction enzyme digestion, agarose gel electrophoresis, Southern blot transfer, and hybridization were performed essentially as described (19). All blots were prepared with Hybond-N ϩ (Amersham Corp.). The probe, an ϳ2.7-kb fragment of mouse cDNA, was labeled with [␣-32 P]dCTP using a random prime labeling kit (Stratagene); washing was done to a final stringency of 1.0 ϫ SSCP and 0.1% SDS at 65°C. Major fragments of 7.4 and 5.7 kb were detected in TaqI-digested M. spretus DNA. The presence or absence of the 5.7-kb TaqI M. spretusspecific fragment was followed in backcross mice.
A description of the probes and restriction fragment length polymorphisms for the loci linked to lse, including E26 avian leukemia oncogene 1 (ets1), thymus cell antigen 1 (thy1), and dopamine receptor 2 (drd2), has been reported previously (42). Recombination distances were calculated as described (43) using the computer program SPRETUS MAD-NESS. Gene order was determined by minimizing the number of recombination events required to explain the allele distribution patterns.

Molecular Cloning of a Novel cDNA Encoding Mouse Lse-In
a previous report, we described a strategy directed toward the understanding of early hematopoietic development by isolating cDNAs preferentially expressed in the yolk sac, up-regulated during the in vitro development of EBs, and expressed in the pluripotent hematopoietic cell line FDCPmixA4 (17). Among the PCR products identified with the above-described pattern of expression was a novel 165-base pair DNA sequence designated Clone 165 (17). To isolate its corresponding complete cDNA, we screened a FDCPmixA4 cell line-derived cDNA library using Clone 165-derived DNA as a probe. The representation of Clone 165 cDNA in this phage library was ϳ1/17,000 (after screening ϳ5 ϫ 10 5 independent clones). Positive clones were purified and subcloned into the pZL1 vector, and their size and restriction digest pattern were analyzed. A 2.7-kb insert, designated Clone 165-109, represented the longest cDNA among the clones isolated. Its complete sequence was determined as well as that of another clone with a slightly different restriction endonuclease digestion pattern. The latter clone, designated Clone 165-8, was found to represent a partial 1.5-kb cDNA containing a different polyadenylation site at the 3Ј-untranslated region (Fig. 1A). The predicted peptide sequence of Clone 165-109 indicated a 531-amino acid protein that appeared to be the mouse homolog of the heterodimeric disulfide-bonded Lse protein purified from rat liver (11,14). Pulse-chase studies of the Lse protein in rat hepatoma cells had shown that the mature dimeric form was derived from the gradual processing and internal cleavage of a single-chain protein precursor (14) (Fig. 1B) into an ϳ58-kDa heterodimeric protein with two subunits of ϳ38 and ϳ28 kDa (11,14). As indicated in Fig. 1C, the protein sequences of the mouse and rat Lse proteins are highly similar.
Lse is a lysosomal glycoprotein that traverses the endoplasmic reticulum-Golgi pathway during biosynthesis (14,15). The existence of a secretory signal peptide in mouse Lse is sug-gested by the presence of an N-terminal hydrophobic sequence (Fig. 1A); amino-terminal peptide sequencing reveals that the amino-terminal processing site of the mature Lse protein is located between Gly2Ile at positions 23 and 24 from the first Met (data not shown). Thus, the amino terminus of the mouse Lse protein corresponds precisely to that of the rat Lse small subunit peptide fragment (Fig. 1C). Eight potential N-glycosylation sites (motif of Asn-X-Ser/Thr) are found scattered in the mature Lse sequence (Fig. 1A); three potential O-glycosylation sites (threonines at positions 37, 45, and 51) are also suggested by the NetOglyc neural network program (29).
Relationship of Lse to Existing Esterase Families-Aside from the close relationship of embedded sequences in mouse Lse to the rat Lse peptide fragments (Fig. 1C), a determined search of nonredundant sequence data bases, including the burgeoning bank of human-expressed sequence tags (44), failed to uncover clear homologs of the Lse enzyme. Taking a cue from the proteolytic processing and functional domain division of the Lse chain (11,14), the amino-and carboxyl-terminal domains of Lse (equivalent to the small and large subunits of Fig. 1B) were then used to separately comb sequence and structure pattern data banks; in addition, diverse prediction algorithms (28,45) were applied to the Lse domain sequences in order to classify their prospective fold types and to delineate ␣-helical and ␤-strand segments, en route to a desired structural description of the Lse component folds and catalytic apparatus. However, only a faint but significant match to the 266-residue C-terminal domain of Lse was detected with the carboxylterminal portion of an open reading frame of unknown function at the 3Ј-end of a gene cluster involved in porphyrin biosynthesis from the anaerobic bacterium Clostridium josui (46) (Fig.  1D). Interestingly, the sequence of the bacterial open reading frame divulges an additional, likely incomplete, 159-amino acid N-terminal domain that appears to be unrelated to the mouse Lse amino-terminal (small) subunit. This different evolutionary pairing of domains suggests, first, that the heteromeric Lse enzyme is composed of two distinct globular protein modules and, second, that the function of the large Lse subunit can be married to other enzymatic schemes.
The catalytic activity of Lse has been shown to be abrogated by the serine active-site inhibitors diisopropyl fluorophosphate and diethyl-p-nitrophenyl phosphate; [ 3 H]diisopropyl fluorophosphate specifically labels the small enzyme subunit (11). However, inspection of the N-terminal domain sequence of Lse does not reveal the typical active-site sequence (Gly-X-Ser-X-Gly) of serine active-center esterases (47) or the derivative Gly-Asp-Ser-Arg-Thr/Ser signature of influenza and coronavirus sialic acid 9-O-acetylesterases (48). A diverse number of esterase enzymes that contain the former catalytic serine motif compose a structural superfamily called the ␣/␤-hydrolases (49). These enzymes conserve a catalytic triad of Ser, Asp, and His residues positioned at the carboxyl-terminal end of a parallel ␤-sheet that forms the core of the hydrolase scaffold; similar active-site clefts are found in other doubly wound ␣/␤folds (50). Sequence relationships between different ␣/␤-hydrolase enzymes are typically detected only after structural superposition of equivalent topological features (49). Comparison of the presumed Lse catalytic (small) subunit with this fold family must then rely on structural considerations. Predictive algorithms (28,45) accordingly suggest that both Lse subunits are composed of alternating ␣and ␤-secondary structure (data not shown).
Lse Is a Glycoprotein Detectable in Both the Intra-and Extracellular Compartments-Lysosomal extracts from mouse liver have an activity corresponding to the rat Lse protein, i.e. concanavalin A-binding 9-O-acetylesterase activity (data not shown). However, the previously described polyclonal and monoclonal antibodies against the rat liver Lse protein (14) do not react with these extracts (data not shown). To monitor the production of the Lse protein, we therefore epitope-tagged Clone 165-109 with a FLAG peptide sequence (21) at the C terminus of the predicted open reading frame of the cDNA (LF construct). We also generated a construct for protein expression of mouse Lse in which the FLAG epitope was introduced immediately before the N terminus of the larger Lse subunit (LFL construct) (see Figs. 1 (A-C) and 2A). This allowed us to use the M2 monoclonal antibody, which recognizes the FLAG sequence, both for purification and detection of the protein by Western blot analysis. The LFL construct should also allow proteolytic cleavage using enterokinase, an enzyme for which a recognition site is available at the C terminus of the FLAG sequence, yielding a heterodimeric disulfide-bonded protein similar to the mature rat protein (Fig. 2A).
Following transient transfection of COS cells with these constructs, the cell lysates and media were immunoprecipitated with agarose-conjugated anti-FLAG antibody and subjected to Western blotting with anti-FLAG antibody after separation by nonreducing SDS-polyacrylamide gel electrophoresis. As shown in Fig. 2B (left panel), both constructs yielded an ϳ75-kDa protein. However, no major change in molecular mass was seen after reduction, indicating that the bulk of the expressed protein was not cleaved into the two subunits in COS cells (Fig.  2B, right panel). We observed an ϳ250-kDa protein band under nonreducing conditions that disappeared under reducing conditions and likely represents aggregates of LF and LFL proteins. We also observed that the LFL protein was less immu-noreactive under nonreducing conditions (Figs. 2B and 4), possibly due to the internal location of the FLAG peptide sequence in this protein. The difference in molecular mass from that predicted by the cDNA open reading frame (calculated polypeptide molecular mass is 58.441 kDa using the PEPTIDE-SORT software program from the Genetics Computer Group package) could be due to post-translational modifications (e.g. glycosylation) of the polypeptide by the COS cells. Indeed, as shown in Fig. 3, treatment of the purified secreted LFL protein with endo-␤-N-acetylglucosaminidase H (which cleaves high mannose and hybrid-type N-linked oligosaccharides) resulted in a small but reproducible reduction in apparent molecular mass, and treatment with peptide N-glycosidase F (which cleaves most known N-linked oligosaccharides) gave a further shift to an apparent molecular mass of ϳ62 kDa. Other posttranslational modifications might account for the additional differences in molecular mass. The remaining discrepancy in molecular mass could be explained by other post-translational modifications or by failure of the peptide N-glycosidase F digestion to go to completion because of the extreme difficulty in denaturing the Lse protein. 1 Both possibilities are supported by the fact that the protein band remains somewhat diffuse, even after peptide N-glycosidase F treatment. The appearance of a small amount of low molecular mass material after peptide N-glycosidase F treatment likely represents exposure of the polypeptide backbone to a trace protease that was previously hindered by the oligosaccharide chains.
The protein purified after expression of the LFL construct was subjected to N-terminal gas-phase sequencing. The sequence obtained (Ile-Gly-Phe) corresponded to amino acids 24 -26 of the open reading frame, confirming the identity of the expressed product and revealing the most likely exact location of the signal peptide cleavage site.
Lse Has Sialic Acid O-Acetylesterase Activity-Rat Lse is gradually processed many hours after synthesis (presumably in the lysosome) into an ϳ58-kDa heterodimeric protein with two subunits of ϳ38 and ϳ28 kDa (11,14). However, Western analysis under reducing conditions did not show cleavage of mouse Lse into two subunits in the COS cells following transfection with either the LF or LFL construct, even when cell harvesting was performed more than 80 h after transfection (Fig. 2B, right panel). This could be due to an inability of COS cell proteases to process the polypeptide precursor in the lyso-

FIG. 2. Expression of the LF and LFL constructs in COS-7 cells.
A, diagrammatic representation of the LF and LFL constructs showing the locations where the FLAG sequence was introduced into the Lse protein coding region. B, Western blot analysis of immunoprecipitates from lysates and media of COS-7 cells transiently transfected with the expression vector without insert (PME18X (PME)) and with the LF and LFL constructs. The immunoprecipitates, run under both reducing and nonreducing conditions, were obtained and probed with the anti-FLAG M2 antibody. The precursor peptide obtained with both constructs was not processed into its heterodimeric disulfide form even 80 h after transfection.

FIG. 3. Expressed protein carries N-linked oligosaccharides.
The secreted protein encoded by the LFL construct was purified from the spent medium of transfected cells as described and concentrated, and aliquots were subjected to digestion with endo-␤-N-acetylglucosaminidase H (Endo-H) and peptide N-glycosidase F (PNGase-F) as described under "Experimental Procedures." Treated samples and untreated controls were run on a reducing SDS-polyacrylamide gel and stained with silver. The molecular mass markers indicate the positions of known standards. some or to an interference with the accessibility of the protease cleavage sites due to the introduction of the FLAG sequences. A similar finding was made when the secreted protein was analyzed (Fig. 2B, right panel), but this is in keeping with the fact that the secreted protein from rat hepatoma cells is also not cleaved (14).
To decrease the possibility of endogenous sialic acid esterase contamination and due to the lack of endopeptide cleavage of the intracellular protein expressed in COS-7 cells, we studied the anti-FLAG antibody-purified preparations derived from the medium of cells expressing the LFL construct to detect activity against [O-acetyl-9-3 H]sialic acids. The growth medium from untransfected COS-7 cells yielded no activity (data not shown). The secreted LFL protein was obtained in microgram quantities and in apparently pure form (see Fig. 3 for an example). This molecule did not undergo spontaneous cleavage en route to being secreted (Fig. 4), but instead underwent the expected cleavage with enterokinase treatment in vitro. After enterokinase treatment, a diffuse subunit band of 35 kDa was detectable. This species likely corresponds to the small subunit because the FLAG peptide that is immunoreactive with the anti-FLAG antibody is expected to remain associated with the carboxyl-terminal end of the amino-terminal subunit.
The protein preparation derived from the LFL construct showed intrinsic sialic acid O-acetylesterase activity (950 units/mg), and only a small increase was seen after enterokinase treatment (1300 units/mg). To confirm that the activity seen was not due to a cellular contaminant in the purified preparation, the protein was reprecipitated with anti-FLAG antibody immobilized on agarose. As shown in Fig. 5, the antibody-agarose beads specifically reprecipitated the activity. These results confirm that the murine cDNA that we isolated encodes a protein that is structurally and functionally similar to the previously described rat Lse protein.

Expression of lse during Mouse Development Is Spatially and Temporally Restricted: lse Is Not Expressed in All Blood Cell
Types-We had previously observed two differently sized mRNAs (2.7 and 4.3 kb) reactive with Clone 165 in the yolk sac at day 8.5 of fetal development (17). Here we analyzed the expression of lse mRNA at days 11.5 and 15 of fetal development to evaluate the expression of lse in intraembryonic sites of hematopoiesis. We confirmed a higher abundance of the 4.3-kb message and the same proportional distribution of the two different messages mentioned above in all tissues analyzed (data not shown). At day 11.5, the yolk sac still is a site where the lse mRNA is still abundant, but this changes by day 15 (Fig.  6). Interestingly, the day 11.5 aorta-gonada-mesonephros re-gion and fetal liver, both active sites of embryonic hematopoiesis (17), express lse more abundantly than the head primordium of the embryo. We also found a decrease in lse mRNA in fetal liver from days 11.5 to 15, whereas at the adult stage, this was found to be the organ where the message is most abundant. The tissue distribution of the lse mRNA in the adult generally parallels that reported for the rat lse activity (11), with liver, testis, and kidney being locations of high expression, whereas skeletal muscle, adipose tissue, and heart have lower levels of lse transcripts.
Given the high expression levels of the lse mRNA in the hematopoietic cell line FDCPmixA4 relative to the fibroblastic cell line STO and the neuronal cell line N 2 a, we sought to analyze the expression of lse mRNA by reverse transcription-PCR in a variety of blood cell-derived lines representing both lymphoid and myeloid lineages (Fig. 6B). The pluripotent myeloid precursor cell line NFS60 (51), the virus-transformed erythroleukemic cell line MEL (52), the macrophage cell lines B5A and P388D1, and the mast cell line PT18 (Fig. 6B) were all found to express the lse message. Expression was also found in all the T-lymphocyte populations that were analyzed, including the T-helper-1 clone D1.1, the T-helper-2 clones D10.G4 and CDC25, and cortisone-treated thymocytes, as well as in the pre-B-cell line Clone K and a variety of B-lymphoma-derived cell lines. However, the bipotential B-cell and macrophage precursor cell line BL/3 did not express the lse mRNA even after repeated 30-cycle PCR experiments. The placental stromal cell line PL1.1 and the thymic stroma of cortisone-treated mice express lse mRNA.
In a previous report, we detected, using Northern blot analysis, an up-regulation of the lse mRNA levels by day 6 of the development of EBs (17). We decided to analyze the expression of lse by reverse transcription-PCR to try to detect expression at earlier stages of the in vitro development of embryonic stem cells into EBs. As shown in Fig. 6B (lanes 22-25), we could FIG. 4. Enterokinase treatment of epitope-tagged LFL proteins expressed by transfected COS cells. The secreted protein encoded by the LFL construct was purified from the spent medium of transfected cells as described and concentrated, and aliquots of 0.375 g were subjected to digestion with 2 units of enterokinase following the manufacturer (Biozyme, San Diego, CA) instructions. Reactions were boiled in sample buffer with or without reducing agents, separated by SDS-polyacrylamide gel electrophoresis, and blotted on Bio-Rad Trans-Blot filters, which were probed with anti-FLAG antibody. Note that the unreduced and uncleaved LFL protein is less immunoreactive, but can be detected upon longer exposure (data not shown). EK, samples treated with enterokinase; 2-ME, samples treated with ␤-mercaptoethanol.

FIG. 5. Reprecipitation of the sialic acid O-acetylesterase activity purified from COS cells with the LFL construct.
Portions of the secreted protein encoded by the LFL construct (purified from the spent medium of transfected cells) were incubated with 20 l of anti-FLAG-Sepharose beads or control Bio-Gel A-agarose beads overnight with mixing. After spinning out the beads, the supernatant (SUPNT.) and the beads were assayed separately for sialic acid O-acetylesterase activity as described under "Experimental Procedures." detect lse expression in embryonic stem cells and an apparent decrease in mRNA levels upon the onset of differentiation of these cells followed by an up-regulation at later stages of the development of EBs (day 6 EBs). To exclude the possibility that the signals could be due to genomic DNA contamination of the RNA, we determined that genomic DNA from HeJ mice produces a distinctive ϳ600-base pair PCR product, but not the 494-base pair product observed with these various cDNAs (data not shown).
Chromosomal Mapping of Mouse lse-Southern blot analysis of mouse lse with three different enzymes allowed us to identify a pattern consistent with the existence of a single lse gene that extends over an ϳ15-kb genomic DNA sequence (data not shown). The mouse chromosomal location of lse was determined by interspecific backcross analysis using progeny derived from matings of [(C57BL/6J ϫ M. spretus)F 1 ϫ C57BL/ 6J] mice. This interspecific backcross mapping panel has been typed for over 1900 loci that are well distributed among all the autosomes as well as the X chromosome (41). C57BL/6J and M. spretus DNAs were digested with several enzymes and analyzed by Southern blot hybridization for informative restriction fragment length polymorphisms using a mouse cDNA probe. The 5.7-kb TaqI M. spretus restriction fragment length polymorphism (see "Experimental Procedures") was used to follow the segregation of the lse locus in backcross mice. The mapping results indicate that lse is located in the central region of mouse chromosome 9, linked to ets1, thy1, and drd2. Although 154 mice were analyzed for every marker and are shown in the segregation analysis (Fig. 7), up to 162 were typed for some pairs of markers. Each locus was analyzed in pairwise combinations for recombination frequencies using the additional data. The ratios of the total number of mice exhibiting recombinant chromosomes to the total number of mice analyzed for each pair of loci and the most likely gene order are as follows: centromere, ets1, 6:158, lse, 11:162, thy1, 2:162, Drd2. The recombination frequencies (expressed as genetic distances in centimorgans Ϯ S.E.) are as follows: ets1, 3.8 Ϯ 1.5, lse, 6.8 Ϯ 2.0, thy1, 1.2 Ϯ 0.9, drd2. The central region of mouse chromosome 9 is syntenic with human chromosome 11q (summarized in Fig. 7), suggesting that human lse will reside on chromosome 11q as well.
We have compared our interspecific map of chromosome 9 with a composite mouse linkage map that reports the location of many uncloned mutations (provided by Mouse Genome Data Base, a computerized data base maintained at the Jackson Laboratory). There are four known mouse mutations that map in a similar region of chromosome 9: luxoid (lu), variable spotting (vs), rough fur (rf), and rough coat (rc). Of these, DNA was available for the first three, from the Jackson Laboratory. Southern blots of restriction digests similar to those of mice with identical background but no abnormal phenotype showed no obvious genomic differences in their DNAs (data not shown). DISCUSSION We have isolated a cDNA that encodes the sialic acid-specific modifying enzyme Lse and showed that mouse Lse has the sialic acid esterase activity originally described for the purified rat protein. Interestingly, and as previously shown for the rat Lse protein (14), mouse Lse is also partially secreted. Future studies will explore if the Lse protein secreted into the extracellular compartment can be taken up by cells and rendered active by proteolytic digestion and if the uncleaved native secreted Lse protein is enzymatically competent. In this regard, it would be interesting to know if this lysosomal protein carries the mannose 6-phosphate recognition marker found on most other soluble lysosomal enzymes with a similar itinerary (53).
All sialic acid esterase activities studied to date can be irreversibly inactivated by diisopropyl fluorophosphate (48), indicating the presence of a Ser active-site mechanism analogous to the extended family of serine esterases and proteases (47). Prior studies with rat Lse have intimated that the small subunit (here shown to be the amino-terminal domain) was covalently modified by [ 3 H]diisopropyl fluorophosphate (11). The notable absence of a hydrolase-like catalytic Ser motif (Gly-X-Ser-X-Gly) in Lse is tempered by the recent finding of a bona fide ␣/␤-hydrolase fold in a myristoyl-acyl carrier protein-specific thioesterase from Vibrio harveyi where the active-site Ser residue is embedded in an Ala-Ala-Ser-Leu-Ser motif (54). In this vein, an imperfect hydrolase Ser motif (Gly-Gln-Ser, residues 102-104) is evident in Lse, forming a loop between a strongly predicted ␤-strand and an ␣-helix; a potential activesite His residue is present at position 163 in a Gly-His-Gly sequence similar to other ␣/␤-hydrolase His motifs (49). Alternatively, the Lse catalytic subunit could fold with an ␣/␤topology distinct from ␣/␤-hydrolases. For example, the bacterial response regulator protein CheB, a Ser active-center methylesterase, has a doubly wound ␣/␤-fold distinct from the hydrolase superfamily with a convergent catalytic triad of Ser, His, and Asp (55). As other mammalian homologs of Lse are characterized, it may be possible both to sharpen the predicted secondary structure of Lse and to winnow the number of catalytically relevant residues by conservation analysis (28,56), allowing more accurate investigation of specific candidate motifs for enzymatic activity.
Our lack of knowledge of the precise biological functions of this enzyme makes it difficult to predict the phenotype that might be expected for an alteration in its activity in the intact animal. lse mapped in a region of the composite map that includes the mouse mutations luxoid (lu), variable spotting (vs), and rough fur (rf). A preliminary screen of these mutants (lu, vs, and rf) failed to show gross alterations of the lse genomic region. However, this result does not rule out minor mutations that must be investigated at the nucleotide sequence level. In this regard, it is important to note that earlier studies (57) reported that the extent of 9-O-acetylation of murine erythrocyte sialic acids was affected by an unknown locus on chromosome 9; one possibility is that this locus corresponds to lse and that allelic variations in the enzyme are involved in this polymorphism.
As a lysosomal enzyme, the Lse protein might be involved only in the terminal degradation of O-acetylated sialic acids. On the other hand, the differential expression of Lse mRNA throughout both in vivo and in vitro development suggests a role for the de-O-acetylation of sialic acids in developmental processes. If so, this would explain why the cellular enzyme binds well to concanavalin A, while the secreted form shows only a small shift with endo-␤-N-acetylglucosaminidase H digestion. In this regard, transgenic mice constitutively overexpressing a viral protein with de-O-acetylation activity at the cell surface have been found to arrest development at the two-cell stage embryo or to show developmental abnormalities at later stages (58). More important, we have observed low levels of lse mRNA upon induction of embryonic stem cell differentiation, followed by an increase in lse mRNA levels at later stages of the in vitro development of EBs (Fig. 6B) (16). An interesting aspect to note is the preferential expression of the message in active sites of hematopoiesis during fetal development: the yolk sac, aorta-gonada-mesonephros region (17), and fetal liver. Moreover, the pluripotent hematopoietic cell line FDCPmixA4 shows higher levels of the message than the fibroblastic cell line STO or the neuronal cell line N 2 a. It will be challenging to find out what is the role played by this sialic acid modification in the yolk sac or during the early stages of the in vitro development of EBs. It is possible that by influencing the hydrophobicity, conformation, or structure of glycoconjugates, de-O-acetylation may influence the cellular interactions that occur during embryogenesis and possibly have a role in the development of blood and/or primordial germinative cells as these cells migrate from the yolk sac to home to their intraembryonic compartments.
Analysis of lse expression in blood cell lines showed that the message is widely distributed among various blood cell populations. However, the finding of a bipotential B-cell and myeloid progenitor that does not express the message points to the possibility that lse may be differentially expressed throughout successive steps of the ontogeny of different blood cell types, namely B-and T-lymphocytes. Indeed, selective expression of O-acetylated sialic acids has been reported in subpopulations of thymocytes and peripheral leucocytes (59 -63). Further studies involving in situ staining techniques to analyze the expression of lse in lymphoid organs and purified populations of lymphocytes will be required to characterize these differences. Moreover, immunohistological studies of lymphoid tissues show that potential ligands capable of mediating CD22␤-dependent B-cell adhesion events are masked by 9-O-acetylation of sialic acids on specific cell types and are regionally distributed (62). Specifically, it was found that masking of CD22␤ ligands by 9-Oacetylation is differentially regulated on the same cell type between two different lymphoid organs. The availability of lse cDNA presented here, in conjunction with studies on the differential activity of the enzymes involved in the synthesis of 9-O-acetylated sialic acids, will help to explore the involvement of 9-O-acetyl esters in the molecular mechanisms by which lymphocytes within lymphoid tissues are segregated into distinct microdomains (62). Furthermore, T-cells of patients with various malignancies have been reported to acquire O-acetylation, and the addition of a single O-acetyl group to a cell surface-associated ganglioside can create an unusual cell typespecific antigen (63). The limited exploration of these matters is in great part due to the lack of mammalian cDNAs encoding the enzymes that regulate O-acetylation. This report will therefore constitute an important advance in the study of the role of de-O-acetylation of sialic acids in various aspects of mammalian biology, including development, hematopoiesis, and tumorigenesis.