Expression of a Functional Drosophila melanogaster CMP-sialic Acid Synthetase

CMP-N-acetylneuraminic acid is a critical metabolite in the generation of glycoconjugates that play a role in development and other physiological processes. Whereas pathways for its generation are firmly established in vertebrates, the presence and function of the relevant synthetic enzyme in insects and other protostomes is unknown. In this study, we characterize the first functional CMP-sialic acid synthase (DmCSAS) from any protostome lineage expressed from a D. melanogaster cDNA clone. Homologous genes were subsequently identified in other insect species. The gene is developmentally regulated, with expression first appearing at 12–24 h of embryogenesis, low expression through larval and pupal stages, and greatly enriched expression in the adult head, suggesting a possible role in the central nervous system. Activity of the enzyme was verified by an increase in in vitro and in vivo CMP-N-acetylneuraminic acid levels when expressed in a heterologous host. Unlike all known vertebrate CMP-sialic acid synthetase (CSAS) proteins that localize to the nucleus, the D. melanogaster CSAS protein was targeted to the Golgi compartment when expressed in both heterologous mammalian and insect cell lines. Replacement of the N-terminal leader sequence of DmCSAS with the human CSAS N-terminal sequence resulted in the redirection of the chimeric CSAS protein to the nucleus but with a concomitant loss of enzymatic activity. The localization of CSAS orthologs to different intracellular organelles represents, to our knowledge, the first example of differential protein targeting of orthologs in eukaryotes and reveals how the sialylation pathway diverged during the evolution of protostomes and deuterostomes.

Sialic acids are a diverse family of negatively charged nine carbon 2-keto-3-deoxy sugars located at the termini of glycoproteins and glycolipids (1). More than 50 different types of naturally occurring sialic acids have been identified, with the most abundant sialic acid being N-acetylneuraminic acid (Neu5Ac). 2 Sialic acid can be present as a monomer or in a polymeric form at the termini of glycoconjugates and can be attached to acceptors in either an ␣(2,6), ␣(2,3), or ␣ (2,8) linkage, which is determined by the specificity of different sialyltransferases (1). In mammals, the presence of sialic acid on proteins regulates their circulatory half-life, and sialic acids serve as ligands for endogenous lectins of the inflammatory and immune responses (2). The sialylation pattern of lipids and proteins displayed on the cell surface regulates cell-cell interactions in normal development, and altered patterns are associated with tumorigenesis and oncogenic transformations in vertebrates (1,3,4). The regulated presence or absence of polysialic acid polymers on the neural cell adhesion molecule is required for proper establishment of the vertebrate embryonic nervous system (5,6) and is associated with changes in neuronal plasticity that occur during learning and memory (7).
Expression of sialic acids is widespread in the deuterostome lineage (vertebrates, ascidians, and echinoderms). Furthermore, certain pathogenic and commensal bacteria, viruses, and fungi are found to contain sialic acid as well (8). On the other hand, sialic acid has been more difficult to detect in the protostome lineage (annelids, arthropods, and mollusks) (8,9). However, since sialic acid was first reported to be present in Drosophila melanogaster embryos using lectin histochemistry and immunostaining (10,11), a number of other studies using biochemical and genomic approaches have supported the presence of sialic acids in insects. Sialic acid in both a ␣(2,6) and ␣ (2,8) linkage was detected in the vacuoles of the Malpighian tubules of Philaenus spumarius (cicada) larvae using both histochemistry and gas-liquid chromatography-mass spectroscopy (12). Whereas insect glycoproteins typically terminate with Man or GlcNAc in paucimannose, oligomannose, or hybrid structures (13,14), two insect cell lines, one from Pseudaletia unipuncta (A7S) and one from Danaus plexippus (DpN1) have been shown to produce more complex glycan structures on expressed recombinant proteins, and DpN1 was shown to produce some sialylated glycans as well (15). Additionally, Watanabe et al. (17) showed that when recombinant proteins were expressed in Trichoplusia ni (Tn-5B1-4) cells in the presence of a hexosaminidase inhibitor, some sialylated glycans were detected. Using the availability of the complete Drosophila genome (19), genes encoding enzymes in the sialic acid pathway have been identified and cloned. Expression of a functional D. melanogaster sialic acid 9-phosphate synthase (16) and ␣(2,6)-sialyltransferase (18) provided genomic evidence for the presence of the sialic acid pathway in insects.
In all known biological systems, sialylation of both lipids and proteins requires the metabolic generation of the sugar nucleotide CMP-Neu5Ac by a CMP-sialic acid synthetase (CSAS). The CMP-Neu5Ac is then transferred to an acceptor oligosaccharide in the Golgi apparatus by sialyltransferases (20). In the present study, we have identified and characterized a single D. melanogaster ortholog (DmCSAS) of mammalian and bacterial CMP-sialic acid synthetases as further evidence for the presence of the sialylation pathway in insects. In contrast to the vertebrate CMP-sialic acid synthetases, which are comprised of an N-terminal catalytic domain and an additional carboxyl-terminal domain, DmCSAS includes only the catalytic domain, similar to some bacterial enzymes, such as that of Neisseria meningitidis. In this study, DmCSAS is shown to be functional both in vivo and in vitro by complementation of a mammalian cell line, LEC29.Lec32 (21), which lacks CMP-sialic acid synthetase activity. Whereas mammalian CMP-sialic acid synthetase localizes to the nucleus (20,22), the D. melanogaster enzyme is found primarily in the Golgi apparatus. In accordance with their differing intracellular locations, the mammalian protein begins with an N-terminal sequence containing a potential nuclear localization signal (NLS), whereas the D. melanogaster protein begins with an N-terminal sequence rich in hydrophobic amino acids characteristic of a signal/anchoring sequence. The different compartmental locations observed for the mammalian and insect CMP-sialic acid synthetases represent, to our knowledge, the first example of functionally equivalent enzymes localizing to different compartments in different eukaryotes. The DmCSAS exhibits its highest levels of mRNA expression during 14 -17 h of embryonic development, similar to that observed for the D. melanogaster sialyltransferase (18). This occurrence is consistent with the finding of Roth et al. (10) concerning the presence of sialic acid and polysialic acid in specific stages of Drosophila development. In contrast to the ubiquitous tissue distribution of human CMP-sialic acid synthetase (HsCSAS) (23), expression of the D. melanogaster mRNA is observed predominantly in the head in adults.

MATERIALS AND METHODS
Gene Identification, Isolation of a cDNA Clone, and DNA Sequencing-A BLAST search was performed using the tBLASTn algorithm at NCBI with the amino acid sequence of either the bacterial CMP-sialic acid synthetase (NeuA) (GenBank TM accession number J05023) or the mouse CMP-sialic acid synthetase (MmCSAS) (GenBank TM accession number MMU6215) as the query sequence. Two genomic clones (AE003515 and AC015229) representing the same genomic sequence on chromosome 3L in band 76D8 had significant homology to the query sequence. In addition, a single corresponding cDNA, GenBank TM accession number RH14248, was obtained from the normalized head library. An alignment of the predicted amino acid sequence of the DmC-SAS with other CMP-sialic acid synthetases suggested that this cDNA was truncated at the 5Ј end, since translation from the first ATG would yield a protein lacking a highly conserved and functionally indispensable part of the protein. A full-length coding region of DmCSAS was obtained by RT-PCR using a forward primer derived from the genomic sequence using the nearest upstream ATG from the first ATG in the original cDNA and a reverse strand primer derived from the end of the coding region of the original cDNA. The predicted amino acid sequence of the cDNA obtained by RT-PCR now included a region shared by all members of the CMP-sialic acid synthetase family as well as an N-terminal extension rich in hydrophobic amino acids, typical of secretory signal sequences. The forward primer (CSA4; all primers are given in Table 1) contained a BamHI site, a Kozak sequence (GCCATC), and sequence corresponding to the first eight codons of DmCSAS. The reverse strand primer (CSA6N) contained a HindIII site, two in frame stop codons, and sequence representing the last seven codons of DmCSAS. Total RNA prepared by the TRIzol method (Invitrogen) from D. melanogaster (Oregon R, P2) 14 -17-h dechorionated embryos treated with amplification grade DNase I (Invitrogen) was used as the template. First strand cDNA synthesis was performed using 0.6 g of template RNA with the 3Ј-SMART rapid amplification of cDNA ends kit (BD Biosciences Clontech, Palo Alto, CA). 2.5 l (of the 110-l reverse transcription reaction) was introduced into a 50-l PCR and amplified with Taq Gold using an Applied Biosystems GeneAmp 2400 thermal cycler using the following cycle settings: 95°C for 5 min; 35 cycles at 95°C for 1 min, 55°C for 1.5 min, and 72°C for 2 min; 72°C for 10 min; hold at 4°C. PCR reagents were purchased from Applied Biosystems (Foster City, CA). The 776-bp product was subcloned into the baculovirus vector pBlueBac4.5 (Invitrogen). The DNA sequence of this construct, pBlueBac-DmCSAS4, was determined on both strands using BigDye terminators (PerkinElmer Life Sciences) by the Nucleic Acid/ Protein Core Research Facility of the Children's Hospital of Philadelphia. The DNA sequence of the cDNA matched the genomic sequence, except it lacked three introns of 52, 56, and 55 bp.
Amplification of Staged Drosophila cDNA Libraries, RNA, and cDNA-Total RNA was isolated from staged Drosophila or S2 cells as described. PCR was performed with either 1 l of the head or larvalearly pupal cDNA libraries (obtained from the Berkeley Drosophila Genome Project consortium), or RT-PCR was performed with staged total RNA as described (42) using 0.6 g of RNA and Moloney murine leukemia virus reverse transcriptase (New England Biolabs, Beverly, MA) in 20 l, which was subsequently introduced into a 100-l PCR. PCR was performed with the forward primer CS1 and reverse strand primer CS5 as described above, except that 31 cycles were performed. PCR of the Drosophila Rapid-Scan Expression panels (Origene, Gaithersburg, MD) was performed in 25-l reactions using the same PCR conditions already described, except that 35 cycles were used. Each row 5Ј-AGTGGTCGACGGCTCAGACAATTTTCCGTGCACCC of the panel contains different amounts of cDNA, from 1ϫ (1 pg of first strand cDNA) to 1000ϫ (1 ng of cDNA), and cDNA for each developmental stage and tissue has been normalized to yield equivalent amplification of a ribosomal protein, RP49. PCR in 48-well microtiter plates (panels) was performed using an Applied Systems GeneAmp 9600 thermal cycler. Construction of Enhanced Green Fluorescent Protein (eGFP)-tagged Human and D. melanogaster CSAS and Human and D. melanogaster CSAS with Swapped Leader Sequences-To determine the intracellular localization of CSAS proteins, both the D. melanogaster CSAS and the human CSAS (23) cDNAs were subcloned in plasmid pEGFP-N2 (BD Biosciences Clontech) to create a CSAS fusion protein with the eGFP. The resultant plasmids are called DmCSAS-GFP and HsCSAS-GFP. The coding region of DmCSAS was amplified by PCR of pBlueBac-DmCSAS4, using the forward primer, CSA10, containing a BglII site, an artificial Kozak sequence, and eight codons of DmCSAS and the reverse strand primer, CSA9, containing the last eight codons of DmCSAS, but not the stop codon, and an artificial SalI site. An extra C was added before the SalI site to keep the correct reading frame with eGFP. Similarly, the HsCSAS coding region was amplified by PCR using the forward primer, HsCS1, containing an artificial BglII site, a Kozak sequence, and the first seven codons of HsCSAS and the reverse strand primer, HsCS2, containing the last nine codons of HsCSAS, excluding the stop codon, an extra C, and an artificial EcoRI site. To determine the intracellular location of DmCSAS in Spodoptera frugiperda, the CSAS-GFP fusion protein was shuttled from the pEGFP vector into the baculovirus vector, pBlueBac4.5 by PCR amplification using the forward primer CSA10 and the reverse primer, CSA11 containing the last seven codons of GFP, two stop codons, and an artificial HindIII site. The resultant plasmid is called pBlueBac-DmCSAS-GFP. In order to ascertain whether the N-terminal sequences of both DmCSAS and HsCSAS were responsible for their distinct intracellular targeting, constructs were made that swapped the leader sequences of each protein with the other. An inspection of an alignment of many CSAS proteins (Fig. 1B), showed that eukaryotic CSAS proteins are longer than bacterial CSAS proteins at their N terminus and that these sequences are either rich in basic amino acids (mammalian), characteristic of NLS, or rich in hydrophobic amino acids (insects), characteristic of secretory signal sequences. We assigned sequences upstream of the start of the Escherichia coli CSAS in the alignment as "leader" sequences; thus, HsCSAS has a 40-amino acid leader, and DmCSAS has a 27-amino acid leader. A construct expressing the human CSAS protein with the D. melanogaster leader, DNHsC-GFP, was obtained by separately amplifying by PCR the leader of DmC-SAS using DmCSAS-GFP and the forward primer CSA10 and the reverse strand primer CSA13 containing eight codons of DmCSAS preceding the fusion point and five codons of HsCSAS past the fusion point and the nonleader fragment of HsCSAS using HsCSAS-GFP and the forward primer HsCS4 containing five codons of DmCSAS preceding the fusion point and eight codons of HsCSAS past the fusion point and the reverse strand primer, HsCS2. The resultant 113-bp leader fragment of DmCSAS was mixed with the 1207 bp downstream of the leader fragment of HsCSAS, which contain 30 nucleotides of complementarity, and amplified for 12 cycles with the outside primers, CSA10 and HsCS2, and the fused fragment was cloned into the BglII/EcoRI site of pEGFP-N2. Similarly, a construct expressing the D. melanogaster CSAS protein with the human leader, HsNDC-GFP, was obtained by separately amplifying by PCR the leader of HsCSAS using HsCSAS-GFP, the forward primer HsCS1, and the reverse strand primer HsCS3 (containing seven codons of HsCSAS upstream of the fusion point and five codons of DmC-SAS past the fusion point) and the downstream of leader fragment of DmCSAS using DmCSAS-GFP, the forward primer CSA12 (containing five codons of HsCSAS preceding the fusion point and nine codons of DmCSAS past the fusion point), and the reverse strand primer CSA9. The resultant 151-bp human leader was mixed with the 688-bp DmC-SAS downstream of leader fragment, which contained 30 nucleotides of complementarity, and amplified for 12 cycles using the outside primers HsCS1 and CSA9, and the fused fragment was cloned into the BglII/SalI site of pEGFP-N2. All final constructs were sequenced.
Cell Culture-LEC29.Lec32 cells (21), which are deficient in CMP-Neu5Ac synthetase activity, were grown at 37°C in a humidified atmosphere with 5% CO 2 in ␣-MEM (Invitrogen) medium supplemented with DNA and RNA precursors and 10% fetal bovine serum (FBS). 1 ϫ 10 6 cells were plated on each well of a 6-well plate. After 24 h, the cells were transfected with 4 g of DNA using Lipofectamine 2000 (Invitrogen) reagent. The cells were harvested 36 h post-transfection. The cells were washed once with Ca 2ϩ -, Mg 2ϩ -free PBS (Invitrogen) and harvested in 300 l of mammalian protein extraction reagent (Pierce) containing HALT protease inhibitor mixture (Pierce). COS-7 cells were cultured in Dulbecco's modified Eagle's medium (Invitrogen) supplemented with 10% FBS in conditions similar to LEC29.Lec32 cells. Spodoptera Sf-9 were grown in serum-free Sf900 medium (Invitrogen) at 27°C.
Western Blotting and Detection of DmCSAS-The total protein of the cell lysate extracted with mammalian protein extraction reagent was determined using the BCA assay kit (Pierce) with a 96-well plate reader (Molecular Devices, Sunnyvale, CA). 50 g of prepared protein was separated on a 12% SDS-polyacrylamide gel. Following electrophoresis, the proteins were transferred to a nitrocellulose membrane. The membrane was blocked with 10% blotting grade nonfat dry milk (Bio-Rad) in Tris-buffered saline, Tween 20 (TBST). DmCSAS-GFP was detected using a polyclonal mouse-anti-GFP (Santa Cruz Biotechnology, Inc., Santa Cruz, CA) antibody (1:1000 dilution) and visualized using a polyclonal goat anti-mouse IgG conjugated with horseradish peroxidase (1:1000 dilution) (Santa Cruz Biotechnology) and chemiluminescence substrate (Pierce).
Immunostaining for Protein Localization-Immunostaining of COS-7 cells expressing either DmCSAS-GFP or HsCSAS-GFP was performed using organelle-specific markers and viewed by confocal microscopy. The CMP-sialic acid synthetase localization was determined by the colocalization of the eGFP-tagged protein and the organelle markers. COS-7 cells were plated in 4.2-cm 2 chamber slides (LabTek, Campbell, CA) at a density of 200,000 cells/well. The cells were transfected after 12 h with 1 g of DNA using Lipofectamine 2000. After 24 h, the cells were washed with PBS. The cells were then fixed with 4% formalin (Richard-Allan Scientific, Kalamazoo, MI) in PBS. After 15 min, the cells were permeabilized with 0.05% Triton X-100 (Sigma) in PBS for 2 min. The cells were then washed once with PBS. For nuclear staining, the cells were then incubated with 1 M To-PRO-3 (Molecular Probes, Inc., Eugene, OR) for 20 min and then washed twice with PBS. For staining the endoplasmic reticulum (ER), the permeabilized cells were washed with PBS and blocked with 8% FBS in PBS for 1 h. The cells were then washed three times with PBS and incubated with polyclonal rabbit anticalnexin antibody (Stressgen, CA) (1:150 dilution in PBS containing 3% FBS). After three additional washes with PBS, the cells were incubated with a polyclonal goat anti-rabbit IgG antibody conjugated with Alexa 546 (Molecular Probes) (1:1000 dilution in PBS containing 3% FBS). Similarly, for Golgi staining, the procedure was the same as for ER staining, except a polyclonal rabbit anti-giantin antibody (Covance Research Products, Denver, PA) (1:1000 in PBS containing 3% FBS) was used as the primary antibody. Images were obtained with a Zeiss LSM 510 Meta Confocal microscope.
Immunostaining in Sf-9 cells was performed by the method described below. Sf-9 cells were plated in 4.2-cm 2 chamber slides at a density of 100,000 cells/well. After 24 h, the cells were infected with a baculovirus containing DmCSAS-GFP (AcDmCSAS-GFP). 2-3 days postinfections, the cells were washed with PBS. These cells were then fixed as described above, and the Golgi was stained by incubating the cells with the Golgi-specific marker Bodipy-TR ceramide (Molecular Probes) at 200 nM concentration for 1 h. The cells were subsequently washed with PBS and viewed under the microscope.
In Vitro CMP-sialic Acid Synthetase Assay-The in vitro CMP-sialic acid synthetase assay was performed as previously described (23). Briefly, 40 ml of cell lysate was added to the 100 l of substrate mixture (0.2 M Tris, 0.2 mM dithiothreitiol, 20 mM MgCl 2 , 5.5 mM CTP, and 2.8 mM Neu5Ac) and incubated for 40 min at 37°C. The reaction was stopped by adding 320 l of ice-cold ethanol. The mixture was subsequently centrifuged for 10 min at maximum speed (14,000 rpm) in a centrifuge (Beckman, Fullerton, CA) at 4°C, and the supernatant was lyophilized. The lyophilized sample was resuspended in 120 l of 40 mM phosphate buffer at pH 9.2 and filtered through a 10,000 molecular weight cut-off microcentrifuge membrane (Millipore, Billerica, MA). The CMPsialic acids were analyzed by high performance anion exchange chromatography (HPAEC) as described previously (24). The CMP-sialic acid contents were normalized with respect to the total protein determined using a BCA assay kit (Pierce) with a 96-well plate reader (Molecular Devices). The activity is reported as pmol of CMP-Neu5Ac/g of total protein/min.
In Vivo Synthesis of CMP-Neu5Ac-The in vivo activity of both DmCSAS-GFP and HsCSAS-GFP was measured as the amount of CMP-Neu5Ac produced in LEC29.Lec32 cells transfected with the CSAS constructs. LEC29.Lec32 cells were plated in a 6-well plate at a density of 1 ϫ 10 6 cells/well. After 24 h, 4 g of plasmid DNA was used to transfect the cells as described above. 48 h post-transfection, the cells were washed once with Ca 2ϩ and Mg 2ϩ -free PBS and lysed in ice-cold 75% ethanol (300 l) using a Tekmar Sonic Disruptor (Cincinnati, OH). The soluble fraction obtained after centrifugation at 14,000 rpm for 10 min was lyophilized. The lyophilized samples were then resuspended in 120 l of 40 mM phosphate buffer, pH 9.2, filtered, and analyzed by HPAEC as described previously (24). The CMP-Neu5Ac contents were normalized with respect to total protein present in the soluble lysate obtained by sonication of PBS washed cells, which had been resuspended in water, using the BCA protein assay kit (Pierce).

Identification of a D. melanogaster CMP-sialic Acid Synthetase
Gene-The amino acid sequences of both the bacterial CMP-sialic acid synthetase (NeuA) (GenBank TM accession number J05023) and the mouse CMP-sialic acid synthetase (GenBank TM accession number MMU6215) were used to query the Drosophila genome data base for genomic DNA encoding homologous sequences (19,25). Two genomic clones (AE003515 and AC015229) representing the same genomic sequence on chromosome 3L in band 76D8 had significant homology to the query sequence. In addition, a single corresponding cDNA, Gen-Bank TM accession number RH14248, was obtained from the normalized head library. An alignment of the predicted amino acid sequence of the DmCSAS with other CSASs suggested that this cDNA was truncated at the 5Ј end, since translation from the first ATG would yield a protein lacking a highly conserved region of the protein involved in binding to the CMP moiety (26). Indeed, examination of the genomic sequence revealed another putative ATG start sequence with an adjacent consensus Drosophila Kozak sequence 35 bp upstream of the start of the RH14248 cDNA. Furthermore, the genomic region 3Ј downstream from this ATG included a sequence that encoded, in frame, a putative CMP binding domain homologous to known domains present in other CSAS proteins. Further analysis of the 300 bp of genomic sequence upstream from the start of RH14248 cDNA indicated only one other ATG that would yield an open reading frame in frame with the Dm-CSAS coding region, but it lacked a consensus Drosophila Kozak sequence. Furthermore, comparison of the putative DmCSAS coding region with the corresponding CSAS coding regions of three related Drosophila species (Drosophila yakuba, Drosophila erecta, and Drosophila ananassae) ( Table 2) revealed that only protein sequences translated from the nearest upstream genomic ATG were conserved. Translation from more upstream ATG codons generated protein sequences that either were not conserved in these other insects or did not generate an open reading frame in frame with the DmCSAS coding region. A full-length coding region of a putative DmCSAS was then obtained by RT-PCR of total RNA from 14 -17-h embryos using a forward primer derived from the genomic sequence of this immediate upstream ATG and a reverse strand primer derived from the end of the coding region of the original cDNA. Since the forward primer ended 12 bp upstream of the start of the RH14248 cDNA, successful RT-PCR using this primer demonstrated that the longer sequence is indeed present in native DmCSAS mRNA. This longer cDNA is predicted to encode a protein with 61 additional N-terminal amino acids, including the invariant CMP substrate binding loop. The DNA sequence of the cDNA matched the genomic sequence, except it lacked three introns of 52, 56,

Protein
Sequence and 55 bp, as did cDNA RH14248. The cDNA for this putative DmCSAS is predicted to encode a protein of 248 amino acids, with a molecular mass of 27.5 kDa (Fig. 1A). BLAST searches also identified a putative CMP-sialic acid synthetase in several related Drosophila species noted above as well as in the mosquito, Anopheles gambiae (GenBank TM accession number AAAB01008898). However, we were unable to identify the start codon in A. gambiae, since all insect CSAS examined have an intron of variable size located between the start codon and the CMP binding domain. The original D. melanogaster cDNA clone, RH14248, spanned the region that contained this intron in its genomic DNA, making identification of the start codon more straightforward. A multiple alignment among the D. melanogaster, human, mouse, E. coli, and N. meningitidis CSAS enzymes (Fig. 1B) shows that the proteins are all homologous over the length of the catalytic domain of the D. melanogaster enzyme (26,27 Analysis of the phylogenetic relationship among various CSAS proteins from vertebrates, insects, and bacteria (Fig. 1C) revealed that whereas all of the proteins share some degree of sequence similarity, the DmCSAS is more closely related to vertebrate homologs than the bacterial homologs. Interestingly, whereas DmCSAS shares high sequence identity with the vertebrate CSAS proteins in its catalytic domain, it lacks the extra C-terminal domain found in these homologs, although several bacterial homologs, while more divergent in their catalytic domain, also possess an extra C-terminal domain (E. coli and Streptococcus agalactiae CSAS). Other bacterial CSAS also lack the C-terminal domain (N. meningitidis and C. jejuni CSAS). The relatedness of the catalytic domains among homologs also does not appear to correlate with whether CSAS proteins do or do not possess this extra domain.
Inspection of the alignments of many CSAS proteins, including those shown in Fig. 1B, reveals that eukaryotic proteins have nonconserved N-terminal extensions when compared with bacterial CSAS proteins. Shown in Fig. 1D is the hydrophobicity plot of the amino acid sequence of different CSAS proteins. The D. melanogaster, D. yakuba, D. erecta, and D. ananassae CSAS proteins (Table 2) begin with a leader sequence characterized by a short positively charged amino-terminal region (n-region) followed by a central hydrophobic region (h-region) (a cluster of 9 -10 hydrophobic residues in the Drosophila species), characteristic of a secretory signal or membrane anchoring sequence (43). Three different programs used to predict protein targeting (44,46) predicted that the N-terminal 21 amino acids of DmCSAS comprised a signal peptide sequence with a high likelihood of protein localization in the secretory apparatus (47) (Fig. 1, A and B, and Table 2). The ␣(2,6)sialyltransferases from D. melanogaster and human and ␤(1,4)-galatosyltransferase I from D. melanogaster (18,46,48,49) have very similar clusters of hydrophobic residues in their N terminus that both target and anchor the proteins to the Golgi (Table 2). Whereas these programs accurately predicted localization of the protein to a secretory compartment, they were not reliable in distinguishing whether proteins would be secreted (and hence have cleavage sites within the signal sequence) or are Golgi-anchored (and do not have cleavage sites) when proteins of known location were tested. The prediction of the N terminus of DmC- SAS being a signal sequence also favors the selected ATG as being the true start codon, since signal sequences generally are located within 10 amino acids of the N terminus and are between 20 and 30 residues in length (50). In contrast, various vertebrate CSAS proteins, such as mouse, human, zebrafish, and rainbow trout (Fig. 1D and listed in Table  2), have N-terminal sequences characterized by clusters of basic resi- dues typically preceded by a proline residue, characteristic of nuclear localization signals (NLS) (28). Indeed, this signal is consistent with the presence of mammalian CSAS in the nucleus (20,29). Munster et al. (29) showed that murine CSAS has three clusters of basic amino acids, designated BC1 (corresponding to Pro 15 -Arg 32 in humans) and BC2 (corresponding to amino acids Pro 198 -Arg 202 in humans) and BC3 (corresponding to Lys 267 -Lys 274 in humans). Either BC1 or BC2, when expressed as a fusion protein with eGFP, mediated protein transport into the nucleus. D. melanogaster CSAS protein lacks BC1 completely and is missing one of the basic amino acids in the BC2 (Lys 200 in humans, Lys 196 in mice, and Ala 179 in D. melanogaster) shown to be essential for nuclear import in vertebrates (29).
Developmental Expression of DmCSAS-PCR amplifications of both a head and larval-pupal cDNA library as well as RT-PCR of total RNA isolated from 14 -17-h embryos and S2 cells were performed using DmCSAS-specific primers. A band of the expected size of 484 bp, characteristic of spliced RNA, was detected in the head (Fig. 2A, lane 1

) and embryos (lane 3) but not larvae-early pupae (lane 2) or S2 cells (lane 4).
To refine the time of expression, PCR was performed with normalized cDNA from all D. melanogaster developmental stages (Fig. 2B). DmCSAS was not detected over the first 12 h (lanes 1-3), appeared prominently at 12-24 h in development (lane 4), and then was detectable at low levels in larvae and pupae (lanes 5-8). In adults, DmCSAS exhibited much higher levels of expression in the head than observed in the body minus the head (lane 9 versus lane 11 and lane 10 versus lane 12). Interestingly, the level of DmCSAS expression was reproducibly higher in male heads and bodies (lanes 9 and 11) compared with female heads and bodies (lanes 10 and 12). This regulated pattern of expression is in contrast to that found for the D. melanogaster sialic acid 9-phosphate synthase gene (16), which was ubiquitously expressed throughout the fruit fly life cycle. It is also different from the ubiquitous expression observed for the human CSAS (23).

Expression of DmCSAS in Mammalian Cells Lacking Endogenous
Activity-In order to determine whether DmCSAS encoded a functional enzyme, the gene was introduced into a mutant mammalian cell line that lacks endogenous CMP-sialic acid synthetase activity. Mammalian cells were chosen over insect cell lines, because many of the available insect cell lines either lack or have extremely low levels of enzymes in the preceding steps of the sialic acid pathway, such as the UDP-N-acetyl- glucosamine epimerase/N-acetylmannosamine kinase and sialic acid 9-phosphate synthase (16,30,31). Consequently, these insect cell lines were not as useful for examining the functionality of the putative DmCSAS gene. Therefore, a plasmid construct expressing the putative DmCSAS protein as a C-terminal fusion protein with eGFP was introduced into the Chinese hamster ovary cell double mutant LEC29.Lec32. In these cells, the dominant lec29 mutation activates ␣ (1, 3)-fucosyltransferase activity, whereas the lec32 mutation nearly eliminates CMPsialic acid synthetase activity (reduced by Ͼ95%) (21). The presence of a C-terminal eGFP tag allows visualization of the expressed protein in the cell and facilitates identification of its subcellular localization. The transfected cells fluoresced green when visualized with appropriate wavelength light at 36 h post-transfection. In Western blot analysis, a single protein band with the expected molecular size of D. melanogaster CSAS plus GFP (ϳ53.5 kDa) was detected in the cell lysate using an anti-GFP antibody (Fig. 3). An examination of the levels of protein in the cell lysate and medium reveals that all of the detectable DmCSAS-GFP was present in the cell lysate and none was secreted into the cell culture medium.
In Vitro and in Vivo Activity of DmCSAS Expressed in LEC29.Lec32 Cells-The functional activity of expressed DmCSAS-GFP was assayed initially in vitro by incubating lysates of transfected LEC29.Lec32 cells with exogenously added CTP and Neu5Ac, and the CMP-sialic acids produced over a period of 40 min were measured using high performance anion exchange chromatography. Whereas cells transfected with the pEGFP parent vector (negative control) (Fig. 4A) contained very low levels of CMP-Neu5Ac synthesis activity following incubation of lysates with the substrates, the activity levels from the lysate of cells expressing DmCSAS-GFP were more than 1 order of magnitude higher at 65 pmol/g of total protein/min.
The capacity for LEC29.Lec32 cells to synthesize CMP-Neu5Ac in cells in vivo was also studied. LEC29.Lec32 cells were transiently transfected with either pEGFP or DmCSAS-GFP. Intracellular CMP-sialic acid levels, shown in Fig. 4B, were determined 48 h post-transfection. The amount of CMP-Neu5Ac in the cells expressing DmCSAS-GFP was more than 2-fold higher than that from cells expressing the parent vector pEGFP when normalized to total protein content (Fig. 4B). The fact that the increase in enzyme activity, compared with controls, was not as great in vivo as observed in vitro may be due to lower availability of the proper precursor substrates in the appropriate cellular compartment. Nonetheless, the increase in CMP-Neu5Ac levels in LEC29.Lec32 cells transfected with DmCSAS above the control in both in vitro and in vivo studies indicates that the D. melanogaster gene encodes a protein with the ability to produce CMP-sialic acid in LEC29.Lec32 cells.
A subsequent experiment was performed in which the in vitro and in vivo synthesis of CMP-Ne5Ac in mutant mammalian cells expressing DmCSAS was compared with the levels achieved when the cells were transfected with the HsCSAS fused to eGFP (HsCSAS-GFP). The in vivo levels of CMP-Neu5Ac in LEC29.Lec32 cells transfected with HsCSAS-GFP were 5-fold higher than those found in the cells transfected with the DmCSAS-GFP gene. In vitro, the lysates expressing the HsCSAS-GFP had an activity that was more than 30 times higher than that of D. melanogaster, which itself was more than 13 times higher than the control eGFP-infected mammalian cells. The reasons for the difference in activities between the D. melanogaster and human CSAS enzymes may be due to either the endogenous activity levels of the enzymes, expression levels in the mammalian host, or stability of the expressed protein.
Localization of DmCSAS to the Secretory Compartment of the Cell-Whereas the mammalian sialic acid 9-phosphate synthase responsible for generating sialic acid is a soluble cytoplasmic enzyme (31), both human and murine CSAS, which activate the sialic acid with a nucleotide, have been reported to localize to the nucleus (22,23,33). As mentioned previously, murine CSAS possesses an N-terminal potential NLS, BC1, and an internal NLS, BC2 (22,29). Since the sequence analysis of the DmCSAS indicated that it lacked BC1 as well as a functional BC2, we were interested in the intracellular location of the DmCSAS protein. Co-localization studies were carried out with GFP-tagged CSAS proteins in conjunction with immunostaining of different cellular compartments. COS-7 cells were used initially for this assay, because their flattened morphology facilitated the identification of the intracellular location of the CSAS protein. Immunostaining of COS-7 cells transfected with HsCSAS-GFP was performed using the nuclear marker dye ToPro3. As expected, HsCSAS-GFP expressed in COS-7 cells was localized to the nucleus in agreement with previous studies (23) (Fig.  5A). In order to determine the intracellular location of the DmCSAS-  GFP protein, COS-7 cells transfected with DmCSAS-GFP were separately immunostained with markers for different cellular compartments. Immunostaining was performed with ToPro3, a nuclear marker, an antibody against calnexin, an ER-resident protein, and an antibody against giantin, a Golgi-resident protein. There was no co-localization of DmCSAS-GFP with the nuclear marker, and the co-localization of DmCSAS-GFP with calnexin was minimal (not shown), indicating that the majority of DmCSAS protein was not present in the nuclear or ER compartments. However, a significant co-localization was found for the DmCSAS-GFP and the Golgi marker, giantin, to indicate that the majority of expressed DmCSAS localizes to the Golgi compartment (Fig. 5B). Although some DmCSAS-GFP is also detected in other cellular compartments, negligible DmCSAS-GFP is detected in the nucleus, in contrast to the results for the human and murine enzymes.
In order to examine the localization of the DmCSAS in an insect cell host, the gene for DmCSAS-GFP was incorporated into a baculovirus vector, and the virus was used to infect Spodoptera Sf-9 cells, an insect cell line widely used for recombinant protein expression. For this cell line, Bodipy-TR stain was used as a Golgi marker. As shown in Fig. 5C, DmCSAS-GFP expressed in Sf-9 cells was found to have a localization pattern similar to Bodipy-TR. This localization of the DmCSAS-GFP to the Golgi is consistent with the predictions made by the PSORT II analysis based on its possessing an N-terminal sequence rich in hydrophobic amino acids.
Chimeric Constructs Containing N Termini from HsCSAS and Dm-CSAS-In order to determine if the differential localization of human and fly CSAS proteins was dictated by their N-termini, a chimeric construct was created that joined the catalytic domain of DmCSAS with the N-terminal nuclear localization signal from HsCSAS. Munster et al. (29) previously showed that the AS BC1 cluster of basic residues in the murine CSAS (Pro 15 -Arg 31 ) present in the N-terminal leader was capable of functioning as a nuclear localization signal. Therefore, the D. melanogaster CSAS tagged with eGFP was modified by replacing its N-terminal 27 amino acids with the N-terminal 40 amino acids of HsC-SAS. This chimeric protein retained the entire region of DmCSAS that shares homology with the catalytic domains of bacterial and vertebrate CSAS proteins. The chimeric protein was expressed in COS-7 cells in order to determine its intracellular location as previously described for the native proteins. Unlike the native DmCSAS, the chimeric D. melanogaster CSAS containing a human leader (HsNDmC-GFP) was observed to co-localize with the nuclear dye ToPro3 (Fig. 5D). However, when the functionality of this construct was evaluated using the same in vitro assays described before, the HsNDmC-GFP was not found to contain any activity above background levels.
The reciprocal chimeric construct was created in which the N-terminal 40 amino acids of HsCSAS-GFP were replaced by the N-terminal 27 amino acids of DmCSAS. This chimeric construct was found to localize outside the nucleus (Fig. 5E) and contain in vitro activity comparable with DmCSAS-GFP activity (data not shown). However, the in vitro activity of this chimera containing the D. melanogaster leader and human catalytic domain (DmNHsC-GFP) was significantly lower than that of the native HsCSAS-GFP.

DISCUSSION
The existence and role of the sialic acid pathway in insects has been the subject of considerable debate. Insect glycoproteins predominantly terminate in paucimannosidic or hybrid structures, in contrast to mammalian glycoproteins, which terminate in sialic acid (8,9). In addition, many of the available insect cell lines either do not express the sialylation enzymes or do so at extremely low levels (16,34). However, alternative studies support the presence of a sialylation pathway in insects (10 -12, 15-18).
The availability of complete genomic information for D. melanogaster and other Drosophila species has made the identification of orthologs of bacterial and mammalian enzymes in the sialic acid pathway possible (16,18,35). One of the essential components of such a pathway is the CMP-Neu5Ac synthetase responsible for activation of sialic acid. We previously noted a possible ortholog of the CMP-sialic acid synthetase (16) that the annotators at Flybase failed to identify as such (Gadfly accession number CG3220) because the sequence information was incomplete.
In this study, we have shown that the predicted D. melanogaster gene encodes a functional CMP-sialic acid synthetase as illustrated by its ability to generate CMP-Neu5Ac in both in vitro and in vivo assays following expression of the corresponding cDNA in a mutant mammalian cell line lacking endogenous activity. The presence of a functional DmCSAS gene provides further evidence that the sialylation pathway is indeed present and functional in insects. We also identified homologous genes in other Drosophila species and in the mosquito, A. gambiae, that are predicted to encode putative CMP-sialic acid synthetases. Inspection of the evolutionary tree (Fig. 1C) generated by comparing just the catalytic domains of various CSAS proteins among prokaryotes, archaea, and eukaryotes shows the expected evolutionary relationship among these major kingdoms. The two archaea (Chlorobium and Methanosarcina) are closer to eukaryotes than are prokaryotes.
The D. melanogaster CMP-sialic acid synthetase protein shares two features that distinguish it from vertebrate CSAS proteins; it lacks the C-terminal domain found in murine and human CSAS proteins, and it differs in the composition of the N-terminal amino acids in the leader region upstream of the catalytic domain. We note no corresponding evolutionary relationship among organisms that have or lack an extra C-terminal domain. Insects that lack this domain are closer to verte-brates that have one, and the two bacterial species that have a C-terminal domain (Escherichia and Streptococcus) are not more closely related than those that do not. Therefore, the CSAS C-terminal domains could have arisen in separate evolutionary branches by fortuitous gene fusions (similar to the fusion of the Campylobacter CSAS with a 1,4-N-acetylgalactosaminyltransferase) or by acquisition of vertebrate domains by commensal or pathogenic bacteria due to horizontal gene transfer. The exact roles of the mammalian C-terminal domains have yet to be defined, but the sequences share identity (24 -32%) with the bacterial sugar phosphatase, YrbI, that dephosphorylates 2-keto-3-deoxy-Dmanno-octulosonic acid (ketodeoxyoctonate)-8-phosphate (27,36). As a result, Krapp  The other distinguishing feature of D. melanogaster CMP-sialic acid synthetase is the composition of its N-terminal leader region. An examination of the alignment of bacterial and eukaryotic CSAS proteins shows that eukaryotic CSAS proteins have N-terminal extensions when compared with bacterial CSAS proteins. In murine CSAS, two regions, an N-terminal sequence BC1 and an internal sequence BC2, include clusters of basic amino acids that were found to be active NLSs when fused to eGFP (29). Whereas Munster et al. (33) originally suggested BC2 as the probable localization signal for mammalian CSAS, another sequence is responsible for nuclear localization in other vertebrates, such as rainbow trout, that lack a functional BC2 (22). Indeed, we show in this study that an N-terminal sequence from human CSAS that included BC1 functioned as an NLS when fused with D. melanogaster CSAS lacking its own N-terminal leader. However, this chimeric construct did not have any enzymatic activity, demonstrating the importance of proper subcellular localization for activity and perhaps proper protein folding as well.
In contrast to the vertebrate CSAS proteins, DmCSAS and other Drosophila homologs contain N-terminal sequences that lack the clusters of basic amino acids characteristic of NLS and instead substitute an Ala for the critical Lys residue in the internal nuclear localization signal sequence, BC2, rendering it nonfunctional as well. Instead, the CSAS proteins from multiple Drosophila species have an N-terminal sequence rich in hydrophobic amino acids that the PSORTII program and hydrophobicity plot predict to function as an intracellular targeting signal to transport insect CSAS to a secretory compartment (i.e. either to be retained in the ER/Golgi or secreted from the cell if the hydrophobic leader sequence is cleaved by a signal peptidase). Since the majority of an expressed DmCSAS-GFP fusion protein was shown to be present in the cell lysate and not in the medium, we believe that the protein is likely to be retained in a secretory compartment. Indeed, the DmCSAS-GFP fusion protein was targeted to the Golgi compartment when expressed in both mammalian COS-7 and insect cells, based on its intracellular location and co-localization with Golgi markers. The N-terminal sequence of the DmCSAS is very similar to the N-terminal sequences found in chicken, human, and D. melanogaster ␣(2,6)-sialyltransferases (GenBank TM accession number Q92182, XM_038616, and AF397532, respectively), which are anchored to the Golgi membrane (18,37,38). There was no co-localization with ToPro-3, a nuclear marker. To our knowledge, this is the first report of two functionally conserved homologous proteins of different eukaryotic organisms observed to localize to different compartments of the cell. The chimeric construct that included the first 27 N-terminal amino acids from D. melonagaster CSAS with the carboxyl terminal domain of the human protein was also functional and outside the nucleus. However, its localiza-tion pattern differed from the Golgi localization pattern of DmCSAS, possibly as a result of protein folding difficulties or different targeting signals.
The localization of the insect CMP-sialic acid synthetase to the Golgi compartment is reasonable, given that the sialyltransferase, which uses the activated sialic acid for transfer to glycoconjugates, is also located in the Golgi. Drosophila would not require a CMP-sialic acid transporter, and a functional ortholog of the CMP-sialic acid transporter has not yet been identified in Drosophila genomes. We originally suggested (16) that the D. melanogaster genomic sequence, GenBank TM accession number AL023874, might be an ortholog of the murine CMP-sialic acid transporter, based on its sequence identity. However, this gene has been experimentally determined to encode a UDP-galactose transporter (39,40).
In bacteria, the activation of the sialic acid occurs along with sialyltransferase activity in the same cytoplasmic compartment following biosynthesis of the sugar, and there is also no requirement for transport of the CMP-sialic acid. In this way, insects are similar to prokaryotes in that the activating enzyme and transferase are present in the same cellular compartment. However, the Golgi location of the CMP-sialic acid synthesis enzyme in Drosophila is distinct from the cellular location for the preceding metabolic steps, including the generation of sialic acid 9-phosphate that we have shown to occur in the cytoplasm of Drosophila melanogaster (16). Thus, a mechanism is required to transport sialic acid, in either the free or possibly phosphorylated form, from the cytoplasm into the Golgi. In bacteria, sialic acid transporters of the ATPbinding cassette, proton-coupled, and tripartite ATP-independent periplasmic class are present (36), and several potential homologs of the sialic acid transporter have been detected in the genome of D. melanogaster. 3 Unlike Drosophila, the vertebrate CMP-sialic synthetase is located in the nuclear compartment (20,22,29), although nuclear localization is not required for activity (20). After activation, CMP-sialic acid is believed to diffuse from the nucleus into the cytoplasm and is subsequently transported into the Golgi compartment. The availability of a CMP-sialic acid transporter in vertebrates makes such a translocation step possible. It is interesting to note that the mammalian CMP-sialic acid transporter is 43% identical to the UDP-Gal transporter (41), and chimeras of the UDP-Gal transporter have been constructed that enable CMP-sialic acid transport (32). The availability of such a CMP-sialic acid transporter in vertebrates would have allowed the CMP-sialic acid synthesis step to be localized in a compartment separate from the transferase step. Thus, the different locations of the CMP-sialic acid synthetase in protostomes and deuterostomes are probably the result of different evolutionary paths taken in the adaptation of the sialic acid pathway across species.
In D. melanogaster, CMP-sialic acid synthesis appears to be strictly regulated at the transcriptional level. Roth et al. (10) used histochemical methods to show the presence of sialic acid in Drosophila embryos with the most intense staining in the embryonic brain and central nervous system at 14 -17 h in development. Koles et al. (18) showed by in situ hybridization that the sialyltransferase gene was expressed in the embryonic brain and subsets of cells in the central nervous system (CNS) at this time. We showed by RT-PCR that the D. melanogaster CMP-sialic acid synthetase RNA is developmentally regulated, first appearing at 12-24 h of embryogenesis, with low expression through larval and pupal stages and greatly enriched expression in the adult head compared with the adult body.
The differential expression between head and body argues for tissuespecific expression and is consistent with high expression in the CNS. Therefore, our results support the proposal put forth previously by Roth et al. (10) that a primary biological role for sialic acid in Drosophila will be in the CNS. Koles et al. (18) also report that the human homolog of the D. melanogaster sialyltransferase displays elevated expression in fetal brain. Thus, understanding the role of the sialic acid pathway in Drosophila development may have relevance to understanding its function in the CNS of other organisms, including mammals.