Identification and Biochemical Characterization of Two Functional CMP-Sialic Acid Synthetases in Danio rerio*

Background: Addition of sialic acid to the nonreducing end of glycoconjugates requires activation by CMP-sialic acid synthetase (CMAS). Results: In zebrafish, we identified two CMAS enzymes that differ in expression pattern, activities, and intracellular localization. Conclusion: Maintenance of two CMAS paralogues is attributed to subfunctionalization. Significance: Unraveling the individual functions of CMAS paralogues helps to elucidate the impact of sialylation in vertebrate development. Sialic acids (Sia) form the nonreducing end of the bulk of cell surface-expressed glycoconjugates. They are, therefore, major elements in intercellular communication processes. The addition of Sia to glycoconjugates requires metabolic activation to CMP-Sia, catalyzed by CMP-Sia synthetase (CMAS). This highly conserved enzyme is located in the cell nucleus in all vertebrates investigated to date, but its nuclear function remains elusive. Here, we describe the identification and characterization of two Cmas enzymes in Danio rerio (dreCmas), one of which is exclusively localized in the cytosol. We show that the two cmas genes most likely originated from the third whole genome duplication, which occurred at the base of teleost radiation. cmas paralogues were maintained in fishes of the Otocephala clade, whereas one copy got subsequently lost in Euteleostei (e.g. rainbow trout). In zebrafish, the two genes exhibited a distinct spatial expression pattern. The products of these genes (dreCmas1 and dreCmas2) diverged not only with respect to subcellular localization but also in substrate specificity. Nuclear dreCmas1 favored N-acetylneuraminic acid, whereas the cytosolic dreCmas2 showed highest affinity for 5-deamino-neuraminic acid. The subcellular localization was confirmed for the endogenous enzymes in fractionated zebrafish lysates. Nuclear entry of dreCmas1 was mediated by a bipartite nuclear localization signal, which seemed irrelevant for other enzymatic functions. With the current demonstration that in zebrafish two subfunctionalized cmas paralogues co-exist, we introduce a novel and unique model to detail the roles that CMAS has in the nucleus and in the sialylation pathways of animal cells.

Sialic acids (Sia) form the nonreducing end of the bulk of cell surface-expressed glycoconjugates. They are, therefore, major elements in intercellular communication processes. The addition of Sia to glycoconjugates requires metabolic activation to CMP-Sia, catalyzed by CMP-Sia synthetase (CMAS). This highly conserved enzyme is located in the cell nucleus in all vertebrates investigated to date, but its nuclear function remains elusive. Here, we describe the identification and characterization of two Cmas enzymes in Danio rerio (dreCmas), one of which is exclusively localized in the cytosol. We show that the two cmas genes most likely originated from the third whole genome duplication, which occurred at the base of teleost radiation. cmas paralogues were maintained in fishes of the Otocephala clade, whereas one copy got subsequently lost in Euteleostei (e.g. rainbow trout). In zebrafish, the two genes exhibited a distinct spatial expression pattern. The products of these genes (dreCmas1 and dreCmas2) diverged not only with respect to subcellular localization but also in substrate specificity. Nuclear dreCmas1 favored N-acetylneuraminic acid, whereas the cytosolic dreCmas2 showed highest affinity for 5-deaminoneuraminic acid. The subcellular localization was confirmed for the endogenous enzymes in fractionated zebrafish lysates. Nuclear entry of dreCmas1 was mediated by a bipartite nuclear localization signal, which seemed irrelevant for other enzymatic functions. With the current demonstration that in zebrafish two subfunctionalized cmas paralogues co-exist, we introduce a novel and unique model to detail the roles that CMAS has in the nucleus and in the sialylation pathways of animal cells.
Sialic acids (Sia), a family of nine carbon ␣-keto acids, are mostly found as terminal sugars on glycoproteins and glycolipids. Due to their exposed position and negative charge, Sia influence numerous cell interaction and cell recognition processes by charge repulsion or by acting as part of recognition structures (1). In mice, interference with Sia biosynthesis has been shown to be lethal before embryonic day 10 (2).
In addition to the mouse model, the zebrafish (Danio rerio) has been introduced as a valuable system to study the biological significance of sialylation in vertebrates, particularly of linkagespecific sialyltransferases. Mass spectrometry studies revealed that N-acetylneuraminic acid (Neu5Ac) and N-glycolylneuraminic acid (Neu5Gc) are the major Sia derivatives in zebrafish (3,4). Sia has been found in mono-and oligosialylated structures with up to seven residues bound to glycoproteins and glycolipids (3,4). In addition, long homopolymeric chains of ␣-2,8-linked Neu5Ac (polySia) 2 were detected in zebrafish on neural cell adhesion molecules Ncam1a and Ncam1b (5,6). Removal of polySia results in deficits in posterior commissure formation (5). The formation of polySia during embryonic development is catalyzed by polysialyltransferase ST8Sia2 (7). ST8Sia4, the second polysialyltransferase identified in zebrafish, is generally capable of adding polySia to Ncam1a and Ncam1b but is not involved in polysialylation in the embryo (6,7). ST8Sia3, the third zebrafish sialyltransferase analyzed to date, catalyzes the formation of oligosialic acid chains. Downregulation of ST8Sia3 entails anomalous somite morphology, which shows that Sia also play a role in non-neuronal development (8).
A prerequisite for the formation of sialoglycoconjugates is the activation of Sia to its cytidine monophosphate diester * This work was supported by the German Research Foundation ( (CMP-Sia), which is catalyzed by one of the key enzymes of the sialylation pathway, CMP-Sia synthetase (CMAS or CSS, EC 2.7.7.43). Only activated sugars are transported into the Golgi apparatus, where they serve as a substrate for sialyltransferases. Interference with the activation reaction abolishes sialylation on the cell surface (9). The CMAS enzyme is conserved from bacteria to human with five conserved primary sequence motifs forming the active site pocket (10). In contrast to all other sugar-activating enzymes analyzed to date, the vertebrate CMAS is localized in the cell nucleus (11). This unusual intracellular localization was first recognized in the lens epithelial layer by E. L. Kean in 1969 (12) and later confirmed in a variety of biochemical studies using different tissues and species (11,13). Although the nuclear localization signals have been identified in recombinant mouse and rainbow trout CMAS (14,15), the biological relevance of this unusual intracellular localization still remains an enigma.
In this study, we identified and cloned two distinct cmas genes in zebrafish. We purified the recombinant proteins and demonstrated that both dreCmas enzymes assemble into tetramers and are enzymatically active in vitro as well as in cellular systems. Remarkably, however, the dreCmas enzymes showed significant differences not only in terms of substrate specificity but also with respect to their subcellular localization and spatial expression. Although dreCmas1 was transported to the nuclear compartment by a bipartite nuclear localization signal, dreCmas2, in contrast to all other vertebrate CMAS analyzed thus far, remained in the cytoplasm.

EXPERIMENTAL PROCEDURES
DNA and Protein Sequence Analysis-Two cmas homologues were identified in the zebrafish genome using known vertebrate Cmas sequences in BLAST searches. Sequences of cmas homologues of other species (supplemental Table 1) were obtained from ENSEMBL and GenBank TM databases or by using BLASTP or TBLASTN algorithms applied to protein, mRNA, and EST databases. In the latter case, overlapping ESTs were downloaded, aligned, and sorted for each species according to sequence identities. Regions of unsure sequencing were deleted, and consensus sequences were inferred manually.
For phylogenetic analyses, nucleotide sequences were aligned using CLUSTALW implemented in MEGA (16). The maximum likelihood tree was constructed using Mega5 and tested by bootstrap analysis with 1000 replications. The ambiguously aligned N-terminal and C-terminal codons were excluded from analysis, resulting in a 1221-bp-long (407 amino acids) alignment. All sites in triplets were used, and missing data and alignment gaps were deleted in pairwise comparisons. We used the general time-reversible model of substitutions and uniform rates of substitutions among sites. The tree was inferred using the nearest neighbor interchange maximum likelihood heuristic method.
The programs EBI-ClustalW (17) and Bio Edit 7.0.5 (Tom Hall, Ibis Therapeutics, Carlsbad, CA) were used for multiple sequence alignments for visualization of conserved domains and nuclear localization signals. Prediction of putative nuclear localization signal (NLS) sequences was performed by eye and by use of PsortII and PredictNLS. Structure prediction was done using Phyre (18).
Blocks of synteny were determined with the help of the Synteny database (19) or on sight using the latest versions of genome projects provided by ENSEMBL database. In the latter case, chromosomal location and orientation of orthologues of up to 10 genes upstream and downstream of the cmas genes of zebrafish were searched for in other species. Only the relative chromosomal position was taken into consideration.
We included species abbreviations in the gene names (e.g. dreCmas). For hemichordates and fish, we used the nomenclature recommended for zebrafish (gene, cmas; protein, Cmas); for vertebrates in general, we used the mouse nomenclature (gene, Cmas; protein, CMAS).
Isolation of cDNA-Total RNA was isolated from pooled 24 and 35 h post-fertilization (hpf) zebrafish embryos using TRIzol (Invitrogen). cDNA was generated using the SuperScript TM II cDNA synthesis kit (Invitrogen). Full open reading frames of drecmas1 and drecmas2 were amplified using specific primers (supplemental Table 2) and Phusion DNA Polymerase (Finnzymes). PCR products were subcloned into the pCR-Blunt II-TOPO vector (Invitrogen). Their identities were confirmed by sequencing.
Cmas Purification and Size Exclusion Chromatography-We subcloned drecmas1 and drecmas2 cDNAs into a modified pET22b-Strep vector allowing the expression of N-terminally Strep II-tagged (IBA) proteins. Primer sequences are provided in supplemental Table 2. Recombinant dreCmas1 and dreCmas2 were expressed in Escherichia coli BL21(DE3) (Novagen) at 15°C in Power Broth (AthenaES) and purified by Strep-Tactin affinity chromatography (IBA). Peak fractions were desalted (HiPrep 26/10, GE Healthcare) and concentrated to 1-2 mg ml Ϫ1 in buffer containing 50 mM Tris-HCl (pH 8), 20 mM MgCl 2 , 150 mM NaCl, and 1 mM DTT. Purified protein samples were flash-frozen in liquid nitrogen and stored at Ϫ80°C until required. Protein concentrations were determined using the absorption at 280 nm and the specific extinction coefficient calculated using the ProtParam tool. Size exclusion chromatography was performed as described previously using the abovementioned buffer (20). Following purification and size exclusion chromatography, Western blots were stained with Strep-Tactinalkaline phosphate (AP) conjugate (IBA).
In Vitro Activity Assay-Sia activation was analyzed using the EnzChek pyrophosphate assay kit (Invitrogen) and half-area 96-well microplates (Greiner). The reaction was performed essentially as described (20) in 50 mM Tris-HCl, pH 7.5, 25 mM MgCl 2 and started by the addition of 0.5 ng/l dreCmas1 or dreCmas2. CTP, UTP, ATP, or GTP was used at final concentration 1000 M, and Neu5Ac, Neu5Gc, or KDN was used at 4000 M.
Construction of Plasmids for Cell Culture Experiments-We subcloned drecmas1 and drecmas2 cDNAs into two different modified pcDNA3 vectors allowing the expression of Nterminally FLAG-tagged or C-terminally Myc-V5-tagged proteins (supplemental Table 2). Deletion mutants were generated using overlap extension PCR (21) and Phusion Polymerase (Finnzymes) and subsequently cloned in the pcDNA3 vector with C-terminal Myc and V5 tag. Deletion primers were designed to delete nucleotide triplets encoding selected amino acids (supplemental Table 3).
In Vivo Activity Assay-The functionality of wild-type and mutant dreCmas was analyzed in complementation studies using CHO LEC29.Lec32 cells as described previously (14). Transfected cells were harvested and subdivided into two aliquots, which were incubated at 37°C for 30 min in either the absence or the presence of 100 ng of endosialidase E to remove polySia. Equal protein amounts were separated by 7 and 10% SDS-PAGE, respectively. PolySia and dreCmas expression was analyzed by Western blotting using anti-polySia mAb 735 (5 g/ml) (22), anti-V5 mAb (Sigma), and goat anti-mouse 680 IRDye secondary antibody (LI-COR). Scanning was performed with the LI-COR Odyssey infrared imaging system.
Animal Care-Wild-type and golden zebrafish strains were maintained and crossed according to standard procedures. Developmental stages are indicated in hpf according to Kimmel et al. (24).
Nuclear and Cytoplasmic Extracts-Zebrafish embryos were dechorionated and deyolked according to Link et al. (25) and flash-frozen in N 2 . 100 embryos were used to prepare nuclear and cytosolic extracts according to Dignam et al. (26). The extracts were analyzed by Western blotting using antibodies that were originally derived against murine CMAS and that specifically recognize the distinct dreCmas enzymes. 3 Whole Mount in Situ Hybridization-Whole mount in situ hybridization of digoxigenin-labeled RNA probes was carried out according to standard protocols (27,28). To generate specific probes, the 3Ј regions of drecmas1 and drecmas2 were amplified by RT-PCR with specific primers (supplemental Table 2) and cloned into pCR-Blunt II-TOPO (Invitrogen). Antisense and sense (control) riboprobes were synthesized from linearized plasmids using the digoxigenin RNA labeling kit (Roche Applied Science). Pictures were taken using a Zeiss SteREO Lumar.V12 equipped with an AxioCam HRc camera. Images were edited with Photoshop 6.0 (Adobe).

Identification of Two cmas Genes in the D. rerio Genome-
Analyses of the completely sequenced genome of D. rerio revealed two homologues of the cmas gene. The first is located on chromosome 4 (drecmas1). According to the current version of the D. rerio genome assemblage, the second gene (drecmas2) is split into two fragments situated on different strands but in direct proximity to each other on chromosome 25. These fragments covered a large portion of the cmas gene including the N-terminal and C-terminal portions, but with a gap of about 135 bp. As both fragments were found on different contigs, we hypothesized that they may represent an intact but incorrectly assembled gene. Indeed, we were able to amplify cDNAs of both drecmas genes by RT-PCR using gene-specific primers. The amplified drecmas1 cDNA encoded a protein of 430 amino acids with a calculated molecular mass of 48.2 kDa and was identical to that previously reported (accession number CAK18993). The amplified drecmas2 contained an open reading frame (ORF) that encoded a protein of 423 amino acids (calculated molecular mass of 47.7 kDa). At the primary sequence level, both proteins shared 56% identity. An alignment of D. rerio Cmas amino acid sequences with those of Mus musculus (mmuCMAS) and Oncorhynchus mykiss (omyCmas, rainbow trout) is presented in Fig. 1A and highlights identical residues in black and conserved residues in gray. Like all other known CMAS enzymes, both dreCmas sequences harbored five conserved primary sequence motifs (motifs I to V) in the N-terminal domain (Fig. 1A). In line with other vertebrate CMAS enzymes, dreCmas1 and dreCmas2 possessed an additional C-terminal domain (Fig. 1A) composed of 173 and 175 amino acids, respectively. When compared with the N-terminal domains, the C-terminal domains showed less homology to the primary sequences of mmuCMAS and omyCmas. Protein structure prediction revealed homology to phosphatases of the haloacid dehalogenase family. However, phosphatase activity was not observed for either of the dreCmas enzymes in vitro (data not shown).
Next, we addressed the question whether other fish species also possess two paralogues of the cmas gene. Analyses of the latest fish genome assemblages as well as of the National Center for Biotechnology Information (NCBI) fish EST database did not give evidence for two different intact cmas genes in any of the species belonging to Euteleostei (as defined by Li et al. (29)). In contrast, consistent with zebrafish, two sequence variants were found among ESTs of species belonging to the Otocephala clade, namely catfish (Ictalurus furcatus, Ictalurus punctatus) and the fathead minnow (P. promelas). The maximum likelihood phylogenetic tree demonstrates that fish cmas genes segregated into two distinct clusters comprising cmas1 and cmas2, respectively (Fig. 1B). Both clades have a monophyletic origin and putatively originate from the ancient whole genome duplication (WGD) that occurred in Teleostei.
To further specify the phylogenetic relationship between Cmas genes, we analyzed the syntenic organization of surrounding genomic regions. Within vertebrates, we found traces of conserved synteny surrounding the Cmas1 genes between zebrafish chromosome 4 and chicken chromosome 1 despite numerous translocations and inversions that presumably occurred in both lineages ( Fig. 2A, left). A significant intragenomic synteny between zebrafish chromosomes 4 and 25 corroborated that both cmas genes were paralogues resulting from FIGURE 1. Relations between selected vertebrate CMAS sequences. A, multisequence alignment. Amino acid sequences of the two D. rerio Cmas proteins (dreCmas1 and dreCmas2) have been aligned with O. mykiss (omyCmas) and M. musculus CMAS (mmuCMAS) using ClustalW to display maximum homology. Strictly and highly conserved residues are shaded in black and gray, respectively. The five conserved CMAS motifs essential for enzymatic activity are marked with dashed lines. The nuclear localization signals of mmuCMAS and omyCmas are underlined (14,15). Basic clusters identified in dreCmas1 (BC1 to BC4) and dreCmas2 (dre2-BC) are boxed in gray and white, respectively. The boundary between the N-and C-terminal domains (NT and CT) is marked by arrows. B, maximum likelihood phylogenetic tree showing evolutionary relationships among Cmas genes. cmas homologues in other fish species were identified either in the latest fish genome assemblages with drecmas2 as a query or in the NCBI fish EST database. In the latter, each species with multiple cmas-like ESTs (10 in Salmo salar, 16 in Gasterosteus aculeatus, 21 in Oryzias latipes) was checked for the presence of more than one sequence variant that could not be explained by allelic polymorphism and alternative splicing.

Characterization of Two D. rerio CMP-Sialic Acid Synthetases
the WGD (Fig. 2A, right). In Euteleostei, cmas1 was excised from a homologue of the D. rerio chromosome 4 and transferred to a foreign location, whereas cmas2 was excised from the homologue of D. rerio chromosome 25 and lost (probably during translocation) ( Fig. 2B and supplemental Fig. 1). In summary, the two cmas genes most likely originated from the third WGD at the base of teleost radiation. During evolution, one cmas copy got lost in Euteleostei (e.g. rainbow trout), whereas both paralogues were maintained in fishes of the Otocephala clade (e.g. zebrafish).
drecmas1 and drecmas2 mRNAs Show Different Spatial Expression in Zebrafish Embryos-The expression pattern of drecmas1 and drecmas2 was analyzed in the developing zebrafish by in situ hybridizations using specific probes derived from the 3Ј-UTR of both mRNAs. In general, drecmas1 showed a stronger and more distinct expression than drecmas2. At 90% epiboly (9 hpf), besides a basal, more or less ubiquitous expression, drecmas1 was detected in the axial mesoderm, especially in the notochord primordium (Fig. 3A). At 18 hpf, drecmas1 showed a robust expression in the entire central nervous system, the somites, the notochord, and the developing pronephric duct (Fig. 3C). With progressing development, expression of drecmas1 was down-regulated in the trunk. It persisted in the central nervous system and was up-regulated in the kidney and the liver primordium (Fig. 3, H and I). drecmas2 was expressed at lower levels and in less sharply defined regions. It was detected more or less ubiquitously from the end of gastrula through segmentation (Fig. 3, B and D). During the pharyngula stage, drecmas2 expression was restricted to the brain (Fig. 3G), and it was down-regulated around hatching (Fig. 3K). Although drecmas2 showed expression in the heart at 48 hpf (Fig. 3I), we could not detect it in skeletal muscle, liver, or kidney.
dreCmas Proteins Exhibit Different Substrate Specificities and Assemble into Tetramers-To analyze whether both drecmas genes encode active enzymes, the proteins were expressed with an N-terminal Strep II tag in E. coli BL21(DE3) and purified to homogeneity via Strep-Tactin affinity and subsequent anion-exchange chromatography. Lysate, flow-through, and the purified proteins were analyzed by Coomassie Brilliant Blue Staining and Western blot analysis (Fig. 4A). Enzymatic activity was investigated in vitro by using the EnzChek pyrophosphate assay kit. Equal protein concentrations were used. In addition to CTP and Neu5Ac, the nucleotide donors UTP, ATP, and GTP, as well as the Sia derivatives Neu5Gc and KDN, were tested as alternative substrates. All substrates were used in nonlimiting concentrations. The specific enzymatic activities clearly demonstrated that both enzymes were strictly dependent on CTP (data not shown) but differed in terms of the preferred Sia derivative (Table 1). dreCmas1 had highest activity toward Neu5Ac, lower activity toward Neu5Gc, and just basal activity toward KDN. In contrast, dreCmas2 preferentially activated KDN and showed only basal activity toward Neu5Ac and Neu5Gc.
To determine the oligomeric state of dreCmas1 and dreCmas2, size exclusion chromatography was performed with the purified recombinant enzymes (Fig. 4, B and D). dreCmas1 eluted as a single peak at 11.87 ml corresponding to an apparent molecular mass of 236.1 kDa as calculated from the log molecular mass versus retention volume plot (Fig. 4B, inset). The apparent molecular mass to theoretical mass (50.2 kDa) ratio of 4.7 indi- cated that the recombinant Strep II-tagged dreCmas1 formed a tetramer or a pentamer. The same result was obtained in repeated experiments with protein lacking the epitope tag. Because it is known from crystal structure analyses of Neisseria meningitidis CMAS (30) and mmuCMAS-NT (31) that the active unit (N-terminal domain) is formed by a dimer, it is reasonable to conclude that dreCmas1 assembled into a tetramer. The recombinant dreCmas2 eluted as a single peak at 12.29 ml corresponding to an apparent molecular mass of 193.9 kDa (Fig.  4D). A ratio of 3.9 from the apparent molecular mass to the theoretical mass (49.6 kDa) clearly indicated that the recombinant Strep II-tagged dreCmas2 also formed a tetramer. The integrity of the proteins was confirmed by SDS-PAGE followed by Coomassie Blue staining as well as Western blot analysis of the fractions (Fig. 4, C and E).
The Two Zebrafish Cmas Enzymes Localize to Different Cellular Compartments-To investigate the intracellular localization of the two zebrafish Cmas enzymes, the cDNAs were expressed in EPC cells, derived from the cyprinid P. promelas. To minimize the influence of the epitope tag, N-terminally FLAG-tagged as well as C-terminally Myc-V5-tagged proteins were analyzed by indirect immunofluorescence analysis. Regardless of the position of the epitope tag, dreCmas1 was localized in the nuclear compartment (Fig. 5A). Intriguingly, and in contrast to all other vertebrate CMAS proteins, dreCmas2 was found in the cytoplasm of transfected EPC cells. These results were confirmed in a mouse fibroblast cell line (NIH-3T3 cells; data not shown). To investigate the intracellular destination of the endogenous enzymes, nuclear and cytoplasmic extracts were prepared from 48 hpf zebrafish embryos and analyzed by Western blotting (Fig. 5B). The specificity of the antibodies for either of the dreCmas enzymes was confirmed with purified recombinant proteins (Fig. 5C). In agreement with the results obtained in EPC cells, endogenous dreCmas proteins were found either in the nuclear (dreCmas1) or in the cytosolic fraction (dreCmas2) (Fig. 5B).
A Bipartite Nuclear Localization Signal Targets dreCmas1 to Nuclear Compartment-To identify the NLS essential for nuclear targeting of dreCmas1, primary sequence analysis was performed by eye and by the use of the programs PSORT and PredictNLS. Four basic clusters (BC1 to BC4) were identified in the dreCmas1 sequence (Fig. 1A). BC1 (K 9 RAMK 13 ) and BC2 (K 24 RRK 27 ) were located at the N terminus, BC3 (P 184 ACRPRR 190 ) was located at the center, and BC4 (K 412 KKAK 416 ) was located at the C terminus. BC4 and the surrounding 12 amino acid residues were strictly conserved in rainbow trout Cmas (Fig. 1A). Because they do not serve as NLS (15), we concentrated on the analysis of BC1 to BC3. All BCs as well as BC1 and BC2 in combination were deleted by site-directed mutagenesis in C-terminally V5-Myc-tagged dreCmas1. Deletion mutants were expressed in EPC cells (Fig. 6) and NIH-3T3 cells (data not shown). The subcellular localization was analyzed by indirect immunofluorescence microscopy. Although the deletion of BC3 did not impair nuclear import of dreCmas1, deletion of BC1 or BC2, individually or in combination, entailed retention in the cytoplasm. Thus, both BC1 and BC2 were essential for nuclear import of dreCmas1 and formed a bipartite NLS (K 9 RAMK 13 (X) 11 K 24 RRK 27 ) according to the consensus sequence (K/R) 2 X 10 -12 (KR) 3 (32). dreCmas2 in contrast contained a single short BC (R 179 PRR 182 ) (dre2-BC) at the center of the protein (Fig. 1A), which did not fit to the consensus sequence of a monopartite NLS (K/R) 4 -6 (32). The absence of an NLS was in perfect agreement with the cytosolic localization of dreCmas2 (Fig. 5).
Nuclear Sequestration Is Not Required for Enzymatic Activity of dreCmas1-To analyze the enzymatic activity of dreCmas in a cellular system and to determine the importance of the BCs for activity, full-length dreCmas as well as deletion mutants were analyzed in a complementation approach using the  F, H, and J), and in the liver (H and J). drecmas2 is weakly but ubiquitously expressed in early developmental stages (up to 18 hpf), and it is restricted to the brain regions from 24 hpf onwards (G, I, J, and K). amd, axial mesoderm; eml, endomesodermal layer; fb, forebrain; hb, hindbrain; ht, heart; kd, kidney; li, liver; mb, midbrain; ms, myoseptum; pd, pronephric duct; ph, pharynx; pnc, posterior notochord; sc, spinal cord; so, somites. The arrows indicate a more or less uniform expression in the brain. Scale bars represent 500 m.
CMAS-negative Chinese hamster ovary (CHO) cell line LEC29.Lec32 (9). The lec32 mutation in LEC29.Lec32 causes the expression of asialoglycoconjugates at the cell surface, a defect that can be complemented by recombinant expression of an active Cmas. As shown in Fig. 7 (upper panel), reconstitution of the defect by both dreCmas proteins led to reappearance of polySia on the cell surface, which is visible as a smear due to microheterogeneity of the polySia chain length. Moreover, specificity of polySia expression was controlled by use of endosialidase E leading to the disappearance of the polySia signal (Fig. 7). In all experiments, dreCmas expression was controlled by Western blot staining (Fig. 7, Cmas lane). dreCmas1 was able to complement the defect in LEC29.Lec32 cells, and neither the deletion of BC1 or BC2 nor deletion of the bipartite NLS (⌬BC1ϩ2) in dreCmas1 altered enzymatic activity in the cellular system. Also, expression of dreCmas2, which preferentially activates KDN and showed only residual activity with Neu5Ac in vitro, produced sufficient amounts of CMP-Neu5Ac to form polySia, a homopolymeric Neu5Ac chain, in the cellular system. In contrast, deletion of the central basic cluster in dreCmas1 (⌬BC3) as well as deletion of the corresponding residues in dreCmas2 (⌬BC) completely abolished enzymatic activity. These data show that dreCmas1 and dreCmas2 were not only enzymatically active in vitro but also in a cellular system. Enzymatic activity was associated with conserved basic residues in the center of both dreCmas proteins, but not with the bipartite NLS in dreCmas1. Thus, nuclear import is not a prerequisite for enzymatic activity of dreCmas1.   Although dreCmas1 was visualized in the nuclear fraction of zebrafish extracts, dreCmas2 was detected in the cytosolic fraction. C, specificity of the antibodies was controlled with purified epitope-tagged recombinant proteins. The antibodies specifically detect either dreCmas1 (dre1) or dreCmas2 (dre2). Both antibodies recognize the murine CMAS (mmu).

DISCUSSION
In the present study, we report the identification and characterization of two cmas genes in zebrafish. We showed that both paralogues encoded active enzymes that differ with regard to their spatial expression pattern, intracellular localization, and substrate specificity. In contrast to mammalian genomes, which contain a single Cmas gene, we identified two cmas genes in D. rerio and other fish belonging to the Otocephala clade, such as P. promelas, I. furcatus, and I. punctatus. Representatives of Euteleostei lacked the second gene. We found a syntenic correspondence between chromosomes carrying zebrafish drecmas1 and the chicken Cmas gene, suggesting that the order of genes surrounding drecmas1 corresponds to the gene arrangement in the common ancestor of the teleost fish. The gene duplication observed in Otocephala most likely originated from the third WGD, which occurred some 305-450 million years ago in the ray-finned fish linage at the base of teleost radiation (33). This WGD has often been regarded as a driving force for the diversification of teleost fish, which constitute the most specious vertebrate lineage (34). Because Euteleostei lacked not only the second copy of cmas2, but also the small circumjacent genomic region, it can be concluded that cmas2 was lost in this lineage. Instability of this genomic region is also emphasized by translocation of cmas1 in Euteleostei from an ancestral chromosome to a new site. The reason for this instability remains to be elucidated.
At the primary sequence level, the two zebrafish Cmas enzymes shared 56% identity. Divergent residues were found throughout the molecules with increased frequency in the C-terminal domain. Accordingly, sequence similarity to vertebrate CMAS enzymes was concentrated in the enzymatically active N-terminal domain, but was still significant in the C-terminal domain. In the murine enzyme, the C-terminal domain mediates tetramerization and thereby modulates the kinetic properties (20). Based on the observed sequence similarity, it is reasonable to conclude that the C-terminal domain is also responsible for the quaternary organization of the dreCmas paralogues. Tetramerization is consistent with results obtained for purified endogenous CMAS enzymes from different vertebrate tissues and species revealing trimeric to pentameric forms (for review, see Refs. 11 and 13). The similarity of vertebrate CMAS enzymes is also reflected in the organization of the active site pocket, which is built by the five conserved primary sequence motifs. Deletion of the central basic cluster (Figs. 1  and 7) abolished enzymatic activity as reported for mouse (14), rainbow trout (15), and zebrafish CMAS enzymes (this study). X-ray analysis of the enzymatically active domain of murine CMAS confirmed that this amino acid stretch is part of the active site (31).
A comparative analysis of fish genomes by Kassahn et al. (35) revealed that a minimum of 4% of protein-encoding genes have  been retained in duplicate in the teleost lineage from the last WGD. To avoid genetic redundancy, permanent preservation of two paralogues is assumed to be due either to subdivision of the ancestral function (subfunctionalization) or to the acquirement of a new function (neofunctionalization). Regarding the two dreCmas paralogues, subfunctionalization was manifested, for example, in substrate specificity. dreCmas1 showed the highest activity toward Neu5Ac and was only poorly active using the deaminated sugar KDN in vitro. KDN, however, was the preferred substrate for dreCmas2, which showed poor in vitro activity with Neu5Ac. This observation was unexpected because dreCmas2 resembles omyCmas in substrate specificity (KDNϾ ϾNeu5AcϾNeu5Gc), whereas on the amino acid level, omyCmas and dreCmas1 are more closely related (36). With further identification and characterization of cmas genes from other fish and higher vertebrate species, the apparent discrepancies between sequence homologies and similarities in enzymatic properties may be resolved, and the importance of single amino acids in functional domains may be elucidated. The functional relevance of dreCmas2 expression with high in vitro preference for KDN remains elusive because thus far, no KDN has been detected in zebrafish embryos. Only Neu5Ac and Neu5Gc are found during 0.5-48 hpf (3,4). However, KDN was shown to be the major Sia derivative in the skin mucus of the closely related carp (Cyprinus carpio) (37). These findings may either reflect evolutionary differences within the order Cypriniformes or point to the possibility that KDN is expressed in zebrafish tissues in developmental stages not analyzed to date. Furthermore, the results obtained in the cellular system indicate that dreCmas2 can participate in Neu5Ac activation in vivo.
In addition to the substrate specificity, differences in the spatial expression pattern underline the functional divergence of the two dreCmas paralogues. drecmas1 was prominently expressed in regions of active neurogenesis, presumably postmitotic neurons in forebrain, midbrain, and hindbrain, as well as in the somites. In contrast, drecmas2 was weakly but ubiquitously detected from 9 hpf throughout somitogenesis and quickly down-regulated thereafter. Both drecmas paralogues were expressed before and maintained during the expression of sialyltransferases acting downstream of CMAS in Sia biosynthesis. The expression of the sialyltransferases st8sia1 and st8sia5 (38) as well as of polysialyltransferase st8sia2 starts around 10 hpf in the developing nervous system (7,39). The oligosialyltransferase st8sia3 shows a highly dynamic expression in somites and somite-derived structures (8). It is also detected in the developing nervous system (38) corresponding to drecmas1 expression in early embryonic stages. Additionally, ubiquitous expression of the monosialyltransferase st8sia6 and the GM3 synthase st3gal4 is observed during D. rerio development (38,40). Thus, drecmas spatial expression patterns coincide with those of sialyltransferases. Differences in the spatial and/or temporal expression profile during embryogenesis have been reported for nearly all D. rerio duplicate gene pairs, suggesting that their sub-and neofunctionalization enables a more specialized control of development (35).
Corroborating our results, functional divergence of zebrafish paralogues has been reported in terms of subcellular localiza-tion (35). So far, vertebrate CMAS enzymes, independent of tissue or species, have been predominantly found in nuclear fractions with just minor amounts in other compartments (11,13). Nuclear sequestration has been confirmed with recombinant proteins in different cellular systems (15,41,42). In this study, we identified and characterized the first cytosolic vertebrate CMAS. Only dreCmas1 was targeted to the nuclear compartment, whereas dreCmas2 was exclusively retained in the cytoplasm. Entry of dreCmas1 to the cell nucleus was mediated by a bipartite NLS (K 9 R(X) 13 KRRK), which is related to the NLS in omyCmas (K 5 KR(X) 10 RKAK) in terms of intramolecular localization (15). Whether the observed differences in subcellular localization of the dreCmas proteins affect their enzymatic properties or indicate different functions remains to be elucidated.
In summary, we identified the first vertebrate CMAS exclusively found in the cytoplasm in vivo and demonstrated the existence of a second Cmas in zebrafish that was directed to the nuclear compartment by a bipartite NLS. Both enzymes assembled into tetramers and showed enzymatic activity in vitro and in a cellular system. Like other duplicated genes that have arisen in WGD, zebrafish Cmas paralogues diverged in function, as is obvious by differences in their expression patterns, subcellular distributions, and substrate specificities. In parallel to unraveling the biological consequences of expressing duplicated paralogues of cmas and their individual function in vertebrate development, these differences provide a base for resolving the molecular and cellular requirements of CMAS enzymatic activity.