N-Glycans of Phaeodactylum tricornutum Diatom and Functional Characterization of Its N-Acetylglucosaminyltransferase I Enzyme*

N-Glycosylation, a major co- and post-translational event in the synthesis of proteins in eukaryotes, is unknown in aquatic photosynthetic microalgae. In this paper, we describe the N-glycosylation pathway in the diatom Phaeodactylum tricornutum. Bio-informatic analysis of its genome revealed the presence of a complete set of sequences potentially encoding for proteins involved in the synthesis of the lipid-linked Glc3Man9GlcNAc2-PP-dolichol N-glycan, some subunits of the oligosaccharyltransferase complex, as well as endoplasmic reticulum glucosidases and chaperones required for protein quality control and, finally, the α-mannosidase I involved in the trimming of the N-glycan precursor into Man-5 N-glycan. Moreover, one N-acetylglucosaminyltransferase I, a Golgi glycosyltransferase that initiates the synthesis of complex type N-glycans, was predicted in the P. tricornutum genome. We demonstrated that this gene encodes for an active N-acetylglucosaminyltransferase I, which is able to restore complex type N-glycans maturation in the Chinese hamster ovary Lec1 mutant, defective in its endogeneous N-acetylglucosaminyltransferase I. Consistent with these data, the structural analyses of N-linked glycans demonstrated that P. tricornutum proteins carry mainly high mannose type N-glycans ranging from Man-5 to Man-9. Although representing a minor glycan population, paucimannose N-glycans were also detected, suggesting the occurrence of an N-acetylglucosaminyltransferase I-dependent maturation of N-glycans in this diatom.

Microalgae are a group of aquatic photosynthetic microorganisms that are so diverse that they have been gathered in a "paraphylum." Among these microalgae, diatoms belong to the heterokont group and are responsible for approximately 40% of marine primary productivity (1,2). Despite their physiological relevance in the marine ecosystem, molecular and cellular processes in diatom remain widely unknown. For example, so far, little is known about secretion, post-translational modifications, and intracellular trafficking of proteins in diatoms. Diatom species are usually classified into two major groups, the bi/multipolar centrics and the pennates. Recently, the genome of a pennate diatom, Phaeodactylum tricornutum, became available (3) revealing a wealth of information about diatom biology. Access to this data (Joint Genome Institute, Walnut Creek, CA), together with the fact that P. tricornutum is easy to culture in vitro and can be genetically transformed (4,5), provides the opportunity to perform comparative genomic studies and to dissect biosynthetic pathways.
N-Glycosylation is a major co-and post-translational modification in the synthesis of proteins in eukaryotes. N-Glycan processing occurs in the secretory pathway and is essential for glycoproteins destined to be secreted or integrated in the membranes. In this process, a Man 5 GlcNAc 2 -PP-dolichol oligosaccharide intermediate is assembled by the stepwise addition of monosaccharides to dolichol pyrophosphate on the cytosolic face of the endoplasmic reticulum (ER). 4 This intermediate is then extended in the lumen of the ER until a Glc 3 Man 9 GlcNAc 2 -PP-dolichol N-glycan precursor is completed (6). This precursor is transferred by the oligosaccharyltransferase (OST) multisubunit complex onto the asparagine residue of the consensus Asn-Xaa-Ser/Thr sequences of a target nascent protein (6). The precursor is then deglucosylated/reglucosylated to ensure the quality control of the neosynthetized protein through the interaction with ER-resident chaperones such as calnexin and calreticulin. These ER events are crucial for the proper folding of the secreted proteins and are highly conserved in the eukaryotes investigated so far (7). * This work was supported by funds from the University of Rouen (to P. L. In contrast, evolutionary adaptation of N-glycan processing in the Golgi apparatus has given rise to a large variety of organism-specific complex structures that allow the protein to carry out diverse glycan-mediated biological functions. ␣-Mannosidase I (␣-Man I) located in the early compartment of the Golgi apparatus (Cis cisternae) first degrades the oligosaccharide precursor into high mannose type N-glycans ranging from Man 9 GlcNAc 2 (Man-9) to Man 5 GlcNAc 2 . N-Acetylglucosaminyltransferase I (GnT I) then transfers the first N-acetylglucosamine (GlcNAc) residue on the ␣(1,3)mannose arm of Man 5 GlcNAc 2 , enabling the initiation of the synthesis of multiple structurally different complex type Nglycans. Following GnT I action, ␣-mannosidase II (␣-Man II) and N-acetylglucosaminyltransferase II (GnT II) give rise to the synthesis of the core GlcNAc 2 Man 3 GlcNAc 2 , which is finally matured into organism-specific complex N-glycans by transfer of various monomers by characteristic glycosyltransferases. GnT I and thus the GnT I-dependent maturation of N-glycans appeared during evolution at the same period as metazoans (8). Complex type N-glycans were demonstrated to be engaged in crucial steps of the development of pluricellular organisms (8 -11). For instance, GnT I-null embryos of mice die at ϳ10 days after fertilization, indicating that complex N-glycans are required for morphogenesis in mammals (9,10). Similarly, inactivation of the GnT I in worm and fly reduces their viabilities (8,11). In plants, Arabidopsis cgl mutant, defective in GnT I activity, was demonstrated to grow normally in standard culture conditions (12). However, these plants exhibited a strong phenotype in salt-induced stress conditions, for example, suggesting a role for mature plant N-glycans in specific physiological processes (13)(14)(15)). In contrast, animal cultured cells having GnT I null mutations usually grow normally (16).
Although major data regarding protein N-glycosylation have been established in yeast and higher eukaryotes, nothing is known on N-glycan biosynthesis and structures in microalgae. In this paper, we describe the N-glycosylation pathway in the diatom P. tricornutum. We also demonstrate that the predicted GnT I from P. tricornutum is able to complement the biosynthesis of complex type N-glycans in the CHO Lec1 mutant, which is defective in its endogenous GnT I. To the best of our knowledge, this is the first functional characterization of a N-glycan glycosyltransferase from microalgae.
In Silico Genome Analysis-In the P. tricornutum genome, annotation of genes involved in the N-glycan pathway was carried out by BLASTP or TBLASTN analyses with genomic sequences from Homo sapiens, Mus musculus, Arabidopsis thaliana, Drosophila melanogaster, Saccharomyces cerevisiae, Physcomitrella patens, Medicago truncatula, Zea mays, Nicotiana plumbaginifolia, and Oryza sativa. Searches for signal peptides and cell localization/targeting of mature proteins were done using Signal P, Signal-BLAST, and Target P. Transmembrane domains were predicted using TMHMM, TOPPRED, and HMMTOP. Pfam domains were identified using Pfam (Wellcome Trust Sanger Institute, Cambridge, UK). The phylogenetic tree was drawn using the Phylogeny.fr platform (17) and following three steps: (i) complete sequences were aligned with ClustalW (v2.0.3) (18); (ii) after alignment, ambiguous regions (i.e. containing gaps and/or poorly aligned) were removed with Gblocks (v0.91b) (19); and (iii) the phylogenetic tree was built using the maximum likelihood method implemented in the PhyML program (v3.0 aLRT) (20,21). Graphical representation and edition of the phylogenetic tree were performed with TreeDyn (v198.3) (22). Thirty-one sequences were selected from the CAZy GT13 glycosyltransferase family (23)  Microalgal Strain and Culture Conditions-The strain of P. tricornutum P.t1.8.6 (CCAP1055/1) was grown in batch culture method using 2-liter flat-bottomed flasks. The nutritive medium used for this experiment consisted of natural seawater, sterilized by filtration through a 0.22-m filter, enriched in Conway medium (24) and containing 40 mg⅐liter Ϫ1 sodium metasilicate. Diatom cells were maintained at 20°C under continuous illumination (280 -350 mol photons m Ϫ2 ⅐s Ϫ1 ). The cells (20 ϫ 10 6 cells⅐ml Ϫ1 ) were then centrifuged at 5,000 ϫ g for 20 min at 4°C, and the resulting pellet was freeze-dried before biochemical analyses. P. tricornutum were grown in continuous culture conditions as described previously (25) for real time quantitative PCR experiments. At the steady state (15-20 ϫ 10 6 cells⅐ml Ϫ1 ), five samples (30 ml of each in triplicate) were harvested at 4,000 ϫ g for 20 min at 4°C. The supernatant was removed rapidly, and the cell pellets were resuspended in 1 ml of TRIzol, immediately frozen, and stored at Ϫ80°C until RNA extraction.
Real Time Quantitative PCR Experiments-Total RNA was extracted from cells using the TRIzol method and then treated with RQ1 DNase to avoid DNA contamination and finally purified using the RNeasy mini kit. cDNA templates for PCR amplification were synthesized from 350 ng of total RNA using the High Capacity cDNA reverse transcription kit. Quantitative PCR was performed using Power SYBR Green I PCR master mix in a final volume reaction of 25 l. All of the reactions were performed following the instructions of the manufacturer with 5 l of diluted cDNA (1/10) and 0.1 M of specific primers. Quantitative measurements were performed in duplicate with a Stratagene Mx3000P TM Q-PCR system. The cycling parameters were one cycle of 10 min at 95°C, followed by 40 cycles of 30 s at 95°C and 60 s at 60°C. The results were represented as the relative gene expression normalized to reference genes encoding for ribosomal protein small subunit 30 S and histone H4 (26). Specific primers for the catalytic domain of the P. tricornutum GnT I gene (GNT1-Q-Fwd, 5Ј-CGTACGAATCGCCCTTACTC-3Ј; and GNT1-Q-Rev, 5Ј-TTGCCGTCTTGTGAAATTACC-3Ј) were designed using the Primer3Plus program. The relative GnT I gene expression analysis was performed using the method already described (27,28) where the comparative C T method (29) and standard curve method were combined to calculate RNA molar ratio between the target and housekeeping genes (28).
Expression of P. tricornutum GnT I in CHO Cells-Genomic DNA was extracted from P. tricornutum cell pellets as described in Ref. 30. The GnT I gene was amplified from P. tricornutum genomic DNA using primers 5Ј-ATGC-GGTTGTGGAAACGTAC-3Ј and 5Ј-TCTTTTCGGTGAC-GGAATG-3Ј and Extensor Hi-Fidelity PCR enzyme mix. Then this GnT I gene was cloned according to the supplier's instructions in the pcDNA3.1/V5-His-TOPO vector, leading to the expression of the P. tricornutum GnT I fused with the V5 epitope under the control of the T7 promoter. This construction pcDNA3.1/V5-His-TOPO-GnT I was sequenced and presented the following mutations: Gln-130 3 Arg; Val-148 3 Ile; Gly-159 3 Ser; Lys-196 3 Gln; Val-337 3 Ala; and Asn-422 3 Tyr (GenBank TM BankIt 1370344). CHO Lec1 mutant cells were transfected by electroporation with this construct. The cells were trypsinized (0.05%), triturated in ␣-minimum essential medium containing 10% FBS, pelleted by centrifugation, and resuspended in 100 l of solution V for nucleofection by an Amaxa Nucleofector device set to program U-016 with the linearized pcDNA3.1/V5-His-TOPO-GnT I and 500 g of sterile sonicated salmon sperm DNA. Then the transfection was followed by repetitive rounds of limiting dilution of cells in 400 g⅐ml Ϫ1 of geneticin for selection. CHO wild type and CHO Lec1 mutant were grown in ␣-minimum essential medium supplemented with 10% FBS at 37°C in a humidified incubator with an atmosphere of 5% CO 2 . CHO Lec1 mutant complemented with P. tricornutum GnT I was grown in the same conditions with 600 g⅐ml Ϫ1 of geneticin.
Extraction of Proteins from P. tricornutum-Two g of lyophilized P. tricornutum cells were lysed in 750 mM Tris-HCl pH 8 buffer containing 15% (w/v) of sucrose, 2% (v/v) of ␤-mercaptoethanol, and 1 mM phenylmethylsulfonyl fluoride (extraction buffer). The cell lysis was done in a 50-ml BigD Lysing tube and assisted by the Fastprep 24 (15 times 30 s at 6 M⅐s Ϫ1 ). The mixture was then centrifuged at 4°C for 5 min at 4,000 ϫ g. The pellet was washed once with 10 ml of extraction buffer and spun again at 4°C for 5 min at 4,000 ϫ g. The resulting supernatants were pooled prior to centrifugation at 15,000 ϫ g for 30 min at 4°C. The supernatant was then dialyzed against water for 48 h at 4°C prior to lyophilization. Protein quantification was then performed using the Pierce BCA protein assay kit and bovine serum albumin as protein standard. Proteins from green onion were prepared in parallel and used as a positive control for affino-and immunodetection analyses.
Immunoblotting and Affinoblotting Analysis-Fifty g of total protein extract from P. tricornutum were separated by SDS-PAGE using a 12% polyacrylamide gel. The proteins were transferred onto nitrocellulose membrane and stained with Ponceau Red to control the transfer efficiency. Affinodetections were carried out as described previously (31) using concanavalin A or biotinylated lectins such as phytohemagglutinin E and L, E. crista galli agglutinin, RCA 120, and peanut agglutinin. Biotinylated lectins were detected using streptavidin coupled with horseradish peroxidase, and concanavalin A was directly detected by horseradish peroxidase. Final developments of the blots were obtained using 4-chloro-1-naphtol or ECL as substrate. Immunodetections were performed using specific core ␤(1,2)-xylose and core ␣(1,3)-fucose antibodies as reported previously (32). Oxidation of the glycan moiety of glycoproteins was carried out on the blots using sodium periodate according to (33). Immunodetection with anti-V5 antibodies was performed following the instructions of the supplier for dilution of the antibody and revelation (ECL kit).
Deglycosylation by PNGase F or Endo H-For deglycosylation with PNGase F, 0.5 mg of proteins was dissolved in 2 ml of a 0.1 M Tris-HCl buffer, pH 7.5, containing 0.1% SDS. The sample was then heated for 5 min at 100°C for protein denaturation. After cooling down, 2 ml of 0.1 M Tris-HCl buffer, pH 7.5, containing 0.5% Nonidet P-40 were added to the sample. Digestion was performed with 10 units of PNGase F for 24 h at 37°C. For deglycosylation by Endo H, 0.5 mg of protein extract was dissolved in 1% SDS and denaturated by heating for 5 min at 100°C. The sample was then diluted five times in 500 l of 150 mM sodium acetate buffer, pH 5.7, and incubated overnight at 37°C with 10 milliunits of Endo H. Finally, proteins digested from either PNGase F or Endo H were precipitated by the addition of 4 volumes of ethanol overnight at Ϫ20°C, separated by SDS-PAGE, and affinodetected with concanavalin A as reported previously (31).
In Vitro Galactosylation-The in vitro galactosylation was performed by treating 50 g of protein at 37°C for 24 h with 50 milliunits of ␤(1,4)-galactosyltransferase from bovine milk in 1 ml of 100 mM sodium cacodylate buffer, pH 6.4, supplemented with 5 mol of UDP-galactose and 5 mol of MnCl 2 (34). The sample was then freeze-dried. Proteins and glycoproteins were separated by SDS-PAGE and electroblotted onto nitrocellulose membrane. Glycoproteins were then affinodetected using biotinylated RCA 120 lectin (34).
Sialic Acid Analysis-Two mg of proteins were treated as described (35). Then the sample was submitted to DMB derivatization according to Ref. 36. DMB derivatives were separated by high performance liquid chromatography using a C18 column and detected by fluorescence using excitation and emission wavelengths of 373 and 448 nm, respectively (37). Neu5Ac was also coupled to DMB and used as a standard.
Isolation of N-linked Glycans from P. tricornutum-For Nglycan profiling, both PNGase A and PNGase F were used. In contrast to PNGase F, PNGase A is able to release N-linked oligosaccharides carrying a fucose ␣(1,3)-linked to the proximal glucosamine residue (38). Proteins (above 13 mg) were deglycosylated by PNGase F as described for CHO N-glycan profiling. For PNGase A, 5 mg of freeze-dried proteins were resuspended in 3 ml of 4 M Tris-guanidine HCl, pH 8.5, prior to denaturation with a 2 mg⅐ml Ϫ1 DTT solution. After a short 30 s sonication, the sample was incubated at 50°C for 2 h. 1.5 ml of iodoacetamide, prepared in 0.6 M Tris buffer, pH 8.5, at 12 mg⅐ml Ϫ1 , was then added to the sample, which was incubated in the dark for 2 h at room temperature. The sample was dialyzed for 72 h against water. The proteins were di-gested at 37°C for 48 h with 10 mg of pepsin dissolved in 2 ml of 10 mM HCl, pH 2.2. After neutralization with 1 M ammonium hydroxide, the solution was heated for 5 min at 100°C and lyophilized. Glycopeptides were then deglycosylated overnight at 37°C with 1.5 milliunits of PNGase A in a 50 mM sodium acetate buffer, pH 5.5. N-Glycans released by either PNGase A or F were purified by successive elutions through a C18 and a Carbograph cartridges according to Ref. 39.
Isolation of N-linked Glycans from CHO Cells-Cells from wild type CHO and CHO Lec1 mutant and CHO Lec1 GnT I complemented cells were lysed by sonication (four times for 20 s) in 1 ml of 100 mM Tris-HCl buffer, pH 7.5, SDS 0.1%. After centrifugation at 100 ϫ g, proteins from the supernatant were deglycosylated by PNGase F as described above, and then N-glycans were purified as already mentioned. Then Nglycan samples were concentrated and finally analyzed by MALDI-TOF mass spectrometer.
Preparation and Exoglycosidase Digestion of 2-AB Oligosaccharides-N-Glycans were labeled with 2-AB using the protocol described in Ref. 40. After incubation at 60°C for 2 h, 2-AB-labeled N-glycans were purified by paper chromatography according to (41). The 2-AB-labeled N-glycans were finally analyzed by MALDI-TOF mass spectrometry before and after jack bean ␣-mannosidase or ␣-L-fucosidase treatments following the principle described in Ref. 39. For the ␣-mannosidase digestion, 2.5 l of 2-AB labeled N-glycans were incubated with 1 l of water and 214 milliunits of enzyme for 48 h at 37°C. The ␣-L-fucosidase from bovine kidney was desalted prior to use and resuspended in 40 M of sodium acetate, pH 5.5. Then 80 milliunits of enzyme was incubated with 2.5 l of 2-AB labeled glycans at 37°C for 48 h. Both digested samples were freeze-dried and resuspended in 10 l of water, 0.1% TFA prior to mass spectrometry analysis.
MALDI-TOF Mass Spectrometry Analysis-Mass spectra were acquired on a Voyager DE-Pro MALDI-TOF instrument equipped with a 337-nm nitrogen laser. Mass spectra were performed in the reflector delayed extraction mode using 2,5dihydroxybenzoic acid. This matrix, freshly dissolved at 5 mg⅐ml Ϫ1 in 70:30 acetonitrile, 0.1% TFA, was mixed with the water solubilized oligosaccharides in a ratio 1:1 (v/v). These spectra were recorded in a positive mode, using an acceleration voltage of 20,000 V with a delay time of 100 ns. They were smoothed once and externally calibrated using commercially available mixtures of peptides and proteins. In this study, the spectra were externally calibrated using des-Arg 1bradykinin (904.4681 Da), angiotensin I (1,296.6853 Da), Glu 1 -fibrinopeptide B (1,570.6774 Da), and ACTH 18 -39 (2,465.1989 Da). Laser shots were accumulated for each spectrum, at least 5000 laser shots. The mass accuracy obtained is 0.011% on average, which is in agreement with the specifications of the instrument used in this study. the Glc 3 Man 9 GlcNAc 2 -PP-dolichol precursor and its transfer by the OST onto asparagine residues of nascent polypeptides entering the lumen of the rough ER; (ii) deglucosylation/reglucosylation of the precursor N-glycan in the ER, allowing the interaction with chaperones responsible for proper folding and oligomerization; and finally (iii) maturation in the Golgi apparatus of the high mannose type N-linked oligosaccharides into complex type N-glycans. Based on sequence homologies, we identified in the genome of P. tricornutum a set of putative sequences that are likely involved in the different steps of the N-glycan biosynthesis and maturation ( Fig. 1 and Table 1). Most of these identified genes have expressed sequence tag support ( Fig. 1 and Table 1).

In
All of the genes encoding for enzymes involved in the biosynthesis of dolichol pyrophosphate-linked oligosaccharide on the cytosolic face and in the lumen of the ER were identi-fied in the genome of P. tricornutum ( Fig. 1 and Table 1). The sequences and topologies of the predicted proteins are highly similar to the corresponding asparagine-linked glycosylation (ALG) orthologs described in other eukaryotes (42), except for ALG 10, for which a P. tricornutum candidate sequence was not clearly identified. Putative transferases, which enabled catalyzation of the formation of dolichol-activated mannose and glucose, were also found. Those two activated sugars are required for the elongation steps arising in the ER lumen. In addition to sequences involved in the biosynthesis of the dolichol pyrophosphate-linked oligosaccharide, two putative genes encoding for orthologs of the STT3 catalytic subunit of OST multisubunit complex were identified (Table 1). These multi-spanned sequences, sharing 34 and 37%, respectively, of identities with A. thaliana and H. sapiens STT3 subunits, contain the conserved WWDYG domain required for the STT3 transferase activity (43).
Genes encoding for polypeptides involved in the quality control of proteins in the ER were also found in the P. tricornutum genome. Indeed, ␣ and ␤ subunits of ␣-glucosidase II were identified. The ␣ subunit contains the characteristic DMNE sequence (44) and a C-type lectin domain involved in mannose binding (45). A putative UDP-glucose: glycoprotein glucosyltransferase and a calreticulin, two molecules ensuring the quality control of the glycoproteins in the ER, are also predicted. Calreticulin is a soluble Ca 2ϩbinding protein of the ER lumen involved in the retention of incorrectly or incompletely folded proteins. Putative P. tricornutum calreticulin exhibits more than 50% of identity with orthologs from N. plumbaginifolia (56%) and A. thaliana (53%). Structurally, the P. tricornutum calreticulin contains the three specific domains required for its biological function: a N-terminal domain of ϳ180 amino acids, a central domain of ϳ70 residues containing three repeats of an acidic 17-amino acid motif, and a C-terminal domain rich in acidic and lysine residues, both responsible for Ca 2ϩ binding (46). P. tricornutum calreticulin also exhibited a predicted signal peptide and a C-terminal YDEF tetrapeptide that may ensure its retention in the ER as HDEL, KDEL, or YDEL signals that are known to play this function in higher eukaryotes (47)(48)(49).
In regard to Golgi enzymes involved in N-glycan biosynthesis, P. tricornutum genome contains two sequences encoding for proteins that belong to the glycosylhydrolase family GH47 that catalyze the hydrolysis of the terminal ␣(1,2)-mannose residues of high mannose type N-glycans. The first predicted sequence (sequence 52346) encodes for a protein sharing 32 and 30%, respectively, of identity with MNS4 and MNS5 from A. thaliana, which were characterized as being ER degradation-enhancing mannosidases (50). The second sequence ( Fig. 1 and Table 1) encodes for a protein sharing 35 and 34%, respectively, of identity with MNS1 and MNS2, two A. thaliana ␣-Man I located in the Golgi apparatus and able to perform the trimming of Man-9 into Man-5 (50,51). Furthermore, this putative P. tricornutum mannosidase exhibits the three conserved catalytic motifs of ␣-Man I, the threonine residue of the motif III, and the two cysteine residues (Cys-301 and Cys-333) essential for the mannosidase activity (51,52). A signal anchor is also predicted in the N-terminal part of the protein as required for a type II transmembrane protein (Table 1). Moreover, there is some expressed sequence tag supporting an expression for this enzyme in P. triconutum (Table 1).
In addition, one putative GnT I and one putative ␣-Man II were also identified in P. tricornutum genome (Table 1). These enzymes are involved in the N-glycan maturation into complex oligosaccharides by transferring a terminal GlcNAc on the ␣(1,3)-mannose arm of Man-5 and then removing the two mannose residues located on the ␣(1,6)mannose arm. The putative GnT I sequence is predicted to be a typical type II membrane protein. Its cytoplasmic tail contains three basic amino acids that could promote ER exit as demonstrated for N. tabacum GnT I (53). This sequence also possesses a luminal part sharing 37% of identity with the rabbit GnT I (Fig. 2). From the crystal structure of this mammalian transferase, 22 amino acid residues in the catalytic domain were shown to form direct or water-mediated interactions with the UDP-GlcNAc nucleotide sugar and the Mn 2ϩ ion (54,55). Fourteen of these residues are strictly identical in the P. tricornutum GnT I, whereas the other residues are closely conserved (Fig. 2). Moreover, the SQD motif, which has been demonstrated to be important because this motif interacts with the uracil ring of the donor substrate, is also present in the P. tricornutum GnT I sequence (54). This conserved motif is present in all GnT I characterized so far (58). The EDD motif that is a variation of the canonical acidic metal-binding DXD motif is conserved in the P. tricornutum sequence. This motif has been demonstrated to have critical interaction with UDP-GlcNAc and the metal ion in rabbit GnT I (55). The predicted ␣-Man II consisted of a large protein containing the three Pfam domains of CAZy GH38 glycosylhydrolases and the conserved residues involved in Zn 2ϩ binding in the catalytic site of D. melanogaster ␣-Man II (59,60).
In eukaryotes, ␣(1,3)and ␣(1,6)-fucosyltransferases transfer fucose residues onto the proximal GlcNAc unit of the Nlinked glycan core. In silico analysis of the P. tricornutum genome revealed the presence of three genes encoding for putative fucosyltransferases (FucT). These candidates exhibit the appropriate type II membrane protein topology (Table 1) and 23% (sequence 46109), 28% (sequence 46110), and 25% (sequence 54559), respectively, of identity with A. thaliana FucTA. These FucT candidates exhibited the motifs I and II of ␣(1,3)-FucT (61,62), the SNC(G/A)A(R/H)N sequence, specific for plant and D. melanogaster ␣(1,3)-FucT (63-65), as well as the CXXC motif located at the C-terminal sequence involved in the formation of disulfide bonds (66). A putative type II xylosyltransferase is also predicted in the genome. This sequence shares 24% of identity with the luminal part of A. thaliana ␤(1,2)-xylosyltransferase involved in the transfer of a ␤(1,2)-xylose residue onto the ␤-Man of the N-glycan core (67,68). Nevertheless, in the absence of reported motifs specific for ␤(1,2)-xylosyltransferase activity, the involvement of this putative transferase in P. tricornutum N-glycan pathway remains highly hypothetical. Searches for sequences encoding other N-glycan-maturating transferases, such as N-acetylglucosaminyltransferases ranging from GnT II to GnT VI that allow the formation of polyantennary N-glycans or sialyltransferases, did not reveal any ortholog in the P. tricornutum genome.
P. tricornutum Proteins Mainly Carry High Mannose Type N-Glycans-Analysis of glycans N-linked to P. tricornutum proteins was first investigated by Western blot on a total pro-

Protein number Gene location Predicted protein function
To investigate the presence in P. tricornutum proteins of complex glycans carrying terminal GlcNAc, we treated the protein extract with a ␤(1,4)-galactosyltransferase, an enzyme able to transfer a galactose residue onto terminal GlcNAc residues, and then analyzed by affinoblotting the resulting protein preparation with RCA 120, a lectin that binds specifically to Gal␤1-4GlcNAc sequences (34). In contrast to a plant-derived IgG used as a positive control of galactose transfer, no signal was detected in P. tricornutum sample after this treatment (Fig. 3C), thus indicating that this diatom does not exhibit terminal GlcNAc onto its proteins at detectable level (0.5 g). Moreover, the presence of N-acetylneuraminic acid (Neu5Ac), the main sialic acid found in mammals, was investigated by coupling to DMB (37) and analysis of the resulting DMB derivatives by liquid chromatography. Although low intensity peaks were detected by fluorescence, none of them co-migrated with a standard of DMB-Neu5Ac (data not shown). Catalytic amino acids are very conserved in the putative GnT I protein from P. tricornutum. Protein sequences alignment between rabbit (1FOA) and P. tricornutum, as proposed by the Swiss-Pdb viewer program (56). Secondary structural elements are represented above the alignment for the P. tricornutum GnT I and below the alignment for the rabbit GnT I with a bold right arrow as the ␤ strand and a looped line as the ␣ helix. Essential residues for the binding of the donor substrate (UDP-GlcNAc) are indicated by arrowheads above the alignment: in black when identical and in white when not. Rabbit GnT I disulfide bridges are also numbered. The figure was created with the Espript program (57).
To investigate their detailed N-glycan structures, N-glycans were released from proteins by PNGase A treatment (31). The resulting N-glycans were then coupled to 2-AB (40,41) to facilitate their detection and analysis by MALDI-TOF mass spectrometry. As illustrated in Fig. 4A, major ions correspond to (M ϩ Na) ϩ adducts of 2-AB derivatives of Hexose 5-9 GlcNAc 2 . Other minor ions were also detected in the mass spectrum profile with m/z values corresponding to Hexose 3-4 GlcNAc 2 oligosaccharides. Moreover, the ion at m/z 1199 was assigned to Hexose 3 DeoxyhexoseGlcNAc 2 Nlinked glycan. The pool of glycans was then submitted to exoglycosidase digestions. The oligosaccharide mixture was converted to HexoseGlcNAc 2 and HexoseDeoxyHexoseGlcNAc 2 upon a treatment with jack bean ␣-mannosidase (Fig. 4B), demonstrating the presence of ␣-linked mannose residues in PNGase A-released N-glycans. Furthermore, a treatment of the sample with ␣-L-fucosidase resulted in the suppression of ion at m/z 1199 (not shown). As a consequence, main ions detected in MALDI-TOF mass spectrum (Fig. 4A) were assigned to high mannose N-glycans ranging from Man-5 to Man-9, and minor ions were assigned to Man-3, Man-4, and Man 3 FucGlcNAc 2 . To investigate the location of the fucose residue onto the core N-glycan in this later N-glycan, proteins were submitted to a deglycosylation experiment by PNGase F, a deglycosylating enzyme that is not able to cleave N-linked oligosaccharides harboring a fucose ␣(1,3)-linked to the proximal glucosamine residue (38). The ion assigned to Man 3 FucGlcNAc 2 was not observed in the mass spectrum, FIGURE 3. P. tricornutum glycoproteins harbor N-linked oligosaccharides. A, affinodetection using concanavalin A (Con A) and immunodetection using antibodies raised against the core ␤(1,2)-xylose (anti-Xyl) and core ␣(1,3)-fucose (anti-Fuc) epitopes of proteins isolated from green onion used as a positive control (lanes 1) and from P. tricornutum (lanes 2). B, affinodetection by concanavalin A of proteins extracted from P. tricornutum treated (ϩ) or not (Ϫ) with Endo H and PNGase F. C, affinodetection with RCA 120 of P. tricornutum proteins treated (ϩ) or not (Ϫ) with bovine ␤(1,4)galactosyltransferase. Plant-derived IgG was used as a positive control of the galactose transfer efficiency (34). The arrows indicate the migration of heavy (H) and light (L) chains. indicating that this glycan carries a core ␣(1,3)-fucose residue (Fig. 4C).

Putative P. tricornutum GnT I Gene Is Able to Complement the Deficiency in N-Glycan Maturation in CHO Lec1
Mutant-Bio-informatic analysis of P. tricornutum genome revealed a gene potentially encoding for a GnT I glycosyltransferase. The expression of this gene was monitored in a continuous culture over a 1-month period by real time quantitative PCR. The relative gene expression was normalized to two reference genes encoding for ribosomal protein small subunit 30 S and histone H4 that have been recently described as appropriate housekeeping genes for real time quantitative PCR in P. tricornutum (26). The results show that P. tricornutum GnT I is steadily expressed in standard culture conditions over 32 days (Fig. 5A).
To demonstrate that this putative transferase is able to transfer in vivo a GlcNAc residue onto the proteins carrying Man-5 N-linked glycans, the P. tricornutum GnT I (GenBank TM accession number 1370344) was expressed in fusion with a V5 tag into CHO Lec1 mutant deficient in its endogenous GnT I (16). On the basis of the immunodetection of the V5 epitope, CHO transformants were found to efficiently express the recombinant GnT I. Two transformants (2 and 4) were selected for N-linked glycan analysis (Fig. 5B). Proteins from these two clones, from the wild type CHO and from the CHO Lec1 mutant, were isolated, and their N-linked glycans were released by treatment with PNGase F. As illustrated in Fig. 6A, MALDI-TOF mass spectrum of N-glycans released from the CHO Lec1 mutant showed that it accumulates high mannose type N-glycans, contrasting with the wild type CHO cell exhibiting both high mannose and complex type N-glycans (Fig. 6B).
In Fig. 6C, glycoproteins from the complemented line (clone 4) displayed both high mannose oligosaccharides and a complete set of complex type N-glycans identical to the one observed in wild type CHO cells (Fig. 6B). The same N-glycan profile was obtained with clone 2 (data not shown). This demonstrates that the expression of the P. tricornutum GnT I was able to restore the biosynthesis of complex N-glycans in the mammalian cell mutant.

DISCUSSION
Bio-informatic analysis of the P. tricornutum genome revealed the presence of a completed set of sequences potentially encoding for proteins involved in the synthesis of the lipid-linked Glc 3 Man 9 GlcNAc 2 -PP-dolichol N-glycan, the subunits of the OST complex that catalyze its transfer onto the asparagine residues of target proteins, as well as ER glucosidases and chaperones (6,7). This suggests that this diatom possesses the ER machinery required for glycoprotein quality control previously characterized for other eukaryotes (71). The genome analysis also revealed the presence of one Golgi mannosidase I involved in the trimming of the N-glycan precursor into Man-5 high mannose type N-glycan (50,51). Consistent with these sequence predictions, two lines of biochemical evidence strongly suggest that proteins from the P. tricornutum Pt 1.8.6 strain are mainly N-glycosylated by high mannose type oligosaccharides. First, proteins from this strain are affinodetected by concanavalin A, a lectin specific for high mannose sequences. This detection is largely suppressed upon treatment with Endo H or PNGase F, two deglycosylating enzymes specific for N-linked glycans. Furthermore, affinoand immunodetection with other glycan-specific probes, as well as a search for sialic acids, were unsuccessful. Second, MALDI-TOF mass spectrometry of the N-glycan population released by PNGase A allowed the detection of Hexose 5-9 GlcNAc 2 oligosaccharides, sensitive to an ␣-mannosidase treatment, as major oligosaccharide species. We concluded that proteins from the P. tricornutum mainly carry Man-5 to Man-9 high mannose type N-glycans. Other minor glycan species i.e. Man-3, Man-4, and Man 3 FucGlcNAc 2 carrying a fucose ␣(1,3)-linked to the proximal glucosamine residue were also detected. The presence of this later sequence corroborates the weak detection of core ␣(1,3)-fucose epitopes on Western blot, potential product of the putative FucT sequences predicted in the P. tricornutum genome.
In eukaryotes, the first steps of the N-glycan processing into complex N-glycans are controlled by GnT I, ␣-Man II, and GnT II. The resulting core N-glycan is modified by the action of a wide variety of glycosyltransferases giving rise to mature N-linked glycans involved in various biological processes (8 -11). Putative GnT I and ␣-Man II are predicted in the P. tricornutum genome. We mainly focused on GnT I characterization because this transferase is the first enzyme initiating the complex type maturation of oligosaccharides N-linked to secreted proteins. In standard culture conditions, we demonstrated that the gene encoding this putative transferase is expressed. However, no glycan carrying terminal GlcNAc residues has been detected on P. tricornutum proteins by either a galactosyltransferase assay or by MALDI-TOF mass spectrometry analysis of the PNGase A-released oligosaccharides. Therefore, the in vivo activity of this putative GnT I was investigated by expressing the full-length protein in CHO Lec1 mutant lacking its endogenous GnT I activity (16). Wild type-like N-glycosylation profiles were detected in transformed cell lines, thus demonstrating that the putative GnT I from this diatom was able to restore the biosynthesis of complex type N-glycans in GnT I-null CHO cells. This shows that the P. tricornutum gene encodes for a protein able to perform in vivo the processing of oligomannosides into complex type N-glycans, thus corresponding to a GnT I activity. To our knowledge, this work is the first functional characterization of a microalgal N-glycan glycosyltransferase. These data also suggest that both the targeting and the retention mechanism of Golgi enzymes are conserved between mammals and diatoms.
A search for putative GnT I was carried out in genomes of species belonging to the three main microalgae lineages, i.e. Viridiplantae, Heteroconta, and Haptophyta of the phylogenetic tree. Genes encoding for putative GnT I were identified in two other heterokonts, i.e. F. cylindrus and T. pseudonana, as well as in the haptophyte E. huxleyi. Search for GnT I in Viridiplantae only revealed one sequence in M. pusilla and not in other species (Fig. 7). The microalgal GnT I complete sequences share ϳ25% of identity with plant and mammal GnT I. Heteroconta and Haptophyta GnT I are gathered in a distinct lineage that could be clearly separated from other GnT I as seen in the phylogenetic tree (Fig. 7). So far, the processing of N-linked glycans into complex oligosaccharide has been mainly described in multicellular higher eukaryotes such as animals and land plants and has been demonstrated to be required for normal morphogenesis in animals (9, 10). Our results show that this key Golgi transferase is also involved in the processing of N-linked glycans in unicellular microalgae species.
Mainly high mannose type N-glycans were detected on P. tricornutum proteins, which suggests that GnT I possesses a limited in vivo impact on glycans N-linked to secreted proteins of this diatom. No glycan carrying terminal GlcNAc residue has been detected onto P. tricornutum proteins. However, although representing a minor glycan population, small size N-glycans Man-3 and Man-4, as well as Man 3 FucGlcNAc 2 , were detected in the P. tricornutum N-glycan mass profile. These glycans, named paucimannose structures, have been previously found in invertebrates and plants and result from the degradation of GlcNAc-terminated complex glycans (GlcNAcMan 3-4 GlcNAc 2 ) by N-acetylglucosaminidases after their biosynthesis in the Golgi apparatus. Indeed, in the GnT I-dependent pathway, GlcNAcMan 5 GlcNAc 2 , the product of GnT I (Fig. 1), is successively converted in the Golgi apparatus into GlcNAcMan 4 GlcNAc 2 and then into GlcNAcMan 3 GlcNAc 2 by action of the ␣-Man II and then into GlcNAcMan 3 FucGlcNAc 2 by ␣-FucT. Elimination of terminal GlcNAc by ␤-N-acetylglucosaminidases in the secretory system or in compartments where proteins accumulate can then degrade these oligosaccharides into Man-3, Man-4, and Man 3 FucGlcNAc 2 . Such a processing was demonstrated in insect cells (72), C. elegans (73), and plants (74). Two putative processing ␤-N-acetylglucosaminidases belonging to the CAZy glycosylhydrolase GH20 family are predicted in the P. tricornutum genome (sequences 49563 and 45073). These glycosidases share 43 and 36%, respectively, of identity with DmFDL, a ␤-Nacetylglucosaminidase from Drosophila that is able to specifically hydrolyze a GlcNAc residue located onto the ␣(1,3)-antenna of N-glycans giving rise to paucimannose oligosaccharides (75,76). Taken together, these data suggest that such processing may also occur in diatoms.
The biochemical characterization of the core-modifying FucTs and of the processing ␤-N-acetylglucosaminidases are currently studied. The N-glycosylation patterns of proteins from P. tricornutum grown in different conditions are currently under investigation to study this major post-translational modification in relation to pleomorphism and/or stress environmental conditions and the function of complex Nglycans in the diatom physiology.