Analysis of substrate specificity of Trypanosoma brucei oligosaccharyltransferases (OSTs) by functional expression of domain-swapped chimeras in yeast

N-Linked protein glycosylation is an essential and highly conserved post-translational modification in eukaryotes. The transfer of a glycan from a lipid-linked oligosaccharide (LLO) donor to the asparagine residue of a nascent polypeptide chain is catalyzed by an oligosaccharyltransferase (OST) in the lumen of the endoplasmic reticulum (ER). Trypanosoma brucei encodes three paralogue single-protein OSTs called TbSTT3A, TbSTT3B, and TbSTT3C that can functionally complement the Saccharomyces cerevisiae OST, making it an ideal experimental system to study the fundamental properties of OST activity. We characterized the LLO and polypeptide specificity of all three TbOST isoforms and their chimeric forms in the heterologous expression host S. cerevisiae where we were able to apply yeast genetic tools and newly developed glycoproteomics methods. We demonstrated that TbSTT3A accepted LLO substrates ranging from Man5GlcNAc2 to Man7GlcNAc2. In contrast, TbSTT3B required more complex precursors ranging from Man6GlcNAc2 to Glc3Man9GlcNAc2 structures, and TbSTT3C did not display any LLO preference. Sequence differences between the isoforms cluster in three distinct regions. We have swapped the individual regions between different OST proteins and identified region 2 to influence the specificity toward the LLO and region 1 to influence polypeptide substrate specificity. These results provide a basis to further investigate the molecular mechanisms and contribution of single amino acids in OST interaction with its substrates.

karyotes. N-Glycosylation is an essential process and plays important roles in protein folding quality control, cell-cell interactions, and developmental processes (1,2). The glycans transferred to nascent polypeptide chains in the ER 6 are built on the lipid carrier dolichylphosphate (Dol-P) to yield the lipidlinked oligosaccharide (LLO) substrate for oligosaccharyltransferase (OST). The biosynthesis of the LLO is an ordered, stepwise process conducted by the concerted action of specific glycosyltransferases that are encoded by the asparagine-linked glycosylation (ALG) genes. LLO biosynthesis is initiated by the addition of N-acetylglucosaminyl-phosphate (GlcNAc-P) and N-acetylglucosamine (GlcNAc) to the lipid carrier on the cytosolic face of the ER membrane using nucleotide-activated UDP-GlcNAc as donor to form Dol-PP-GlcNAc 2 . Subsequently, the LLO is elongated by five mannose (Man) residues. The Man 5 GlcNAc 2 LLO is then translocated into the ER lumen. There, Dol-P-bound Man serves as donor for the further elongation of the LLO with Man until a Man 9 GlcNAc 2 structure is built up. In most of the fungi and animal species, the addition of three glucose (Glc) residues from Dol-P-Glc terminates LLO biosynthesis (3).
The mature Glc 3 Man 9 GlcNAc 2 LLO is used as donor substrate by the OST that transfers the oligosaccharide from the lipid carrier en bloc to an asparagine residue of a nascent polypeptide chain. In eukaryotes, the acceptor asparagine residue is located within a conserved sequon consisting of three amino acids: asparagine, a second amino acid (any, but proline), and threonine or serine (NX(T/S)) (4). In multicellular eukaryotes the OST is a complex assembled from eight different proteins with STT3 encoding the catalytic subunit (5)(6)(7). Other subunits of the hetero-oligomeric complex were suggested to influence OST substrate interactions and complex assembly (8 -12).
The genomes of the kinetoplastids Trypanosoma and Leishmania only encode homologues of the yeast STT3 gene (13). All other subunits found in hetero-oligomeric OST complexes of Saccharomyces cerevisiae or mammals are missing in Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, suggesting that these proteins function as single subunit OSTs similar to enzymes known from bacterial and archaeal N-gly-cosylation systems (14,15). S. cerevisiae has proven to be a suitable heterologous in vivo system to functionally express and characterize kinetoplastid OSTs. The yeast cells used in these studies lacked the essential yeast STT3 subunit or other essential OST subunits. This deleterious loss of a functional yeast OST is complemented by expression of different STT3 proteins from T. cruzi, T. brucei, and L. major, demonstrating that these STT3s function as single protein OSTs (16 -19).
T. brucei is a protozoan parasite causing African sleeping sickness in humans and nagana in cattle. T. brucei cells encounter two hosts during their life cycle. The parasite exists as the procyclic form in an insect vector (the tsetse fly), and it is referred to as bloodstream form when afflicting the mammalian host. The surface of T. brucei cells is covered by glycoproteins termed variant surface glycoprotein (VSG) in the bloodstream form and procyclins in its procyclic life stage. N-Glycosylation epitopes on VSG play an important role in T. brucei virulence (20). Furthermore, a possible immune evasion strategy has been proposed where T. brucei genetically recombined its N-glycosylation machinery, resulting in the change of the glycosylation status of VSG (21). The T. brucei genome encodes three paralogues of STT3 termed TbSTT3A, TbSTT3B, and TbSTT3C (19,22). In contrast to multicellular eukaryotes, trypanosomatids are incapable of synthesizing Dol-P-Glc and therefore lack the three capping Glc residues found in yeast and higher eukaryotes (22)(23)(24).
The three paralogue STT3s encoded by the T. brucei genome display distinct preferences for the LLO donor as well as for the acceptor polypeptide substrate. Whereas TbSTT3A was shown to preferentially transfer Man 5 GlcNAc 2 glycans to acceptor polypeptide chains, both TbSTT3B and TbSTT3C glycosylated acceptor sites with Man 9 GlcNAc 2 glycans (19). A recent study performed in T. brucei cells extended these findings, reporting that both TbSTT3A and TbSTT3B transfer Man 7 GlcNAc 2 and Man 5 GlcNAc 2 glycans to a VSG protein (25). The authors suggest that the substrate specificity of TbSTT3A and TbSTT3B is promoted by the presence or absence of the LLO substrate c-branch. The LLO specificity of TbSTT3C was not addressed because this paralogue is not expressed in T. brucei cells used for the experiments (19). Analysis of the OST protein products revealed that TbSTT3A and TbSTT3C preferentially glycosylate sequons with acidic amino acids in the sequon's vicinity. By contrast, TbSTT3B did not display a particular preference for glycosylation sequons and the amino acids surrounding them (19).
In this study, we used the functional expression of T. brucei STT3 proteins in yeast to analyze their function in detail. We took advantage of the "genetic tailoring" (26) of the OST substrate and the quantitative analysis of glycosylation site occupancy (27) to characterize T. brucei OST function. Domainswap experiments made it possible to assign functional properties to specific regions of the STT3 proteins.

TbSTT3A, TbSTT3B, and TbSTT3C display differential preferences for LLO substrates from Man 5 GlcNAc 2 to Glc 3 Man 9 GlcNAc 2
In vivo analysis of the TbSTT3 LLO specificity revealed that TbSTT3B transfers primarily Man 9 GlcNAc 2 oligosaccharides, whereas TbSTT3A transfers Man 5 GlcNAc 2 glycans to its preferred glycosylation site of VSG221. Data acquired in the ⌬stt3 S. cerevisiae strain suggested that TbSTT3B and TbSTT3C both accept Glc 3 Man 9 GlcNAc 2 as LLO substrate, whereas TbSTT3A cannot utilize the Glc 3 Man 9 GlcNAc 2 LLO substrate (19). We therefore further investigated the LLO specificity of all three TbSTT3 paralogues using the heterologous yeast expression system. We combined the STT3 deletion with different deletions in the LLO biosynthesis pathway (ALG genes) in yeast. The ⌬stt3⌬alg strains harboring the yeast STT3 (ScSTT3) URA3 plasmid and a second LEU2-marked plasmid, encoding either ScSTT3 or the different TbSTT3 paralogues, were subjected to plasmid shuffling using 5-FOA. In this approach, survival of the cells on 5-FOA depended on the ability of the different TbSTT3 genes, encoded on the LEU2 plasmids, to complement the yeast STT3 deletion.
In the ⌬stt3⌬alg3 double mutant strain (accumulation of the Man 5 GlcNAc 2 oligosaccharide), only TbSTT3A and TbSTT3C complemented the STT3 deletion (Fig. 1A). This result is in accordance with the finding that TbSTT3A can utilize the Man 5 GlcNAc 2 LLO substrates in T. brucei (19). Interestingly, in both the ⌬stt3⌬alg9 and the ⌬stt3⌬alg12 strains, all three TbSTT3 paralogues complemented the deletion of endogenous OST activity (Fig. 1, B and C). In the ⌬stt3 strain, only TbSTT3B and TbSTT3C complement the STT3 deletion, whereas TbSTT3A-expressing cells did not survive plasmid shuffling ( Fig. 1E) (19). The inability of TbSTT3A to complement ⌬stt3 in the presence of Glc 3 Man 9 GlcNAc 2 LLOs was independent of the three terminal glucose residues on the LLO substrate because growth of ⌬stt3⌬alg6 cells expressing TbSTT3A was also not rescued (Fig. 1D). Our data confirm that the complementary activity of the different T. brucei STT3 proteins depended on the oligosaccharide structures of the substrate LLO.

Two distinct protein regions influence LLO specificity of TbSTT3B and TbSTT3C
TbSTT3B and TbSTT3C protein sequences are ϳ95% identical, and sequence differences cluster, in contrast to the TbSTT3A, in three distinct regions located in lumenal part of the proteins (Fig. 2). These two OSTs differ significantly with respect to their LLO substrate specificity: TbSTT3B does not accept Man 5 GlcNAc 2 LLOs, in contrast to TbSTT3C. We therefore reasoned that one or a combination of these regions would account for different LLO specificities of TbSTT3B and TbSTT3C and that chimeric TbSTT3B/C proteins will be fold properly.
The differences in LLO specificity led to the inability of TbSTT3B to rescue the growth of the ⌬stt3⌬alg3 strain, whereas TbSTT3C was able to complement the STT3 deletion of this strain. TbSTT3B-C chimeras were constructed by exchanging single regions of TbSTT3B by corresponding regions of TbSTT3C (TbSTT3B-1C, TbSTT3B-2C, and TbSTT3B-3C) or by combinations of two regions (TbSTT3B-1/2C, TbSTT3B-2/3C, and TbSTT3B-1/3C). To test the effect of the region exchange on LLO specificity, ⌬stt3⌬alg3 cells harboring the chimeric TbSTT3B-C constructs were subjected to 5-FOA-induced plasmid shuffling, and growth of the cells was Substrate specificity of TbSTT3s assessed (Fig. 3A). The exchange of region 1 in TbSTT3B did not rescue the cell growth in the ⌬stt3⌬alg3 (Man 5 GlcNAc 2 ), and the growth in the Glc 3 Man 9 GlcNAc 2 accumulating strain was reduced as well. In contrast, the exchange of region 2 (TbSTT3B-2C) allowed cell growth in both backgrounds comparable with TbSTT3C. Also, the exchange of region 3 (TbSTT3B-3C) supported growth of ⌬stt3⌬alg3 cells albeit at a somewhat reduced level. These observations were further substantiated by the TbSTT3B-C chimeras with two regions exchanged. Both chimeras containing region 2 (TbSTT3B-1/2C and TbSTT3B-2/3C) promoted growth similar to TbSTT3C, whereas the chimera with regions 1 and 3 (TbSTT3B-1/3C) resulted in a severe growth phenotype in ⌬stt3⌬alg3 cells, but not in cells with normal LLO biosynthesis (Fig. 3, A and B). We concluded that both regions 2 and 3 were required for optimal utilization of Man 5 GlcNAc 2 LLO. However, region 2 seems to be more important in determining LLO substrate specificity of TbSTT3B and TbSTT3C describing an oligosaccharide recognition domain of this single subunit OST.

TbSTT3A, TbSTT3B, and TbSTT3C display differential preferences for polypeptide substrates
Having established that region two is responsible for providing specificity toward the LLO donor substrate, we also sought to identify the region(s) of the TbSTT3 paralogues that provide(s) specificity toward polypeptide substrates. In the first step, differences in the glycosylation efficiencies of the three TbSTT3 paralogues for different glycosylation sites of yeast proteins were addressed. It was previously reported that TbSTT3A and TbSTT3C modified more efficiently the glyco-sylation sites that are located in a sequence context with acidic amino acids, whereas TbSTT3B did not display a particular preference for specific amino acids in the local environment of the glycosylation site (19). A parallel reaction monitoring (PRM) mass spectrometry (MS)-based method ( Fig. 4A) was used to determine the occupancy of glycosylation sites from different yeast membrane proteins, which represents the percentage of peptides modified with a glycan at a given glycosylation site compared with same peptide in wild-type cells. To monitor all three T. brucei OST proteins, we used the ⌬stt3⌬alg9 yeast strain to compare the glycosylation efficiencies of TbSTT3A, TbSTT3B, and TbSTT3C. This strain generates an LLO substrate compatible with all T. brucei OSTs, and the N-linked glycan is susceptible to EndoH digestion.
The stable isotope labeling with amino acids in cell culture (SILAC) coupled to the PRM MS-based technique (27) was used to analyze glycosylation occupancy at glycosylation sites on proteins in yeast microsomal membrane preparations. In short, a reference wild-type strain was grown in medium containing heavy isotope-labeled arginine and lysine, whereas the strains expressing TbSTT3s were grown in the corresponding medium using regular amino acids (light). Cells were mixed 1:1 and disrupted, and samples were enriched for the glycoproteinrich membrane fraction. N-Linked glycans were cleaved by EndoH to maintain the first GlcNAc residue of the glycan on the protein. Proteins were digested enzymatically, and the resulting peptides were analyzed by liquid chromatographyelectrospray-MS/MS. Corresponding light and heavy peptides were paired, and site occupancies relative to the reference

Substrate specificity of TbSTT3s
strain were calculated for TbSTT3-expressing cells based on light/heavy (L/H) ratios of the peak area values (supplemental Table 1). Site occupancy reflected the preference of the OST for a given glycosylation site and its local environment. Glycosylation sites that are favored by a given OST will be glycosylated more efficiently (i.e. higher site occupancy) as compared with sites not located in a favored peptide sequence context. Cluster analysis of the site occupancy data acquired for the TbSTT3 paralogues indicated that TbSTT3A and TbSTT3C glycosylation efficiency of the 55 analyzed sites were similar and the efficiency of glycosylation by TbSTT3B was distinct from TbSTT3A and TbSTT3C (Fig. 4B). This result confirmed previously reported differences in the glycosylation efficiency observed for the TbSTT3B and TbSTT3C (19).
We then examined the polypeptide substrate specificity of TbSTT3 paralogues in the context of sequence polarity. Sequence composition analysis was performed on glycosylation sequons themselves plus 10 residues downstream and upstream of the glycosylation sites. Amino acid residues were grouped based on their polarity into acidic (Asp and Glu), basic

Substrate specificity of TbSTT3s
(Arg, Lys, and His), polar (Ser, Thr, Asn, Glu, Cys, and Tyr), and hydrophobic (Ala, Val, Ile, Leu, Met, Phe, Trp, Pro, and Gly) groups. Sequences were divided into "efficiently" and "poorly" glycosylated when the glycosylation occupancy calculated was more than 85% or less than the 25% compared with reference strains, respectively. The ratio of each amino acid group was calculated compared with total number. Percentage change in respective ratios of each amino acid group between efficiently and poorly glycosylated sequences downstream and upstream of the glycosylation sequon was calculated for each TbSTT3expressing strain. The analysis of sequences downstream of the glycosylation sites showed no apparent difference in sequence  Table 1. Data were used for cluster analysis. In cluster analysis samples with high similarity are close, and samples with low similarity are distant from each other. C, sequence composition analysis was performed where amino acids were grouped based on their polarity. Percentage change in respective ratio of each amino acid group between efficiently and poorly glycosylated sequences upstream of glycosylation sequons was calculated for each TbSTT3expressing strain. Efficiently glycosylated sites are considered where glycosylation occupancy ratio compared with wild type is more than 0.75, and poorly glycosylated sites are considered where glycosylation occupancy ratio compared with wild type is less than 0.25. D, two-sample logo analysis (30) was used to calculate and visualize residues surrounding glycosylation sites that are significantly enriched in either efficiently or poorly glycosylated sites in each TbSTT3expressing strain. Sequence size analyzed contained 10 amino acids upstream and downstream from the glycosylation site. Efficiently glycosylated sites are considered where glycosylation occupancy ratio compared with wild type is more than 0.75, and poorly glycosylated sites are considered where glycosylation occupancy ratio compared with wild type is less than 0.25.

Substrate specificity of TbSTT3s
specificity between different TbSTT3 paralogues (supplemental Table 2). All three enzymes showed preference for hydrophobic residues, whereas polar residues were not favored. The analysis of upstream sequences revealed that both TbSTT3A and TbSTT3C preferred acidic residues (Fig. 4C). Additionally, both paralogues showed disfavor toward basic residues. However, TbSTT3B showed no specific preference for any amino acid type. These results support a previously reported difference in the polypeptide acceptor substrate specificity, where TbSTT3A and TbSTT3C showed selectivity toward glycosylation sequons flanked by acidic residues, whereas TbSTT3B lacked any obvious preference (19). Furthermore, it has been confirmed that upstream amino acids play a dominant role in TbSTT3A acceptor peptide specificity (35).
Site occupancy data acquired for the TbSTT3 paralogues was further exploited to examine whether there is a particular preference for specific amino acids in the local environment of the glycosylation sites. To establish whether there are consensus sequences or patterns that control site-specific N-glycosylation, a two-sample logo analysis (30) was used to calculate and visualize statistically significant residues surrounding glycosylation sites (Fig. 4D). In agreement with sequence composition analysis, both TbSTT3A and TbSTT3C showed efficient glycosylation of sites containing aspartic acid. Sequences enriched in aspartic acid at the Ϫ2 position are more likely to be efficiently glycosylated by TbSTT3A, whereas TbSTT3C preferred sequences enriched in aspartic acid at the Ϫ9 position. Although the effect of distal residues on the glycosylation occupancy (e.g. aspartic acid at Ϫ9 position) needs to be confirmed by mutagenesis studies, the preference for an acidic residue at the Ϫ2 position has been demonstrated in vitro (31). Similar to previous data, TbSTT3B showed no specific preference. Preference for TbSTT3A for amino acids immediately adjacent to the glycosylation site is supported by an extensive analysis of native and artificial polypeptide substrates in T. brucei cells (35).

Distinct protein region influences polypeptide specificity of TbSTT3B and TbSTT3C
To define the polypeptide substrate specificity more closely, we used TbSTT3B-C chimeras to identify regions of the OST proteins that influenced the preference for certain glycosylation sites. Because of the fact that the chimera TbSTT3B-1C yielded poor growth of the strains, we took advantage of the stabilizing property of the concomitant exchange of region 3 and compared the TbSTT3B-1/3C (ϭ TbSTT3C-2B) with the TbSTT3B-2/3C (ϭ TbSTT3C-1B) chimera that yielded similar growth in ⌬stt3⌬alg9 cells (supplemental Fig. 1). Glycosylation efficiencies of different chimeras were determined using the MS-based method described above, and the cluster analysis of the site occupancy data was performed (supplemental Table 3). TbSTT3B-1/3C displayed the highest similarity with TbSTT3C, whereas the corresponding chimera TbSTT3B-2/3C showed only little similarity with TbSTT3C (Fig. 5A). Exchange of region 1 seems to influence glycosylation efficiency of the two TbSTT3 paralogues examined.
We focused the analysis only on the sites that were differentially glycosylated by TbSTT3B and TbSTT3C. Sequence composition analysis was performed where the percentage change in respective ratios of acidic and basic amino acid groups between efficiently and poorly glycosylated sequences upstream of the glycosylation sequon was calculated for each TbSTT3-expressing strain. As expected, TbSTT3C showed preference for the acidic residues, whereas in this analysis, these were disfavored by TbSTT3B enzyme (Fig. 5B). Similar to TbSTT3C, the TbSTT3B-1/3C chimera displayed preference for acidic sequences upstream of the glycosylation sites. Interestingly, a two-sample logo analysis revealed that both the TbSTT3C paralogue and TbSTT3B-1/3C chimera showed preference for an aspartic acid at the same position upstream of the glycosylation sequon, although, as mentioned before, the effect of distal residues on the glycosylation occupancy needs to be further investigated (Fig. 5C). Similar to previous data, TbSTT3B showed no specific preference, and the same was true for TbSTT3B-2/3C chimera. Involvement of region 1 in sequon specificity of TbSTT3B and TbSTT3C has been previously demonstrated, whereupon genetic rearrangements a chimeric gene was generated containing the first variable region of TbSTT3C flanked by TbSTT3B sequences. The chimeric TbSTT3B/C/B protein described in this study showed much less efficient recognition of the native substrate of TbSTT3B, whereas it appeared to have attained a peptide acceptor specificity more similar to TbSTT3A than TbSTT3B (21).

Discussion
Functional expression of single subunit OSTs from kinetoplastids in S. cerevisiae has proven to be a useful model system to study the properties of STT3s from L. major, T. cruzi, and T. brucei (16 -19). Yeast genetics methods allow manipulations of the LLO biosynthesis pathway to generate specific intermediate oligosaccharide structures (26) that can be used to study the influence of altered LLO substrates on OSTs. Consequences of such altered substrates for N-glycosylation can be monitored by analyzing N-glycoproteins, the products of the OST-catalyzed reaction. Both single N-glycoproteins like carboxypeptidase Y (32,33) and MS-based methods, which allow a broader view on many N-glycoproteins at the same time, were used to study consequences of alterations in the N-glycosylation process (27). The combination of genetics and analytical tools available in S. cerevisiae thus represents an excellent system to perform reverse genetics approaches to study particular OST features.
In vivo analysis of the T. brucei VSG221 protein showed that TbSTT3A transfers Man 5 GlcNAc 2 glycans to protein, whereas TbSTT3B prefers Man 9 GlcNAc 2 as a substrate for N-glycosylation (19). Analysis of VSG221 proteins in T. brucei TbALG3 Ϫ / Ϫ and TbALG12 Ϫ / Ϫ mutant strains revealed also that LLO intermediates can serve as substrates for both TbSTT3A and TbSTT3B, although with reduced efficiency. It was hypothesized that efficient glycosylation of VSG221 by TbSTT3A correlates with the absence of the LLO c-branch, whereas for TbSTT3B the presence of the c-branch is an important determinant to improve glycosylation (19,36).
Analysis of the LLO specificities of TbSTT3A and TbSTT3B in T. brucei was possible, because the two VSG221 glycosylation sites get selectively modified either by TbSTT3A with Man 5 GlcNAc 2 glycans or by TbSTT3B with Man 9 GlcNAc 2

Substrate specificity of TbSTT3s
oligosaccharides. This selectivity for a specific glycosylation sequon is provided by the distinct polypeptide specificities of TbSTT3A and TbSTT3B (19). Because TbSTT3C was not expressed in T. brucei, its substrate specificity was investigated in the heterologous yeast expression system. TbSTT3C shared the preference for acidic sequons with TbSTT3A but used Man 9 GlcNAc 2 as LLO substrate, as observed with TbSTT3B (19).
Our detailed investigations on LLO specificity in the yeast in vivo system confirmed the results obtained in T. brucei for TbSTT3A and TbSTT3B. Furthermore, we demonstrated that the inability of TbSTT3A to complement the STT3 deletion of yeast cells was independent of LLO glucosylation. Because the terminal Glc residues of the LLO a-branch did not influence TbSTT3A, although T. brucei synthesizes only non-glucosylated LLOs (22), an interaction of TbSTT3A with the terminal Man residue of the LLO a-branch seems unlikely. The inability of TbSTT3A to support growth of the ⌬stt3 and ⌬stt3⌬alg6 strains was rather due to the presence of LLO c-branch mannoses, which seemed to prevent efficient glycosylation by TbSTT3A. TbSTT3B was able to support growth of all strains tested with the exception of ⌬stt3⌬alg3. In T. brucei, TbSTT3B modified VSG221 in a TbALG3 Ϫ/Ϫ strain, although with reduced efficiency (36). The inability of TbSTT3B to rescue the growth of ⌬stt3⌬alg3 yeast cells was therefore likely to result from reduced overall glycosylation levels, which were too low to allow survival of the yeast cells, rather than the inability to use Man 5 GlcNAc 2 LLOs as substrate in the heterologous host. Opposed to TbSTT3A and TbSTT3B, both of which displayed distinct LLO donor substrate preferences, all genetically tailored LLOs served as a substrate for TbSTT3C. Although TbSTT3 paralogues could utilize a range of LLO substrates to glycosylate proteins, no intermediate glycan structures are transferred to protein by TbSTT3A and TbSTT3B unless ALG mutations were introduced (19,25,36). This indicates that additional factors like availability or accessibility of LLO biosynthesis intermediate and k m values of the ALG enzymes fine tune the glycosylation machinery leading to the specific transfer of Man 5 GlcNAc 2 LLOs by TbSTT3A and Man 9 GlcNAc 2 oligosaccharides by TbSTT3B (19). The heterologous expression system allowed us to perform targeted structure-function analyses. Region 2, which was found to be important for LLO specificity, is located in the C-terminal part of the TbSTT3 protein after the last predicted transmembrane helix. This C-terminal domain was predicted to be localized in the ER lumen due to the presence of the highly conserved WWDXG motif that is important for the glycosylation reaction of OSTs (5,37,38). The results presented here identify for the first time regions of the TbSTT3 proteins that are important for LLO specificity and provide a basis to further investigate the molecular mechanisms and contribution of single amino acids in OST LLO interaction.
Glycosylation efficiency of a given OST can be determined by analyzing site occupancy for different substrate polypeptides (17, 19, 39 -41). Here, we compared the site occupancies for the three TbSTT3 paralogues in the ⌬stt3⌬alg9 strain relative to a wild-type reference strain. Our results confirmed previous observations, made in two different expression hosts, that TbSTT3A and TbSTT3C have similar polypeptide substrate preferences (19). We further found that region 1 influences the glycosylation efficiency of certain polypeptide substrate glycosylation sites. PglB, the bacterial homologue of STT3, interacts with the threonine/serine residue of the glycosylation sequon of its peptide substrate via residues of the conserved WWDXG motif, which determines the specificity for the sequon sequence. A periplasmic loop connecting transmembrane helices 9 and 10 (i.e. external loop 5; EL5) also showed significant interaction with the peptide substrate. The C-terminal part of EL5 pins the peptide against the periplasmic domain, but it also contains the conserved residue Glu-319 that is part of the catalytic site of PglB (38). In the TbSTT3 paralogues, the region surrounding the WWDXG motif is completely conserved. Therefore, it seems unlikely that this region is responsible for the differences observed in site occupancy between TbSTT3B and TbSTT3C. Sequence alignments between PglB from Campylobacter lari and TbSTT3B and TbSTT3C showed that the conserved residue Glu-319 of PglB has an equivalent glutamic acid residue in the TbSTT3s that is located in region 1 of TbSTT3B and TbSTT3C (i.e. Glu-396). Hence, it is tempting to speculate that region 1 could be the functional equivalent to the EL5 described in PglB. Region 1 might interact with polypeptide substrates and modulate the glycosylation efficiency of TbSTT3s. Interestingly, toward the C-terminal end of region one, both TbSTT3A and TbSTT3C have conserved sequences, although the amino acid residues of TbSTT3B at these positions are different. This seemed to coincide with the observed differences in site occupancy and sequence composition for TbSTT3A and TbSTT3C compared with TbSTT3B. We showed that amino acids upstream of the glycosylation site play a dominant role in polypeptide specificity of TbSTT3. Furthermore, we were able to depict specific amino acid residues in polypeptide substrate sequences that increase the chances of such substrates being efficiently glycosylated by different TbSTT3 paralogues. Sequences enriched in aspartic acid at the Ϫ2 position are more likely to be efficiently glycosylated by TbSTT3A, whereas TbSTT3C preferred sequences enriched for aspartic acid at the Ϫ9 position. These conclusions are supported by the analysis of TbSTT3A and TbSTT3B specificity in ⌬stt3⌬alg9 cells complemented with TbSTT3B, TbSTT3C, TbSTT3B-1/3C, and TbSTT3B-2/3C were grown in light medium and mixed 1:1 with the wild-type reference strain grown in heavy medium, and membrane-derived peptides were prepared. Intensity ratios of glycosylated light to heavy peptides were normalized for expression differences in TbSTT3-expressing cells and wild-type cells. The resulting ratios represent the site occupancy for the TbSTT3expressing cells relative to the wild-type reference strain (reported in %). A, site occupancy values for TbSTT3B, TbSTT3C, TbSTT3B-1/3C, and TbSTT3B-2/3C were used for cluster analysis. In cluster analysis samples with high similarity are close whereas samples with low similarity are distant from each other. B, sequence composition analysis was performed where amino acids were grouped based on their polarity. Percentage change in respective ratio of each amino acid group between efficiently and poorly glycosylated sequences upstream of glycosylation sequon was calculated for each TbSTT3-expressing strain. C, two-sample logo analysis (30) was used to visualize differences between efficiently and poorly glycosylated sequences surrounding the glycosylation sites for each TbSTT3-expressing strain.

Substrate specificity of TbSTT3s
T. brucei cells directly (35). Based on a larger data set, a preference for acidic acceptor sequences of TbSTT3A was observed. Molecular modeling let these authors conclude that sequences in region 1 are responsible for this substrate specificity. Our experimental data support this conclusion. In addition, in vitro analysis of TbSTT3A substrate specificity revealed an activating effect of an acidic residue at position Ϫ2 (31). More experimental work and crystal structures of eukaryotic single subunit OSTs will provide insights into the molecular basis of this polypeptide substrate specificity.

Specificity for lipid-linked oligosaccharides
Lipid-linked oligosaccharide specificity was tested by plasmid shuffling with the respective ⌬stt3⌬alg-double mutant strains harboring both the URA3 marked pScSTT3 plasmid and the LEU2 marked TbSTT3 or ScSTT3 encoding plasmids. These cells were subjected as 1:10 serial dilutions of 10 6 cells/ml to minimal medium containing 1 mg/ml 5-FOA (44) and 1 M sorbitol. The presence of 5-FOA allowed the selection of cells that lost the Ura ϩ pScSTT3 plasmid. These Ura Ϫ cells only survived when the STT3 genes encoded on the LEU2-plasmid could complement the yeast STT3 deletion. Strains were incubated at 23°C for 7-15 days depending on cell growth.

Mass spectrometry analysis
MS analysis was performed by LC-ESI-MS/MS in PRM mode using a Q Exactive HF instrument (Thermo Fisher Scientific) coupled to ACQUITY UPLC system (Waters). Peptides were separated on HSS T3 column (78 m ϫ 150 mm, 1.8 m) packed with C18 material (Waters). Peptides were eluted using Substrate specificity of TbSTT3s the gradient of 2-35% solvent B (99% (v/v) ACN, 0.1% (v/v) formic acid) over 90 min at a flow rate of 0.3 l/min. All samples were analyzed using two PRM methods at an Orbitrap resolution of 30,000 or 60,000, based on scheduled inclusion lists containing the 175 and 128 target precursor ions, respectively, including retention time iRT standard peptides (Biognosys) (supplemental Table 4). The full scan event was collected using m/z 50 -1400 mass selection, an Orbitrap resolution of 60,000 (at m/z 400), target automatic gain control (AGC) value of 3 ϫ 10 6 , and a maximum injection time of 30 ms. The PRM scan events used an Orbitrap resolution of 30,000 or 60,000, maximum fill time of 30 or 110 ms, respectively, an AGC value of 1 ϫ 10 6 and with an isolation width of 2 m/z. Fragmentation was performed with a normalized collision energy of 28, and MS/MS scans were acquired with a starting mass of m/z 150. Scan windows were set to 10 min for each peptide in the final PRM method to ensure the measurement of 6 -10 points per LC peak per transition.

Data processing and analysis
Skyline software (version 2.6.0) with standard settings was used for data processing (47). Briefly, raw MS data files were imported, and the peaks were manually inspected and adjusted to ensure proper peak picking and peak integration. The resulting light to heavy intensity ratio (L/H) for glycopeptides modified with HexNAc was used to calculate the relative site occupancy for the given peptide/glycosylation site. The relative site occupancy was normalized for expression differences between the heavy (H)-labeled reference wild-type strain and the TbSTT3-expressing light (L) strains by dividing the L/H intensity ratio for the occupied glycopeptide by the median of L/H intensity ratios reported for all non-glycopeptides (i.e. not containing an NX(T/S) sequon) from the same protein as the glycopeptide. Cluster analysis was performed with Cluster 3.0 software (library version 1.50) (29,34) using the Spearman rank correlation to calculate similarity between site occupancy data for different TbSTT3s. Hierarchical clustering with the single linkage method was used to generate a dendrogram visualized with Java TreeView 1.1.6 software. Statistical tests used to analyze significant differences indicated in the respective figure legends were performed using t test in Microsoft Excel.
Author contributions-K. P., J. B., and M. A. conceived and coordinated the study and wrote the paper. R. G. assisted in conceiving the study. G. R. and M. P. contributed to growth assay analysis. All authors reviewed the results and approved the final version of the manuscript.