The Amino Acid at the X Position of an Asn- X -Ser Sequon Is an Important Determinant of N -Linked Core-glycosylation Efficiency*

N -Linked glycosylation is a common form of protein processing that can profoundly affect protein expression, structure, and function. N -Linked glycosylation generally occurs at the sequon Asn- X -Ser/Thr, where X is any amino acid except Pro. To assess the impact of the X amino acid on core glycosylation, rabies virus glycoprotein variants were generated by site-directed mutagen- esis with each of the 20 common amino acids substituted at the X position of an Asn- X -Ser sequon. The efficiency of core glycosylation at the sequon in each variant was quantified in a rabbit reticulocyte lysate cell-free trans- lation system supplemented with canine pancreas microsomes. The presence of Pro at the X position com- pletely blocked core glycosylation, whereas Trp, Asp, Glu, and Leu were associated with inefficient core gly- cosylation. The other variants were more efficiently glycosylated, and several were fully glycosylated. These findings demonstrate that the X amino acid is an important determinant of N -linked core-glycosylation efficiency. One of the most common types of protein modification is N -linked glycosylation, in which oligosaccharides are added to specific Asn residues (1, 2). N -Linked glycosylation plays a critical role in the expression of most cell-surface and secreted proteins and is often required for protein stability, antigenicity, and biological (1, 3–6). The effects acrylamide gel electrophoresis and autoradiography as described (8). Densitometric analysis of gel autoradiographs exposed in the linear range was performed using a Sci-Scan 5000 system (U. S. Biochemical Corp.).

We have used rabies virus glycoprotein (RGP) 1 as a model system to study the regulation of N-linked core glycosylation (8,29,30). Using a rabbit reticulocyte lysate cell-free translation system supplemented with canine pancreas microsomes, we can examine the effects of specific amino acid substitutions on the core-glycosylation efficiency (CGE) of individual sequons in RGP (29). Our results in the cell-free system are similar to those obtained when RGP variants are expressed in transfected Chinese hamster ovary cells (8,29). In this report we examine the impact of the X amino acid on CGE. To do this we generated a set of RGP variants by site-directed mutagenesis in which each of the 20 common amino acids was substituted at the X position of the sequon Asn 37 -Leu 38 -Ser 39 . We then quantified the CGE at the sequon in each variant using the cell-free system described. Our results demonstrate that the amino acid at the X position is an important determinant of CGE.

Construction of a Cloning Vector for Cassette
Mutagenesis-A cloning vector for cassette mutagenesis was generated from the plasmid that encodes RGP(1--) by the introduction of unique EcoRV and SacI restriction sites on either side of the sequon (Fig. 2). This was accomplished using the polymerase chain reaction-based method, splicing by overlap extension (31), essentially as described (8). Briefly, polymerase chain reaction amplification of RGP cDNA was performed with two separate primer pairs to generate overlapping cDNA fragments containing EcoRV and SacI restriction sites. The mutagenic primers used for these amplification reactions were: 5Ј-ggatatcactgcagagagct-cAAAGTTGGATACATCTTAGC-3Ј (sense primer) and 3Ј-GGTTTGT-TAAACCATCACCTCCTatagtgacgtctctcgag-5Ј (antisense primer). The regions corresponding to the RGP sequence are shown in capital letters, and the EcoRV and SacI sites in the primer tails are underlined. Those cDNA fragments were combined in a third polymerase chain reaction. The resulting cDNA fragment was digested with HindIII and XhoI and ligated into the corresponding restriction sites in pRGP(1--) to generate the plasmid pRGP(1--)ES. In that plasmid, the cDNA encoding amino acids 32-46 of RGP(1--) is replaced with a 20-base pair sequence containing the EcoRV and SacI restriction sites (Fig. 2, A and  B). The polymerase chain reaction-derived HindIII-XhoI region of pRGP(1--)ES was sequenced to confirm successful mutagenesis. This plasmid was digested with EcoRV and SacI restriction enzymes and gel-purified to generate a vector for oligonucleotide cassette mutagenesis at sequon 1 (Fig. 2C).
Construction of Plasmids Encoding Variants of RGP(1--) with Amino Acid Substitutions at the X Position of Sequon 1-A cassette mutagenesis approach was used to generate plasmids encoding variants of RGP(1--) with amino acid substitutions at the X position of sequon 1. For this construction, sense and antisense oligonucleotides were synthesized, which were complementary to one another except at the nucleotide positions corresponding to the codon for amino acid 38 (Fig. 2D). To introduce the full spectrum of amino acid substitutions at that position, each oligonucleotide was completely degenerate at all * This work was supported in part by funds from the National Blood Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18  three of the nucleotide positions. Those oligonucleotides were combined in 1 ϫ ligation buffer (30 mM Tris-HCl, pH 7.8, 10 mM MgCl 2 , 10 mM dithiothreitol, 5 mM ATP) at a final concentration of 100 M each and hybridized to one another by heating to 68°C for 5 min and cooling slowly to room temperature. The resulting oligonucleotide duplex was ligated into the RGP(1--)ES cloning vector at the EcoRV and SacI sites. Each plasmid resulting from this ligation encodes a protein identical to RGP(1--) except for the amino acid at the X position of sequon 1 (Fig. 2E). Ligated plasmids were transformed into competent Escherichia coli and isolated by alkaline lysis. DNA sequencing of each plasmid between the EcoRV and SacI sites was performed to identify the specific amino acid substitution in each recombinant plasmid. Plasmids containing 18 of the 20 common amino acids at position 38 were isolated using this approach. The remaining two plasmids, encoding Asp and Glu at position 38, were isolated using a similar approach using an oligonucleotide duplex with the sequence GA(A/T) at the codon corresponding to amino acid 38 (sense sequence).
In Vitro Transcription and Expression in the Cell-free System-RNA encoding RGP variants was generated by in vitro transcription with T7 RNA polymerase as described previously (8,29). In vitro translation was performed using a rabbit reticulocyte lysate system supplemented with [ 35 S]methionine and canine pancreas microsomes (Boehringer Mannheim) as described (29), except that the amount of microsomes was reduced to 1 l/15-l translation reaction. Following translation the reactions were incubated on ice for 20 min in the presence of 200 g/ml proteinase K and then for 5 min in the presence of 0.5 mg/ml phenylmethylsulfonyl fluoride. Samples were analyzed by SDS-polyacrylamide gel electrophoresis and autoradiography as described (8). Densitometric analysis of gel autoradiographs exposed in the linear range was performed using a Sci-Scan 5000 system (U. S. Biochemical Corp.).

Construction of Plasmids Encoding RGP Variants with Amino Acid Substitutions at Position 38 -
The variants used in this study were derived from RGP. Wild type RGP is a 505amino acid type 1 transmembrane protein with a 439-amino acid extracellular domain and a 44-amino acid cytoplasmic tail ( Fig. 1, RGP(WT)) (32). The extracellular domain has three sequons for N-linked glycosylation at Asn 37 (sequon 1), Asn 247 (sequon 2), and Asn 319 (sequon 3) (32). To study the core glycosylation of individual sequons in RGP, sequons 2 and 3 in RGP(WT) were deleted by site-directed mutagenesis by substituting Thr at each Asn-X-Thr sequon with Ala (8). The resulting variant, RGP(1--), contains a single glycosylation sequon with the sequence Asn 37 -Leu 38 -Ser 39 (Fig. 1). Sequon 1 was selected for these studies since core glycosylation is normally inefficient at this site (8,29,30). This is advantageous because the effects of amino acid substitutions near Asn 37 can be detected whether they increase or decrease CGE.
To simplify construction of RGP variants with amino acid substitutions at the X position of sequon 1, pRGP(1--) was further modified to generate a cloning vector for cassette mutagenesis. This involved the introduction of unique EcoRV and SacI restriction sites on either side of sequon 1 ( Fig. 2 and "Materials and Methods"). An oligonucleotide cassette mutagenesis approach was used to generate a set of plasmids encoding variants of RGP(1--) with each of the 20 common amino acids at the X position of sequon 1 (collectively referred to as RGP(1--)X38 variants) ( Fig. 1 and "Materials and Methods"). A variant with Leu at position 38 (corresponding to the sequence normally present in RGP) was among the variants isolated using that approach. DNA sequencing was performed to confirm that each RGP(1--)X38 plasmid encoded a protein identical to RGP(1--) except for the amino acid at position 38.
Expression of RGP Variants in a Cell-free System-Each RGP variant was expressed in a rabbit reticulocyte cell-free translation system supplemented with [ 35 S]methionine and canine pancreas microsomes (33)(34)(35). The microsomes cleave the signal sequence of these proteins and add core oligosaccharides (8,30). This cell-free system provides a highly reproducible method for quantifying the co-translational CGE of specific sequons in recombinant proteins (30). Analysis of core glycosylation in this system is simpler than in intact cells because oligosaccharide processing is limited and protein variants can be analyzed without immunoprecipitation. Also, unlike analysis in intact cells, alterations in protein stability or expression resulting from amino acid substitutions are uncommon. Our previous studies have demonstrated that the core-glycosylation efficiencies of RGP variants in this system are similar to those observed when the same proteins are expressed in transfected Chinese hamster ovary cells (8,29).
RGP variants with each of the 20 common amino acids at the X position of sequon 1 were translated in parallel in the cellfree system described. Translation of the variants in the absence of microsomes confirmed that the amino acid substitutions at the X position did not alter the electrophoretic migration of the proteins (data not shown). The variants were then translated in the presence of microsomes to examine the effect of the X amino acid on core glycosylation. The amount of microsomes added to each reaction was optimized to maximize incorporation of RGP into microsomes while maintaining ade- quate translational activity. Under these conditions each translation reaction contains a small amount of protein that is not targeted to microsomes (8,30). These untargeted proteins are not glycosylated, retain the 19-amino acid N-terminal signal sequence, and migrate between the nonglycosylated and glycosylated forms of RGP synthesized on microsomes in our gel system (8,30). Because these untargeted proteins can interfere with the quantification of CGE, they were removed from translation reactions by proteinase K digestion prior to gel analysis (36). The extracellular domain of RGP variants is translocated into the microsomal lumen during protein synthesis where it is protected from proteinase K digestion. In contrast, the 44-amino acid cytoplasmic tail (Fig. 1) remains outside of the microsome and is removed by this treatment. Removal of the cytoplasmic tail produces a small shift in the electrophoretic mobility of RGP proteins (data not shown) but does not interfere with the quantification of CGE. Following proteinase K treatment, radiolabeled translation products were analyzed directly (without immunoprecipitation) by gel electrophoresis and autoradiography. A gel autoradiograph showing the translation products of all 20 RGP variants is shown in Fig. 3. The positions of the nonglycosylated protein (N) and the protein glycosylated with a single core oligosaccharide (G) are shown. The total amount of protein produced in each translation can vary from tube to tube reflecting differences in the amount of RNA in each sample. For this reason glycosylation efficiency is determined by comparing the amounts of glycosylated and nonglycosylated protein produced in a single reaction for each variant.
To quantify the CGE at the sequon in each RGP variant, the variants were expressed in the cell-free system in three independent experiments and autoradiographs from each experiment were analyzed by densitometric scanning. The densities of bands representing glycosylated (G) and nonglycosylated (N) proteins were quantified for each variant, and the CGE was calculated as follows: G/(N ϩ G) ϫ 100% (30). The mean CGE Ϯ 1 S.D. was then determined for each variant (Fig. 4). This analysis revealed that the CGE observed for each RGP variant was highly reproducible in this system.
The experiments presented demonstrate that the amino acid at the X position of an Asn-X-Ser sequon can have a profound effect on CGE. These studies confirm that the presence of Pro at the X position completely blocks glycosylation (15,20,27,37). Also, consistent with our findings from earlier studies (8,29,30), these data demonstrate that the sequon Asn 37 -Leu 38 -Ser 39 is glycosylated at an intermediate level (mean CGE ϭ 43%). Remarkably, we find that substitution of Leu 38 with Trp, Asp, or Glu dramatically reduces the efficiency of core glycosylation (mean CGE ϭ 5, 19, and 24%, respectively), whereas substitution of Leu 38 with other amino acids increases CGE to varying degrees (mean CGE ranges from Phe ϭ 70% to Ser ϭ 97%). These results provide the first direct demonstration that amino acids at the X position of an Asn-X-Ser/Thr sequon can influence the efficiency of co-translational core glycosylation. DISCUSSION This report extends previous studies by providing the first comprehensive direct analysis of the impact of the X amino acids on CGE. We demonstrate that the CGE at an Asn-X-Ser sequon in RGP ranges from no glycosylation to full glycosylation, depending on which amino acid is present at the X position. This demonstrates that the X amino acid is an important determinant of CGE.
Because the structure and enzymatic mechanism of oligosaccharyltransferase are not well characterized, currently it is not possible to determine the mechanism by which individual amino acids influence core glycosylation. Several studies suggest that the spatial relationship of the Asn and Ser/Thr residues in a sequon may be critical for oligosaccharide transfer (37)(38)(39)(40)(41)(42)(43). Large hydrophobic amino acids (e.g. Trp, Leu, Phe, and Tyr) may inhibit core glycosylation by producing an unfavorable local protein conformation. In contrast, Gly, which is small and does not constrain protein conformation, is associated with efficient core glycosylation. Other factors also appear to be important. The negatively charged amino acids (Asp and Glu) inhibit glycosylation, whereas the positively charged amino acids (Lys, Arg, and His) are favorable. The charge of the X amino acid may influence the ability of oligosaccharyltransferase to bind simultaneously to the sequon and the negatively charged dolichol-PP-oligosaccharide precursor (41,44). Interestingly, the X amino acids with hydroxy groups (Ser and Thr) and Cys are associated with highly efficient core glycosylation, whereas those with amide groups (Asn and Gln) are associated with suboptimal core glycosylation. Further characterization of FIG. 4. CGE of RGP variants with amino acid substitutions at the X position of sequon 1. The 20 RGP variants generated by cassette mutagenesis were analyzed in the cell-free system as described for Fig. 3 in three independent experiments. Gel autoradiographs from each experiment were exposed in the linear range and analyzed by densitometric scanning. The CGE of each variant was calculated as described in the text for each experiment, and the mean CGE Ϯ 1 S.D. from the three experiments was determined (shown). oligosaccharyltransferase may help clarify the role that individual amino acids play in oligosaccharide addition.
The general nature of these findings is supported by studies comparing the X amino acid in glycosylated and nonglycosylated sequons in native glycoproteins. Those studies reveal that Cys, Trp (15), Asp (19), and Glu (27) are uncommon at the X position in core-glycosylated sequons. Studies of synthetic peptides in membrane preparations also find an inhibitory effect of Asp at the X position (41,44). The current report provides direct confirmation that Trp, Asp, and Glu at the X position inhibit core glycosylation. Interestingly, the sequon Asn 37 -Cys 38 -Ser 39 in RGP is fully core-glycosylated. The lack of core glycosylation at Asn-Cys-Ser/Thr sequons in other proteins may reflect the potential of certain Cys residues to participate in disulfide bonding (40,45).
It is important to note that factors other than the X amino acid also influence core glycosylation. For example, the presence of Pro immediately following a sequon can inhibit core glycosylation (20,37), and the presence of Thr rather than Ser at the hydroxy position favors efficient glycosylation (15,29,42). Our previous studies demonstrate that the inhibitory effect of Leu in the sequon Asn 37 -Leu 38 -Ser 39 in RGP can be overcome by replacing Ser 39 with Thr (29). Studies are currently under way to compare the impact of other X amino acids in Asn-X-Thr versus Asn-X-Ser sequons. Core glycosylation can also be influenced by factors that influence the accessibility of a sequon to oligosaccharyltransferase, such as the position of the sequon in a protein (14,20,30,46,47) and the folding of the nascent protein chain (45,48). Core glycosylation is clearly a complex process influenced by a variety of factors. Further characterization of the protein signals that regulate core glycosylation will enhance our understanding of glycoprotein expression and facilitate the design of novel recombinant glycoproteins for research and clinical applications.