A Necessary and Sufficient Determinant for Protein-selective Glycosylation in Vivo*

A limited number of glycoproteins including luteinizing hormone and carbonic anhydrase-VI (CA6) bear N-linked oligosaccharides that are modified with β1,4-linked N-acetylgalactosamine (GalNAc). The selective addition of GalNAc to these glycoproteins requires that the β1,4-N-acetylgalactosaminyltransferase (βGT) recognize both the oligosaccharide acceptor and a peptide recognition determinant on the substrate glycoprotein. We report here that two recently cloned βGTs, βGT3 and βGT4, that are able to transfer GalNAc to GlcNAc in β1,4-linkage display the necessary glycoprotein specificity in vivo. Both βGTs transfer GalNAc to N-linked oligosaccharides on the luteinizing hormone α subunit and CA6 but not to those on transferrin (Trf). A single peptide recognition determinant encoded in the carboxyl-terminal 19-amino acid sequence of bovine CA6 mediates transfer of GalNAc to each of its two N-linked oligosaccharides. The addition of this 19-amino acid sequence to the carboxyl terminus of Trf confers full acceptor activity onto Trf for both βGT3 and βGT4 in vivo. The complete 19-amino acid sequence is required for optimal GalNAc addition in vivo, indicating that the peptide sequence is both necessary and sufficient for recognition by βGT3 and βGT4.

Unique carbohydrate structures found on a number of glycoproteins are believed to contribute to biological functions ranging from transport of lysosomal enzymes to lysosomes (1,2), to regulation of circulatory half-life of hormones (3,4), to cellular recognition (5,6). The addition of unique carbohydrate structures to select glycoproteins requires a "recognition determinant," a feature encoded in the peptide portion of these glycoproteins, to be recognized by one or more of the transferases responsible for their synthesis. In some cases key features of the peptide for such recognition determinants have been identified (7)(8)(9)(10)(11). However, understanding the molecular basis for recognition ultimately requires that a recognition determinant that does not itself comprise the site of modification can be added to a structurally unrelated glycoprotein and confer selective modification of this unrelated glycoprotein.
In in vitro studies we demonstrated that one or more ␤GTs present in the pituitary (23) and other tissues (24) are proteinspecific, adding GalNAc to oligosaccharide acceptors on proteins that contain a peptide recognition determinant. The catalytic efficiency for this addition is 500-fold higher than for GalNAc transfer to the same oligosaccharide acceptors on proteins lacking the determinant in their peptide sequence (23,25). We reported that the basic amino acids within the sequence PLRSKK that is present in the pituitary glycoprotein hormone ␣ subunit are critical elements of the peptide recognition determinant conferring transfer of GalNAc to N-linked oligosaccharides in vitro (26,27). Two recently cloned ␤GTs, ␤GT3 (28,29), and ␤GT4 (30) that are closely related transfer GalNAc in ␤1,4-linkage to terminal GlcNAc. They are expressed in tissues that have previously been determined to express protein-specific GalNAc transferase activity, but they have not yet themselves been shown to be protein-specific either in vitro or in vivo.
We now report the development of unique chimeric glycoprotein constructs that allow us to examine the efficiency of GalNAc addition to N-linked carbohydrates on these glycoproteins in the complex milieu of the Golgi. The chimeric proteins consist of either Renilla (31)(32)(33) or Gaussia (34,35) luciferase and glycoproteins that do or do not bear N-linked oligosaccharides modified with ␤1,4-linked GalNAc. By expressing these chimeric glycoproteins in cells that express either ␤GT3 or ␤GT4, we were able to quantitatively compare their extent of modification with ␤1,4-linked GalNAc in vivo. Using this approach we show that ␤GT3 and ␤GT4 are indeed proteinspecific, recognizing a peptide determinant in the protein substrate to allow efficient and protein-selective transfer of Gal-NAc to their oligosaccharides. We have identified a 19-amino acid sequence that is both necessary and sufficient to mediate protein-specific GalNAc addition by ␤GT3 and ␤GT4. The addition of this sequence to a glycoprotein that is not recognized by either ␤GT3 or ␤GT4 converts the glycoprotein into one that is selectively modified with ␤1,4-linked GalNAc in vivo.

MATERIALS AND METHODS
Luciferase Constructs-Construction of the Renilla luciferase (RLuc) (32,33) chimeras ␣-RLuc, transferrin-RLuc (Trf-RLuc), RLuc-␣, and RLuc-CA6 and the Gaussia luciferase (GLuc) (34,35) chimeras GLuc-␣, GLuc-Trf, and GLuc-CA6 will be described in detail elsewhere. Each expression plasmid was made from pcDNA3.1 with a cytomegalovirus promoter to drive expression (Invitrogen). The chimeric glycoproteins ␣-RLuc and Trf-RLuc consist of the glycoprotein hormone ␣ subunit or Trf followed by RLuc epitope-tagged at its carboxyl terminus with V5His. RLuc-CA6 and RLuc-␣ were constructed using the pSec-Tag expression plasmid (Invitrogen) and consist of the Ig⌲ leader sequence followed by RLuc and CA6 or ␣ epitope-tagged at their carboxyl termini with MycHis. pCMV-GLuc was obtained from New England Biolabs. The GLuc constructs in each case consist of GLuc followed by ␣, Trf, or CA6 and the epitope MycHis. Because GLuc is a secreted protein, no additional leader is required, and the leader sequences of ␣, Trf, and CA6 were omitted from the constructs. All of the cDNAs were amplified using Klentaq Long and Accurate DNA polymerase (KTLA polymerase) (36,37). The sequence encoding additional amino acids from CA6 was added to Trf using KTLA polymerase and ribocloning as described (38). A schematic of the key luciferase chimeras generated including the locations of the glycosylation sites and the amino acid sequences that constitute the recognition sequences is shown in Fig. 1.
Biotinylated Wisteria Floribundia Agglutinin (WFA)-WFA (1 mg) (Sigma) was dissolved in 1 ml of phosphate-buffered saline (PBS) and dialyzed overnight against 0.1 M NA 2 CO 3 at 4°C. Six hundred g of aminohexanoyl-biotin-N-hydroxysuc-cinimide dissolved in 120 l of dimethyl sulfoxide was added to the dialyzed WFA. The reaction was dialyzed against 0.1 M Na 2 CO 3 4°C overnight and then against PBS. The biotinylated WFA was stored in aliquots at Ϫ20°C until use.
Assay for GalNAc Addition to Luciferase Chimeras-Microlite-2 96-well plates were coated with Streptavidin (Roche Applied Science) by adding 100 l of 25 mM sodium carbonate buffer, pH 8.5, containing 1 g of streptavidin and incubating at 37°C for 3 h. The plates were then washed six times with PBS containing 0.1% bovine serum albumin (BSA) using a Bio-Rad Immunowasher. Biotinylated WFA (0.25 g in 100 l of PBS/ well) was incubated with the immobilized streptavidin overnight at 4°C. Each well was washed six times with 300 l of cold PBS, 0.1% BSA and then blocked by incubating for 30 min at 25°C with 300 l of PBS, 5% BSA. After washing six times with PBS, 0.1% BSA, aliquots of luciferase chimeras containing 50,000 light units (LU) of luciferase activity in 100 l of PBS, 0.1% BSA were incubated with the biotinylated WFA-coated wells for 4 h at 4°C in the presence or absence of 50 mM Gal-NAc. The wells were washed six times with PBS, 0.1% BSA, and 20 l of PBS was added to each well. The amount of bound luciferase activity was measured using a Wallac Victor2 luminometer by injecting 50 l of luciferase assay buffer containing freshly diluted coelenterazine (New England Biolabs) into each individual well and determining the LU produced over a period of 10 s. The GalNAc-specific LU bound were calculated by subtracting the LU bound in presence of 50 mM GalNAc from the LU bound in the absence of GalNAc. The background in each case was less than 250 LU and was subtracted from the analyses shown. Chimeric glycoproteins consisting of ␣ followed by RLuc or GLuc followed by the glycoprotein hormone ␣ subunit, Gluc-␣, carbonic anhydrase-6, Gluc-CA6, or transferrin, Gluc-Trf were prepared in the expression plasmid pcDNA3.1 (Invitrogen) using ribocloning. The chimeric glycoproteins were epitope-tagged with V5His at their carboxyl termini. The chimeric glycoprotein constructs including the location of the N-glycosylation sites and the proposed recognition determinants are illustrated schematically. The sequence of the proposed recognition determinant for each construct and the variants that were generated are shown below the schematic. In the case of the ␣ subunit, the PLRSKK was mutated to PLESEE. The amino acid sequence of the carboxyl-terminal 19 amino acids of CA6(Wt) is shown. The 6 amino acids that were deleted in CA6(Mu1) are indicated by the strike through. The 19 amino acids from the carboxyl terminus of CA6(Wt) was added to the carboxyl terminus of Trf using ribocloning to generate the Trf-CA6 chimeric glycoproteins shown. Residues that were deleted are indicated by the strike through for each construct.

RESULTS
Protein-selective Addition of GalNAc to CA6-Bovine CA6, a secreted form of carbonic anhydrase, bears N-linked oligosaccharides that are modified with ␤1,4-linked GalNAc when it is expressed by salivary or lachrymal glands (16). We have previously shown that HEK 293T cells endogenously express protein-specific ␤GTs that are able to modify the N-linked oligosaccharides on LH and other glycoproteins with ␤1,4-linked GalNAc (20, 24, 39 -41). When native CA6(Wt) epitopetagged with V5His at its carboxyl terminus is expressed in HEK 293T cells, at least 51% of the secreted CA6(Wt) is bound by immobilized WFA, a lectin that binds oligosaccharides bearing terminal ␤1,4-linked GalNAc (42, 43) (Fig. 2). Even though the ␤1,4-linked GalNAc added to N-linked oligosaccharides on glycoproteins expressed by HEK 293T cells can be further modified with either SO 4 or ␣2,6-linked sialic acid, little CA6(Wt) remains in the unbound fraction. The major fraction of bound CA6(Wt) is selectively eluted with GalNAc (Fig. 2, lanes  E1-E4). The CA6(Wt) eluted by warming in SDS-PAGE loading buffer reflects the inefficiency of elution with GalNAc because CA6(Wt) expressed in CHO cells does not contain any GalNAc and is not retained by immobilized WFA (not shown). Thus, the N-linked oligosaccharides on CA6(Wt) expressed in HEK 293T cells are modified with ␤1,4-linked GalNAc, suggesting that, like the glycoprotein hormone ␣ subunit, CA6(Wt) has a recognition determinant that results in its selective modification with GalNAc when expressed by HEK 293T cells.
The 19-amino acid sequence located at the carboxyl terminus of bovine CA6 contains 8 basic amino acids (see CA6(Wt) in Fig. 1). Modeling programs such as that of Chou and Fasman (44) predict that this sequence is likely to form an ␣ helix. This region of CA6 may therefore resemble the PLRSKK sequence of the glycoprotein hormone ␣ subunit that forms an ␣ helix in the dimeric form of the hormone (45,46) and is essential for recognition by pituitary ␤GT in in vitro assays (26). The sequence KRKKEK was therefore deleted from the carboxyl-terminal 19-amino acid sequence of CA6 (see CA6(Mu1) in Fig. 1). In contrast to CA6(Wt), less than 13% of the CA6(Mu1) produced by HEK 293T cells is bound by immobilized WFA and eluted from the WFA by GalNAc. Furthermore, little CA6(Mu1) is subsequently eluted by SDS-PAGE loading buffer (see Fig. 2). CA6 therefore contains a recognition determinant that is utilized by the ␤GT(s) expressed by HEK 293T cells, and the KRKKEK sequence is a critical element of this recognition sequence.
Luciferase Chimeras Replicate the Properties of Native Glycoproteins in Vivo-Western blot analysis for quantitation of the amount of a glycoprotein that is modified with terminal ␤1,4linked GalNAc when expressed in different cells is cumbersome. We therefore prepared chimeric glycoproteins that consist of either RLuc or GLuc and the glycoprotein of interest (Fig.  1). The RLuc chimeras consist of the glycoprotein followed by RLuc at the carboxyl terminus followed by the V5His epitope. ␣-RLuc, Trf-RLuc, and CA6-RLuc are efficiently secreted into the culture medium following transfection of cultured HEK 293T and CHO/Flp-In cells (not shown). The extent of GalNAc addition to each of these luciferase chimeras can be compared by immobilizing biotinylated WFA onto streptavidin coated 96-well plates, capturing GalNAc-containing chimeras onto the immobilized WFA, removing unbound chimera, and quantitating the amount of luciferase activity that has been bound using coelenterazine.
␣-RLuc and Trf-RLuc Were Transfected into CHO/Flp-In cells expressing a single copy of either ␤GT3 (Fig. 3A) or ␤GT4 (Fig. 3B), the amount of luciferase activity bound, expressed as LU bound per 50,000 LU of input, was 6-fold greater for ␣-RLuc than Trf-RLuc for both ␤GT3/CHO and ␤GT4/CHO cells. The KRKKEK sequence near the carboxyl terminus of CA6 is required for optimal GalNAc addition in vivo. CA6(Wt) and CA6(Mu1) epitope-tagged at their carboxyl termini with V5His were expressed in HEK 293T cells. Culture medium containing equal amounts of CA6(Wt) and CA6(Mu1) (M) was incubated with WFA immobilized on agarose. After removing medium containing unbound CA6 (UB), the WFA-agarose was washed with equal volumes of buffer (W1). Bound CA6 was eluted with four successive incubations with buffer containing 50 mM GalNAc (E1-E4). Any CA6 that remained bound to the WFA-agarose was then recovered by heating the WFA-agarose in SDS-PAGE loading buffer (E5). The amount of CA6(Wt) and CA6(Mu1) in equal aliquots of M and UB fractions and larger aliquots of W1 and E1-E5 was determined following SDS-PAGE using Pre-Cast NuPAGEா gels (Invitrogen) and electrophoretic transfer to nitrocellulose. The amount of epitope-tagged protein present was determined by Western blot analysis using mouse anti-V5 antibody (Invitrogen) and Alexa Fluor 680-labeled goat anti-mouse IgG (Molecular Probes). IR-induced fluorescence was used for detection. CA6(Wt) synthesized by CHO cells that do not express ␤GT activity is not bound by WFA-agarose (not shown), indicating that binding is GalNAcdependent and that the inability to elute all of the bound CA6(Wt) with Gal-NAc reflects the inefficiency of elution rather than nonspecific binding.

Protein-selective Glycosylation Determinant
Mutation of the glycosylation site (NVT to QVT) immediately adjacent to the PLRSKK sequence that we previously determined is essential for recognition using in vitro assays (26) completely abolishes GalNAc addition to ␣(QVT)-RLuc. In contrast, mutation of the glycosylation site more distant from the PLRSKK sequence (NHT to QHT) reduces but does not abolish GalNAc addition to ␣(QHT)-RLuc. Thus, ␣-RLuc is efficiently modified by both ␤GT3 and ␤GT4 in vivo, whereas Trf-RLuc is not efficiently modified by either ␤GT. The properties of ␣-RLuc and Trf-RLuc agree with our in vitro studies showing the free ␣ subunit contains a peptide sequence that is recognized by ␤GT and as a result is modified with a catalytic efficiency that is 500-fold greater than for glycoproteins that do not contain a recognition determinant for the ␤GT (23,(25)(26)(27). In addition, the oligosaccharide added to NVT appears to be the predominant target for GalNAc addition in the ␣-RLuc chimera.
The CA6-RLuc chimera expressed by either HEK 293T cells, ␤GT3/CHO cells, or ␤GT4/CHO cells is not bound by immobilized WFA, indicating GalNAc is not added to this construct (not shown). Because the KRKKEK sequence that is essential for efficient GalNAc addition to CA6(Wt) is located at the carboxyl terminus, we prepared chimeras in which the RLuc preceded the ␣ and CA6 sequences. RLuc-␣, like ␣-RLuc, is secreted into the medium and is modified with GalNAc when expressed in either ␤GT3/CHO or ␤GT4/CHO cells, indicating the location of the RLuc is not critical for recognition of the ␣ subunit by either ␤GT3 or ␤GT4 (compare Fig. 3, A and B, with  Fig. 4, A and B). In contrast, RLuc-CA6(Wt), unlike CA6(Wt)-RLuc, is efficiently modified with GalNAc when it is expressed in either ␤GT3/CHO or ␤GT4/CHO cells (Fig. 4, A and B). Therefore, the presence of the MycHis epitope at the carboxyl terminus of CA6(Wt) does not interfere with recognition by either ␤GT3 or ␤GT4, whereas the presence of the much larger luciferase sequence at the carboxyl terminus of CA6 is sufficient to prevent recognition. As was seen with CA6(Wt), deletion of the KRKKEK sequence from RLuc-CA6(Wt) to generate RLuc-CA6(Mu1) markedly reduces GalNAc addition by ␤GT3 (Fig. 4A) and by ␤GT4 (Fig. 4B). Mutation of each individual Asn glycosylation site alone reduces but does not abolish Gal-NAc addition to either RLuc-CA6(QLT) or RLuc-CA6(QET). Mutation of both sites completely abolishes GalNAc addition to RLuc-CA6(QLT/QET). The results obtained with RLuc-CA6 indicate that ␤GT3 and ␤GT4 utilize the same recognition determinant in vivo and are able to transfer GalNAc to Asnlinked oligosaccharides at two different glycosylation sites on CA6.

DISCUSSION
Our studies using RLuc and GLuc glycoprotein chimeras establish that ␤GT3 and ␤GT4 are protein-selective glycosyltransferases in vivo. Both ␤GTs selectively transfer GalNAc to N-linked oligosaccharides on the pituitary glycoprotein hormone ␣ subunit and on CA6 but not to the identical oligosaccharide acceptors on glycoproteins such as Trf. Adding the 19-amino acid sequence from the carboxyl terminus of CA6 to the carboxyl terminus of Trf converts GLuc-Trf into a glycoprotein that is selectively modified by ␤GT3 and by ␤GT4. This outcome is particularly remarkable because Trf has no structural relationship to either the ␣ subunit or to CA6. The crystal structure of human serum transferrin has recently been solved (47). The two N-linked glycosylation sites at Asn 413 and Asn 611 are located on adjacent loops of peptide and are both in close proximity to the carboxyl terminus of Trf. The addition of the recognition sequence to Trf may serve to mediate transfer of GalNAc to both N-linked oligosaccharides as it does in CA6. Because this 19-amino acid sequence is sufficient to confer recognition onto RLuc-Trf in vivo by both ␤GT3 and ␤GT4, it is likely that the same key residues serve to mediate recognition by both ␤GTs. Furthermore, because peptide recognition and Gal-NAc transfer to terminal GlcNAc represent distinct interactions, the addition of this sequence to virtually any glycoprotein will likely confer recognition by ␤GT3 and ␤GT4 and selective modification of accessible N-linked oligosaccharides with ␤1,4linked GalNAc in vivo.
The N-linked oligosaccharide acceptor that is modified with ␤1,4-linked GalNAc on glycoproteins such as the glycoprotein hormone LH and CA6 is identical in structure to the N-linked oligosaccharide acceptors on glycoproteins that are not modified with GalNAc but are instead modified with ␤1,4-linked Gal. This led us to hypothesize the existence of a peptide recognition determinant located on glycoproteins that could be selectively modified with ␤1,4-linked GalNAc-containing structures. We confirmed this by comparing glycoproteins that do and do not contain a recognition determinant as acceptors for GalNAc addition in in vitro assays using solubilized enzymes. Now we have definitively demonstrated that in vivo the same recognition determinant is utilized by cells expressing ␤GTs endogenously and that the presence of the recognition determinant is necessary for efficient modification with ␤1,4linked GalNAc.
Using in vitro analyses, we identified the basic amino acids in the sequence PLRSKK that is found in the ␣ subunit as being essential for recognition by the ␤GT activity present in the pituitary. The lack of GalNAc transfer to Gluc-␣(PLESEE) by either ␤GT3 or ␤GT4 now provides the in vivo demonstration that the basic amino acids in the PLRSKK sequence are indeed critical elements of the recognition determinant. Similarly, the marked reduction in GalNAc transfer by ␤GT3 and ␤GT4 to GLuc-CA6(Mu1) that has had the KRKKEK sequence deleted indicates that basic amino acids are critical elements of the recognition determinant in CA6. The results obtained with Gluc-␣(PLESEE) and GLuc-CA6(Mu1) show that the basic amino acids within these sequences are necessary for recognition in vivo.
The ability to confer recognition by both ␤GT3 and ␤GT4 onto GLuc-Trf by adding the carboxyl-terminal 19 amino acids from CA6 indicates that the key elements of this sequence that are recognized by these closely related transferases are similar if not identical. The fact that a single recognition determinant can serve to mediate GalNAc addition to two distinct N-linked oligosaccharides on CA6 and can confer recognition onto an unrelated glycoprotein, i.e. Trf, indicates that the peptide recognition determinant does not interact directly with the oligosaccharide acceptor. Rather the peptide recognition determinant and the oligosaccharide acceptor interact with ␤GT3 and ␤GT4 independently.
␤GT3 and ␤GT4 are large glycosyltransferases consisting of 987 and 1035 amino acids, respectively. The carboxyl-terminal regions of ␤GT3 and ␤GT4, consisting of 226 and 231 amino acids, respectively, are 68% identical and contain sequences that are characteristic of ␤1,4-glycosyltransferases (28,30). This region is presumed to encode the actual catalytic activity; however, neither the region that mediates peptide recognition nor the function of the other regions of these transferases is known. Because the 19-amino acid sequence from the carboxyl terminus of CA6 is sufficient to mediate recognition in vivo and in vitro, it is now possible to devise strategies that will allow us to identify the key features of this determinant that are required for optimal recognition. It should also be possible to locate the regions of ␤GT3 and ␤GT4 that bind the recognition determinant.
The synthesis of glycoproteins bearing N-linked structures containing ␤1,4-linked GalNAc is a highly regulated process in vivo, requiring the expression of both the appropriate transferases and glycoproteins that have a recognition determinant. In the case of the glycoprotein hormone LH, the structure of the N-linked sugars plays a critical role in determining in vivo clearance rates and potency (14,15). Structures with ␤-linked Gal-NAc are also present on other glycoproteins such as the low density lipoprotein receptor homolog SorLA/LR11 (20) that has been implicated as a risk factor for development of Alzheimer disease (51) and tenascin-R, an extracellular matrix component in the central nervous system (22). The highly regulated and selective addition of GalNAc generates unique structures that may play a number of different roles in vivo. Defining the key features of this 19-amino acid recognition determinant will allow us to identify additional glycoproteins that may also bear this unique modification. The addition of ␤1,4-linked GalNAc is among the best characterized forms of protein-selective carbohydrate modification. Understanding how the selective addition of GalNAc is determined will provide a useful paradigm for understanding how regulation of the synthesis of specific carbohydrate structures contributes to biologic function in vivo.