Site-specific Core 1 O-Glycosylation Pattern of the Porcine Submaxillary Gland Mucin Tandem Repeat

The sequence-specific O-linked core 1 ([R1,R2]-β-Gal(1–3)-α-GalNAc-O-Ser/Thr) glycosylation pattern has been quantitatively determined for 30 of the 31 Ser/Thr residues in the 81-residue porcine submaxillary gland mucin tandem repeat. This was achieved by Edman amino acid sequencing of the isolated tandem repeat after selective removal of non-C3-substituted, peptide-linked GalNAc residues by periodate oxidation and subsequent trimming of the remaining oligosaccharides to peptide-linked GalNAc residues by mild trifluoromethanesulfonic acid/anisole treatment. The sequencing reveals 61% (range, 12–95%) of the peptide α-N-acetylgalactosamine (GalNAc) residues to be substituted by core 1 chains, a value in agreement with the carbon-13 NMR analysis of the native mucin. Residues with the lowest C3 substitution were typically clustered in regions of sequence with the highest densities of (glycosylated) serine or threonine. This suggests that the porcine β3-Gal, core 1, transferase is sensitive to peptide sequence and/or neighboring core GalNAc glycosylation in vivo, in keeping with earlier in vitro enzymatic glycosylation studies (Granovsky, M., Blielfeldt, T., Peters, S., Paulsen, H., Meldal, M., Brockhausen, J., and Brockhausen, I. (1994)Eur. J. Biochem. 221, 1039–1046). These results demonstrate that the O-glycan structures in mucin domains are not necessarily uniformly distributed along the polypeptide core and that their lengths can be modulated by peptide sequence. The data further suggest that hydroxyamino acid spacing may contribute to the regulation of glycan length, thereby, providing a mechanism for maintaining an optimally expanded, protease resistant, mucin conformation.

The sequence-specific O-linked core 1 ([R 1 ,R 2 ]-␤-Gal(1-3)-␣-GalNAc-O-Ser/Thr) glycosylation pattern has been quantitatively determined for 30 of the 31 Ser/Thr residues in the 81-residue porcine submaxillary gland mucin tandem repeat. This was achieved by Edman amino acid sequencing of the isolated tandem repeat after selective removal of non-C3-substituted, peptidelinked GalNAc residues by periodate oxidation and subsequent trimming of the remaining oligosaccharides to peptide-linked GalNAc residues by mild trifluoromethanesulfonic acid/anisole treatment. The sequencing reveals 61% (range, 12-95%) of the peptide ␣-N-acetylgalactosamine (GalNAc) residues to be substituted by core 1 chains, a value in agreement with the carbon-13 NMR analysis of the native mucin. Residues with the lowest C3 substitution were typically clustered in regions of sequence with the highest densities of (glycosylated) serine or threonine. This suggests that the porcine ␤3-Gal, core 1, transferase is sensitive to peptide sequence and/or neighboring core GalNAc glycosylation in vivo, in keeping with earlier in vitro enzymatic glycosylation studies ( Biochem. 221, 1039 -1046). These results demonstrate that the O-glycan structures in mucin domains are not necessarily uniformly distributed along the polypeptide core and that their lengths can be modulated by peptide sequence. The data further suggest that hydroxyamino acid spacing may contribute to the regulation of glycan length, thereby, providing a mechanism for maintaining an optimally expanded, protease resistant, mucin conformation.
Protein O-glycosylation is a common post-translational modification of a wide range of secreted and membrane-associated proteins. Many of these glycoproteins contain highly O-glycosylated, so called mucin-like, domains that are important to their function (1)(2)(3)(4). These domains contain high numbers (Ͼ30%) of Ser and Thr, which are extensively O-glycosylated by oligosaccharide side chains attached via ␣-N-acetylgalactosamine (GalNAc). 1 These domains are typically 50% or more carbohydrate by weight and commonly contain tandemly repeated amino acid sequences. Little is known of the structural and dynamic effects of multiple O-glycosylation, except that these regions are resistant to proteases and possess extended conformations (4 -6), making them ideal structural motifs for modifying the physical-chemical and biological properties of proteins.
Mucin glycoproteins are a class of high molecular weight, highly O-glycosylated glycoproteins secreted by the epithelium, designed to protect and lubricate the cell surface from biological, chemical, and mechanical insult. Studies on the ovine and porcine submaxillary gland mucins have shown that their Oglycan side chains (which make up ϳ60 -70% of their masses) are responsible for their expanded solution structure and resistance to proteases, and hence are a key component governing the protection rendering properties of mucins (4,5). Cell surface mucins and other cell surface glycoproteins containing mucin-like domains play additionally important biological roles by their involvement for example in the immune response, inflammation, cell adhesion, and tumorigenesis (1)(2)(3)(4)(7)(8)(9)(10). Mucin-like domains also serve structural roles in a number of secreted globular proteins (11,12). In many cases the mucinlike domains in these glycoproteins are important to their biological function (13)(14)(15)(16)(17)(18).
Very little is known of the site-specific glycosylation patterns of the glycosylated domains in mucins or other O-linked glycoproteins, nor how alterations in glycosylation pattern may affect properties of a glycoprotein. The absence of specific Oglycan structural information is principally due to the lack of suitable analytical methods applicable to the characterization of such heavily O-glycosylated domains. Major practical difficulties arise from the resistance of these domains to proteolytic cleavage and their inherent oligosaccharide structural heterogeneity, thereby making it difficult to obtain suitable glycopeptides for structural analysis. Although Edman sequencing of glycoproteins containing intact O-linked glycans have been reported, the inherent oligosaccharide heterogeneity presents significant problems when attempting to quantitate a glycosylation pattern of a glycoprotein (19,20). Mass spectrometric approaches have also recently been applied to the characterization of short mucin glycopeptides (21)(22)(23). The analysis by mass spectrometry is extremely complex and typically requires oligosaccharide trimming to the peptide GalNAc residue, via chemical or enzymatic approaches (21,38). In addition, mass spectrometry approaches are not easily made quantitative and are unreliable for the quantitative determination of glycosylation patterns.
Recently we reported the nearly quantitative determination of the specific ␣-GalNAc-Ser/Thr core O-glycosylation pattern of the 31 Ser/Thr residues in the 81-residue glycosylated tandem repeat of porcine submaxillary gland mucin (PSM) (24). Our approach involved the quantitative trimming of the mucin oligosaccharide side chains to the core GalNAc residue by treatment with trifluoromethanesulfonic acid (TFMSA)/anisole at 0°C followed by trypsinolysis and the isolation of an ensemble of tandem repeat glycopeptides. Edman amino acid sequencing was then used to obtain the glycosylation pattern by the quantitation of specific glycosylated and nonglycosylated amino acid derivatives. Practical advantages to this approach are: 1) the reduction of oligosaccharide side chain heterogeneity by the mild TFMSA/anisole treatment; 2) the increased susceptibility the partially deglycosylated (trimmed) mucin to proteases, thereby allowing the isolation of tandem repeat monomers; and 3) the ability to readily identify and quantify monosaccharide Ser/Thr-GalNAc derivatives by standard Edman amino acid sequencing protocols. Applicability of this approach has been further demonstrated by the recent determination of the core O-glycosylation pattern of the 20-residue tandem repeat of the Muc1 isolated from human milk (21).
A chief drawback of the use of the above approach is the contaminant loss of oligosaccharide side chain structural information, which indeed is the chief reason that the approach works so effectively. We now report on a scheme, using this basic approach, that permits the quantitation of the core 1 ([R]-␤-Gal(1-3)-␣-GalNAc-O-Ser/Thr) oligosaccharide structures in a peptide sequence-dependent manner. The ability to monitor the GalNAc C3 OH substitution status relies on the ability to selectively remove non-C3 substituted peptide-linked GalNAc residues after periodate oxidation, while leaving C3substituted GalNAc residues intact (25). In this manner only those peptide-linked GalNAc residues substituted at C3 are retained for amino acid sequencing. We have confirmed the selectivity of this procedure by studies on the antifreeze glycoprotein (AFGP) from Antarctic fish, whose side chains are exclusively ␤-Gal(1-3)-␣-GalNAc-O-Thr (26). With this modification it is now possible to further characterize the O-glycan structures of mucins and other heavily O-glycosylated proteins in a sequence-dependent manner. Having such information will permit the characterization of the peptide specificities of the core 1, ␤3-galactosyltransferase (UDP-galactose:glycoprotein-N-acetylgalactosamine ␤3-galactosyltransferase) that catalyzes a primary step in O-glycan biosynthesis. Using this approach we have obtained the sequence specific core 1 glycosylation pattern of 30 of the 31 Ser/Thr residues in the PSM tandem repeat. This represents the first sequence-specific determination of an oligosaccharide side chain structure of any highly glycosylated mucin-type glycoprotein. Our results indicate that in vivo the ␤-Gal(1-3) core 1 transferase is sensitive to peptide sequence and/or neighboring core ␣-GalNAc glycosylation, findings that corroborate previous in vitro enzymatic studies on glycopeptide substrates (27,28).
The complete polypeptide structure of PSM has recently been completed (29). The basic molecule is composed of a very large central highly O-glycosylated domain flanked by small Cys-rich globular domains at both the N and C termini. The glycosylated domain is composed of approximately 100 81-residue tandem repeats with the sequence as given in Fig. 1A (30). Each tandem repeat contains 31 Ser/Thr O-glycosylation sites, all of which have been observed to be O-glycosylated, although to different degrees of completion (24). We have previously suggested that the observed differential glycosylation may be partially due to the steric effects of penultimate bulky side chains (24). The oligosaccharide side chain structures described for PSM range from the monosaccharide ␣-GalNAc-O-Ser/Thr, Tn determinant, to the tetrasaccharide ␣-GalNAc(1-3) [␣-Fuc(1-2)]-␤-Gal(1-3)-␣-GalNAc-O-Ser/Thr), A blood group determinant (31). Each of these mono-through tetra-saccharides are potentially glycosylated by an ␣-NeuNGl residue attached to the peptide-linked GalNAc at C6, thus up to eight possible oligosaccharides can be found in PSM. The complete series of possible structures is given in Fig. 1B. The oligosaccharide side chain composition of native PSM can be readily quantitated by carbon-13 NMR spectroscopy (32). Studies on several preparations of PSM indicate that 30 -40% of the PSM side chains consist of the monosaccharide ␣-GalNAc, while the di-, tri-, and tetrasaccharides each contribute 0 -40% of the remainder of the oligosaccharides, depending on the blood group status of each individual pig (32). Forty to fifty percent of these oligosaccharides are substituted by NeuNGl, although, the distribution of NeuNGl among the different oligosaccharides is unknown (32).

Materials
AFGP purified from the blood of Dissostichus mawsoni (fractions AFGP 1-5) (33) was a gift of A. DeVries, University of Illinois at Urbana-Champaign, Urbana, IL. Except where noted, chemicals and enzyme reagents were obtained from Boehringer Mannheim, Sigma, or Fisher.
Shogren et al. (34). Mucin was reduced and carboxymethylated by the methods of Gupta and Jentoft (35) and trypsinized as described by Gerken et al. 1997 (24), yielding the PSM glycosylated domain of tandem repeats called TR-PSM. Studies were performed on two pools of A blood group-positive TR-PSM with nearly identical oligosaccharide side chain distributions; see Table IA under "Results." These pools were also utilized for our earlier studies (24).
Oxidation and Elimination of TR-PSM-The selective removal of C3-unsubstituted GalNAc (oligo)saccharide side chains was performed utilizing methods modified from Gerken et al. (25). TR-PSM (10 mg/ml) was made 0.1 M HIO 4 , pH 4.5, and incubated at 4°C in the dark for 5 h or overnight. Periodate was destroyed by the addition of 0.4 M Na 2 S 2 O 3 , 0.1 M NaI, 0.1 M NaHCO 3 , and the elimination begun by immediately adjusting the pH to 10.5 with 1 M NaOH. After 1 h on ice the solution was dialyzed overnight against ϳ2.5 mM sodium bicarbonate buffer, pH 10.5. After dialysis the remaining oxidized oligosaccharide side chains were reduced at 0°C by the addition of 1 M NaBH 4 in 1 M Na 2 HPO 4 (1 ml/10 mg of mucin). After 1 h, NaBH 4 was destroyed with dilute acetic acid to pH ϳ6.5. Best results were obtained when care was taken to keep the oxidized mucin on ice until after the NaBH 4 reduction (see "Results"). After exhaustive dialysis the reaction mixture was lyophilized and fractionated on Sephacryl S200. This oxidized and eliminated mucin will be called EOTR-PSM.
To reduce peptide core degradation (see "Results") that apparently occurs during the overnight elevated pH elimination procedure, an alternate procedure was developed. In this procedure, after the destruction of the periodate, the mucin was immediately reduced by NaBH 4 in phosphate buffer as described above. This modified mucin will be called ROTR-PSM.
Biotinylation of the Tryptic Tandem Repeat Glycopeptide-Purified tandem repeat glycopeptide (TTEOTR-PSM or TTROTR-PSM) (4 mg in 1.5 ml of 66 mM NaH 2 CO 3 containing 66% Me 2 SO, pH 8.2) was Nterminal biotinylated by the addition of a ϳ40-fold molar excess of N-hydroxysuccinimidobiotin (Pierce) (6.5 mg in 0.2 ml of Me 2 SO). After 6 h, a second aliquot of N-hydroxysuccinimidobiotin in Me 2 SO was added, and the mixture was incubated at ambient temperature overnight. The modified tandem repeat glycopeptide was separated from low molecular weight reagents on Sephracryl S200 chromatography yielding 3 mg of N-terminal biotinylated tandem repeat after lyophilization. The biotinylated tandem repeat will be called BTTEOTR-PSM or BTTROTR-PSM.
Isolation of the Large C-terminal Glu-C Tandem Repeat Glycopeptide-Sephacryl S200-purified biotinylated tryptic tandem repeat glycopeptide (ϳ3 mg/ml) in 25 mM ammonium bicarbonate, pH 7.8, was digested with ϳ0.4 mg of protease Glu-C overnight at 25°C. A second addition of Glu-C, and an additional 6 -8-h incubation was performed to ensure full digestion. The digestion mixture was applied to a prewashed 1-ml bed volume immobilized avidin column (Pierce) and washed with 5 ml of 50 mM ammonium bicarbonate, pH 7.8. The pass through and wash, which contain the nonbiotinylated C-terminal Glu-C glycopeptides, were collected, pooled, and lyophilized. After Sephacryl S200 chromatography the large C-terminal glycopeptide fraction (called ABTTEOTR-PSM or ABTTROTR-PSM) gave 1.5 mg of glycoprotein after lyophilization. Both glycopeptide were characterized on reverse phase HPLC.
Antifreeze Glycoprotein Modifications-AFGP (12-mg aliquots) were periodate-oxidized and eliminated or oxidized and immediately reduced as described for PSM above. Partial deglycosylation of the native and oxidized AFGP by mild TFMSA/anisole was also performed as described previously for PSM (24,25). After mild TFMSA/anisole treatment, each yielded ϳ8 mg of deglycosylated AFGP, which were characterized by carbon-13 NMR spectroscopy.
NMR Analysis-Carbon-13 NMR spectra of native and modified mucins were obtained at 67.9 MHz with a Bruker AC-270 spectrometer using acquisition conditions previously described (32). Oligosaccharide side chain distribution was determined from integrations of the ano-meric carbon region of the carbon-13 NMR spectrum of native mucin (32). GalNAc removal after oxidation-elimination was obtained by integration of the GalNAc C1 and C5 carbons and the ␣-carbon resonance of Gly. Percent glycosylation was obtained by normalizing the integrals to the Gly, Ser, and Thr content of PSM reported by the amino acid analysis of Gupta and Jentoft (35). The Ser/Thr ␣-carbon resonances cannot be readily used for obtaining quantitative deglycosylation data due to overlap with other peptide ␣-carbon resonances, although changes in their intensities can by used to confirm Ser/Thr deglycosylation.
Amino Acid Sequencing-Pulsed liquid phase Edman degradation amino acid sequencing was performed on an Applied Biosystems Procise 494 protein sequencer (Perkin-Elmer, Applied Biosystems Div., Foster City, CA) using the manufacturer's recommended pulse liquid cycles as described previously (24). Amino acid phenylthiohydantion (PTH) derivatives were chromatographed on an ABI 5-m C18 PTH column using the fast normal I gradient program and monitored by the absorbance at 269 nm. The PTH-Ser/Thr-O-GalNAc elute as two diastereotopic peaks in the chromatogram at relatively unique positions and constant area ratios. Peak areas were measured automatically, and picomoles of PTH derivative were determined after eliminating long term cycle preview and lag for each PTH derivative by a simple base line substraction approach and correcting when necessary for overlapping peaks as described previously (24). This base line substraction approach permits the quantitation of the extent of glycosylation by eliminating long range base line contributions to the peak area from adjacent cycles. Corrections for the overlap of the second eluting PTH-Thr-O-GalNAc diastereomer with PTH-Thr were made using the previously obtained area ratios for the two PTH-Thr-O-GalNAc diastereomers obtained from fully glycosylated glycopeptides (1,24). The extent of glycosylation was determined by comparing the relative picomoles of nonglycosylated and glycosylated PTH derivatives after taking into account the relative recovery the individual PTH species determined from the sequencing of fully glycosylated glycopeptides of known composition (1,24). Typically 1-3 nmol (10 -50 g) of glycopeptide were sequenced to ensure sequencing to at least 40 -50 residues. As shown by Table I (see "Results") the reproducibility between sequencing runs is relatively good, and typically within 1-5 percentage values.
Sequence "Density" Determinations-Sequence weighted average Ser/Thr and GalNAc-O-Ser/Thr density values were obtained by performing a 7-residue weighted running average along the tandem repeat using arbitrary weights of 0.25, 0.5, 0.75, 1, 0.75, 0.5, and 0.25 at each position. For the Ser/Thr density determination, Ser and Thr residues were valued at 1, and all other residues were valued at 0. For the GalNAc-O-Ser/Thr density determination, values for the Ser and Thr residue-specific glycosylation were used (24) with all other residues valued at 0.

Elimination of Non-C3-substituted Peptide GalNAc Residues and Isolation of Partially Deglycosylated PSM Tandem Repeat
Glycopeptide-We have previously shown (25) that an effective method for the complete deglycosylation of mucin glycoproteins involves a two-step approach: 1) the trimming of O-linked oligosaccharide side chains to peptide linked ␣-GalNAc residues by mild TFMSA/anisole followed by 2) a periodate oxidation and alkaline elimination of the oxidized GalNAc residues. This results in a relatively nondegraded peptide core containing intact Ser and Thr residues. C3-substituted, peptide-linked GalNAc residues are not oxidized by periodate, therefore, an oxidation-elimination prior to the mild TFMSA/anisole treatment will fail to cleave C3-substituted, peptide-linked GalNAc residues. A subsequent mild TFMSA/anisole treatment will trim the remaining C3 substituents from the substituted Gal-NAc residues, leaving intact monosaccharide GalNAc side chains as labels for the presence of C3-substituted residues.
Oxidized and partially deglycosylated 81-residue PSM tryptic tandem repeats were obtained using the same strategies that were used for obtaining the glycosylation pattern of the "native" repeat (24). 2 Gel filtration chromatograms showing the sequential isolation of the oxidized and eliminated/reduced PSM tandem repeat glycopeptides are given in Fig. 2. In each chromatogram the indicated glycopeptide factions were pooled and used for subsequent steps.
As shown by the gel filtration chromatogram in Fig. 2A, the oxidization, elimination and reduction procedures, giving EOTR-PSM, yield a high molecular weight glycosylated domain with chromatographic behavior nearly identical to the previously reported nonoxidized PSM glycosylated domains (24). Preparations that were oxidized and immediately reduced, giving ROTR-PSM also gave identical gel filtration profiles (data not shown). These findings suggest the peptide cores of the variously treated glycosylated domains are not cleaved by these procedures. However, after mild TFMSA/anisole deglycosylation of EOTR-PSM, the breadth of the major glycopeptide peak was found to vary considerably; compare the two different chromatograms in Fig. 2B. By very carefully maintaining temperatures between 0 and 4°C during all steps (including the alkaline dialysis) of the oxidation and elimination procedure, the broadening was significantly reduced; Fig. 2B, closed circles. Eliminating the alkaline elimination step completely and relying on the mild TFMSA/anisole treatment to remove the oxidized (and reduced) peptide linked GalNAc residues (36), giving TROTR-PSM, the broadening was further reduced (Fig. 2C). Since prior to the mild TFMSA/anisole treatment the EOTR-PSM and ROTR-PSM derivatives are nearly identical on Sephracryl S200, the alkaline elimination step apparently produces a temperature-dependent modification of the peptide core that is subsequently susceptible to TFMSA treatment. Although we have not further pursued the origins of this degradation, a plausible source may be the alkaline ␤-elimination of a small number of the remaining nonoxidized C3substituted GalNAc oligosaccharides producing acid-labile un-saturated Ser/Thr residues (37).
Monomeric tandem repeats were obtained after trypsinolysis of the indicated glycosylated domains in Fig. 2, B or C. The gel filtration chromatogram of the digested TROTR-PSM fraction, Fig. 2D, is nearly identical to that observed earlier for the nonoxidized glycosylated domains (24), with the major low molecular weight peak representing the monomeric tandem repeat. Consistent with the presence of less peptide core degradation, the TTROTR-PSM preparations typically give sharper tandem repeat peaks compared with the alkaline eliminated TTEOTR-PSM preparations (data not shown). Pooled tandem repeat fractions were further characterized by reverse phase HPLC (data not shown) (24). The tandem repeats from the alkaline-treated preparation (TTEOTR-PSM) gave a single major peak with a broad shoulder, while those from the rapidly reduced preparation (TTROTR-PSM) gave a symmetrical peak lacking the shoulder, again suggesting differences in peptide core degradation between preparations. Portions of the major TTEOTR-PSM peak on HPLC were sequenced and used for isolating the C-terminal portion of the tandem repeat as discussed below. The immediately reduced TTROTR-PSM preparations were sequenced and further modified without prior HPLC fractionation. As shown below, both TTEOTR-and TTROTR-PSM preparations gave indistinguishable results upon sequencing.
Isolation of Large C-terminal Glu-C Tandem Repeat Glycopeptide-Glu-C digestion of the isolated tryptic tandem repeat yields glycopeptides of 38, 40, and 3 residues in length (cleavage after Glu (E) in tandem repeat sequence in Fig. 1A). The large 40-residue C-terminal glycopeptide, residues 39 -78, was isolated for amino acid sequencing and glycosylation pattern determination as described below.
In our previous studies the larger nonoxidized Glu-C-digested PSM tandem repeat glycopeptides could be separated on reverse phase HPLC (24). However, in the current studies the oxidized Glu-C-digested glycopeptides are unresolved on reverse phase HPLC (data not shown), presumably due to their different carbohydrate content. Instead, an alternate route was TFMSA/anisole. Only after trimming the glycans to GalNAc residues does the tandem repeat domain become susceptible to trypsin thus giving an ensemble of 81 residue tandem repeat glycopeptides (Fig. 1A). (Note that the Arg-Pro sequence is resistant to trypsin, therefore a single peptide repeat is obtained.) Trypsinolysis prior to partial deglycosylation fails to release tandem repeats (24).

FIG. 2. Preparation and isolation of selectively deglycosylated PSM tandem
repeat glycopeptides by Sephacryl S200 gel filtration. The indicated fractions were pooled for subsequent processing as described in the results. A, profile of periodate-oxidized, eliminated, and reduced PSM (EOTR-PSM). The broad low molecular weight peak represents the poorly glycosylated PSM N-and C-terminal domains. B, profiles of two different preparations of oxidized, eliminated, and reduced (EOTR-PSM) after deglycosylated by mild TFMSA/anisole, oxidized and eliminated under stringent (q) and non stringent low temperature control (E). C, profile of mild TFMSA/anisole-treated, oxidized, and reduced PSM (lacking the alkaline elimination step) giving TROTR-PSM. D, isolation of the PSM 81-residue tandem repeat after trypsinolysis of TROTR-PSM. E, gel filtration of the N-terminal biotinylated tandem repeat. F, isolation of the C-terminal 40-residue glycopeptide after digestion of the biotinylated tandem repeat with protease Glu-C and avidin column treatment.
used to separate these glycopeptides based on the biotinylation of the N-terminal amino group of the intact 81-residue tandem repeat prior to Glu-C digestion. As there are no other reactive amino groups in the tandem repeat, this procedure allows for its selective removal of the N-terminal Glu-C glycopeptide after Glu-C digestion on an immobilized avidin column. Fig. 2E shows the separation of the biotinylated tandem repeat from the biotinylation reagents on Sephracryl S200, while Fig. 2F shows the avidin column flow-through of the Glu-C digest (closed circles). The first peak of the avidin column flowthrough (Fig. 2F) was shown to be protease Glu-C, based on its identical migration behavior to authentic Glu-C (Fig. 2F, open  circles). The second peak was identified as the large C-terminal Glu-C glycopeptide based on its lower molecular weight compared with the intact tandem repeat in Fig. 2E. Reverse phase HPLC characterizations of the large C-terminal Glu-C glycopeptides obtained from both the TTEOTR-PSM and TTROTR-PSM tandem repeats were identical (data not shown) and lacked shoulders. These preparations were sequenced without prior HPLC fractionation. Conformation of the isolation of Cterminal PSM tandem repeat glycopeptide, residues 39 -78, was obtained from amino acid sequencing (see Table II and Fig.  4

below).
Proof of Removal of Non-C3-substituted GalNAc Residues after Periodate Oxidation-Carbon-13 NMR spectroscopy was used to confirm and quantitate the specific loss of non-C3substituted, peptide-linked GalNAc residues after oxidation and subsequent mild TFMSA/anisole treatment. Representative NMR spectra are given in Fig. 3. As previously shown, mild TFMSA/anisole treatment (giving TTR-PSM) quantitatively trims the PSM oligosaccharides to peptide-linked GalNAc residues; compare Fig. 3, A and B, and Table IB. Failure to remove the peptide linked GalNAc residue by mild TFMSA/anisole is further shown by the relative absence of change in the glycosylated and nonglycosylated Ser and Thr ␣-carbon resonances (55-61 ppm, Fig. 3, A and B). About 75% of the Ser/Thr residues are glycosylated in native and mild TFMSA/anisoletreated mucin (Table IB). Only after periodate oxidation and subsequent mild TFMSA/anisole treatment (giving TEOTRand TROTR-PSM) does the peptide-linked GalNAc content decrease (Fig. 3, C and D, and Table IB. This is readily detected by the lower intensities of the peptide-linked GalNAc resonances (labeled in Fig. 3) and by the changes in the intensities of the ␣-carbon resonances for glycosylated and nonglycosylated Ser and Thr. After the periodate oxidation, which removes non-C3-substituted GalNAc residues, the Ser/Thr glycosylation of mucin decreases to about 50% (Table IB). Since the average percent of C3 substitution obtained after removal of non-C3substituted GalNAc residues, ϳ62% is identical to the value obtained from the oligosaccharide composition of intact PSM, 63% (Table I, part A), we conclude that the periodate oxidation is indeed selectively removing non-C3-substituted GalNAc residues. As will be discussed below these values are in excellent agreement with the results obtained from amino acid sequencing (Table II). The NMR analysis also shows that the two different treatments performed after the oxidation step, TEOTR-PSM and TROTR-PSM, give essentially identical extents of C3 substitution, see Table 1B.
Demonstration of the Protection of C3-substituted GalNAc Residues against Oxidation-To unambiguously show that periodate oxidation fails to react with C3 substituted peptide linked GalNAc residues and to confirm that the ␤-Gal residue remains attached to the GalNAc residue after its oxidation (thus preventing subsequent oxidation of the GalNAc) we performed a series of oxidation-mild TFMSA/anisole reactions on the AFGP. This mucin-like glycoprotein consists primarily of the Ala-Ala-Thr sequence with each Thr residue O-glycosylated with the core 1 disaccharide, ␤-Gal(1-3)-␣-GalNAc-O (26,33). Carbon-13 NMR spectra for oxidized AFGP with and without the alkaline elimination step followed by mild TFMSA/anisole and of native AFGP treated with mild TFMSA/anisole alone were identical (data not shown), showing the complete removal of the ␤-Gal residue and the full retention of the peptide linked ␣-GalNAc residue for all conditions. These results demonstrate that C3 substituted GalNAc residues are resistant to periodate oxidation and elimination, and confirm the selectivity of the procedures for non-C3 substituted peptide-linked GalNAc residues.
Amino Acid Sequencing and C3 Glycosylation Determination-Amino acid sequencing was performed on the purified TTEOTR-and TTROTR-PSM tandem repeat glycopeptides (three determinations) and on the 40-residue C-terminal Glu-C derivative (two determinations) as described previously (24). An example of the quality of the base-line-corrected yields for Thr, Ser, Ala, Gly, and Val for the sequencing of the 40-residue C-terminal Glu-C peptide are given in Fig. 4. The obtained residue specific extents of glycosylation are given in Table II. After periodate oxidation and mild TFMSA treatment the remaining glycosylated Ser/Thr residues directly measure the C3 substitution status of each GalNAc-O-Ser/Thr residue. Therefore, the observed extent of glycosylation at each residue must be equal to or less than that of the native mucin obtained previously. This is born out in Table II, thus further demonstrating the reliability of the quantitation. As shown in Table  II, the average extent of glycosylation of the oxidized mucin is 48% compared with 78% for the native mucin, giving an average percent of C3 substitution of 61%. These values are identical to the carbon-13 NMR derived values given in Table I and Mono through tetra saccharide structures as defined in Fig. 1B. Oligosaccharide compositions obtained from integrations of the anomeric carbon region of each pool's carbon-13 NMR spectrum in Fig. 4A (32). Sialic acid content not included.
b See text and footnote 1 for abbreviations. c Value obtained from the areas of the unique di-and tetrasaccharide anomeric carbon resonances (105 and 92 ppm), in Fig. 4A, normalized to their compositions (A above) and to the Gly ␣-carbon resonance (44 ppm) using the amino acid compositions of 20% Gly, 21.6% Ser, and 14.4% Thr (35). d Values obtained from the areas of the GalNAc anomeric carbons, 100 and 98 ppm, Fig. 4, B-D, normalized to the Gly ␣-carbon resonance (44 ppm) using the amino acid compositions of 20% Gly, 21.6% Ser, and 14.4% Thr (35).
e Calculated from the NMR derived average oligosaccharide composition in A above. f NA, not applicable, all C3 substitution information lost after mild TFMSA/anisole treatment in the absence of periodate oxidation. g Relative to the %R-GalNAc-O-Ser/Thr value of TTR-PSM obtained as described in note d.  further validate the quantitation of the amino acid sequencing data.
Analysis of C3 Substitution Data-A graphical representation of the glycosylation pattern of the PSM tandem repeat is given in Fig. 5A. In Fig. 5 the height of the bars represents the percent that each Ser/Thr residue in the tandem repeat is glycosylated by GalNAc; the gray areas representing the proportion of residues substituted at C3 while the black areas are those residues containing no C3 substituents. For simplicity we will call the non-C3-substituted GalNAc residues "mono" saccharides, keeping in mind that they may be sialylated at C6 (see Fig. 1B). Fig. 5A clearly shows a wide range of C3 substitutions along the tandem repeat. To account for differences in the peptide core GalNAc content of each residue, the C3 substitution data were replotted in Fig. 5B after normalization to each residue's GalNAc content. In the figure, solid bars represent the percentage of non-C3-substituted, monosaccharide GalNAc residues. As shown in Fig. 5 and Table II the percent of GalNAc residues substituted at C3 ranges between 12 and 95%. Inspection of the data in Fig. 5B suggests that there is a decrease in C3 substitution in the regions of residues 31, 51, and 61, which also appears to be in regions of clusters of Ser and Thr residues. To test this hypothesis, the relative Ser/Thr density was quantitated for each residue along the tandem repeat (see "Methods") for comparison to the extent of C3 substitution. These Ser/Thr density values are plotted as the upper curve in Fig. 5B. The results suggest that Ser/Thr residues with the highest percentage of monosaccharide side chains are located in regions of the tandem repeat with the highest densities of Ser and Thr, while residues with the highest C3 substitution usually appear in regions of relatively low Ser/Thr density. These data may suggest that steric crowding by neighboring potentially glycosylated Ser/Thr residues may be a factor modulating the extent of C3 substitution. The similarly calculated GalNAc density plot, the lower curve in Fig. 5B, is consistent with this conclusion.
As a further analysis, the C3 glycosylations on Ser and Thr were compared to determine whether both residues were substituted to the same extent. Average C3 substitutions of 66 and 51%, for Ser and Thr were obtained, respectively. These differences, however, are not statistically significant by t test (p ϭ 0.072).
To further test the relationship between hydroxyamino acid (and GalNAc) sequence density values with the extent of C3 substitution, the data have been plotted in Fig. 6, A and B. In Fig. 6, Ser residues are denoted as squares and Thr residues as circles. Except for several outliers the majority of the points show the trend of decreasing C3 ␤-Gal substitution with increasing hydroxyamino acid or GalNAc density value. From the linear regression analysis of the complete data set, correlation coefficients (r 2 ) of 0.25 and 0.18 are obtained for the plots (solid lines) in Fig. 6, A and B, respectively (see Fig. 6 legend). By eliminating two outliers (Ser-2, which is very poorly glycosylated by GalNAc, presumably due to steric interactions of neighboring residues (24), and Thr-79 which represents a single relatively low signal-to-noise determination) correlation coefficients of 0.34 and 0.33 are obtained, respectively, with relatively small changes in the regression lines (dashed lines). The scatter in the plots suggest additional factors, perhaps local peptide sequence and peptide conformation, contribute to the modulation of transferase activity.

Modulation of Glycosylation by Protein Structure-A number
of examples have been reported, where specific protein sequences, peptide secondary/tertiary structures, or other unknown protein factors appear to regulate or direct the specific elongation or modification of N-linked glycans (39). These factors range from highly specific peptide sequence "motifs" recognized by specific transferases (40 -43) to relatively less specific steric or structural effects that alter an acceptor's accessibility as a result of, e.g. the folding or unfolding of a protein (44,45). By these various mechanisms, a range of protein specific N-linked glycan structures can be synthesized within in a given cell. In the absence of such mechanisms, cells would be expected to synthesize relatively uniform glycan structures, governed simply by the competition of transferases and sugar nucleotides for substrate (39).
In contrast to the N-linked glycans, relatively little is known regarding the effects of protein substrate structure on the elongation of the mucin-type O-glycans, although recent studies suggest that similar behavior should be expected. 3 Purified O-glycan core 1 ␤3-Gal transferases, from mammalian liver, display differential in vitro activity against a series of small ␣-GalNAc-O-Ser/Thr containing peptides (discussed below) (27,28), while the selectin ligand, PSGL1, is glycosylated differently from leukosialin, CD43, when expressed in the same cells (51,52). In the latter case, the differences in glycosylation were suggested to be due to the differential recognition of several transferases including ␣-2,3-sialyltransfease, ␣-1,3-fucosyltransferase, and ␤-1,3-N-acetylglucosaminyltransferase for glycoprotein substrate (51). Similarly, the synthesis of different O-glycans on recombinant MUC1, expressed in cell lines that otherwise do not express such oligosaccharides, has recently been suggested to be due to specific peptide epitopes directing the glycosylation of MUC1 (53). Presently it is unknown whether such hypothetical epitopes directly interact with transferase or whether they serve as local Golgi retention-like signals that could differentially affect the pattern of glycosylation (46).
The determination of the PSM tandem repeat glycosylation pattern described above clearly suggests that local peptide/ glycopeptide structural features govern the in vivo elongation of the O-glycan core GalNAc by the porcine core 1 ␤3-Galtransferase. This is demonstrated by the nonuniform nature of the substitution of the GalNAc residues given in Fig. 5. If there were no structural sequence effects we would expect the extent of C3 substitution to be identical at each Ser/Thr in the sequence. 4 Instead, the C3 substitution ranges from 12 to 95% (Table II). As shown by Figs. 5 and 6, decreased substitution appears to be weekly associated with the local density of hydroxyamino acids and/or GalNAc, with the most highly clustered Ser/Thr regions being the least likely to be fully core 1-substituted. The variability in the data in Fig. 6 further suggests that additional sequence-based factors, beyond glycosylation and/or hydroxyamino acid densities, must be influencing the addition of the ␤-Gal residue. These additional factors are currently unknown, although, as discussed below, previous in vitro studies have indicated specific neighboring residue effects on the rat liver core 1 enzyme (28). The reduction of the core 1 side chains observed for PSM in regions of high GalNAc or hydroxyamino acid density is consistent with the findings of Brockhausen and co-workers with the rat liver core 1 ␤3-Gal transferase (27,28). These workers have shown that the in vitro activity of the rat liver transferase vitro over others (47,48). The presence of multiple peptide ␣-GalNAc transferases (49,50), furthermore, helps explain the apparently broad substrate specificity of this initial step of O-glycan synthesis and may provide additional biosynthetic control. 4 Previous studies in our laboratory (32) have shown the extent of Ser/Thr substitution and the percent of unsubstituted monosaccharide core GalNAc range from 65-83% and 28 -43%, respectively, depending on the individual animal. Care was taken to utilize the same or similar mucin pools for these studies as were used for determining the original glycosylation pattern in Gerken et al. (24). Therefore, the different extents of core 1 glycosylation reflect structural sequence effects and not artifacts of the use of different mucin pools. FIG. 5. Peptide GalNAc and core 1 glycosylation patterns of the PSM tandem repeat. A, percentage of each Ser/Thr residue glycosylated versus tandem repeat sequence. Gray bars represent the percentage of Ser/Thr residues with core 1, ␤-Gal(1-3)GalNAcside chains, black bars represent the percentage of Ser/Thr residues containing unsubstituted monosaccharide GalNAc side chains (see "Results"). B, percentage of GalNAc residues that are unsubstituted monosaccharides (bars) and relative sequence densities (lines), versus tandem repeat sequences. Upper line, sequenceweighted average Ser/Thr density values; lower line, sequence-weighted average GalNAc-O-Ser/Thr density values, obtained as described under "Methods." Values in A are from Table II is reduced against small glycopeptide substrates that have adjacent glycosylated residues. Thus, elevated GalNAc densities appear to affect both the porcine core 1 transferase on very large macromolecules in vivo, and the rat liver enzyme on small glycopeptides in vitro. The data for PSM further show that some Ser diads are capable of high C3 substitution as long as there are no nearby glycosylated sites, as shown by the Ser-6 and -7 and Ser-13 and -14 pairs (Table II, Fig. 5A). This suggests the carbohydrate side chains of glycosylated hydroxyamino acid diads may be orientated toward opposite sides of the peptide core, whereas carbohydrate side chains at penultimate and more distant residues may be positioned to sterically interfere with the approach of the transferase. These findings are consistent with earlier molecular modeling studies of O-glycosylated dyad sequences, which suggest that the oligosaccharide side chains of a Thr diad should be pointing away from each other on opposite sides of the peptide core (54). Similar structural considerations have been discussed regarding the activity of the porcine polypeptide:UDP-GalNAc N-acetylgalactosaminyltransferase against Ser/Thr diad sequences (24).
Inhibitory effects of adjacent Pro, Lys, and Arg residues and stimulatory effects of adjacent Glu, Asp, and Gly have also been reported for in vitro activity of the rat liver core 1 ␤3-Gal transferase (28). Because of the PSM tandem repeat sequence is relatively low in charged residues and Pro, we are unable to confirm these correlations on PSM. However, an analysis of the 18 Gly residues in the tandem repeat reveals that GalNAc-O-Ser/Thr residues lacking adjacent Gly residues are C3 substituted to the essentially same extent as those residues with no neighboring Gly residues (59.5% (n ϭ 11) versus 62.2% (n ϭ 19)). Apparently, adjacent Gly residues do not enhance the porcine 1-␤3-Gal transferase activity against the PSM tandem repeat as is reported for the rat enzyme against different substrates. Finally, the rat core 1 transferase shows a preference for Thr-O-GalNAc substrates over Ser-O-GalNAc substrates (28) which is not observed in the PSM tandem repeat.
Maintaining Optimal Mucin Properties by Modulation of O-Glycan Structure-The present results, obtained from the analysis of the glycosylation pattern of the PSM tandem repeat, suggest that the number and lengths of O-glycans can be modulated in a sequence dependent manner. The data indicate that C3 substitution, and by inference side chain length, may vary inversely with local hydroxyamino acid and/or peptide linked GalNAc density. This suggests that shorter side chains (or fewer longer side chains) will be found at the regions in the peptide core with high Ser/Thr densities and that higher glycosylations and longer side chains will be found in regions of low Ser/Thr density. This provides a mechanism for maintaining a uniformly dense oligosaccharide coating of the mucin peptide core. Therefore, regions of low Ser/Thr density, which would require longer side chains for complete protease protection and maximal peptide core extension, will contain the longest glycans. Presently the origins of the modulation of O-glycan structure by peptide sequence are not fully understood, although, undoubtedly steric crowding of adjacent glycosylated residues is likely to be a major determinant.