JBC

HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Gerken, T. A.
Right arrow Articles by Pasumarthy, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gerken, T. A.
Right arrow Articles by Pasumarthy, M.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Volume 272, Number 15, Issue of April 11, 1997 pp. 9709-9719
©1997 by The American Society for Biochemistry and Molecular Biology, Inc.

Determination of the Site-specific O-Glycosylation Pattern of the Porcine Submaxillary Mucin Tandem Repeat Glycopeptide
MODEL PROPOSED FOR THE POLYPEPTIDE:GalNAc TRANSFERASE PEPTIDE BINDING SITE*

(Received for publication, December 19, 1996, and in revised form, January 30, 1997)

Thomas A. Gerken Dagger §, Cheryl L. Owens §Dagger par and Murali Pasumarthy Dagger

From the W. A. Bernbaum Center for Cystic Fibrosis Research, Departments of Dagger  Pediatrics, § Biochemistry, and par  Molecular and Microbiology, Case Western Reserve University, Cleveland, Ohio 44106-4948

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES


ABSTRACT

The heterogeneously glycosylated 81-residue tryptic tandem repeat glycopeptide from porcine submaxillary mucin (PSM) has been isolated and its glycosylation pattern determined by amino acid sequencing. Key to these studies is the ability to trim the structurally heterogeneous PSM oligosaccharide side chains to homogeneous GalNAc monosaccharide side chains by mild trifluoromethanesulfonic acid treatment. Trypsin treatment of trifluoromethanesulfonic acid-treated PSM releases the 81-residue tandem repeat as an ensemble of 81-residue glycopeptides with different glycosylation patterns. Automated amino acid sequencing using Edman degradative chemistry of the repeat was used to determine the extent of glycosylation of nearly every Ser and Thr residue. The Thr residues are all highly glycosylated within the range of 73-90%, giving an average Thr glycosylation of 83%. In contrast, the Ser residues display a wide range of glycosylations, ranging between 33 and 95%, giving an average Ser glycosylation of 74%. These data are consistent with the known elevated glycosylation of Thr peptides over Ser peptides for the porcine UDP-N-acetylgalactosamine:polypeptide N-acetylgalactosaminyltransferase. It is also observed that the extent of glycosylation of the repeat correlates poorly with published predictive methods. An examination of the sequences surrounding the glycosylation sites reveals that nearly all of the highly glycosylated sites have a penultimate Gly residue, whereas those that are less highly glycosylated have medium to large side chain penultimate residues. As observed by others, glycosylation also appears to be modulated by the presence of Pro residues. On the basis of these findings we suggest that the acceptor peptide binds the transferase in a beta -like conformation and that penultimate residue side chain steric interactions may play a role in determining extent that a given Ser or Thr is glycosylated. A model for the GalNAc transferase peptide binding site is proposed.


INTRODUCTION

Mucin glycoproteins are heavily O-glycosylated glycoproteins secreted by higher organisms that serve vital roles, protecting and lubricating epithelial cell surfaces from biological, chemical, and mechanical insult. Mucins and mucin-like molecules attached to membrane and cell surfaces play additional important roles by modulating for example immune response, inflammation, and tumorigenesis (1-5). The O-glycosylated domains of mucins and mucin-like glycoproteins typically contain 50-80% carbohydrate and possess expanded conformations. These regions typically contain high Ser and Thr contents, commonly composed of polypeptide tandem repeats containing clusters of Ser and Thr residues. It has been demonstrated, in the case of mucins, that the O-linked oligosaccharide side chains, attached via alpha -N-acetylgalactosamine (GalNAc)1 to Ser and Thr, are solely responsible for their 3-fold expanded peptide chain dimensions (6). Chemical and NMR studies indicate that on average 75% or more of the Ser and Thr residues in mucins are glycosylated (7, 8). Little, however, is known of the actual distribution of the carbohydrate in these clusters along the mucin polypeptide core. In addition, it is unknown whether the distribution of oligosaccharides along the peptide core is random or whether specific Ser/Thr residues are preferentially glycosylated over others. Obtaining this information by peptide mapping approaches would be a significant analytical undertaking due to the vast array of glycopeptides with different glycosylation patterns and oligosaccharide structures that would be needed to be quantitatively isolated and characterized.

Several predictive methods, based on the analysis of reported O-glycosylation sites compiled from protein data bases, are available for estimating the relative propensity for a given Ser or Thr to be glycosylated (9-11). Application of these predictive methods to the highly glycosylated mucins may not be fully valid because mucins and mucin-like molecules were not a significant component of the "training" data sets, which were dominated by globular glycoproteins, and where presented, the data on mucins and mucin-like glycoproteins glycosylation are commonly incomplete or in error. Furthermore, the propensity for O-glycosylation of Ser and Thr residues at the surface of globular glycoproteins may vary considerably from those found in mucin-like glycoproteins, which presumably serve different structural functions.

In vitro O-glycosylation studies using synthetic peptide acceptors have also been utilized to determine the propensity for a given Ser and Thr to be O-glycosylated (12-19). Unfortunately, the isolated peptide alpha -N-acetylgalactosaminyltransferases give different extents of Ser/Thr glycosylation on peptide substrates compared with that observed in vivo (14, 20). This may be the result of altered enzyme specificity as a result of inappropriate solution conditions, absence of cofactors, the presence of more than one transferase (21-23), or the result of the artifacts resulting from the use of relatively small peptides containing charged N and C termini. Only by characterizing and quantifying the specific in vivo glycosylation of native and/or expressed recombinant proteins and peptides (20) is it likely that valid and useful data will be obtained for determining the true in vivo transferase specificities.

To begin to address the structural effects of O-glycosylation and to determine the extent that mucin O-glycosylation is modulated in vivo in a site-specific manner, we have undertaken the isolation and characterization of the porcine submaxillary gland mucin (PSM)-glycosylated tandem repeat. We have determined the glycosylation pattern of the isolated 81-residue tandem repeat and have isolated and characterized several smaller PSM tandem repeat-derived glycopeptides (24). This work provides the first detailed analysis of the glycosylation pattern of a mucin and suggests that mucin glycosylation is modulated by peptide sequence but not entirely as expected by the existing O-glycosylation prediction algorithms. Based on the observed glycosylation pattern a model for the GalNAc transferase peptide binding site has been proposed.


EXPERIMENTAL PROCEDURES

Materials

alpha -GalNAc-Thr and alpha -GalNAc-Ser were a kind gift of R. Koganti, Biomera Inc. Edmonton, Alberta, Canada. Except where noted, all chemicals and enzyme reagents were obtained from Sigma or Fisher.

Methods

Isolation of PSM

Porcine submaxillary gland mucin was obtained from frozen porcine submaxillary glands, in gram quantities, as described by Shogren et al. (25).

Reduction, Carboxymethylation, and Trypsinolysis of PSM

PSM was reduced and carboxymethylated by the methods of Gupta and Jentoft (32) giving R-PSM. R-PSM (1 g/50 ml) was digested with L-1-tosylamido-2-phenylethyl chloromethyl ketone-trypsin (15 mg/1 g R-PSM) (Worthington) overnight at 37 °C in 50 mM ammonium bicarbonate, pH 8.3. A second aliquot of trypsin was added to ensure complete digestion and incubated 5-8 h. Toluene was added to both incubation solutions to prevent microbial growth. After exhaustive dialysis and the removal of insoluble debris by centrifugation, trypsinized R-PSM (TR-PSM) was lyophilized.

Partial Deglycosylation by Trifluoromethanesulfonic Acid (TFMSA) and Isolation of PSM Tandem Repeats

Lyophilized and dissector-dried TR-PSM (~0.75 g in 75-ml Teflon screw-cap tubes) was reacted for 6-16 h with a TFMSA (50 g) anisole (15 ml) (Aldrich) mixture at 0 °C following the approach of Gerken et al. (26, see also Ref. 27). To reduce heating effects, both the reagents and lyophilized TR-PSM were chilled in dry ice/ethanol prior to their mixing. After incubation with occasional vigorous shaking, the reaction was again chilled, and 1 volume of cold anhydrous diethyl ether was added. This mixture was slowly added to 125 ml of a frozen slush of 60% pyridine, after which the solution was warmed to room temperature and extracted with ether. The aqueous phase containing the partially deglycosylated TR-PSM (TTR-PSM) was dialyzed exhaustively and lyophilized.

Low molecular weight non-glycosylated peptides were separated from the glycosylated tandem repeat subunits by gel filtration chromatography on Sephacryl S200 (Pharmacia Biotech, Uppsala, Sweden) (column dimensions 5 × 55 cm, 7-ml fraction volumes) eluted with 50 mM ammonium bicarbonate buffer. Glycoprotein content was monitored by periodic acid-Schiff reagent (28), absorbance at 555 nm, and protein monitored by the absorbance at 220 nm.

The high molecular weight carbohydrate-containing fraction eluting near the void volume of the S200 column was lyophilized and treated a second time with L-1-tosylamido-2-phenylethyl chloromethyl ketone-trypsin (10 mg/g TTR-PSM) using the conditions described for the initial trypsin treatment. The digested TTR-PSM (TTTR-PSM) was fractionated on S200, and the PSM tandem repeats were isolated as the major included glycopeptide fraction (TTTR-PSM-T3) and lyophilized.

Glu-C Digestion of PSM Tandem Repeat

The TTTR-PSM-T3 tandem repeat (30 mg) in 5 ml of 25 mM ammonium bicarbonate, pH 7.8, was digested with 1 mg of protease Glu-C (Boehringer Mannheim) for 20 h at 25 °C. After a second addition of Glu-C and further digestion, the mixture was fractionated by S200 chromatography. The major glycopeptide peak (GTTTR-PSM) was separated into the major N- and C-terminal tandem repeat peptides on reverse phase HPLC.

HPLC Purification of PSM Tandem Repeat and Glu-C-digested Tandem Repeat

The TTTR-PSM-T3 tandem repeat and Glu-C-digested repeat (GTTTR-PSM) were further purified by reverse phase HPLC chromatography on a 0.46 × 15 cm C18 ODSII column (Alltech Associates Inc., Dearfield IL) using 0.05% trifluoroacetic acid, water/acetonitrile gradients as described in the figure legend. All isolations were performed on a Varian 5000 HPLC system (Varian Associates, Walnut Creek CA) equipped with a Schamitsu UV/VIS detector.

Amino Acid Sequencing

Pulsed liquid phase Edman degradation amino acid sequencing of the isolated PSM tandem repeats and Glu-C-derived glycopeptide was performed on either an Applied Biosystems 477A or Applied Biosystems Procise 494 protein sequencer (Perkin-Elmer) typically using standard manufacturer recommended pulsed-liquid cycles (24). Samples of 2000-5000 pmol were dried on trifluoroacetic acid washed glass fiber filters (ABI number 401111) spotted with 1.5 mg of BioBrene Plus (ABI 400385). Amino acid phenylthiohydantoin (PTH) derivatives were chromatographed on standard ABI 5-µm C18 PTH columns using the Fast Normal 1 gradient program and were monitored by the absorbance at 269 nm. The PTH-Thr/Ser-O-GalNAc derivatives were found to elute as two diastereotopic peaks at unique positions in the chromatogram (24). Due to increased peak broadening and changes in elution position as the PTH columns age, some variability and overlap of the glycosylated PTH derivatives with Ser, Thr, Gly, and Asp PTH derivatives were commonly observed. Since the ratio of the areas of the diastereotopic PTH-Ser/Thr-O-GalNAc derivatives was found to be relatively constant (typically 45/55 for Thr), the extent of glycosylation usually could be determined by the use of simple algebra. Modifications in the HPLC gradient were found to produce only marginal improvement in the separation of the PTH-Ser/Thr-GalNAc derivatives from the elution positions of neighboring PTH-derivatives. Prior to data analysis, long term cycle preview and lag for each PTH-derivative was eliminated by a base-line subtraction approach. This was performed by subtracting the 5-cycle minimum value running average across the entire sequencing run for each peak. Due to the length of the peptides and high content of similar amino acids (i.e. Gly, Ser, Thr, and Ala), no attempts were made to include quantifications for adjacent residue cycle lag or preview. Response factors for the PTH-Ser/Thr-O-GalNAc derivatives were obtained from shorter glycopeptide sequencing experiments (24) and were found to be similar to those of Ser and Thr. Response values for Ser, Thr, and their glycosylated PTH-derivatives were further adjusted for each sequencing run to obtain consistent picomole yields relative to the non-Ser/Thr residues.

Prediction of O-Glycosylation Sites and Secondary Structure

The PSM tandem repeat sequence was analyzed for potential O-glycosylation sites using software kindly provided by Dr. A. Elhammer (9) and by using the E-mail NetOglyc server of Hansen et al. (10). The sequence coupled vector projection predictions were kindly performed by Dr. K. Chou (11). Peptide secondary structure predictions were performed by the SOPM internet server (29). Predictions were performed for at least 10 residues beyond the tandem repeat N and C-terminal boundaries to eliminate end effects.

Peptide Modeling

Peptides were modeled using the Biopolymer module of InsightII (MSI Inc. formally Biosym Technologies, San Diego CA).


RESULTS

Isolation of PSM Tandem Repeat Glycopeptides

The structure of porcine submaxillary mucin (PSM) polypeptide based on the nucleotide sequencing of Timpte et al. (30) and Eckhardt et al. (31) is shown in Fig. 1A. The mucin's structure is dominated by the presence of highly O-glycosylated, multiple repeating 81-residue tandem repeats. These repeats make up the vast majority of the mucin's amino acid sequence and represent the major glycosylation sites in PSM.


Fig. 1. Polypeptide structure and tryptic tandem repeat sequence of the porcine submaxillary gland mucin (PSM). A, model of the PSM polypeptide based on the nucleotide sequencing of Timpte et al. (30) and Eckhardt et al. (31) and the biophysical studies of Gupta and Jentoft (32), Shogren et al. (6), and Perez-Vilar et al. (33). The PSM polypeptide consists of a very long highly extended O-glycosylated domain containing multiple copies, n, of an 81-residue tandem repeat which is followed by a relatively small, poorly glycosylated C-terminal domain. The glycosylated domain consists of several thousand residues comprising 25 or more tandem repeats (31). The C-terminal (and perhaps the N-terminal) domains have been shown to dimerize, m, thus accounting for the very large size of native mucin (33). B, the sequence of the tryptic PSM tandem repeat.
[View Larger Version of this Image (21K GIF file)]


Tryptic Tandem Repeat Glycopeptide

There are two potential trypsin cleavage sites, Arg-Ile and Arg-Pro, in each PSM tandem repeat. The Arg-Pro site, however, is expected to be inactive; therefore, trypsin treatment is expected to yield single copies of the 81-residue tandem repeat with the sequence given in Fig. 1B. Indeed, Timpte and co-workers (30) have shown that after full deglycosylation and digestion of the fully deglycosylated (apo) mucin with trypsin, this tryptic peptide is obtained. In contrast, Gupta and Jentoft (32) have shown that trypsin treatment of fully glycosylated, reduced and carboxymethylated mucin fails to yield single copies of the repeat and instead yields relatively high molecular weight glycopeptides composed of undigested tandem repeats. Apparently, the longer oligosaccharide side chains inhibit the digestion of the PSM polypeptide by trypsin, preventing the isolation of individual tandem repeats. The Sephacryl S200 gel filtration chromatogram of this species, TR-PSM, is given in Fig. 2A.


Fig. 2. Sephacryl S200 gel filtration chromatography showing the isolation of the PSM tryptic tandem repeat and its digestion by Glu-C. For A-C protein content is monitored by the absorbance at 229 nm (square ) and carbohydrate content by periodic acid-Shiff, absorbance at 555 nm (black-diamond ). A, S200 chromatogram of trypsinized, reduced, and carboxymethylated PSM (TR-PSM). B, chromatogram of TR-PSM after partial deglycosylation by TFMSA at 0 °C giving TTR-PSM (see "Experimental Procedures"). C, S200 chromatography of the indicated TTR-PSM fraction in B after digestion with trypsin, yielding TTTR-PSM. The third peak, T3, represents the tryptic glycosylated PSM tandem repeat. D, S200 chromatograph of the glycosylated tryptic tandem repeat, T3, from C (square , absorbance at 229 nm) and the chromatogram of Glu-C digested T3 (isolated on reverse phase HPLC, Fig. 3A) giving GTTTR-PSM (black-diamond , absorbance at 229 nm).
[View Larger Version of this Image (24K GIF file)]


Only after quantitatively trimming the oligosaccharide side chains to the peptide-linked alpha -GalNAc residue by mild TFMSA treatment (26-27), Fig. 2B, does the PSM glycopeptide core become susceptible to cleavage by trypsin, Fig. 2C, presumably releasing monomeric glycosylated tryptic tandem repeats (peak T3).2 Mild TFMSA treatment does not appreciably degrade the mucin, since on gel filtration untreated and treated mucin (TR-PSM and TTR-PSM respectively) contain similar high molecular weight glycosylated peaks, Fig. 2, A and B. Integrations of the alpha -carbon resonances of the 13C NMR spectra of native and TTR-PSM confirm that 96 to 97% of the glycosylated Ser and Thr residues of the native mucin retain intact unsubstituted alpha -GalNAc residues after mild TFMSA treatment (data not shown, see Ref. 26).

The suspected monomeric glycosylated tryptic tandem repeat, pooled as indicated in Fig. 2C, gave a single sharp peak after rechromatography on S200 as shown in Fig. 2D (square ). On reverse phase HPLC, Fig. 3A, this peak gives a single somewhat broadened peak, representing the ensemble of heterogeneously glycosylated tandem repeats. This peak was pooled for amino acid sequencing and for subsequent digestion by Glu C (discussed below). Amino acid sequencing of this peak confirmed the isolation of the PSM tryptic tandem repeat.


Fig. 3. Reverse phase HPLC chromatography of the tryptic PSM tandem repeat before and after Glu-C digestion. A, chromatograph of the isolated tryptic tandem repeat (TTTR-PSM-T3) pooled as indicated in Fig. 2C. B, chromatograph of the Glu-C-digested tryptic tandem (GTTTR-PSM) repeat pooled as indicated in Fig. 2D giving peaks GTTTR-PSM-GII and GTTTR-PSM-GIII. Solvent gradients for A and B were as follows: 0 min, 20% solvent B and 20 min, 50% solvent B. Solvent A, 100% water, 0.05% trifluoroacetic acid; buffer B, 50% acetonitrile, 50% water, 0.05% trifluoroacetic acid. Vertical scale represents the absorbance at 220 nm.
[View Larger Version of this Image (9K GIF file)]


Since each step in the above procedure typically yields a single major glycopeptide species that is pooled for the subsequent step, the obtained tandem repeat glycosylation pattern (see below) is expected to represent the majority of the PSM tandem repeats of the native mucin. After correcting for the different carbohydrate contents of native and TFMSA-modified PSM (26), it is calculated that between 40 and 50% (uncorrected for the nonspecific losses of material at each step) of the initial TR-PSM peptide is isolated in the pooled TTTR-PSM-T3 fraction. The possibility exists, however, that a subpopulation of differently glycosylated species may have been excluded, since the lower molecular weight regions of the glycopeptide peaks in Fig. 2, B and C, were not pooled. These lower molecular weight species are thought to represent nonspecific (protease and TFMSA) degradation products of the tandem repeat and other non-tandem repeat glycopeptides arising from the N- and C-terminal domains of PSM (see Fig. 1A). Evidence for nonspecific protease degradation arises from the observation that the use of non-L-1-tosylamido-2-phenylethyl chloromethyl ketone-treated trypsin gives a T3 peak that is further broadened to lower molecular weight. By eliminating these lower molecular weight species we are able to reduce the background sequencing "noise," thereby permitting the sequencing of longer segments of the tandem repeat glycopeptide (discussed below). We believe, however, that these degradation products would not have significantly different glycosylation patterns compared with the intact tandem repeat.

Glu-C Glycopeptides

The PSM tandem repeat contains three potential cleavage sites for endoproteinase Glu-C from Staphylococcus aureus V8 (cleaving at C terminus of Glu). Digestion of the tryptic PSM tandem repeat with Glu-C will therefore produce three glycopeptides of 40, 38, and 3 residues each, proceeding from the N to C terminus, respectively. We chose to isolate the 38-residue glycopeptide (residues 39-78) since its analysis would further help confirm the glycosylation pattern of the C-terminal half of the tandem repeat.

The HPLC-purified tryptic tandem repeat (Fig. 3A) was digested with Glu-C and fractionated on S200 chromatography, Fig. 2D. As shown in the figure the Glu-C digest (black-diamond ) migrates at a lower molecular weight than undigested tryptic repeat (square ). The 40- and 38-residue glycopeptides are not expected to be resolved on S200; therefore, the production of a single sharp peak after Glu-C digestion indicates the repeat has been fully cleaved by the enzyme. The remaining 3-residue glycopeptide was not specifically isolated or identified but presumably appears near the included volume of the column approximately fractions 120-140.

On reverse phase HPLC the pooled product of the Glu-C digest (pooled as indicated in Fig. 2D, black-diamond ) reveals a complex pattern comprised of two major broad peaks, labeled (GTTTR-PSM)-GII and (GTTTR-PSM)-GIII, as shown in Fig. 3B. Peak GII elutes at a lower acetonitrile content than the intact tryptic tandem repeat, whereas peak GIII elutes at a similar acetonitrile content as the intact tryptic tandem repeat. On the basis of the expected differences in hydrophobicities, the least hydrophobic fraction, GII, was tentatively assigned to the least hydrophobic glycopeptide 39-78 (containing 11 hydrophobic residues: Ala, Pro, Val, and Ile), and the more hydrophobic fraction, GIII, was tentatively assigned to glycopeptide 1-38 (containing 15 hydrophobic residues). Proton NMR spectroscopy, at 600 MHz, of each pooled fraction confirmed their identities based on their different amino acid compositions (data not shown). Fraction GTTTR-PSM-GII was unambiguously identified as residues 39-78 on the basis of its amino acid sequence, presented below.

Amino Acid Sequencing of PSM Tandem Repeat Glycopeptides

Amino acid sequencing was performed on the reverse phase HPLC-purified tandem repeat glycopeptides, TTTR-PSM-T3 and GTTTR-PSM-GII. As described earlier (24, 34-35), unique elution patterns are observed for the PTH-derivatives of alpha -GalNAc-Ser and alpha -GalNAc-Thr as shown in Fig. 4A for authentic alpha -GalNAc-Ser and alpha -GalNAc-Thr. Each glycosylated PTH-derivative appears as a pair of peaks in the chromatogram (Fig. 4A) because the conversion reaction forming the amino acid-PTH-derivative produces diastereomers with different retention times. The alpha -GalNAc-Ser-PTH-derivatives elute as an unresolved doublet, labeled S*+S**, early in the gradient near the position of PTH-Asp, and the alpha -GalNAc-Thr-PTH diastereomers elute later in the gradient as resolved peaks T* and T**, near the positions of PTH-Ser and PTH-Thr, respectively. These peaks are readily identified in the sequencing chromatograms of glycopeptide TTTR-PSM-T3 for residues Ser2, Ser6, and Thr22 as shown in Fig. 4, B-D. On the basis of the relative sizes of the glycosylated and nonglycosylated PTH-Ser/Thr derivatives in the TTTR-PSM-T3, it appears that Ser6 and Thr22 are more highly glycosylated than Ser2, which seems to be poorly glycosylated.


Fig. 4. Representative amino acid sequencing chromatograms of the alpha -GalNAc-Ser and alpha -GalNAc-Thr standards and TFMSA-treated PSM tryptic tandem repeats. A, chromatogram of PTH-derivatized alpha -GalNAc-Ser and alpha -GalNAc-Thr on the Applied Biosystems Procise 494 protein sequencer as described under "Experimental Procedures." B-D, chromatograms for cycles 2, 6, and 22 representing residues Ser2, Ser6, and Thr22 from the amino acid sequence determination 2 (Table I) of the HPLC-purified PSM tryptic tandem repeat (TTTR-PSM- T3). Vertical scale represents absorbance at 269 nm.
[View Larger Version of this Image (23K GIF file)]


Sequencing chromatograms were quantified to obtain the residue-specific extent of glycosylation as illustrated in Figs. 5 and 6 for sequencing run 2 of TTTR-PSM-T3 tandem repeat glycopeptide. Fig. 5A displays the uncorrected area data for the Gly-PTH peak plotted as a function of sequence cycle. The figure shows a pronounced base-line curvature that is also observed for the other amino acid residues and glycosylated Ser/Thr (data not shown). Since the extent of base-line curvature correlated with the residue's percent mole fraction, we conclude that the curvature is due to the cumulative effects of cycle preview and lag and perhaps due to heterogeneous cleavage of the tandem repeat peptide. Since non-zero base lines will interfere with the accurate determination of the extent of glycosylation and with the sequence determination at high cycle numbers, we eliminated the curvature by the base-line subtraction approach described under "Experimental Procedures." The effectiveness of the base-line correction approach is shown in the corrected data for Gly, Ile, Val, and Ala of Fig. 5, C-F, and for glycosylated and nonglycosylated Ser and Thr of Fig. 6, A-D. Note that after base-line correction the sequence can be read well beyond residue 60 as demonstrated by the expanded plots of Fig. 5, D and F, and Fig. 6, B and D. A plot of the single residue picomoles recovered versus cycle number, Fig. 5B, indicates that we have achieved reasonable sequential residue quantification and recovery. An average apparent repetitive yield of 99% is obtained from the data in Fig. 5B.


Fig. 5. Representative amino acid sequencing profiles for the 81-residue tryptic PSM tandem repeat glycopeptide TTTR-PSM-T3. A, plot of the uncorrected peak area data for Gly-PTH as a function of cycle number. B, plot of the base-line corrected specific residue picomole content versus cycle number. C and D, base-line corrected picomole plots for Gly-PTH at 1 and 10 × vertical scales, respectively. E and F, base-line corrected picomole plots for Ala-PTH (square ), Val-PTH (+), and Ile-PTH (diamond ) at 1 and 10 × vertical scales, respectively. Data are taken from sequence determination 2 (Table I) obtained on an Applied Biosystems Procise 494 Peptide Sequencer.
[View Larger Version of this Image (39K GIF file)]



Fig. 6. Ser and Thr residue sequencing profiles for the 81-residue tryptic PSM tandem repeat glycopeptide TTTR-PSM-T3. A and B, base-line corrected picomole plots for PTH-Ser-OH (+) and PTH-Ser-O-GalNAc (square ) at 1 and 10 × vertical scales, respectively. C and D, base-line corrected picomole plots for PTH-Thr-OH (+) and PTH-Thr-O-GalNAc (square ) at 1 and 10 × vertical scales, respectively. Data are taken from sequence determination 2 (Table I) obtained on an Applied Biosystems Procise 494 Peptide Sequencer.
[View Larger Version of this Image (40K GIF file)]


Table I lists the calculated extent of Ser/Thr glycosylation obtained from the multiple sequencing of the PSM tryptic tandem repeat (TTTR-PSM-T3) and its C-terminal Glu-C glycopeptide (GTTTR-PSM-GII). Sequence determinations 1 and 2 represent data from the same sample sequenced on different instruments. Note the excellent agreement between the two sequencing experiments. Sequence determination 3 represents the tandem repeat obtained from a different PSM preparation. Again, the results of determination 3 are nearly indistinguishable from the results of sequence determinations 1 and 2. The glycosylation patterns of the C-terminal Glu-C tryptic tandem repeat glycopeptide, determinations 4 and 5, also are in good agreement with the data from the full tryptic tandem repeat.

Table I.

Sequence-specific glycosylation patterns of the PSM tandem repeat

The abbreviations used are: TTTR-PSM-T3, oligosaccharide trimmed PSM tryptic tandem repeat, residues 1-81, HPLC purified in Fig. 3A; GTTTR-PSM-GII, protease Glu-C-cleaved TTTR-PSM-T3, residues 39-78, purified as shown in Fig. 3B. For further explanation see text.


Residue Observed Ser/Thr glycosylationa
Predicted Ser/Thr glycosylation
TTTR-PSM-T3 OG
GTTTR-PSM-GII OG
Average OG (S.D.) NetOglyc valueb h valuec  Delta valued
1 2 3 4 5

% % %
S2 31 33 35 33 (2) 0.29- 0.11-  -0.45-
S6 97 92 95 95 (2) 0.17- 0.55+ 0.01+
S7 94 94 95 94 (0) 0.58+ 0.50+ 0.26+
S13 94 92 94 93 (1) 0.69+ 0.75+ 0.43+
S14 62 65 71 66 (4) 0.12- 0.52+ 0.16+
S17 87 85 87 86 (1) 0.48- 0.25+ 0.41+
T22 101 83 85 90 (8) 0.75+ 0.17-  -0.07-
S23 96 88 91 92 (3) 0.84+ 0.32+  -0.33-
T29 75 82 79 (4) 0.32- 0.78+ 0.00+
T30 79 80 83 81 (2) 0.88+ 0.83+ 0.94+
S32 96 62 56 71 (18) 0.60+ 0.47+ 0.44+
S33 97 88 79 88 (4) 0.24- 0.59+ 0.65+
T37 88 79 83 83 (4) 0.88+ 0.15- 0.12-
T39 91 81 81 92 90 87 (5) 0.45- 0.56+ 0.35+
S43 92 92 92 95 93 (1) 0.52+ 0.17- 0.89+
S47 67 62 54 59 61 (5) 0.27- 0.45+  -0.02-
T49 81 81 93 83 85 (5) 0.20- 0.59+  -0.18-
T50 80 83 90 80 83 (4) 0.74+ 0.66+ 0.60+
T52 73 80 92 65 78 (10) 0.34- 0.38+ 0.08+
S54 45 23 39 35 36 (8) 0.42- 0.46+ 0.12+
S57 92 66 85 92 84 (11) 0.52+ 0.61+  -0.20-
S59 88 92 75 83 85 (6) 0.34- 0.59+ 0.51+
T60 77 77 105 79 85 (12) 0.82+ 0.85+ 0.21+
S62 62 48 15 42 (20) 0.32- 0.80+ 0.49+
S63 58 66 37 54 (12) 0.79+ 0.87+ 0.50+
S64 71 75 58 68 (7) 0.55+ 0.91+ 0.17+
S66 78 87 77 81 (4) 0.40- 0.82+ 0.87+
T70 79 107 71 86 (15) 0.56+ 0.48+ 0.28+
S73 86 40 67 64 (19) 0.48- 0.30+ 0.14+
T79 73 73 0.24- 0.27+  -0.02-
S80 118 118 0.26- 0.11-  -0.61-
Average 78

a OG represents a glycosylated Ser or Thr. Numbered columns represent different sequencing experiments.
b Hansen et al. (10).
c Elhammer et al. (9).
d Chou (11).

An examination of sequence determinations 1-5, Table I, reveals that Ser2, Ser14, Ser32, Ser47, Ser54, Ser62, and Ser63 are consistently the least glycosylated. As a group, 74% of the Ser residues are glycosylated compared with 83% for the Thr residues, values consistent with the 13C NMR data analysis (data not shown, see Ref. 26). Combined, 78% of the Ser and Thr residues are glycosylated (Table I).

The average sequence-specific glycosylation obtained from determinations 1-5 along with the O-glycosylation predictions of Hansen et al. (10) (NetOglyc activity values), Elhammer et al. (9) (h values) and Chou (11) (Delta  values) are tabulated in Table I. For the predictions, plus symbols to the right of the value indicate that the residue is predicted to be glycosylated based on the original published cutoff criteria, and minus symbols indicate the residue would not be glycosylated. As evident from the table and visually when the observed versus predicted glycosylations are plotted together (Fig. 7), the predictions do not completely agree with each other nor do they correlate well with the observed glycosylation.


Fig. 7. Plots of predicted versus observed extents of glycosylation for the individual Ser and Thr residues in the PSM tryptic tandem repeat. A, values of h (9) versus observed glycosylation (Table I). Residues with values of h greater than 0.19 (vertical dashed line) are predicted to be O-glycosylated. B, NetOglyc activity values (10) versus observed glycosylation (Table I). Residues with NetOglyc values greater than 0.5 (vertical dashed line) are predicted to be O-glycosylated. C, Delta  values (11) versus observed glycosylation (Table I). Residues with "Delta " values greater than 0 (vertical dashed line) are predicted to be glycosylated.
[View Larger Version of this Image (25K GIF file)]



DISCUSSION

O-Linked Glycoprotein Sequencing

These studies demonstrate that the partial deglycosylation of mucin-type O-linked glycoproteins by mild TFMSA yields glycoprotein derivatives with monosaccharide alpha -GalNAc side chains that are both susceptible to protease digestion and suitable for standard amino acid sequencing. Using this approach the heterogeneously glycosylated 81-residue PSM tryptic tandem repeat has been isolated and quantitatively sequenced, revealing its glycosylation pattern.

There are numerous reports of the use of automated Edman sequencing for the semiqualitative sequencing of O-linked glycoproteins and for the determination of the in vitro glycosylation patterns of the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase. Sites of glycosylation have typically been estimated by the presence of "blank" cycles (see for example Refs. 23, 35-36). Other studies of the GalNAc transferase have relied on the incorporation of radiolabeled UDP-GalNAc into substrate peptide and scintillation counting of the released products after Edman sequencing (see Refs. 9, 18-19, 34, 37). Few workers have attempted to chromatographically characterize or quantitatively analyze the resultant glycosylated Ser and Thr PTH derivatives obtained from standard Edman sequencing. Abernethy and co-workers (34) have described and partially characterized the alpha -GalNAc-O-Thr-PTH-derivative derived from the sequencing of a series of glycopeptide acceptors of the GalNAc transferase. Although they failed to demonstrate its use, these workers had suggested that a TFMSA-Edman sequencing approach would be useful for characterizing alpha -GalNAc-Thr containing glycopeptides.3 In the laboratory of Gooley and co-workers (38-41), an Edman sequencing protocol has been developed, using the more hydrophilic solvent trifluoroacetic acid, for the sequence analysis of O-linked glycoproteins containing intact oligosaccharides.4 Unfortunately, for most glycoproteins with intact oligosaccharide side chains, the presence of heterogeneous oligosaccharide structures complicates the sequence analysis. Thus, quantitation of the extent of glycosylation is still a difficult task. Another drawback of this approach is that the presence of full-length oligosaccharide side chains will interfere with protease digestions making the isolation of reasonably sized glycopeptides with homogeneous peptide sequences difficult. Therefore, for several reasons, the use of mild TFMSA to trim heterogeneous O-linked oligosaccharide side chains to homogeneous peptide-linked GalNAc residues followed by standard Edman sequencing may be the most effective approach for determining the primary glycosylation pattern of heavily O-glycosylated glycoprotein domains.

PSM Tandem Repeat Glycosylation Pattern

The glycosylation pattern obtained for the PSM tandem repeat reveals that all potential glycosylation sites can be glycosylated, although to different extents. Interestingly, the Ser residues display a wider range of observed glycosylations, ranging from about 30% to nearly 100%, whereas the Thr residues show a much narrower range of glycosylation, ranging between 70 and 90%. The extent of Ser O-glycosylation of the PSM tandem repeat is higher than expected based on the in vitro glycosylation studies of the isolated porcine UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase (14-15). With this enzyme and other GalNAc transferases isolated to date (see Refs. 9, 17-20, 35, 42-43), Ser residues are typically very poor substrates in in vitro glycosylation studies. In contrast, in vivo glycosylation studies reveal higher extents of O-glycosylation at both Ser and Thr (20). As expected the observed PSM glycosylation pattern is consistent with these observations.

As shown in Fig. 7, none of the three most recent peptide O-glycosylation predictive approaches were capable of successfully predicting the PSM tandem repeat glycosylation pattern. A comparison of the figure suggests that Elhammer's and Chou's (9, 11) predictions may be more useful for mucins as they correctly predicted the largest number of glycosylated residues. Unfortunately, none of the approaches performed well in predicting the poorly glycosylated residues, although all three approaches predicted the least glycosylated residue, Ser2, to be nonglycosylated. The inability for the predictions to reasonably predict the PSM glycosylation pattern may arise from several factors, most notably mixing the glycosylation patterns of globular proteins and mucin-like domains in constructing the algorithms, the lack of accurate and complete glycosylation data, and the possible existence of several tissue-/species-specific transferases with different substrate specificities (21-23).

Having available such a large data base in the PSM tandem repeat (i.e. a total of 31 different glycosylated Ser and Thr residues), an attempt was made to determine whether any specific patterns could be associated with a given sequence's degree of glycosylation. In Tables II and III we have listed, in order of increasing extent of glycosylation, the heptad peptide sequences for each Ser and Thr residue in the tandem repeat. In addition, due to the large number of Ser/Thr dyad sequences (9 pairs) and the presence of a single Ser triad, we have also listed their peptide sequences in Table IV for comparison.5

Table II.

PSM tandem repeat glycosylation-sequence correlations for Ser

Superscripts indicate absolute relative positions (+/-) to the glycosylated Ser.


Serine residue Observeda OG (S.D.) Ser-OG sequence
Ser/Thr dyadb +1-1 Gly +2-2 Gly +3-3 Gly Pro R/E I/V Ser/Thr Second strand consensusc
 -3-2-1 0 +1+2+3

%
S2 33 (2)    SRI S VAG + +2 +1+1 +3 cee e eec
S54 36 (8)    GTV S GAS + + +1 +2+3 eee e ecc
S62 42 (20)    STG S SSG ++ + + +1+2+2+3 ccc c ccc
S63 54 (12)    TGS S SGS ++ ++ +1+1+3+3 ccc c ccc
S47 61 (5)    VAG S GTT + +3 +3+3 eec c cec
S73 64 (19)    TGA S IGQ ++ +1 +3 ccc e ccc
S14 66 (4)    AVS S GAS + + +2 +1+3 ccc c ccc
S64 68 (7)    GSS S GSP ++ + + +3 +1+2+3 ccc c ccc
S32 71 (18)    TTA S SVG + + +2 +1+2+3 ccc c eec
S66 81 (4)    SSG S PGA + + +1 +2+3 ccc c ccc
S57 84 (11)    SGA S GST + + +2+3+3 eec c ccc
S59 85 (6)    ASG S TGS + + + +2+2+3 ccc c ccc
S17 86 (1)    SGA S QAA + +3 ccc c ccc
S33 88 (7)    TAS S VGV + + +1+3 +1+3 ccc e eee
S23 92 (3)    AGT S GAG + + + + +1 ctc c ccc
S13 93 (1)    PAV S SGA + + +3 +1 +1 ccc c ccc
S43 93 (1)    ARP S VAG + +1 +2 +1 ccc e eec
S7 94 (0)    AGS S GAP + + + +3 +1 ecc c ccc
S6 95 (2)    VAG S SGA + + + +3 +1 eec c ccc
S80d 100 (-)    PET S RIS + +3 +1+2 +2 +1+3 ccc c eee
Average total 74 Average %OG, +e 79 73 83 62 88  - 73  -
Average %OG, -e 67 76 63 81 68  - 76  -

a Data from Table II.
b ++ indicates a triad sequence.
c Predicted secondary structures; c, coil; e, extended b strand; t, turn (29).
d Single determination value, see Footnote 5.
e Average glycosylation value of sequences containing (+) or lacking (-) the indicated residue(s).

Table III.

PSM tandem repeat glycosylation-sequence correlations for Thr

Superscripts indicate absolute relative positions (+/-) to the glycosylated Thr.


Threonine residue Observeda OG (S.D.) Thr-OG sequence
Ser/Thr dyad +1-1 Gly +2-2 Gly +3-3 Gly Pro R/E V/I Ser/Thr Second strand consensusb
 -3-2-1 0 +1+2+3

%
T79c 73 (-)     QPE T SRI + +2 +2+1 +3 +2 ccc c cee
T52 78 (10)     TTG T VSG + + +1 +2+2+3 cee e eee
T29 79 (4)     GPG T TAS + + + +2 +1+3 ccc c ccc
T30 81 (2)     PGT T ASS + + +3 +1+2+3 ccc c ccc
T37 83 (4)     VGV T ETA + +1 +1+3 +2 eee e ecc
T50 83 (4)     SGT T GTV + + + +3 +1+2+3 cce c eee
T49 85 (5)     GSG T TGT + + + + +1+2+3 ccc e cee
T60 85 (12)     SGS T GSS + + + +1+2+3+3 ccc c ccc
T70 86 (15)     PGA T GAS + + +3 +3 ccc c ccc
T39 87 (5)     VTE T ARP +3 +2+1 +3 +2 eee e ccc
T22 90 (8)     AAG T SGA + + + +1 cct c ccc
Average 83 Average %OG, +d 82 83 85 80 78  - 81  -
Average %OG, -d 84 81 79 84 86  - 84  -

a Data from Table II.
b Predicted secondary structures; c, coil; e, extended b strand; t, turn (29).
c Single determination value, see Footnote 5.
d Average glycosylation value of sequences containing (+) or lacking (-) the indicated residue(s).

Table IV.

PSM tandem repeat dyad/triad glycosylation correlations

Superscripts indicate absolute relative positions (+/-) to the nearest glycosylated Ser or Thr.


Residue dyads Individuala OG
Minimumb OG Randomb OG Maximumb OG Sequence
+1-1 Gly +2-2 Gly +3-3 Gly Pro R/E I/V Ser/Thr Second strand consensusc
 -0 +0  -3-2--0+0 +1+2+3

% % % %
T79S80d 73 100 73 73 73    QPE TS RIS +2 +1+1 +2 +3 ccc cc eee
S32S33 71 88 59 61 71    TAA SS VGV + +1+3 +3 ccc ce eee
S13S14 93 66 59 61 66    PAV SS GAS + +3 +1 +3 ccc cc ccc
T29T30 79 81 60 64 79    GPG TT ASS + + +2 +2+3 ccc cc cce
T49T50 85 83 68 71 83    GSG TT GTV ++ + +3 +2+2 ccc ec eee
S59T60 85 85 70 72 85    ASG ST GSS ++ +2+2+3 ccc cc ccc
T22S23 90 92 82 83 90    AAG TS GAG ++ + cct cc ccc
S6S7 95 94 89 89 94    VAG SS GAP ++ +3 +3 eec cc ccc
Dyad average 84 86 70 72 80  Average ++e 79
 Average +/-f 65
Triad  -0, 0, +0  -3-2--00+0 +3+2+3
S62SS64 42 54 68 0 15 42    SIG SSS GSP ++ +3 +2+2+3 ccc ccc ccc

a Data from Table II.
b Extent that each dyad (triad) may be fully glycosylated; minimum, theoretical minimum value (full negative cooperativity between sites); random, value obtained if diglycosylation is random (no cooperativity); and maximum, theoretical maximum value (full positive cooperativity between sites).
c Predicted secondary structures; c, coil; e, extended b strand; t, turn (29).
d Values for Thr79 and Ser80 are single determinations, see Footnote 5.
e Average random glycosylation of sequences containing two Gly at the +1 and -1 positions.
f Average random glycosylation of sequences containing none or one Gly at the +1 or -1 positions.

An examination of the Ser peptide data (Table II) suggests several possible trends. Sequences with Gly at positions +2 or -2 from the potential site of glycosylation appear to be associated with a higher degree of glycosylation; Gly at positions +3 or -3 may be associated with reduced glycosylation, and Gly at positions +1 or -1 apparently are glycosylation neutral as shown by their average glycosylation values at the bottom of Tables II, III, IV. Sequences containing Pro are usually more heavily glycosylated, with the Pro typically located at the +3 or -3 positions, and sequences with high Ser/Thr contents appear to be more poorly glycosylated. No obvious correlation with the extent of glycosylation beyond the uniform presence of coil and extended secondary structures is found. The presence of large hydrophobic residues, Ile and Val, does not apparently directly correlate with the extent of glycosylation (however, their positions may be important as discussed below). These trends also appear for Thr and the dyad and triad peptide sequences (Tables III and IV), although since their ranges in glycosylation are much smaller the trends are less apparent. It is noteworthy that 7 of the 9 dyad sequences, Table IV, have at least 1 Gly residue at dyad positions +1 or -1 while 4 of the sequences have Gly residues at both +1 and -1 dyad positions.6

To examine the possibility that the observed extent of glycosylation could be rationalized in terms of specific peptide conformation and structure, we built models for the peptide sequences in Tables II, III, IV. An extended beta -conformation7 was found to best account for the nearest neighbor and penultimate positional effects of Gly. In a beta -conformation, nearest neighbor residues will have their side chains directed away from each other on opposite sides of the peptide backbone, whereas penultimate residues (at positions +2 or -2) will have their side chains adjacent to each other on the same side of the peptide backbone. Thus, the observation that penultimate Gly residues may favor glycosylation suggests that penultimate residues with larger side chains may interfere with glycosylation. Furthermore, since the side chains of residues at positions +1 and -1 would not be expected to sterically interfere with the Ser or Thr side chain, the presence or absence of residues containing bulky side chains at these positions would be expected to have little effect on glycosylation. Thus, Ser2, Ser54, and Ser47 which have N- and C-terminal penultimate residues with side chains are poorly glycosylated, whereas Ser66, Ser57, and Ser17 which have a single penultimate neighbor with side chains are more highly glycosylated. Ser73 having 2 penultimate Gly residues might also be expected to be highly glycosylated; however, it is only moderately glycosylated. The reduced glycosylation of Ser73 is consistent with the results of in vitro glycosylation studies that suggest the added flexibility of multiple Gly residues may reduce glycosylation (see Refs. 15 and 45). All of the single residue Thr sequences in Table III are highly O-glycosylated, and the ranking of Thr52, Thr37, and Thr70 would follow the order of decreasing penultimate residue side chain size. Ser43 and Thr39 appear to be exceptions to the "rule" in that they contain 2 penultimate residues with side chains and are nevertheless very highly glycosylated. We suggest that other factors, such as the presence of Pro residues in their sequences, may be responsible for enhancing their glycosylation, as discussed below.

The unexpectedly high glycosylation of Ser43 may be rationalized in terms of the conformational effects of -1 Pro. Model building reveals that the Pro residue preceding Ser43 can alter the peptide backbone conformation so that the -2 side chain would no longer be adjacent to the Ser side chain. The proposed conformational effects of -1 Pro may explain the elevated incidence for observing Pro at this position at known O-glycosylation sites (9-10, 16, 46). Experimentally, the introduction of a -1 Pro has been shown to enhance the in vitro glycosylation of a human von Willebrand factor dodecapeptide (