![]()
|
|
||||||||
(Received for publication, December 19, 1996, and in revised form, January 30, 1997)
From the W. A. Bernbaum Center for Cystic Fibrosis Research,
Departments of The heterogeneously glycosylated 81-residue
tryptic tandem repeat glycopeptide from porcine submaxillary mucin
(PSM) has been isolated and its glycosylation pattern determined by
amino acid sequencing. Key to these studies is the ability to
trim the structurally heterogeneous PSM oligosaccharide side chains to
homogeneous GalNAc monosaccharide side chains by mild
trifluoromethanesulfonic acid treatment. Trypsin treatment of
trifluoromethanesulfonic acid-treated PSM releases the 81-residue
tandem repeat as an ensemble of 81-residue glycopeptides with different
glycosylation patterns. Automated amino acid sequencing using Edman
degradative chemistry of the repeat was used to determine the extent of
glycosylation of nearly every Ser and Thr residue. The Thr residues are
all highly glycosylated within the range of 73-90%, giving an average
Thr glycosylation of 83%. In contrast, the Ser residues display a wide
range of glycosylations, ranging between 33 and 95%, giving an average Ser glycosylation of 74%. These data are consistent with the known elevated glycosylation of Thr peptides over Ser peptides for the porcine UDP-N-acetylgalactosamine:polypeptide
N-acetylgalactosaminyltransferase. It is also observed that
the extent of glycosylation of the repeat correlates poorly with
published predictive methods. An examination of the sequences
surrounding the glycosylation sites reveals that nearly all of the
highly glycosylated sites have a penultimate Gly residue, whereas those
that are less highly glycosylated have medium to large side chain
penultimate residues. As observed by others, glycosylation also appears
to be modulated by the presence of Pro residues. On the basis of these
findings we suggest that the acceptor peptide binds the transferase in
a Mucin glycoproteins are heavily O-glycosylated
glycoproteins secreted by higher organisms that serve vital roles,
protecting and lubricating epithelial cell surfaces from biological,
chemical, and mechanical insult. Mucins and mucin-like molecules
attached to membrane and cell surfaces play additional important roles by modulating for example immune response, inflammation, and
tumorigenesis (1-5). The O-glycosylated domains of mucins
and mucin-like glycoproteins typically contain 50-80% carbohydrate
and possess expanded conformations. These regions typically contain
high Ser and Thr contents, commonly composed of polypeptide tandem
repeats containing clusters of Ser and Thr residues. It has been
demonstrated, in the case of mucins, that the O-linked
oligosaccharide side chains, attached via
Several predictive methods, based on the analysis of reported
O-glycosylation sites compiled from protein data bases, are available for estimating the relative propensity for a given Ser or Thr
to be glycosylated (9-11). Application of these predictive methods to
the highly glycosylated mucins may not be fully valid because mucins
and mucin-like molecules were not a significant component of the
"training" data sets, which were dominated by globular
glycoproteins, and where presented, the data on mucins and mucin-like
glycoproteins glycosylation are commonly incomplete or in error.
Furthermore, the propensity for O-glycosylation of Ser and
Thr residues at the surface of globular glycoproteins may vary
considerably from those found in mucin-like glycoproteins, which
presumably serve different structural functions.
In vitro O-glycosylation studies using synthetic peptide
acceptors have also been utilized to determine the propensity for a
given Ser and Thr to be O-glycosylated (12-19).
Unfortunately, the isolated peptide
To begin to address the structural effects of
O-glycosylation and to determine the extent that mucin
O-glycosylation is modulated in vivo in a
site-specific manner, we have undertaken the isolation and
characterization of the porcine submaxillary gland mucin
(PSM)-glycosylated tandem repeat. We have determined the glycosylation
pattern of the isolated 81-residue tandem repeat and have isolated and
characterized several smaller PSM tandem repeat-derived glycopeptides
(24). This work provides the first detailed analysis of the
glycosylation pattern of a mucin and suggests that mucin glycosylation
is modulated by peptide sequence but not entirely as expected by the
existing O-glycosylation prediction algorithms. Based on the
observed glycosylation pattern a model for the GalNAc transferase
peptide binding site has been proposed.
Materials
Methods
Porcine submaxillary gland mucin was
obtained from frozen porcine submaxillary glands, in gram quantities,
as described by Shogren et al. (25).
PSM
was reduced and carboxymethylated by the methods of Gupta and Jentoft
(32) giving R-PSM. R-PSM (1 g/50 ml) was digested with
L-1-tosylamido-2-phenylethyl chloromethyl ketone-trypsin (15 mg/1 g R-PSM) (Worthington) overnight at 37 °C in 50 mM ammonium bicarbonate, pH 8.3. A second aliquot of
trypsin was added to ensure complete digestion and incubated 5-8 h.
Toluene was added to both incubation solutions to prevent microbial
growth. After exhaustive dialysis and the removal of insoluble debris
by centrifugation, trypsinized R-PSM (TR-PSM) was lyophilized.
Lyophilized and
dissector-dried TR-PSM (~0.75 g in 75-ml Teflon screw-cap tubes) was
reacted for 6-16 h with a TFMSA (50 g) anisole (15 ml) (Aldrich)
mixture at 0 °C following the approach of Gerken et al.
(26, see also Ref. 27). To reduce heating effects, both the reagents
and lyophilized TR-PSM were chilled in dry ice/ethanol prior to their
mixing. After incubation with occasional vigorous shaking, the reaction
was again chilled, and 1 volume of cold anhydrous diethyl ether was
added. This mixture was slowly added to 125 ml of a frozen slush of
60% pyridine, after which the solution was warmed to room temperature
and extracted with ether. The aqueous phase containing the partially
deglycosylated TR-PSM (TTR-PSM) was dialyzed exhaustively and
lyophilized.
Low molecular weight non-glycosylated peptides were separated from the
glycosylated tandem repeat subunits by gel filtration chromatography on
Sephacryl S200 (Pharmacia Biotech, Uppsala, Sweden) (column dimensions
5 × 55 cm, 7-ml fraction volumes) eluted with 50 mM
ammonium bicarbonate buffer. Glycoprotein content was monitored by
periodic acid-Schiff reagent (28), absorbance at 555 nm, and protein
monitored by the absorbance at 220 nm.
The high molecular weight carbohydrate-containing fraction eluting near
the void volume of the S200 column was lyophilized and treated a second
time with L-1-tosylamido-2-phenylethyl chloromethyl ketone-trypsin (10 mg/g TTR-PSM) using the conditions described for the
initial trypsin treatment. The digested TTR-PSM (TTTR-PSM) was
fractionated on S200, and the PSM tandem repeats were isolated as the
major included glycopeptide fraction (TTTR-PSM-T3) and lyophilized.
The TTTR-PSM-T3 tandem
repeat (30 mg) in 5 ml of 25 mM ammonium bicarbonate, pH
7.8, was digested with 1 mg of protease Glu-C (Boehringer Mannheim) for
20 h at 25 °C. After a second addition of Glu-C and further
digestion, the mixture was fractionated by S200 chromatography. The
major glycopeptide peak (GTTTR-PSM) was separated into the major N- and
C-terminal tandem repeat peptides on reverse phase HPLC.
The TTTR-PSM-T3 tandem repeat and Glu-C-digested repeat
(GTTTR-PSM) were further purified by reverse phase HPLC chromatography on a 0.46 × 15 cm C18 ODSII column (Alltech Associates Inc.,
Dearfield IL) using 0.05% trifluoroacetic acid, water/acetonitrile
gradients as described in the figure legend. All isolations were
performed on a Varian 5000 HPLC system (Varian Associates, Walnut Creek CA) equipped with a Schamitsu UV/VIS detector.
Pulsed liquid phase Edman degradation
amino acid sequencing of the isolated PSM tandem repeats and
Glu-C-derived glycopeptide was performed on either an Applied
Biosystems 477A or Applied Biosystems Procise 494 protein sequencer
(Perkin-Elmer) typically using standard manufacturer recommended
pulsed-liquid cycles (24). Samples of 2000-5000 pmol were dried on
trifluoroacetic acid washed glass fiber filters (ABI number 401111)
spotted with 1.5 mg of BioBrene Plus (ABI 400385). Amino acid
phenylthiohydantoin (PTH) derivatives were chromatographed on standard
ABI 5-µm C18 PTH columns using the Fast Normal 1 gradient program and
were monitored by the absorbance at 269 nm. The
PTH-Thr/Ser-O-GalNAc derivatives were found to elute as two
diastereotopic peaks at unique positions in the chromatogram (24). Due
to increased peak broadening and changes in elution position as the PTH
columns age, some variability and overlap of the glycosylated PTH
derivatives with Ser, Thr, Gly, and Asp PTH derivatives were commonly
observed. Since the ratio of the areas of the diastereotopic
PTH-Ser/Thr-O-GalNAc derivatives was found to be relatively
constant (typically 45/55 for Thr), the extent of glycosylation usually
could be determined by the use of simple algebra. Modifications in the
HPLC gradient were found to produce only marginal improvement in the
separation of the PTH-Ser/Thr-GalNAc derivatives from the elution
positions of neighboring PTH-derivatives. Prior to data analysis, long
term cycle preview and lag for each PTH-derivative was eliminated by a
base-line subtraction approach. This was performed by subtracting the
5-cycle minimum value running average across the entire sequencing run
for each peak. Due to the length of the peptides and high content of
similar amino acids (i.e. Gly, Ser, Thr, and Ala), no
attempts were made to include quantifications for adjacent residue
cycle lag or preview. Response factors for the
PTH-Ser/Thr-O-GalNAc derivatives were obtained from shorter
glycopeptide sequencing experiments (24) and were found to be similar
to those of Ser and Thr. Response values for Ser, Thr, and their
glycosylated PTH-derivatives were further adjusted for each sequencing
run to obtain consistent picomole yields relative to the non-Ser/Thr residues.
The PSM tandem repeat sequence was analyzed for
potential O-glycosylation sites using software kindly
provided by Dr. A. Elhammer (9) and by using the E-mail
NetOglyc server of Hansen et al. (10). The
sequence coupled vector projection predictions were kindly performed by
Dr. K. Chou (11). Peptide secondary structure predictions were
performed by the SOPM internet server (29). Predictions were performed
for at least 10 residues beyond the tandem repeat N and C-terminal
boundaries to eliminate end effects.
Peptides were modeled using the Biopolymer
module of InsightII (MSI Inc. formally Biosym Technologies, San Diego
CA).
The structure of
porcine submaxillary mucin (PSM) polypeptide based on the nucleotide
sequencing of Timpte et al. (30) and Eckhardt et
al. (31) is shown in Fig. 1A. The
mucin's structure is dominated by the presence of highly
O-glycosylated, multiple repeating 81-residue tandem
repeats. These repeats make up the vast majority of the mucin's amino
acid sequence and represent the major glycosylation sites in PSM.
There are two potential
trypsin cleavage sites, Arg-Ile and Arg-Pro, in each PSM tandem repeat.
The Arg-Pro site, however, is expected to be inactive; therefore,
trypsin treatment is expected to yield single copies of the 81-residue
tandem repeat with the sequence given in Fig. 1B.
Indeed, Timpte and co-workers (30) have shown that after full
deglycosylation and digestion of the fully deglycosylated (apo) mucin
with trypsin, this tryptic peptide is obtained. In contrast, Gupta and
Jentoft (32) have shown that trypsin treatment of fully glycosylated,
reduced and carboxymethylated mucin fails to yield single copies of the
repeat and instead yields relatively high molecular weight
glycopeptides composed of undigested tandem repeats. Apparently, the
longer oligosaccharide side chains inhibit the digestion of the PSM
polypeptide by trypsin, preventing the isolation of individual tandem
repeats. The Sephacryl S200 gel filtration chromatogram of this
species, TR-PSM, is given in Fig. 2A.
Only after quantitatively trimming the oligosaccharide side chains to
the peptide-linked The suspected monomeric glycosylated tryptic tandem repeat, pooled as
indicated in Fig. 2C, gave a single sharp peak after rechromatography on S200 as shown in Fig. 2D (
Since each step in the above procedure typically yields a single major
glycopeptide species that is pooled for the subsequent step, the
obtained tandem repeat glycosylation pattern (see below) is expected to
represent the majority of the PSM tandem repeats of the native mucin.
After correcting for the different carbohydrate contents of native and
TFMSA-modified PSM (26), it is calculated that between 40 and 50%
(uncorrected for the nonspecific losses of material at each step) of
the initial TR-PSM peptide is isolated in the pooled TTTR-PSM-T3
fraction. The possibility exists, however, that a subpopulation of
differently glycosylated species may have been excluded, since the
lower molecular weight regions of the glycopeptide peaks in Fig.
2, B and C, were not pooled. These lower
molecular weight species are thought to represent nonspecific (protease
and TFMSA) degradation products of the tandem repeat and other
non-tandem repeat glycopeptides arising from the N- and C-terminal
domains of PSM (see Fig. 1A). Evidence for nonspecific protease degradation arises from the observation that the use of
non-L-1-tosylamido-2-phenylethyl chloromethyl
ketone-treated trypsin gives a T3 peak that is further broadened to
lower molecular weight. By eliminating these lower molecular weight
species we are able to reduce the background sequencing "noise,"
thereby permitting the sequencing of longer segments of the tandem
repeat glycopeptide (discussed below). We believe, however, that these degradation products would not have significantly different
glycosylation patterns compared with the intact tandem repeat.
The PSM tandem repeat contains three
potential cleavage sites for endoproteinase Glu-C from
Staphylococcus aureus V8 (cleaving at C terminus of Glu).
Digestion of the tryptic PSM tandem repeat with Glu-C will therefore
produce three glycopeptides of 40, 38, and 3 residues each, proceeding
from the N to C terminus, respectively. We chose to isolate the
38-residue glycopeptide (residues 39-78) since its analysis would
further help confirm the glycosylation pattern of the C-terminal half
of the tandem repeat.
The HPLC-purified tryptic tandem repeat (Fig. 3A) was
digested with Glu-C and fractionated on S200 chromatography, Fig.
2D. As shown in the figure the Glu-C digest ( On reverse phase HPLC the pooled product of the Glu-C digest (pooled as
indicated in Fig. 2D, Amino acid sequencing was performed on the reverse
phase HPLC-purified tandem repeat glycopeptides, TTTR-PSM-T3 and
GTTTR-PSM-GII. As described earlier (24, 34-35), unique elution
patterns are observed for the PTH-derivatives of
Sequencing chromatograms were quantified to obtain the residue-specific
extent of glycosylation as illustrated in Figs. 5 and
6 for sequencing run 2 of TTTR-PSM-T3 tandem repeat
glycopeptide. Fig. 5A displays the uncorrected area data for
the Gly-PTH peak plotted as a function of sequence cycle. The figure
shows a pronounced base-line curvature that is also observed for the
other amino acid residues and glycosylated Ser/Thr (data not shown).
Since the extent of base-line curvature correlated with the residue's percent mole fraction, we conclude that the curvature is due to the
cumulative effects of cycle preview and lag and perhaps due to
heterogeneous cleavage of the tandem repeat peptide. Since non-zero
base lines will interfere with the accurate determination of the extent
of glycosylation and with the sequence determination at high cycle
numbers, we eliminated the curvature by the base-line subtraction
approach described under "Experimental Procedures." The
effectiveness of the base-line correction approach is shown in the
corrected data for Gly, Ile, Val, and Ala of Fig. 5, C-F, and for glycosylated and nonglycosylated Ser and Thr of Fig. 6, A-D. Note that after base-line correction the sequence can
be read well beyond residue 60 as demonstrated by the expanded plots of
Fig. 5, D and F, and Fig. 6, B and
D. A plot of the single residue picomoles recovered
versus cycle number, Fig. 5B, indicates that we
have achieved reasonable sequential residue quantification and
recovery. An average apparent repetitive yield of 99% is obtained from
the data in Fig. 5B.
Table I lists the calculated extent of Ser/Thr
glycosylation obtained from the multiple sequencing of the PSM tryptic
tandem repeat (TTTR-PSM-T3) and its C-terminal Glu-C glycopeptide
(GTTTR-PSM-GII). Sequence determinations 1 and 2 represent data from
the same sample sequenced on different instruments. Note the excellent
agreement between the two sequencing experiments. Sequence
determination 3 represents the tandem repeat obtained from a different
PSM preparation. Again, the results of determination 3 are nearly
indistinguishable from the results of sequence determinations 1 and 2. The glycosylation patterns of the C-terminal Glu-C tryptic tandem
repeat glycopeptide, determinations 4 and 5, also are in good agreement
with the data from the full tryptic tandem repeat.
Sequence-specific glycosylation patterns of the PSM tandem repeat
The abbreviations used are: TTTR-PSM-T3, oligosaccharide trimmed PSM
tryptic tandem repeat, residues 1-81, HPLC purified in Fig.
3A; GTTTR-PSM-GII, protease Glu-C-cleaved TTTR-PSM-T3,
residues 39-78, purified as shown in Fig. 3B. For further explanation
see text.
Volume 272, Number 15,
Issue of April 11, 1997
pp. 9709-9719
©1997 by The American Society for Biochemistry and Molecular Biology, Inc.
MODEL PROPOSED FOR THE POLYPEPTIDE:GalNAc TRANSFERASE PEPTIDE
BINDING SITE*
§¶,
and
Pediatrics, § Biochemistry, and
Molecular and Microbiology, Case Western Reserve University,
Cleveland, Ohio 44106-4948
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES
-like conformation and that penultimate residue side chain steric
interactions may play a role in determining extent that a given Ser or
Thr is glycosylated. A model for the GalNAc transferase peptide binding
site is proposed.
-N-acetylgalactosamine
(GalNAc)1 to Ser and Thr, are solely
responsible for their 3-fold expanded peptide chain dimensions (6).
Chemical and NMR studies indicate that on average 75% or more of the
Ser and Thr residues in mucins are glycosylated (7, 8). Little,
however, is known of the actual distribution of the carbohydrate in
these clusters along the mucin polypeptide core. In addition, it is
unknown whether the distribution of oligosaccharides along the peptide
core is random or whether specific Ser/Thr residues are preferentially glycosylated over others. Obtaining this information by peptide mapping
approaches would be a significant analytical undertaking due to the
vast array of glycopeptides with different glycosylation patterns and
oligosaccharide structures that would be needed to be quantitatively
isolated and characterized.
-N-acetylgalactosaminyltransferases give different
extents of Ser/Thr glycosylation on peptide substrates compared with
that observed in vivo (14, 20). This may be the result of
altered enzyme specificity as a result of inappropriate solution
conditions, absence of cofactors, the presence of more than one
transferase (21-23), or the result of the artifacts resulting from the
use of relatively small peptides containing charged N and C termini.
Only by characterizing and quantifying the specific in vivo
glycosylation of native and/or expressed recombinant proteins and
peptides (20) is it likely that valid and useful data will be obtained
for determining the true in vivo transferase
specificities.
-GalNAc-Thr and
-GalNAc-Ser were a kind gift of R. Koganti, Biomera Inc. Edmonton, Alberta, Canada. Except where noted, all chemicals and enzyme reagents were obtained from
Sigma or Fisher.
Isolation of PSM Tandem Repeat Glycopeptides
Fig. 1.
Polypeptide structure and tryptic tandem
repeat sequence of the porcine submaxillary gland mucin (PSM).
A, model of the PSM polypeptide based on the nucleotide
sequencing of Timpte et al. (30) and Eckhardt et
al. (31) and the biophysical studies of Gupta and Jentoft (32),
Shogren et al. (6), and Perez-Vilar et al. (33).
The PSM polypeptide consists of a very long highly extended
O-glycosylated domain containing multiple copies,
n, of an 81-residue tandem repeat which is followed by a
relatively small, poorly glycosylated C-terminal domain. The
glycosylated domain consists of several thousand residues comprising 25 or more tandem repeats (31). The C-terminal (and perhaps the
N-terminal) domains have been shown to dimerize, m, thus
accounting for the very large size of native mucin (33). B,
the sequence of the tryptic PSM tandem repeat.
[View Larger Version of this Image (21K GIF file)]
Fig. 2.
Sephacryl S200 gel filtration chromatography
showing the isolation of the PSM tryptic tandem repeat and its
digestion by Glu-C. For A-C protein content
is monitored by the absorbance at 229 nm (
) and carbohydrate content
by periodic acid-Shiff, absorbance at 555 nm (
). A, S200
chromatogram of trypsinized, reduced, and carboxymethylated PSM
(TR-PSM). B, chromatogram of TR-PSM after partial
deglycosylation by TFMSA at 0 °C giving TTR-PSM (see "Experimental
Procedures"). C, S200 chromatography of the indicated
TTR-PSM fraction in B after digestion with trypsin, yielding
TTTR-PSM. The third peak, T3, represents the tryptic glycosylated PSM tandem repeat. D, S200 chromatograph of the
glycosylated tryptic tandem repeat, T3, from C (
,
absorbance at 229 nm) and the chromatogram of Glu-C digested T3
(isolated on reverse phase HPLC, Fig. 3A) giving GTTTR-PSM
(
, absorbance at 229 nm).
[View Larger Version of this Image (24K GIF file)]
-GalNAc residue by mild TFMSA treatment (26-27),
Fig. 2B, does the PSM glycopeptide core become susceptible to cleavage by trypsin, Fig. 2C, presumably releasing
monomeric glycosylated tryptic tandem repeats (peak
T3).2 Mild TFMSA treatment does not
appreciably degrade the mucin, since on gel filtration untreated and
treated mucin (TR-PSM and TTR-PSM respectively) contain similar high
molecular weight glycosylated peaks, Fig. 2, A and
B. Integrations of the
-carbon resonances of the
13C NMR spectra of native and TTR-PSM confirm that 96 to
97% of the glycosylated Ser and Thr residues of the native mucin
retain intact unsubstituted
-GalNAc residues after mild TFMSA
treatment (data not shown, see Ref. 26).
). On
reverse phase HPLC, Fig. 3A, this peak gives
a single somewhat broadened peak, representing the ensemble of
heterogeneously glycosylated tandem repeats. This peak was pooled for
amino acid sequencing and for subsequent digestion by Glu C (discussed
below). Amino acid sequencing of this peak confirmed the isolation of
the PSM tryptic tandem repeat.
Fig. 3.
Reverse phase HPLC chromatography of the
tryptic PSM tandem repeat before and after Glu-C digestion.
A, chromatograph of the isolated tryptic tandem repeat
(TTTR-PSM-T3) pooled as indicated in Fig. 2C. B,
chromatograph of the Glu-C-digested tryptic tandem (GTTTR-PSM) repeat
pooled as indicated in Fig. 2D giving peaks GTTTR-PSM-GII
and GTTTR-PSM-GIII. Solvent gradients for A and B
were as follows: 0 min, 20% solvent B and 20 min, 50% solvent B. Solvent A, 100% water, 0.05% trifluoroacetic acid; buffer B, 50%
acetonitrile, 50% water, 0.05% trifluoroacetic acid. Vertical
scale represents the absorbance at 220 nm.
[View Larger Version of this Image (9K GIF file)]
) migrates
at a lower molecular weight than undigested tryptic repeat (
). The
40- and 38-residue glycopeptides are not expected to be resolved on
S200; therefore, the production of a single sharp peak after Glu-C
digestion indicates the repeat has been fully cleaved by the enzyme.
The remaining 3-residue glycopeptide was not specifically isolated or
identified but presumably appears near the included volume of the
column approximately fractions 120-140.
) reveals a complex pattern comprised of two major broad peaks, labeled (GTTTR-PSM)-GII and (GTTTR-PSM)-GIII, as shown in Fig. 3B. Peak GII elutes at a
lower acetonitrile content than the intact tryptic tandem repeat,
whereas peak GIII elutes at a similar acetonitrile content as the
intact tryptic tandem repeat. On the basis of the expected differences in hydrophobicities, the least hydrophobic fraction, GII, was tentatively assigned to the least hydrophobic glycopeptide 39-78 (containing 11 hydrophobic residues: Ala, Pro, Val, and Ile), and the
more hydrophobic fraction, GIII, was tentatively assigned to
glycopeptide 1-38 (containing 15 hydrophobic residues). Proton NMR
spectroscopy, at 600 MHz, of each pooled fraction confirmed their
identities based on their different amino acid compositions (data not
shown). Fraction GTTTR-PSM-GII was unambiguously identified as residues
39-78 on the basis of its amino acid sequence, presented below.
-GalNAc-Ser and
-GalNAc-Thr as shown in Fig. 4A for
authentic
-GalNAc-Ser and
-GalNAc-Thr. Each glycosylated
PTH-derivative appears as a pair of peaks in the chromatogram (Fig.
4A) because the conversion reaction forming the amino
acid-PTH-derivative produces diastereomers with different retention
times. The
-GalNAc-Ser-PTH-derivatives elute as an unresolved
doublet, labeled S*+S**, early in the gradient near the
position of PTH-Asp, and the
-GalNAc-Thr-PTH diastereomers elute
later in the gradient as resolved peaks T* and
T**, near the positions of PTH-Ser and PTH-Thr,
respectively. These peaks are readily identified in the sequencing
chromatograms of glycopeptide TTTR-PSM-T3 for residues
Ser2, Ser6, and Thr22 as shown in
Fig. 4, B-D. On the basis of the relative sizes
of the glycosylated and nonglycosylated PTH-Ser/Thr derivatives in the
TTTR-PSM-T3, it appears that Ser6 and Thr22 are
more highly glycosylated than Ser2, which seems to be
poorly glycosylated.
Fig. 4.
Representative amino acid sequencing
chromatograms of the
-GalNAc-Ser and
-GalNAc-Thr standards and
TFMSA-treated PSM tryptic tandem repeats. A, chromatogram of
PTH-derivatized
-GalNAc-Ser and
-GalNAc-Thr on the Applied
Biosystems Procise 494 protein sequencer as described under
"Experimental Procedures." B-D, chromatograms for
cycles 2, 6, and 22 representing residues Ser2,
Ser6, and Thr22 from the amino acid sequence
determination 2 (Table I) of the HPLC-purified PSM tryptic tandem
repeat (TTTR-PSM- T3). Vertical scale represents absorbance
at 269 nm.
[View Larger Version of this Image (23K GIF file)]
Fig. 5.
Representative amino acid sequencing profiles
for the 81-residue tryptic PSM tandem repeat glycopeptide TTTR-PSM-T3.
A, plot of the uncorrected peak area data for Gly-PTH as a
function of cycle number. B, plot of the base-line corrected
specific residue picomole content versus cycle number.
C and D, base-line corrected picomole plots for
Gly-PTH at 1 and 10 × vertical scales, respectively. E
and F, base-line corrected picomole plots for Ala-PTH (
),
Val-PTH (+), and Ile-PTH (
) at 1 and 10 × vertical
scales, respectively. Data are taken from sequence determination 2 (Table I) obtained on an Applied Biosystems Procise 494 Peptide
Sequencer.
[View Larger Version of this Image (39K GIF file)]
Fig. 6.
Ser and Thr residue sequencing profiles for
the 81-residue tryptic PSM tandem repeat glycopeptide TTTR-PSM-T3.
A and B, base-line corrected picomole plots for
PTH-Ser-OH (+) and PTH-Ser-O-GalNAc (
) at 1 and 10 × vertical scales, respectively. C and
D, base-line corrected picomole plots for PTH-Thr-OH (+) and
PTH-Thr-O-GalNAc (
) at 1 and 10 × vertical
scales, respectively. Data are taken from sequence determination 2 (Table I) obtained on an Applied Biosystems Procise 494 Peptide
Sequencer.
[View Larger Version of this Image (40K GIF file)]
Residue
Observed Ser/Thr
glycosylationa
Predicted Ser/Thr glycosylation
TTTR-PSM-T3 OG
GTTTR-PSM-GII
OG
Average OG (S.D.)
NetOglyc
valueb
h valuec
valued
1
2
3
4
5
%
%
%
S2
31
33
35
33 (2)
0.29

0.11

0.45
S6
97
92
95
95 (2)
0.17

0.55+
0.01+
S7
94
94
95
94 (0)
0.58+
0.50+
0.26+
S13
94
92
94
93 (1)
0.69+
0.75+
0.43+
S14
62
65
71
66 (4)
0.12

0.52+
0.16+
S17
87
85
87
86 (1)
0.48

0.25+
0.41+
T22
101
83
85
90 (8)
0.75+
0.17

0.07
S23
96
88
91
92 (3)
0.84+
0.32+
0.33
T29
75
82
79 (4)
0.32

0.78+
0.00+
T30
79
80
83
81 (2)
0.88+
0.83+
0.94+
S32
96
62
56
71 (18)
0.60+
0.47+
0.44+
S33
97
88
79
88 (4)
0.24

0.59+
0.65+
T37
88
79
83
83 (4)
0.88+
0.15

0.12
T39
91
81
81
92
90
87 (5)
0.45

0.56+
0.35+
S43
92
92
92
95
93 (1)
0.52+
0.17

0.89+
S47
67
62
54
59
61 (5)
0.27

0.45+
0.02
T49
81
81
93
83
85 (5)
0.20

0.59+
0.18
T50
80
83
90
80
83 (4)
0.74+
0.66+
0.60+
T52
73
80
92
65
78 (10)
0.34

0.38+
0.08+
S54
45
23
39
35
36 (8)
0.42

0.46+
0.12+
S57
92
66
85
92
84 (11)
0.52+
0.61+
0.20
S59
88
92
75
83
85 (6)
0.34

0.59+
0.51+
T60
77
77
105
79
85 (12)
0.82+
0.85+
0.21+
S62
62
48
15
42 (20)
0.32

0.80+
0.49+
S63
58
66
37
54 (12)
0.79+
0.87+
0.50+
S64
71
75
58
68 (7)
0.55+
0.91+
0.17+
S66
78
87
77
81 (4)
0.40

0.82+
0.87+
T70
79
107
71
86 (15)
0.56+
0.48+
0.28+
S73
86
40
67
64 (19)
0.48

0.30+
0.14+
T79
73
73
0.24

0.27+
0.02
S80
118
118
0.26

0.11

0.61
Average
78
a
OG represents a glycosylated Ser or Thr. Numbered
columns represent different sequencing experiments.
b
Hansen et al. (10).
c
Elhammer et al. (9).
d
Chou (11).
An examination of sequence determinations 1-5, Table I, reveals that Ser2, Ser14, Ser32, Ser47, Ser54, Ser62, and Ser63 are consistently the least glycosylated. As a group, 74% of the Ser residues are glycosylated compared with 83% for the Thr residues, values consistent with the 13C NMR data analysis (data not shown, see Ref. 26). Combined, 78% of the Ser and Thr residues are glycosylated (Table I).
The average sequence-specific glycosylation obtained from
determinations 1-5 along with the O-glycosylation
predictions of Hansen et al. (10) (NetOglyc
activity values), Elhammer et al. (9) (h values)
and Chou (11) (
values) are tabulated in Table I. For the
predictions, plus symbols to the right of the value indicate that the
residue is predicted to be glycosylated based on the original published
cutoff criteria, and minus symbols indicate the residue would not be
glycosylated. As evident from the table and visually when the observed
versus predicted glycosylations are plotted together (Fig.
7), the predictions do not completely agree with each
other nor do they correlate well with the observed glycosylation.
values (11) versus observed glycosylation (Table I).
Residues with "
" values greater than 0 (vertical dashed
line) are predicted to be glycosylated.
These studies demonstrate
that the partial deglycosylation of mucin-type O-linked
glycoproteins by mild TFMSA yields glycoprotein derivatives with
monosaccharide
-GalNAc side chains that are both susceptible to
protease digestion and suitable for standard amino acid sequencing.
Using this approach the heterogeneously glycosylated 81-residue PSM
tryptic tandem repeat has been isolated and quantitatively sequenced,
revealing its glycosylation pattern.
There are numerous reports of the use of automated Edman sequencing for
the semiqualitative sequencing of O-linked glycoproteins and
for the determination of the in vitro glycosylation patterns of the UDP-GalNAc:polypeptide
N-acetylgalactosaminyltransferase. Sites of glycosylation
have typically been estimated by the presence of "blank" cycles
(see for example Refs. 23, 35-36). Other studies of the GalNAc
transferase have relied on the incorporation of radiolabeled UDP-GalNAc
into substrate peptide and scintillation counting of the released
products after Edman sequencing (see Refs. 9, 18-19, 34, 37). Few
workers have attempted to chromatographically characterize or
quantitatively analyze the resultant glycosylated Ser and Thr PTH
derivatives obtained from standard Edman sequencing. Abernethy and
co-workers (34) have described and partially characterized the
-GalNAc-O-Thr-PTH-derivative derived from the sequencing of a series of glycopeptide acceptors of the GalNAc transferase. Although they failed to demonstrate its use, these workers had suggested that a TFMSA-Edman sequencing approach would be useful for
characterizing
-GalNAc-Thr containing
glycopeptides.3 In the laboratory of Gooley
and co-workers (38-41), an Edman sequencing protocol has been
developed, using the more hydrophilic solvent trifluoroacetic acid, for
the sequence analysis of O-linked glycoproteins containing
intact oligosaccharides.4 Unfortunately,
for most glycoproteins with intact oligosaccharide side chains, the
presence of heterogeneous oligosaccharide structures complicates the
sequence analysis. Thus, quantitation of the extent of glycosylation is
still a difficult task. Another drawback of this approach is that the
presence of full-length oligosaccharide side chains will interfere with
protease digestions making the isolation of reasonably sized
glycopeptides with homogeneous peptide sequences difficult. Therefore,
for several reasons, the use of mild TFMSA to trim heterogeneous
O-linked oligosaccharide side chains to homogeneous
peptide-linked GalNAc residues followed by standard Edman sequencing
may be the most effective approach for determining the primary
glycosylation pattern of heavily O-glycosylated glycoprotein
domains.
The glycosylation pattern obtained for the PSM tandem repeat reveals that all potential glycosylation sites can be glycosylated, although to different extents. Interestingly, the Ser residues display a wider range of observed glycosylations, ranging from about 30% to nearly 100%, whereas the Thr residues show a much narrower range of glycosylation, ranging between 70 and 90%. The extent of Ser O-glycosylation of the PSM tandem repeat is higher than expected based on the in vitro glycosylation studies of the isolated porcine UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase (14-15). With this enzyme and other GalNAc transferases isolated to date (see Refs. 9, 17-20, 35, 42-43), Ser residues are typically very poor substrates in in vitro glycosylation studies. In contrast, in vivo glycosylation studies reveal higher extents of O-glycosylation at both Ser and Thr (20). As expected the observed PSM glycosylation pattern is consistent with these observations.
As shown in Fig. 7, none of the three most recent peptide O-glycosylation predictive approaches were capable of successfully predicting the PSM tandem repeat glycosylation pattern. A comparison of the figure suggests that Elhammer's and Chou's (9, 11) predictions may be more useful for mucins as they correctly predicted the largest number of glycosylated residues. Unfortunately, none of the approaches performed well in predicting the poorly glycosylated residues, although all three approaches predicted the least glycosylated residue, Ser2, to be nonglycosylated. The inability for the predictions to reasonably predict the PSM glycosylation pattern may arise from several factors, most notably mixing the glycosylation patterns of globular proteins and mucin-like domains in constructing the algorithms, the lack of accurate and complete glycosylation data, and the possible existence of several tissue-/species-specific transferases with different substrate specificities (21-23).
Having available such a large data base in the PSM tandem repeat (i.e. a total of 31 different glycosylated Ser and Thr residues), an attempt was made to determine whether any specific patterns could be associated with a given sequence's degree of glycosylation. In Tables II and III we have listed, in order of increasing extent of glycosylation, the heptad peptide sequences for each Ser and Thr residue in the tandem repeat. In addition, due to the large number of Ser/Thr dyad sequences (9 pairs) and the presence of a single Ser triad, we have also listed their peptide sequences in Table IV for comparison.5
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
An examination of the Ser peptide data (Table II) suggests several
possible trends. Sequences with Gly at positions +2 or
2 from the
potential site of glycosylation appear to be associated with a higher
degree of glycosylation; Gly at positions +3 or
3 may be associated
with reduced glycosylation, and Gly at positions +1 or
1 apparently
are glycosylation neutral as shown by their average glycosylation
values at the bottom of Tables II, III, IV. Sequences containing Pro are
usually more heavily glycosylated, with the Pro typically located at
the +3 or
3 positions, and sequences with high Ser/Thr contents
appear to be more poorly glycosylated. No obvious correlation with the
extent of glycosylation beyond the uniform presence of coil and
extended secondary structures is found. The presence of large
hydrophobic residues, Ile and Val, does not apparently directly
correlate with the extent of glycosylation (however, their positions
may be important as discussed below). These trends also appear for Thr
and the dyad and triad peptide sequences (Tables III and IV), although
since their ranges in glycosylation are much smaller the trends are
less apparent. It is noteworthy that 7 of the 9 dyad sequences, Table
IV, have at least 1 Gly residue at dyad positions +1 or
1 while 4 of
the sequences have Gly residues at both +1 and
1 dyad
positions.6
To examine the possibility that the observed extent of glycosylation
could be rationalized in terms of specific peptide conformation and
structure, we built models for the peptide sequences in Tables II, III, IV.
An extended
-conformation7 was found to
best account for the nearest neighbor and penultimate positional
effects of Gly. In a
-conformation, nearest neighbor residues will
have their side chains directed away from each other on opposite sides
of the peptide backbone, whereas penultimate residues (at positions +2
or
2) will have their side chains adjacent to each other on the same
side of the peptide backbone. Thus, the observation that penultimate
Gly residues may favor glycosylation suggests that penultimate residues
with larger side chains may interfere with glycosylation. Furthermore,
since the side chains of residues at positions +1 and
1 would not be
expected to sterically interfere with the Ser or Thr side chain, the
presence or absence of residues containing bulky side chains at these
positions would be expected to have little effect on glycosylation.
Thus, Ser2, Ser54, and Ser47 which
have N- and C-terminal penultimate residues with side chains are poorly
glycosylated, whereas Ser66, Ser57, and
Ser17 which have a single penultimate neighbor with side
chains are more highly glycosylated. Ser73 having 2 penultimate Gly residues might also be expected to be highly
glycosylated; however, it is only moderately glycosylated. The reduced
glycosylation of Ser73 is consistent with the results of
in vitro glycosylation studies that suggest the added
flexibility of multiple Gly residues may reduce glycosylation (see
Refs. 15 and 45). All of the single residue Thr sequences in Table III
are highly O-glycosylated, and the ranking of
Thr52, Thr37, and Thr70 would
follow the order of decreasing penultimate residue side chain size.
Ser43 and Thr39 appear to be exceptions to the
"rule" in that they contain 2 penultimate residues with side chains
and are nevertheless very highly glycosylated. We suggest that other
factors, such as the presence of Pro residues in their sequences, may
be responsible for enhancing their glycosylation, as discussed
below.
The unexpectedly high glycosylation of Ser43 may be
rationalized in terms of the conformational effects of
1 Pro. Model
building reveals that the Pro residue preceding Ser43 can
alter the peptide backbone conformation so that the
2 side chain
would no longer be adjacent to the Ser side chain. The proposed conformational effects of
1 Pro may explain the elevated incidence for observing Pro at this position at known O-glycosylation
sites (9-10, 16, 46). Experimentally, the introduction of a
1 Pro has been shown to enhance the in vitro glycosylation of a
human von Willebrand factor dodecapeptide (