JBC Focus on PI3-Kinase with Echelon

HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Originally published In Press as doi:10.1074/jbc.M205851200 on October 22, 2002

J. Biol. Chem., Vol. 277, Issue 51, 49850-49862, December 20, 2002
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow All Versions of this Article:
277/51/49850    most recent
M205851200v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Gerken, T. A.
Right arrow Articles by Elhammer, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gerken, T. A.
Right arrow Articles by Elhammer, A.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Mucin Core O-Glycosylation Is Modulated by Neighboring Residue Glycosylation Status

KINETIC MODELING OF THE SITE-SPECIFIC GLYCOSYLATION OF THE APO-PORCINE SUBMAXILLARY MUCIN TANDEM REPEAT BY UDP-GalNAc:POLYPEPTIDE N-ACETYLGALACTOSAMINYLTRANSFERASES T1 AND T2*,

Thomas A. GerkenDagger§, Jiexin ZhangDagger, Jessica Levine, and Åke Elhammer

From the Departments of Pediatrics and Biochemistry, W. A. Bernbaum Center for Cystic Fibrosis Research and University Hospitals Research Institute, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106 and  Pharmacia Corporation, Kalamazoo, Michigan 49001

Received for publication, June 12, 2002, and in revised form, October 3, 2002

    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
REFERENCES

The influence of peptide sequence and environment on the initiation and elongation of mucin O-glycosylation is not well understood. The in vivo glycosylation pattern of the porcine submaxillary gland mucin (PSM) tandem repeat containing 31 O-glycosylation sites (Gerken, T. A., Gilmore, M., and Zhang, J. (2002) J. Biol. Chem. 277, 7736-7751) reveals a weak inverse correlation with hydroxyamino acid density (and by inference the density of glycosylation) with the extent of GalNAc glycosylation and core-1 substitution. We now report the time course of the in vitro glycosylation of the apoPSM tandem repeat by recombinant UDP-GalNAc:polypeptide alpha -GalNAc transferases (ppGalNAc transferase) T1 and T2 that confirm these findings. A wide range of glycosylation rates are found, with several residues showing apparent plateaus in glycosylation. An adjustable kinetic model that reduces the first-order rate constants proportional to neighboring glycosylation status, plus or minus three residues of the site of glycosylation, was found to reasonably reproduce the experimental rate data for both transferases, including apparent plateaus in glycosylation. The unique, transferase-specific, positional weighting constants reveal information on the peptide/glycopeptide recognition site for each transferase. Both transferases displayed high sensitivities to neighboring Ser/Thr glycosylation, whereas ppGalNAc T2 displayed additional high sensitivities to the presence of nonglycosylated Ser/Thr residues. This is the first demonstration of the ability to model mucin O-glycosylation kinetics, confirming that under the appropriate conditions neighboring glycosylation status can be a significant factor modulating the first step of mucin O-glycan biosynthesis.

    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
REFERENCES

A wide range of secreted and membrane-associated proteins are O-glycosylated at serine and threonine by glycans linked through N-acetylgalactosamine (GalNAc).1 A significant number of these glycoproteins contain heavily O-glycosylated domains rich in Ser and Thr of which mucus glycoproteins, commonly called mucins, represent a major class (1). These heavily O-glycosylated domains typically contain 20-30% Ser and Thr, are up to 80% carbohydrate by weight, and are commonly made up of tandemly repeated peptide sequences. Many important biological processes including the protection of epithelial cell surfaces, the immune response, adhesion, inflammation, and tumor genesis (2-9) appear to be afforded and/or modulated by mucins or glycoproteins containing such mucin-like domains. Protein O-glycosylation may also play important developmental roles (10, 11).

In the Golgi the transfer of GalNAc to Ser/Thr residues by UDP-GalNAc:polypeptide alpha -GalNAc transferase (ppGalNAc transferase) represents the first step in O-glycan synthesis. Twelve members of the mammalian ppGalNAc transferase family have been described to date, ppGalNAc T1-T12,2 (9-19). Homologous ppGalNAc transferases have been described in Drosophila and Caenorhabditis elegans (10, 20). Although not well characterized, it is accepted that peptide substrate specificities can vary among the different family members (10, 13, 21-25). In addition, the most characterized transferases (principally ppGalNAc T1 through T4) have activities and specificities that seem to be unpredictably altered by prior peptide substrate glycosylation (26-30), whereas ppGalNAc transferases T7 and T102 show an apparent absolute requirement for prior GalNAc addition for activity (9, 16, 31). The expression of the various ppGalNAc transferase family members having different peptide and/or glycopeptide specificities therefore represents the initial step in the regulation or modulation of O-glycan biosynthesis. Subsequent elongation of O-linked glycans proceeds by the stepwise addition of single sugar residues via a series of substrate-specific Golgi resident transferases (32). Depending on the initial and subsequent substitutions on the GalNAc residue (which may be further modulated by peptide sequence and glycosylation state), a bewildering array of O-linked structures is possible (32).

Central to our understanding of the role and regulation of mucin-type O-glycosylation is the quantitative determination of the site-specific glycosylation pattern of mucins and mucin-like domains and attempting to relate these patterns to the nature of the surrounding polypeptide sequence. A number of statistically based approaches have been reported that have attempted to predict sites of O-glycosylation; however, these approaches are at best semiqualitative and are thus not capable of predicting the extent of site-specific glycosylation (33-35). Presently no predictive approaches exist for estimating site-specific oligosaccharide side chain structures.

Many studies have been performed both in vivo and in vitro examining the glycosylation of specific peptide sequences with the goals of characterizing the peptide specificities of the ppGalNAc transferases (26, 36-40). To date only the ppGalNAc T1 has been extensively and systematically characterized with respect to peptide sequence (36). The results of these and other in vitro studies and the above statistical analyses all point to the roles of neighboring proline, serine, and threonine residues for enhancing the probability of O-glycosylation. The effects of peptide O-glycosylation on ppGalNAc transferase activity have unfortunately failed to provide any generalized rules for O-glycosylation (26-30).

With the goal of understanding the influence of peptide sequence and structure on mucin O-glycosylation, our laboratory has been systematically characterizing the in vivo site-specific glycosylation pattern of each of the 31 glycosylated residues in the 81-residue tandem repeat of the porcine submaxillary mucin (PSM) (41, 42) (see Fig. 2, panel C, for the tryptic tandem repeat sequence of PSM (43, 44)). Recently we have reported (45) the mono-(alpha -GalNAc-O-Ser/Thr), di-(beta -Gal-1,3-alpha -GalNAc-O-Ser/Thr), and tri-(alpha -Fuc-1,2-beta -Gal-1,3-alpha -GalNAc-O-Ser/Thr) saccharide distributions at nearly each individual glycosylation site of the PSM tandem repeats isolated from a group of A blood group minus animals. Our analysis of the glycosylation pattern of these mucins was found to support our earlier suggestions that the O-linked glycan side chain structures and side chain lengths of mucin appear to be modulated in vivo by the density of neighboring, presumably partially glycosylated, hydroxyamino acid residues in the polypeptide sequence. This effect was detected initially for the core-1 structures, formed by the transfer of beta -Gal to GalNAc by the core-1 beta 3-galactosyltransferase (41) and more recently for the peptide GalNAc residues (45).

In this report we have extended our studies to the characterization of the in vitro glycosylation of the apoPSM tandem repeat domain by purified recombinant ppGalNAc transferases T1 and T2. The results show a wide range of rates of GalNAc incorporation with Thr residues incorporating GalNAc at a significantly higher rate than Ser residues. Kinetic models taking into account the reduction of the Ser and Thr first-order rate constants proportionally to the glycosylation status of neighboring residues were found to reasonably reproduce the experimental site-specific rates of GalNAc incorporation for both ppGalNAc transferases. These findings unambiguously confirm the important role that neighboring group glycosylation plays in modulating the initial step of mucin O-glycosylation. Furthermore, the positional weighting constants derived from the models may be used to infer information on the peptide/glycopeptide recognition sites on the different ppGalNAc transferases. This is the first report demonstrating the ability to kinetically model the site-specific glycosylation pattern of a heavily O-glycosylated mucin-like domain.

    MATERIALS AND METHODS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
REFERENCES

Mucin Substrate Preparation-- Oligomeric PSM tandem repeat glycosylated domains were obtained after trypsinization and gel filtration chromatography of the reduced and carboxymethylated mucin as described previously (41, 42). Domains were fully deglycosylated by mild trifluoromethanesulfonic acid/anisole treatment followed by periodate oxidation and alkaline elimination as described (46). Carbon-13 NMR spectroscopy was used to confirmed the complete removal of carbohydrate. Apo-tandem repeat domains were fractionated on Sephacryl S200, 50 mM (NH4)2CO3, pH 8.5, buffer, and the relatively broad apomucin peak split into 4 fractions. The highest molecular weight fraction, representing ~1/4 of the preparation, was utilized as an acceptor peptide. In some studies apomucin was further freed of potential contaminating proteases by passage through a 1-ml immobilized aprotinin protease inhibitor column (Sigma) and through a 1-ml column containing a mixture of six different reactive dye-ligand resins (Sigma reactive dye-ligand test kit RDL-6) both in ammonium bicarbonate buffer.

ppGalNAc Transferases-- Purified soluble recombinant bovine ppGalNAc T1 (47) and human ppGalNAc T2 (21, 48) were prepared from Sf9 cells using baculovirus expression vectors as described previously (25, 49). Recombinant viral ppGalNAc T2 vector was a kind gift of Henrik Clausen University of Copenhagen School of Dentistry. Both transferases revealed single bands on SDS-PAGE (data not shown). Purified stock solutions of ppGalNAc T1 (220 µg/ml) and T2 (145 µg/ml) transferases were stored at -20 °C in 50% glycerol 100 mM HEPES, pH 7.5. Protein concentrations were determined by Lowry et al. (50) and by quantitative amino acid analysis.

Apomucin Glycosylation by ppGalNAc T1 and T2-- The large scale glycosylation of apoPSM was performed in 0.5- to 1.0-ml volumes containing 5 mg/ml apomucin, 10 mM MnCl2, 0.1 mM EDTA, 100 mM HEPES, pH 7.5, 22 µg/ml ppGalNAc transferase T1 or T2, and 5 mM UDP-[3H]GalNAc in the presence of protease inhibitors. Note that transferase concentration was arbitrarily chosen to conserve limited transferase stocks. The best results were found with the following protease inhibitors (Sigma): 1 mM phenylmethylsulfonyl fluoride, 300 µg/ml phosphoramidon, 10 µg/ml trans-epoxysuccinyl-L-leucylamido(4-guanidino)-butane (E-64), and 0.1 mM 4-(2-aminoethyl)benzenesulfonyl fluoride. In later studies the protease mixtures for mammalian cell and tissue extracts (Sigma P8304) and for poly(His)-tagged proteins (Sigma P8849), which contained the above inhibitors in addition to pepstatin A, bestatin, leupeptin, and aprotinin, were used. Reaction mixtures were allowed to incubate for 5 or 24 h (for ppGalNAc T1 or T2, respectively) at 37 °C and were subsequently transferred to 3.5-kDa molecular weight cut-off dialysis membranes (Spectrum Laboratories, Rancho Dominguez, CA, or Pierce) for dialysis at 4 °C against 2-3 changes of 500 ml of reaction buffer lacking UDP-GalNAc in order to remove free UDP inhibitor. GalNAc transfer was reinitiated at 37 °C by adding again both protease inhibitors and UDP-GalNAc (to 5 mM). This procedure was repeated until the desired net reaction time was reached (up to ~250 h) while adding fresh transferase (11 µg/ml) at approximately the half-lives of each transferase, determined under similar reaction conditions (see the Supplemental Material). After the final incubation the sample was dialyzed against water, lyophilized, and fractionated on Sephacryl S200 in 50 mM ammonium bicarbonate, pH 8.5, or in later experiments in 50 mM acetic acid (pH 4.5 with NH4OH) to reduce microbially derived proteolytic cleavage of the tandem repeat domain. Intact and partially cleaved PSM tandem repeat domain glycopeptides were pooled separately.

PSM Tandem Repeat Glycopeptide Isolation-- The partially glycosylated PSM 81-residue tandem repeat glycopeptide was obtained from S200 chromatography after trypsinolysis as described previously (41, 42). The C-terminal portion of the 81-residue tandem repeat glycopeptide (residues 39-78) was obtained after N-terminal biotinylation of the 81-residue tandem repeat, digestion with protease Glu-C, and passage through an immobilized avidin column (Pierce) as described previously (41, 45). Commonly after lengthy incubations with transferase, the oligomeric tandem repeat domain exhibited various degrees of cleavage by contaminating proteases. Such cleavage was significantly reduced with the use of the commercial protease mixtures. These glycopeptides, which were the size of ~80 and 40 residues on gel filtration, were characterized by N-terminal Edman amino acid sequencing. The most common cleavage was found to be at the N terminus of Arg, which gave predominantly the ~81-residue tandem repeat glycopeptide starting at Arg81 and two overlapping ~40-residue glycopeptides beginning at Arg41 and Arg81. Attempts to separate the latter two overlapping 40-residue glycopeptides by reverse phase high pressure liquid chromatography were unsuccessful. The C-terminal peptide, residues 39-78, of the full-length Arg81 tandem repeat glycopeptide was obtained by the above biotinylation procedure. For all time points, except for the 80-h time point with ppGalNAc T2, a unique set of N- and C-terminal glycopeptides (better than 80% a single sequence) could be obtained for sequence analysis. For the 80-h ppGalNAc T2 experiment, the full-length 81-residue tryptic tandem repeat was lost to extensive Arg-N proteolysis prior to obtaining its C-terminal sequence by the biotinylation procedure. The glycosylation of its C-terminal fragment was derived mathematically using the multiple sequence data from the combined 40-residue peptides (beginning with Ile1 and Arg41) and from the original full-length tryptic tandem repeat. In other preparations, cleavage sites at the N terminus of Val and the C terminus of Glu78 were observed, the latter because of protease Glu-C contamination. This latter cleavage allowed the determination of the glycosylation of Thr79 and Ser80 which are typically very difficult to obtain (see Fig. S3 in the Supplemental Material). In all experiments the observed site-specific glycosylation of the partially proteolyzed tandem repeat sequences was found to be essentially the same as the glycosylation for the full-length tandem repeat. Subsequent experiments revealed the Sephacryl S200 column in (NH3)2CO3 buffer as the source of the Arg-N proteolytic activity presumably due to microbial contamination. Changing the column buffer to 50 mM acetic acid, pH 4.5, has eliminated the Arg-N proteolytic activity (data not shown).

Amino Acid Sequencing-- Pulsed liquid phase Edman degradation amino acid sequencing was performed on an Applied Biosystems Procise 494 protein sequencer (Applied Biosystems, Foster City, CA), and site-specific glycosylation was determined as described previously (41, 42, 45). The reproducibility between sequencing runs is relatively good, with standard deviations typically in the range of 5% values. Typically two or more sequence determinations were performed for each isolated glycopeptide. Representative sequencing profiles are displayed in Fig. S3 in the Supplemental Material.

Sequence "Density" Determinations and Initial Data Analysis-- Sequence weighted average Ser/Thr and GalNAc density values for the PSM tandem repeat were taken from previous work (41, 45). Statistical analyses were performed using Pearson product moment correlation procedure in the Sigma Stat statistical software package (version 2.0) (SPSS Inc., Chicago, IL). Correlations were deemed significant for p values less than 0.05.

Numerical Simulation of Site-specific Glycosylation Kinetics-- Modeling of the site-specific glycosylation kinetics taking into account neighboring group glycosylation was performed using the Lotus 123 spread sheet software release 9.5 (Lotus Development Co., Cambridge, MA). In the model the instantaneous relative rate of glycosylation of residue i, d[OG]i/dt, is defined as shown in Equation 1,


d[<UP>OG</UP>]<SUB>i</SUB>/dt=k<SUB>(<UP>Ser or Thr</UP>)</SUB>f(OG+OH)<SUB>i</SUB> [<UP>OH</UP>]<SUB>i</SUB> (Eq. 1)
where [OH]i is the free, nonglycosylated fraction of hydroxyamino acid i, and k(Ser or Thr) is the first-order rate constant for the glycosylation of residue i, specific to whether it is a Ser or Thr residue. Values for kSer and kThr are global values and are fixed throughout the simulation. The function defined as f(OG+OH)i represents a unitless multiplier that decreases the rate constant commensurate to the current local glycosylation status, referred to as (OG+OH)i, of the Ser and Thr residues neighboring residue i. Values for (OG+OH)i are obtained from the weighted summation of the glycosylation status of neighboring residues as described below. Out of necessity the values for f(OG+OH)i range from 1.0 to 0, representing no alteration of the rate constant to the full suppression of the rate constant, respectively, based on the modified exponential function described below. In this manner, the rate constant may be continuously modulated as a function of local glycosylation status throughout the course of the simulation.

Values for the local glycosylation status, (OG+OH)i, are defined as the weighted summation of the neighboring hydroxyamino acid residue's fraction of glycosylation [OG]n and fraction free of glycosylation [OH]n, summed over n = +3 to -3 residues adjacent to residue i, as shown in Equations 2 and 3,
OG=W<SUB>OG<SUB>−3</SUB></SUB> [<UP>OG</UP>]<SUB>−3</SUB>+W<SUB>OG<SUB>−2</SUB></SUB> [<UP>OG</UP>]<SUB>−2</SUB>+W<SUB>OG<SUB>−1</SUB></SUB> [<UP>OG</UP>]<SUB>−1</SUB>+W<SUB>OG<SUB>+1</SUB></SUB> [<UP>OG</UP>]<SUB>+1</SUB>+W<SUB>OG<SUB>+2</SUB></SUB> [<UP>OG</UP>]<SUB>+2</SUB>+W<SUB>OG<SUB>+3</SUB></SUB> [<UP>OG</UP>]<SUB>+3</SUB> (Eq. 2)
and
OH=W<SUB>OH<SUB>−3</SUB></SUB> [<UP>OH</UP>]<SUB>−3</SUB>+W<SUB>OH<SUB>−2</SUB></SUB>[<UP>OH</UP>]<SUB>−2</SUB>+W<SUB>OH<SUB>−1</SUB></SUB> [<UP>OH</UP>]<SUB>−1</SUB>+W<SUB>OH<SUB>+1</SUB></SUB> [<UP>OH</UP>]<SUB>+1</SUB>+W<SUB>OH<SUB>+2</SUB></SUB> [<UP>OH</UP>]<SUB>+2</SUB>+W<SUB>OH<SUB>+3</SUB></SUB>[<UP>OH</UP>]<SUB>+3</SUB> (Eq. 3)
In the above equations, the WOGn and WOHn values represent global positional weighting coefficients that can be adjusted to reflect the unique sensitivities of each transferase to local sequence. Values of WOGn WOHn range from 0 to 1. WOHn and WOGncoefficients for non-hydroxyamino acid residues are defined as zero. Fractional [OH]n and [OG]n values also range between 0 and 1 and are determined numerically from the previous time step of the simulation as described below. The maximum attainable value for each OG or OH glycosylation status summation is 6. (Note that the glycosylation status of residue i does not contribute to the determination of the (OG+OH) value.) This formalism has the flexibility to account for the inhibitory effects of both glycosylated and non-glycosylated neighboring hydroxyamino acid residues based on the individual values of the global positional weighting coefficients WOHn and WOGn.

The f(OG+OH)i function converts the glycosylation status value (OG+OH) into a rate constant multiplier with limits of 1 and 0. The function must have the characteristics so that in the event of the full glycosylation of any single neighboring residue the near full inhibition of glycosylation may be permitted, i.e. as (OG+OH) approaches 1, f (OG+OH) approaches 0. To meet this criteria a modified exponential function, Equation 4 below and plotted in Fig. 1, was devised to satisfy the above limits. Because the maximum value of the local glycosylation status, (OG+OH), will vary depending on the number of Ser and Thr residues in the local sequence, a normalized inverse linear function will not suffice because it will not equally weight the glycosylation of individual Ser or Thr residues in sequences with different numbers of Ser and Thr residues. The f(OG+OH)i function is defined as follows:
f(OG+OH)<SUB>i</SUB>=e<SUP>−(OG+OH)<SUB>i</SUB>/<UP>C1</UP><SUB>(<UP>Ser or Thr</UP>)</SUB></SUP> (1+<UP>C</UP>2(OG+OH)<SUB>i</SUB>) (Eq. 4)
where C1(Ser or Thr) and C2 are global parameters whose values can be adjusted to optimize the fit of the simulation to the experimental data. The term containing C2 is included to further linearize the curve (note that when C2 = 0 a pure exponential function results). As shown by the black curve in Fig. 1, values for C1 of 0.35 and C2 of 2.2 produce an ideal shaped curve with an initial slope of -1.0 and an ~80% reduction in value at (OG+OH) values of ~1. The function as defined by these constants was found to work well for modeling the Thr residue glycosylation data for both transferases (shown below). It was found after some experimentation, however, that significantly improved results were obtained in the modeling of Ser residues when the C1Ser value was reduced to 0.25, giving the gray curve shown in Fig. 1. This curve is somewhat steeper than that for Thr having an initial slope of approximately -1.5 and giving an ~95% reduction in value at (OG+OH)i values of ~1. As for the Thr residues, this function as parameterized was found to satisfactorily model the Ser data for both ppGalNAc T1 and T2.


View larger version (13K):
[in this window]
[in a new window]
 
Fig. 1.   Plot of the glycosylation rate constant multiplier, f(OG+OH), versus neighboring glycosylation status, (OG+OH), using Equation 4. The black line represents the plot utilized for Thr residues, C1Thr = 0.35 and C2 = 2.2, whereas the gray line represents the plot optimized for Ser residues, C1Ser = 0.25 and C2 = 2.2. The straight lines represent extrapolations of initial plot to values of (OG+OH) ~0.5.

The glycosylation kinetics of each transferase was simulated numerically by using Equation 5 utilizing 360 time steps, Delta t. For ppGalNAc T1 each step represented 0.25 h, while for ppGalNAc T2 each step represented 0.9 h. This gave effective simulation times of 90 and 324 h, respectively. In the equation [OG]i,t and [OG]i,t+Delta t represent the fractional glycosylation of residue i at time t and at time t + Delta t respectively.
[<UP>OG</UP>]<SUB>i, t+&Dgr;t</SUB>=&Dgr;t k<SUB>(<UP>Ser or Thr</UP>)</SUB> f(OG+OH)<SUB>i</SUB> [<UP>OH</UP>]<SUB>i, t</SUB>+[<UP>OG</UP>]<SUB>i, t</SUB> (Eq. 5)

In this model there are a total of 17 adjustable variables. These consist of 12 positional weighting coefficients WOGi and WOHi, two Ser- and Thr-specific rate constants kSer and kThr (with units of fractional residue glycosylation h-1), and the f(OG+OH) constants C1Ser, C1Thr, and C2.3 The implementation and optimization of the model involved the manual adjustment of the above variables until a satisfactory fit (visually, by least square correlation coefficient, r2, and by standard deviation determination) was obtained for the glycosylation of the majority of Ser and Thr residues using all experimental time points.

    RESULTS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
REFERENCES

Optimization of apoPSM Glycosylation by ppGalNAc T1 and T2-- Native mucin is highly O-glycosylated by GalNAc (~70%); therefore, it was deemed necessary in these studies to follow the in vitro glycosylation to similar high levels of glycosylation, rather than simply obtaining the initial rates of glycosylation as is typically performed. Only from such high glycosylation studies was it anticipated that the full effects of neighboring glycosylation would be revealed on the glycosylation kinetics.

Preliminary studies described in the Supplemental Material indicate that GalNAc incorporation into apoPSM by ppGalNAc T1 reaches a plateau after 5-8 h of incubation (see Fig. S1, panels A and B, in the Supplemental Material) and that this is due to both UDP-GalNAc depletion and UDP product inhibition resulting from the competing UDP-GalNAc hydrolyase activity of the transferase. Therefore, to reach high levels of GalNAc incorporation, repetitive incubations were performed, followed by overnight dialysis as described under "Materials and Methods." Incorporation was reinitiated by the addition of fresh UDP-GalNAc and protease inhibitors, and the cycle was repeated. Because the half-life of ppGalNAc T1 is ~5 days at 37 °C (see Fig. S1, panel C, in the Supplemental Material), every 5th day one-half of the initial ppGalNAc T1 was re-added to the incubation mix. Net incubation times of ~5, 15, 35, and 70 h at 37 °C were obtained for ppGalNAc T1. Carbon-13 NMR spectra of the mucin products prior to trypsinolysis clearly indicate the incorporation of GalNAc and demonstrate that Thr residues are glycosylated at a greater rate than Ser residues (see Fig. S2 in the Supplemental Material).

ppGalNAc T2 was shown to maintain linear GalNAc incorporation over at least a 24-h incubation period at 37 °C showing no apparent UDP inhibition (Fig. S1, panels A and B); therefore, repetitive incubations of 24-40 h were performed as described for ppGalNAc T1. Fresh ppGalNAc T2 was added every second dialysis to compensate for the ~2-day half-life of ppGalNAc T2 (Fig. S1, panel C). Because of its roughly one-fifth lower activity relative to ppGalNAc T1, net incubation times of ~40, 80, and 250 h at 37 °C were obtained. Carbon-13 NMR spectra confirmed the incorporation of GalNAc (data not shown).

Site-specific Glycosylation-- The site-specific glycosylation patterns obtained at each time point for both ppGalNAc T1 and T2 are plotted with respect to sequence position in Fig. 2 and by hydroxyamino acid residue type as shown in Fig. S4 of the Supplemental Material (also see Table I of the Supplemental Material for a tabulation of the data). Also plotted in each figure are previously determined in vivo GalNAc glycosylation for each residue (the right-most light gray bar in each cluster) (45). A wide range of apparent glycosylation rates are displayed by both transferases, with ppGalNAc T1 glycosylating many more residues than ppGalNAc T2. The longest incubation with ppGalNAc T1 (70 h) resulted in an average residue glycosylation of 49%, whereas the longest incubation with ppGalNAc T2 (250 h) gave an average residue glycosylation of 20%. The figures reveal that Thr residues are typically more highly glycosylated than Ser residues, 78 versus 36% and 35 versus 12% for ppGalNAc T1 (70 h of incubation) and T2 (250 h of incubation), respectively, although a number of Ser residues can attain high degrees of glycosylation for both transferases.


View larger version (39K):
[in this window]
[in a new window]
 
Fig. 2.   Glycosylation of the PSM tandem repeat by ppGalNAc T1 and T2. Panels A and B display the glycosylation pattern in sequential order along the tandem repeat for ppGalNAc T1 and ppGalNAc T2, respectively. Increasingly dark gray bars from left to right at each residue represent 5, 15, 35, and 70 h of net incubations with ppGalNAc T1, or 40, 80, and 250 h of net incubations with ppGalNAc T2. The right-most light gray bar for each residue represents the native in vivo glycosylation for each residue as determined previously (45). Data were obtained from the glycosylation data in Table I of the Supplemental Material. Note that the omitted bars in the panels for Thr79 and Ser80 signify the absence of experimental data for these residues. Panel C displays the PSM tandem repeat amino acid sequence.

To examine the extent that the glycosylation pattern of each transferase may correspond with the native in vivo glycosylation pattern, the plots in Fig. 3, panels A-D, were obtained. For ppGalNAc T1 there are good correlations between the averaged residue-specific glycosylation at the 70-h time point with the native in vivo glycosylation for both Ser and Thr residues, see Fig. 3, panels A and B (r2 = 0.45, p = 0.002 for Ser and r2 = 0.65, p = 0.002 for Thr). Very weak, if any, correlation is observed for the 250-h glycosylation by ppGalNAc T2 with the native glycosylation as shown by Fig. 3, panels C and D (see Fig. 3 legend for statistics). These observations and the plots in Fig. 2 and Fig. S4 in the Supplemental Material suggest that ppGalNAc T1 may be a major contributor to the in vivo glycosylation of the porcine submaxillary gland mucin, whereas ppGalNAc T2 may play a minor role. This is consistent with immunohistological studies of the porcine salivary gland, demonstrating that ppGalNAc T1 is expressed at a significantly higher level than ppGalNAc T24 using monoclonal antibodies against the human transferases (51). Because the porcine, bovine, and human ppGalNAc T1 transferases have higher than 98% sequence homology, they are expected to have virtually identical enzymatic and immunological properties.5 It is likely that those sites not highly glycosylated by ppGalNAc T1 in vitro but highly glycosylated in vivo may be glycosylated in vivo by other ppGalNAc transferases with different peptide/glycopeptide specificities (13).


View larger version (22K):
[in this window]
[in a new window]
 
Fig. 3.   Comparisons of in vitro glycosylation with both in vivo glycosylation and Ser/Thr density. Panels A-D, PSM tandem repeat glycosylation by ppGalNAc T1 and T2 versus native in vivo glycosylation. Panels E-H, in vitro glycosylation versus sequence-specific Ser/Thr density. Panels A and B and E and F represent data obtained from the 70-h ppGalNAc T1 glycosylation data. Panels C and D and G and H represent data from the 250-h ppGalNAc T2 incubation data. Ser/Thr density function values were obtained as described previously (45), and the in vivo glycosylation values are from Ref. 45. Note the different vertical scales for the left panels representing Ser residues. Solid lines in each panel represent the least square fit to the data. Correlation coefficients and p values are as follows: A, r2 = 0.45, p = 0.002; B, r2 = 0.65, p = 0.002; C, r2 = 0.18, p = 0.07; D, r2 = 0.09, p = 0.4; E, r2 = 0.24, p = 0.03; F, r2 = 0.001, p = 0.9; G, r2 = 0.23, p = 0.04; H, r2 = 0.49, p = 0.02.

Previous studies from our laboratory (41, 45) have suggested relationships between the observed in vivo site-specific glycosylation and the density of the Ser and Thr residues along the PSM polypeptide sequence as quantified by an arbitrarily defined Ser/Thr density function. These relationships were proposed to reflect inhibitory steric effects of neighboring residue glycosylation. Plots of the 70-h ppGalNAc T1 and 250-h ppGalNAc T2 site-specific glycosylation versus the sequence-derived Ser/Thr density are shown in Fig. 3, panels E-H. Consistent with the above studies, an inverse relationship of the extent of glycosylation versus Ser/Thr density is apparent for Ser with ppGalNAc T1 (Fig. 3, panel E) and for both Ser and Thr with ppGalNAc T2 (Fig. 3, panels G and H) (see Fig. 3 legend for statistics). The plot of Thr with ppGalNAc T1 (Fig. 3, panel F) reveals no trends with Ser/Thr density presumably because of the clustering of glycosylation values to a narrow range of very high values.

Modeling Site-specific Glycosylation by Numerical Simulation-- To determine whether the observed individual rates of glycosylation indeed reflect the modulation of glycosylation by neighboring residue effects, we performed numerical simulations using a model (described in detail under the "Materials and Methods") that incorporates the incremental effects of neighboring group glycosylation. The simulation was performed such that the Ser and Thr first-order rate constants, kSer and kThr, were decremented during the course of the simulation as a function of changing neighboring residue glycosylation status, defined as f(OG+OH). The model was formulated to be flexible, permitting independent sensitivities, WOGn and WOHn, to both the presence and absence of neighboring glycosylated Ser and Thr residues over a range of plus and minus 3 residues of the site of glycosylation.3 Simulations for each transferase were optimized to reproduce the experimental data by manual iterations of the intrinsic rate constants, kSer and kThr, the 12 sequence-specific positional weighting parameters, WOGn and WOHn, and the Ser and Thr-specific function, f(OG+OH), that relates overall neighboring glycosylation status to a fractional rate constant multiplier, as further described under "Materials and Methods." Goodness of fit was evaluated visually, by least square correlation coefficient, r2, and by standard deviation. For ppGalNAc T1 the four different incubation times give ~120 individual glycosylation values for fitting, and the three time points for ppGalNAc T2 provide a set of ~85 values. Our goal in developing this model was to test whether the experimentally observed glycosylation time course could be reproduced by the inhibitory effects of neighboring group glycosylation. We therefore have not submitted the model to an exhaustive mathematical minimization procedure, recognizing in particular that the experimental data are subject to significant errors with respect to the net incubation time, overall transferase activity, and measured extent of glycosylation.

Simulation of ppGalNAc T1-- By a series of manual iterations it was possible to obtain values of the Ser and Thr rate constants, kSer and kThr, and the positional weighting coefficients, WOGn and WOHn, capable of reproducing the experimental site-specific glycosylation data as shown by Figs. 4 and 5 and in Figs. S5 and S6 of the Supplemental Material. For ease of presentation, glycosylation values averaged over all four time points (5, 15, 35, and 70 h) are plotted in the right-hand panels of Fig. 4. The figure clearly demonstrates the stepwise improvement of the simulation's fit to the experimental data (right-hand panels) as a function of the inclusion of successive positional weighting values WOGn and WOHn (left-hand panels). Corresponding plots of the predicted glycosylation versus experimental glycosylation for the different weighting patterns in Fig. 4 are given in Fig. S5 of the Supplemental Material.


View larger version (43K):
[in this window]
[in a new window]
 
Fig. 4.   Optimization of the simulated glycosylation of the PSM tandem repeat by ppGalNAc T1 by adjustments of the positional weighting coefficients, WOGn and WOHn. The left panels display the values of WOGn and WOHn (black and gray bars, respectively) used for the simulation (see text for their definitions). The right panels show the comparison of the simulated values (black bars) with the experimental values (gray bars), which for ease of presentation represent the combined average glycosylation values for the 5-, 15-, 35-, and 70-h incubation periods. The simulation was performed using Equation 5, and the values for C1Ser, C1Thr, and C2 are given in Equation 4 and displayed in Fig. 1. First-order rate constants, kSer and kThr, of 0.022 and 0.09 mol fraction h -1, respectively, were used for all simulations. See Fig. S5 of the Supplemental Material for the statistical analysis of the data.


View larger version (26K):
[in this window]
[in a new window]
 
Fig. 5.   Simulated time course of the site-specific glycosylation of the PSM tandem repeat by ppGalNAc T1. The simulation was performed using the optimized kinetic and positional weighting parameters given in Fig. 4F. Panels A and B display the indicated Ser residues, and panels C and D display the indicated Thr residues. Solid lines represent the individual residue simulation that are identified to the right of the curve. Individual data points represent the experimentally obtained values that are identified to residue number at the far right of each panel. Direct comparisons of the experimental and simulated data for each of the 4 experimental time points are displayed in Fig. S6 of the Supplemental Material.

As expected, in the absence of neighboring group inhibition (i.e. all WOGn and WOHn values = 0), the simulation gives a uniform extent of Ser and Thr glycosylation as shown in Fig. 4, panel A (for the average), and Fig. S5, panel A, of the Supplemental Material (for the individual time points). Under these conditions relatively poor correlation coefficients, r2, and S.D. for the simulated versus experimental values are obtained (r2 values of 0.383 and 0.55 and S.D. values of 0.308 and 0.235 for Ser and Thr, respectively).6 The systematic inclusion of WOGn values of 1 as shown in the plots in Fig. 4 and Fig. S5, panels A-D, of the Supplemental Material clearly demonstrate an improved fit for both Ser and Thr residues as the effects of neighboring residue glycosylation are incrementally included in the simulation (see Fig. S5 legend for statistical values). Additional changes in the weighting scheme were found to further improve the simulation for both Ser and Thr as shown by panels E and F in Fig. 4 and Fig. S5 of the Supplemental Material. Reducing the WOG-3 and WOG+3 residue weights to 0.5 and the inclusion of a weak free hydroxyamino acid sensitivity, WOH, of 0.2 at the +1 position further improved the fit (Fig. 4, panel E), resulting in a noticeable narrowing of the data point scatter for both Ser and Thr residues (compare Fig. S5, panels D and E, of the Supplemental Material). Optimal r2 and S.D. values for Ser (r2 = 0.881, S.D. = 0.068) and Thr (r2 = 0.662, S.D. = 0.140) were obtained by including additional weak hydroxyamino acid sensitivities, WOH, at positions -3, +1, and +3 as shown in panel F of Fig. 4 and Fig. S5 of the Supplemental Material.

To confirm that the apparent success of the simulation indeed reflects authentic neighboring group effects and not artifacts of the fitting procedure, we attempted to fit the experimental data to the model after shifting the experimental data for each Ser or Thr to the next Ser or Thr residue, respectively, in the tandem repeat. In this manner the experimental data were effectively removed from their original sequence context, and the simulation was expected to fail. This was observed as no set of positional weighting constants could be obtained that were capable of increasing either Ser or Thr r2 above their initial values, obtained in the absence of neighboring group effects, nor could significant improvements in S.D. values be obtained (data not shown).

The full time course of the optimized ppGalNAc T1 simulation is given in Fig. 5, whereas a comparison of the experimental and simulated data at each individual time point (5, 15, 35, and 70 h) is given in Fig. S6 of the Supplemental Material. Considering the inherent experimental errors in the incubation times, transferase activity, and extent of glycosylation, the model reasonably reproduces the site-specific glycosylation for most residues particularly when visualized at each time point as shown in Fig. S6 of the Supplemental Material.7 An interpretation of the optimized weighting parameters suggests that ppGalNAc T1 is highly inhibited by the glycosylation of neighboring residues plus or minus 3 residues of the site of glycosylation and very weakly inhibited by the presence of unsubstituted hydroxyamino acid residues at the +1 position and perhaps at the -3 and +3 position. The model-derived Ser and Thr first-order rate constants of 0.022 and 0.090 mol fraction/h representing rate constants of 0.38 and 1.6 µmol of GalNAc (mg of ppGalNAc T1)-1h-1, respectively, are consistent with previous reports demonstrating that Thr residues are typically significantly more rapidly glycosylated than Ser residues by ppGalNAc T1.

It is particularly satisfying that the model may provide insight into the origins of the experimentally observed glycosylation behavior. For example, Ser43, which is the most rapidly glycosylated Ser (Fig. 5 and Fig. S6 of the Supplemental Material), is the only Ser with no hydroxyamino acid neighbors within plus or minus 3 residues (see Fig. 2 for the PSM tandem repeat sequence), although as discussed below Ser43 is the only hydroxyamino acid residue preceded by a Pro. Ser17, which has a single Ser at the -3 position, is also found highly glycosylated. Ser62 and Ser63, in the Ser62-64 triad, are very poorly glycosylated due to their proximity to the more rapidly glycosylating Thr60, whereas Ser64 is more readily glycosylated due to its increased distance from Thr60 (see Fig. 4, panel F, and Fig. S6 of the Supplemental Material). Even the "dip" in the glycosylation of the Thr49-50 dyad is predicted by the model due to the inhibitory effects of the glycosylation of neighboring Thr52. The model also predicts that the glycosylation of several residues will likely plateau at values less than 100% (i.e. Ser residues 6, 14, 23, 32, 47, 54, 59, 62, 63, and 80) (see Figs. 2 and 5) and that this is again due to the glycosylation of neighboring Thr or Ser residues. Further evidence of the importance of the inhibitory effects of rapidly glycosylating neighboring Thr residues arises from the high dependence of the fit of the Ser residue glycosylation to the value for the Thr rate constant, i.e. when kThr = 0, the Ser residue fit considerably worsens (r2 = 0.498, S.D. = 0.161). In contrast, when there is no Ser glycosylation allowed, kSer = 0, there is essentially no change in the simulation for the Thr residues using the optimized positional weighting parameter values in Fig. 4, panel F (r2 = 0.657 and S.D. = 0.144). These results clearly show that for the majority of the residues in the PSM tandem repeat, the observed site-specific time course of ppGalNAc T1 glycosylation can be explained to a large extent on the basis of the inhibitory effects of the glycosylation status of the neighboring hydroxyamino acid residues.

Two residues, Ser2 and Thr79, are predicted to be highly glycosylated by the ppGalNAc T1 simulation but are very poorly glycosylated in vitro and in vivo (Figs. 4 and 5 and Fig. S6 of the Supplemental Material). Ser80, which flanks these residues, is also poorly glycosylated both by the simulation and experiment. These differences are not due to end effects nor to the inability to cleave these sites by trypsin.8 Interestingly, Ser2 is in a sequence nearly homologous to Ser43 (Glu-Thr-Ser-Arg-Ile-Ser2-Val-Ala-Gly-Ser versus Glu-Thr-Ala-Arg-Pro-Ser43-Val-Ala-Gly-Ser) which is found to be the most rapidly glycosylated Ser residue in vitro and by the simulation. The high glycosylation of Ser43 may be attributed to its lacking neighboring hydroxyamino acid residues and perhaps by the presence of a preceding Pro residue. Preliminary ppGalNAc T1 studies on heptapeptide analogues of both Ser2 and Ser43 confirm that these peptides display the same differences in propensity for glycosylation as observed in the intact tandem repeat.9 However, considerable differences in peptide solubility in aqueous buffers are observed; the Ser2 peptide readily precipitates whereas the Ser43 peptide remains fully soluble. Secondary structure predictions on the PSM tandem repeat (42) indicate that Ser2 is located in a region of predicted extended beta -like structure; therefore, the Ser2 region of the tandem repeat (including Thr78 and Ser80), and the Ser-2 heptapeptide, may form partially soluble beta -sheet-like structures resistant to glycosylation. A similar discrepancy in Ser2 glycosylation between experiment and simulation with ppGalNAc T2 further supports this explanation (see below). We conclude that Ser2 as well as Thr79 and Ser80 may be intrinsically very poor substrates for both ppGalNAc transferases as the result of their altered secondary and tertiary structures. Work continues characterizing the secondary structures of the Ser2 and Ser43 peptides and on characterizing the specific role of the neighboring residues in each.

Simulation of ppGalNAc T2-- By manually adjusting the individual Ser and Thr rate constants, kSer and kThr, and the positional weighting coefficients, WOGn and WOHn as was performed for ppGalNAc T1, it was possible to obtain a good simulation for ppGalNAc T2 as shown by Figs. 6 and 7 and by Figs. S7 and S8 of the Supplemental Material. In the absence of neighboring residue effects, (i.e. WOGn and WOHn = 0), the r2 values are 0.020 and 0.445, whereas the S.D. values are 0.429 and 0.374 for Ser and Thr, respectively (data not shown). The systematic inclusion of WOGn values was found to somewhat improve the fit. For example, WOG values of 1 from the -2 through +2 positions give r2 values of 0.089 and 0.440 and S.D. values of 0.285 and 0.260 for Ser and Thr (data not shown). The inclusion of additional WOG values of 1 at the -3 and +3 positions further improved the fit for both Ser and Thr (Fig. 6, panel A, and Fig. S7, panel A, of the Supplemental Material, and see the legend of Fig. S7 for statistical values). Only with the inclusion of values for WOHn does the simulation significantly improve as shown by Fig. 6 and Fig. S7, panels B-E, of the Supplemental Material. With a WOH+1 value of 0.9, the simulation of Thr22 through Thr49 significantly improved (Fig. 6, panel B and Fig. S7, panel B, of the Supplemental Material). Setting all WOHn values to 0.3 (Fig. 6, panel C, and Fig. S7, panel C, of the Supplemental Material) reproduced the pattern of Thr30 through Thr70 and improved the simulation for Ser. Limiting the WOH weights to only the -2 through +2 positions was found to reduce the fit giving r2 values of 0.222 and 0.463, and S.D. values of 0.185 and 0.163 for Ser and Thr, respectively (data not shown). By increasing the WOH+1WOH weight to 0.9 while maintaining the remaining WOHn values at 0.3, a significant improvement in the simulation for both Ser and Thr was achieved (Fig. 6, panel D, and Fig. S7, panel D, of the Supplemental Material). Further adjustment of the WOH values to 0.5 at positions +3 and -3 resulted in the best overall simulation (Fig. 6, panel E, and Fig. S7, panel E, of the Supplemental Material), giving r2 values of 0.532 and 0.778 and S.D. values of 0.116 and 0.092 for Ser and Thr, respectively.


View larger version (43K):
[in this window]
[in a new window]
 
Fig. 6.   Optimization of the simulated glycosylation of the PSM tandem repeat by ppGalNAc T2 by adjustments of the positional weighting coefficients, WOGn and WOHn. The left panels display the values of WOGn and WOHn (black and gray bars, respectively) used for the simulation (see text for their definitions). The right-hand panels show the comparison of the simulated values (black bars) with the experimental values (gray bars), which for ease of presentation represent the combined average glycosylation values for the 40-, 80-, and 250-h incubation periods. The simulation was performed using Equation 5, and the values for C1Ser, C1Thr, and C2 given in Equation 4 and displayed in Fig. 1. First-order rate constants, kSer and kThr, of 0.0055 and 0.008 mol fraction h-1, respectively, were used for all simulations. See Fig. S7 of the Supplemental Material for the statistical analysis of the data. Note that no experimental data are available for Thr79 and Ser80.


View larger version (22K):
[in this window]
[in a new window]
 
Fig. 7.   Simulated time course of the site-specific glycosylation of the PSM tandem repeat by ppGalNAc T2. The simulation was performed using the optimized kinetic and positional weighting parameters given in Fig. 6E. Panels A and B display the indicated Ser residues, and panels C and D display the indicated Thr residues. Solid lines represent the individual residue simulations that are identified to the right of the curve. Individual data points represent the experimentally obtained values that are identified to residue number at the far right of each panel. Direct comparisons of the experimental and simulated data for each of the three experimental time points are displayed in Fig. S8 of the Supplemental Material.

On the basis of the obtained weighting parameters, ppGalNAc T2 appears to be highly sensitive to neighboring glycosylation as well as to the presence of neighboring nonglycosylated hydroxyamino acid residues, especially at the +1 position. The optimized ppGalNAc T2 first-order Ser and Thr rate constants are ~4- and 10-fold lower relative to ppGalNAc T1 (0.0055 and 0.008 mol fraction h-1 or 0.094 and 0.14 µmol (mg of ppGalNAc T2)-1h-1). These values are consistent with the lower activities of ppGalNAc T2 compared with ppGalNAc T1 reported against the same substrates (21, 23, 25). The lower ratio of kThr/kSer for ppGalNAc T2 compared with ppGalNAc T1 (1.6 versus 4.9 respectively) also follows the same trends observed previously for these transferases (25).

The plot of the experimental ppGalNAc T2 data versus the optimal simulation for Ser (see Fig. S7, panel E, left panel, of the Supplemental Material) shows considerable scatter. This scatter is shown to be primarily due to differences between the time points as shown by an examination of the plots for the individual time points (Fig. S8 of the Supplemental Material). At each time point the simulation correctly ranks the observed glycosylation for both Ser and Thr. These time point-dependent variations are attributed to our inability to accurately control transferase activity over the lengthy incubation periods utilized in these experiments. As a final validation of the fitting procedure, attempts to fit the experimental data after shifting the data by one residue were performed as for ppGalNAc T1. Again, no consistent set of parameters were found that would improved the correlation coefficients and standard deviations for both residues.10

Only a small number of Ser residues are significantly glycosylated in vitro by ppGalNAc T2, i.e. Ser43, Ser7, and Ser17, and these accept GalNAc at rates similar to several Thr residues (Fig. 2, panel B, and Fig. S4, panel B, of the Supplemental Material). This behavior is approximated by the simulation; however, the simulation incorrectly predicts Ser2 and Ser23 to be highly glycosylated (Fig. 6, panel E, Fig. 7, and Fig. S8 of the Supplemental Material). We cannot presently explain the discrepancy for Ser23, but as discussed above for ppGalNAc T1, Ser2 may be poorly glycosylated due to the secondary and/or tertiary structural effects of the peptide.11 Most of the remaining Ser residues appear experimentally to be refractory to glycosylation by ppGalNAc T2 (Figs. 2, panel B, and 7 and Fig. S4, panel B, of the Supplemental Material,). In contrast, the Thr residues appear to be capable of further glycosylation. These observations are reproduced, to the most part, by the ppGalNAc T2 simulation (Fig. 7 and Fig. S8 of the Supplemental Material). We conclude that the very low rates of Ser glycosylation observed for ppGalNAc T2 are best explained in terms of the inhibitory effects of neighboring nonglycosylated hydroxyamino acid residues and to a lesser extent due to neighboring glycosylation. This contrasts with ppGalNAc T1 whose glycosylation appears to be dominated by the inhibitory effects of neighboring residue glycosylation. Thus, unlike the ppGalNAc T1 simulation, the inhibition of Thr glycosylation, kThr = 0, does not greatly affect the simulation for Ser (r2 = 0.441, S.D. = 0.130) using the optimized positional weighting parameters of Fig. 6, panel E. Similar to ppGalNAc T1, the elimination of Ser glycosylation, kSer = 0, does not affect the simulation of the Thr (r2 = 0.773, S.D. = 0.096).

Of particular interest is the very high sensitivity of ppGalNAc T2 to nonglycosylated Ser and Thr at the +1 position and the elevated sensitivities at -3 and +3 positions. The former would tend to direct glycosylation to the C-terminal residue in hydroxyamino acid residue dyad sequences, as shown for the Ser6 to Ser7 and Thr29 to Thr30 dyads (see Figs. 2, panel B, 6, panel E, and Fig. S8 of the Supplemental Material). Such preferences are not reported in previous studies (27, 28, 30) on ppGalNAc T2, perhaps due to the limited number of peptides studied and due to the potential for end effects. The overall high sensitivity of ppGalNAc T2 to the neighboring nonglycosylated hydroxyamino acid residue is not presently understood. Perhaps substrate peptide binding is sufficiently weak that the neighboring hydroxyamino acid residues can compete with the site of glycosylation thereby reducing the overall efficiency of GalNAc transfer. Regardless, the values of the positional weighting parameters obtained are useful for comparing and contrasting the properties of the different ppGalNAc transferase.

It is of interest to examine the distribution of Ser and Thr residues neighboring each hydroxyamino acid residue in the PSM tandem repeat to determine whether any given positions are over- or under-represented which could result in the skewing of the positional weighting parameters. From the sequence analysis (see Table 2 of the Supplemental Material) the distribution of neighboring hydroxyamino acid residues is remarkably uniform for both Ser and Thr, with each position having between 9 and 11 hydroxyamino acid residues. We conclude that an uneven distribution of hydroxyamino acid residues is not responsible for the weighting parameters obtained for either transferase.

    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
REFERENCES

Studies were undertaken to examine the activities of purified ppGalNAc transferases T1 and T2 against the porcine submaxillary mucin tandem repeat substrate whose in vivo glycosylation pattern has been extensively characterized (41, 42, 45). It was of interest to determine the extent that the single transferases could reproduce the observed in vivo glycosylation pattern and to determine whether the inverse correlations of glycosylation by GalNAc with hydroxyamino acid density and presumably glycosylation density would be reproduced in vitro. These studies have successfully addressed both issues, suggesting that ppGalNAc T1 is a major contributor to the mucin's glycosylation and confirming, by a kinetic modeling approach, that local glycosylation status can account for much of the glycosylation behavior of both transferases.

Experimental evidence for neighboring glycosylation decreasing the rate of glycosylation has been reported previously (26, 27, 30) for ppGalNAc T1 through T4 against a range of small glycopeptides substrates. However, these studies did not yield predictive rules, perhaps due to the presence of end effects due to the use of relatively short (5-25 residues) glycopeptide substrates. The present work was performed on the relatively intact oligomeric tandem repeat domains derived from PSM; therefore, end effects should be absent in the final analysis of the glycosylation of the isolated tandem repeat. An additional advantage of our approach is that the rate data for all of the Ser and Thr residues in the PSM tandem repeat are obtained under exactly identical conditions, thereby eliminating additional experimental variables that may interfere with their comparison.

The demonstration that the time course of glycosylation of the PSM tandem repeat can be nearly completely accounted for on the basis of neighboring group glycosylation status, for both ppGalNAc T1 and T2, is particularly interesting as these results seem to contradict the many previous studies demonstrating that the ppGalNAc transferases possess clear substrate preferences related to peptide sequence and composition. Many in vivo and in vitro studies (26, 36-40) have demonstrated the importance of neighboring Pro residues and the modulating effects of charged residues. Statistical data base studies on O-glycosylation also demonstrate a high prevalence for Pro, Ser, and Thr residues neighboring the sites of O-glycosylation (33-35). In addition, several ppGalNAc transferase isoforms have sufficiently different substrate specificities, such that unique transferase-specific peptide acceptors have been identified (13, 21). On the basis of this prior knowledge the only feasible explanation for the success of our modeling is that the PSM tandem repeat, having evolved to be efficiently O-glycosylated, is composed of exceptionally good acceptor sites having nearly identical initial rates of O-glycosylation (indeed as implemented in the model). Only under such conditions would the inhibitory effects of neighboring glycosylation be readily revealed. As we have discussed above, the observation that Ser2 and perhaps Thr79 are far less glycosylated than expected by our kinetic model suggests that these residues may be intrinsically poor acceptors. It is therefore anticipated that additional refinements in the model, taking into account rate constant decreases or enhancements due to specific neighboring residues or peptide sequences, may be required before the glycosylation of other mucin peptide domains can be successfully modeled by this approach.

It should be noted that the inclusion into our model of the inhibitory effects of neighboring nonglycosylated hydroxyamino acid residues in fact introduces a sequence-specific component, effectively recognizing that ppGalNAc transferases may possess unfavorable hydroxyamino acid sequence motifs. On the basis of the modeling, ppGalNAc T2 clearly shows a highly specific sensitivity to the presence of neighboring hydroxyamino acid residues, particularly at the +1 position, which is only weakly observed for ppGalNAc T1. Whether other ppGalNAc transferases exhibit similar sensitivities remains to be determined. Although the above may appear to conflict with the statistical analysis of O-glycosylation sites which suggests the presence of Ser and Thr as predictors of O-glycosylation (33-35), it has been shown that the association of Ser and Thr with O-glycosylation is the result of the statistical clustering of Ser and Thr residues (34).

In summary, a kinetic model based on first principles has been described capable of reproducing the in vitro glycosylation of the PSM tandem repeat domain by both ppGalNAc T1 and T2. Key to the model is the reduction of the rate constant proportional to the neighboring residue glycosylation status. An analysis of the positional sequence weighting coefficients reveals that both ppGalNAc T1 and T2 possess sensitivities to the neighboring glycosylation state up to plus and minus 3 residues of the site of glycosylation, generally in keeping with previous O-glycosylation site analysis (33-35). Each transferase has been found to possess common and unique sensitivities, with ppGalNAc T2 showing significantly higher sensitivity to neighboring nonglycosylated hydroxyamino acid residues than ppGalNAc T1. These findings support our previous in vivo studies that revealed an inverse relationship with the extent of glycosylation and Ser/Thr density (45). For those cases where the model predicts greater glycosylation than experiment, we propose as a plausible explanation the presence of intrinsically poor acceptor substrates. This work demonstrates that in addition to the intrinsic propensity of a substrate for O-glycosylation dictated by peptide sequence and conformation, the glycosylation states of neighboring residues play equally important roles in determining mucin O-glycosylation.

    ACKNOWLEDGEMENTS

We thank Marc Gilmore for technical assistance and Drs. Himan Sternlicht, Vernon Anderson, Eckard Jankowsky, and Frank Sonnichsen for the helpful discussions.

    FOOTNOTES

* This work was supported by NCI Grant RO1-CA-78834 from the National Institutes of Health (to T. A. G.).The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The on-line version of this article (available at http://www.jbc.org) contains Supplemental Methods and Results, Supplemental Tables 1 and 2, and Supplemental Fig. S1-S8.

Dagger Both authors contributed equally to this work.

§ To whom correspondence should be addressed: Dept. of Pediatrics, Case Western Reserve University School of Medicine. BRB, 2109 Adelbert Rd., Cleveland, OH 44106-4948. Tel. 216-368-4556; Fax: 216-368-4223; E-mail: txg2@po.cwru.edu.

Published, JBC Papers in Press, October 22, 2002, DOI 10.1074/jbc.M205851200

2 Note that there are two different mammalian ppGalNAc T9s described in the literature: Toba and co-workers (18) and Ten Hagen and co-workers (9). The latter should perhaps be referred to as ppGalNAc T10.

3 Note that in the implementation of the model, the possibility for separate positional weighting values for Ser and Thr residues and the inclusion of residues plus and minus 4 from the site of glycosylation were allowed. It has been found, however, that these additional variables do not typically improve the simulation and are therefore not included in the present work.

4 U. Mandel and H. Clausen (University of Copenhagen), personal communication.

5 The amino acid sequence of the bovine ppGalNAc T1 (47) used in these studies is 99.3% homologous (555 of 559 residues) to the porcine transferase (52), whereas the porcine transferase is 98.7% homologous (552 of 559 residues) to the human transferase (48). To date the sequence of the porcine ppGalNAc T2 homologue has not been reported, although one with high homology would be expected (see Ref. 10). On the basis of the recent work of Schwientek et al. (10), one would expect identical substrate preferences for the homologous enzymes across the species studied in this work.

6 The statistical analysis was performed using data obtained from all time points and all residues except for Ser2 and Thr79 which appear to be outliers. As discussed in the text, the glycosylation of Ser2 and Thr79 clearly appears to be affected by additional factors.

7 The concordance of the predicted plateaus with the experimental data is not always good. This may be due to both errors in the primary experimental data or inaccuracies in the model such as the nature of the arbitrary f(OG+OH) function (Equation 4) that determines the rate constant multiplier based on local glycosylation status. Regardless, the overall correspondence of the experiment data and the predictions of the model consistently show that the rates of glycosylation will be significantly and systematically reduced, solely on the basis of the glycosylation status of neighboring residues.

8 Previous studies from our laboratory (41, 42