Discovery of the Shortest Sequence Motif for High Level Mucin-type O-Glycosylation*

The consensus primary amino acid sequence for mucin-type O-glycosylation sites has not been identified. To determine the shortest motif sequence required for high level mucin-type O-glycosylation, we prepared more than 100 synthetic peptides and assayed in vitro O-GalNAc transfer to serine or threonine in these peptides using a bovine colostrum UDP-N-acetylgalactosamine:polypeptideN-acetylgalactosaminyl transferase (O-GalNAcT). We chose the sequence PDAASAAP from human erythropoietin (hEPO) for further systematic substitutions because it accepted GalNAc and was a fairly simple sequence consisting only of four kinds of amino acids. Several substitutions showed that threonine is ∼40-fold better than serine as the glycosylated amino acid and a proline at position +3 on the C-terminal side is very important. To define the effect of proline residues around the glycosylation site, we analyzed a series of peptides containing one to three proline residues in a parent peptide AAATAAA. The results clearly indicated that prolines at positions +1 and +3 had a positive effect. The O-GalNAc transfer level of AAATPAP was increased approximately 90-fold from AAATAAA. The deletion of amino acids from the N-terminal side of the glycosylation site suggested that five amino acids from position −1 to +3 were especially important for glycosylation. Moreover, the influence of all 20 amino acids at positions −1, +2, and +4 was analyzed. Uncharged amino acids were preferred at position −1, and small or positively charged amino acids were preferred at position +2. No preference was observed at position +4. We propose a mucin-type O-glycosylation motif,XTPXP, which may be suitable as a signal for protein O-glycosylation. The features observed in this study also appear to be very useful for prediction of mucin-typeO-glycosylation sites in glycoproteins.

The consensus primary amino acid sequence for mucin-type O-glycosylation sites has not been identified. To determine the shortest motif sequence required for high level mucin-type O-glycosylation, we prepared more than 100 synthetic peptides and assayed in vitro O-Gal-NAc transfer to serine or threonine in these peptides using a bovine colostrum UDP-N-acetylgalactosamine: polypeptide N-acetylgalactosaminyl transferase (O-Gal-NAcT). We chose the sequence PDAASAAP from human erythropoietin (hEPO) for further systematic substitutions because it accepted GalNAc and was a fairly simple sequence consisting only of four kinds of amino acids. Several substitutions showed that threonine is ϳ40-fold better than serine as the glycosylated amino acid and a proline at position ؉3 on the C-terminal side is very important. To define the effect of proline residues around the glycosylation site, we analyzed a series of peptides containing one to three proline residues in a parent peptide AAATAAA. The results clearly indicated that prolines at positions ؉1 and ؉3 had a positive effect. The O-GalNAc transfer level of AAATPAP was increased approximately 90-fold from AAATAAA. The deletion of amino acids from the N-terminal side of the glycosylation site suggested that five amino acids from position ؊1 to ؉3 were especially important for glycosylation. Moreover, the influence of all 20 amino acids at positions ؊1, ؉2, and ؉4 was analyzed. Uncharged amino acids were preferred at position ؊1, and small or positively charged amino acids were preferred at position ؉2. No preference was observed at position ؉4. We propose a mucin-type O-glycosylation motif, XTPXP, which may be suitable as a signal for protein O-glycosylation. The features observed in this study also appear to be very useful for prediction of mucin-type O-glycosylation sites in glycoproteins.
Glycosylation is an important post-translational modification of proteins in eukaryotic cells, and is divided broadly into three categories of N-linked, O-linked, and glycosylphosphatidylinositol-anchored types (1). Mucin-type sugar chains are the principal structure in various O-linked oligosaccharides. The initial transfer of an N-acetylgalactosamine (GalNAc) 1 from UDP-GalNAc to the hydroxyl group of serine or threonine is catalyzed by UDP-GalNAc:polypeptide ␣1,O-N-acetylgalactosaminyl transferase (O-GalNAcT; EC 2.4.1.41) in the cis-Golgi compartment (2). To date, at least three isozymes for the enzyme have been found by purification (3)(4)(5)(6)(7)(8)11) and cDNA cloning (7)(8)(9)(10)(11)(12)(13). Although the consensus amino acid sequence Asn-Xaa-Ser/Thr (Xaa Pro) is extensively known in the case of N-glycosylation sites, such a consensus primary amino acid sequence has not been identified among the mucin-type Oglycosylation sites. However, because mucin-type O-glycosylation occurs only at specific serine or threonine residues, it seems that a certain primary sequence or three-dimensional structure must be required for the recognition by O-GalNAcT.
Proline residues are found with considerable frequency around glycosylation sites and may play an important role in recognition by O-GalNAcT. Statistical studies by Wilson et al. (14), O'Connell et al. (15), and Hansen et al. (16) suggested that prolines occur most commonly at position Ϫ1 and ϩ3 from the glycosylated residue (number with Ϫ or ϩ refers to a position of amino acid that is N-terminal or C-terminal to a glycosylated amino acid). Elhammer et al. (17) suggested that all proline residues from Ϫ4 to ϩ4 may be acceptable for O-glycosylation. Pisano et al. (18) proposed four O-glycosylation motifs from the identification of sites in human glycophorin A. One of them, XPXX (at least one X is a glycosylated threonine residue), indicates that prolines at position Ϫ1, Ϫ2, and ϩ1 may be important. However, these findings have not been fully supported by biochemical studies of the effect of proline residues at each position around the O-glycosylation sites.
Some attempts to analyze the influence of the primary sequence using synthetic peptides in vitro have been reported. This method is useful for identifying sequences capable of being glycosylated and for determining the efficiency of O-GalNAc transfer to a certain sequence. By using synthetic peptides derived from bovine myelin, Young et al. (19) showed that the sequences containing TPPP are good substrates and suggested that a triproline sequence C-terminal to threonine might be sufficient for mucin-type O-glycosylation. However, it has also been reported that peptides not containing the triproline sequence are substrates for O-GalNAcT (15,17,20). Tabak's group have made efforts to analyze the in vitro GalNAc transfer using more than 50 peptides derived from human von Willebrand factor (hVF) (15,20). Although they reported that a proline at position ϩ3 is essential and a second proline placed at Ϫ1, ϩ1, ϩ2, or ϩ4 is particularly effective, single substitutions from the native sequence are not enough to understand the effect of proline residues in general (20). Similar experi-ments using more than forty synthetic peptides derived from human Muc1 mucin were reported by Nishimori et al. (21). Their findings suggested that a proline at position ϩ3 enhances the glycosylation but is not sufficient and the minimal size of the peptide required for efficient glycosylation is six amino acids from Ϫ1 to ϩ4, while Young et al. (19) described TPPP as the minimal substrate.
These results show that the surrounding sequences of the O-glycosylated amino acid affect the GalNAc transfer remarkably; however, further experiments are needed to clarify the general requirements for mucin-type O-glycosylation.
Previously, most studies were complicated by the fact that no distinction was made between single and multiple glycosylations or between serine and threonine as the glycosylated residue. In this report we describe the features of amino acid sequence around single mucin-type O-glycosylation sites by in vitro GalNAc transfer analysis using a series of synthetic peptides. Bovine colostrum O-GalNAcT was able to transfer Gal-NAc to a short peptide PPDAASAAP (underscore indicates the glycosylated serine) derived from an inherent mucin-type Oglycosylation site of human erythropoietin (hEPO). Subsequent substitutions of the sequence showed the minimal peptide length required for high level GalNAc transfer, the preference of O-glycosylated amino acid, and two influential proline residues at position ϩ1 and ϩ3. From results observed in this study, we propose that XTPXP is an effective mucin-type Oglycosylation motif.
Preparation of Acceptors-Apomucin was prepared from bovine submaxillary mucin as described by Hagopian and Eylar (22).
The peptides were synthesized by Fmoc (N-(9-fluorenyl)methoxycarbonyl) solid-phase method using an automated peptide synthesizer PS3 (Protein Technologies, Inc.) on a 0.1 mM scale or purchased from Kurabo Co. Ltd. The crude peptides were purified by reverse-phase high performance liquid chromatography (HPLC) on a C18 Vydac column (10 ϫ 250 mm) by elution with a water/acetonitorile gradient containing 0.1% trifluoroacetic acid. Each purified product displayed a single peak by the reverse-phase HPLC analysis on a C18 Vydac column (4.6 ϫ 250 mm).
To quantify peptide amounts and verify their amino acid content, the peptides were hydrolyzed under vapor phase HCl containing 1% phenol at 130°C for 3 h, and the hydrolyzed materials were then analyzed by a JLC-300 amino acid analyzer (JEOL Ltd.). The purity of each peptide was analyzed on an API III Sciex electron ion spray mass spectrometer (Perkin-Elmer).
Purification of UDP-N-acetylgalactosamine:Polypeptide N-Acetylgalactosaminyl Transferase from Bovine Colostrum-O-GalNAcT was purified from 4.8 liters of bovine colostrum by DEAE-Sephacel column chromatography and two steps of apomucin affinity column chromatography essentially as described by Elhammer et al. (5). The enzyme activity of the purified fraction was determined by monitoring the transfer from UDP-[ 3 H]GalNAc to an apomucin as described by Sugiura et al. (3).
In Vitro O-GalNAc Transfer Assay for Peptide Acceptors-The standard reaction mixture contained 50 mM imidazole-HCl (pH7.2), 10 mM MnCl 2 , 0.5% Triton X-100, 150 M UDP-[ 3 H]GalNAc (approximately 133,000 dpm), 2 mM synthetic peptide, and enzyme solution in a final volume of 50 l. The initial velocity of GalNAc transfer to peptide was measured by incubating the mixture at 37°C from 30 to 480 min, depending on the reactivity of the peptide. The reaction was terminated by adding 50 l of 250 mM EDTA. The glycosylated peptide was separated from unreacted UDP-[ 3 H]GalNAc on a 1-ml AG1X-8 (Cl Ϫ form) column with water as eluent. The 2.6 ml of run-through fraction was collected directly in a glass scintillation vial and supplemented with Atomlight liquid scintillation mixture. The radioactivity was measured by a SE-100 scintillation counter (Packard) for 2 min. Each assay was calibrated with PPASTSAPG and/or AAATPAP as standard.

RESULTS
O-GalNAc Transfer to Peptides Derived from Inherent Mucin-type Glycoproteins-O-GalNAc transfer was assayed using synthetic peptides derived from inherent mucin-type glycoproteins, porcine submaxillary mucin (23), ovine submaxillary mucin (24), human granulocyte-colony stimulating factor (hG-CSF) (25), bovine -casein (26), and hEPO (27,28). The peptide RTPPP derived from bovine myelin, which is not natural mucin-type glycoprotein (19), and the peptide PPASTSAPG, which was predicted to be a good substrate from statistical analysis of the mucin-type O-glycosylation sites by Elhammer et al. (17), were also assayed as substrates for convenience to the comparison with previous results. The results are summarized in Fig.  1A. The transfer of GalNAc was observed in peptides from hG-CSF, bovine -casein, hEPO, bovine myelin, and the sequence proposed by Elhammer, while the transfer was not detected in two peptides from porcine and ovine submaxillary mucins. Because the peptide PPASTSAPG was the best substrate, the amount of transferred GalNAc to each peptide was indicated in relative values to this control peptide. The peptide PPDAASAAP from hEPO was more highly glycosylated than other peptides derived from natural mucin-type glycoproteins, and the amount of transferred GalNAc was 8% of the control peptide.
Differences of the Glycosylation Efficiency between Serine and Threonine as a Glycosylated Amino Acid-The difference FIG. 1. Relative amounts of transferred GalNAc to synthetic peptides. A, peptide sequences derived from inherent mucin-type glycoproteins. The peptide SGGSGTPG is derived from porcine submaxillary mucin (23), PGGSATPQ from ovine submaxillary mucin (24), ALQPTQGA from hG-CSF (25), GEPTSTP from bovine -casein (26), PDAASAAP from hEPO (27,28), RTPPP from bovine myelin (19), and PPASTSAPG from statistic analysis of the mucin-type O-glycosylation sites by Elhammer et al. (17). B, comparison between threonine and serine as a glycosylated amino acid in the same surrounding sequences. C, influence of amino acid substitutions and a deletion in a peptide sequence derived from hEPO. The mean values obtained at least three independent experiments are presented.
between Ser and Thr as a glycosylated amino acid was measured using the sequences from bovine myelin and hEPO. The glycosylation level of the peptide containing Thr was ϳ40-fold higher than those containing Ser in both cases (Fig. 1B). This value is in good agreement with the previously published results for the purified and the recombinant bovine colostrum O-GalNAcT1, respectively (8,17). Because the sequence PP-DAATAAP showed higher glycosylation than PPASTSAPG that showed the highest glycosylation in Fig. 1A, it is likely that the surrounding sequence of the O-glycosylation site from hEPO may contain suitable characteristics for the glycosylation by bovine colostrum enzyme.
Effect of Simplification in the hEPO Sequence-To find a parent peptide for further investigation of the amino acid preference at each position around the glycosylation site, the effect of simplification in the hEPO sequence was analyzed (Fig. 1C). All alanine substitutions and a deletion of Pro at position Ϫ4 decreased O-GalNAc transfer, but Pro at position ϩ3 appears to be critical. This result suggested that certain proline residues that are often observed around mucin-type O-glycosylation sites are very important. The most simplified peptide, AAATAAA, was glycosylated in low level and chosen as the parent peptide.
Survey of Effective Proline Residues-The effective positions of a proline residue were analyzed, and the result is shown in Fig. 2A. Two proline residues at position ϩ3 and ϩ1 increased the transfer level by 33-and 7-fold in comparison to the parent peptide, respectively. A 3-fold increase was caused by Pro at position Ϫ1; however, proline residues at other positions did not promote glycosylation.
Subsequently, effects of a second proline were measured by adding it to AAATAAP, which was the best sequence in Fig. 2A. As shown in Fig. 2B, the second proline at position ϩ1 revealed a synergistic effect to Pro at ϩ3 and increased the transfer level to 90-fold of the parent peptide. While the prolines at position Ϫ3 and Ϫ2 showed a 2-fold increase from AAATAAP, other second proline residues did not show any effects.
In addition, a series of peptides, containing a third proline which was added to two prolines at position ϩ1 and ϩ3, were analyzed (Fig. 2C). No increase from AAATPAP was observed in all cases, and a marked decrease was conferred by Pro at position Ϫ1.
The kinetic data of several representative peptides are shown in Table I. Substitution of either proline of the sequence AAATPAP by alanine decreased V max significantly, while K m values were comparable. Interestingly, the peptide AAATPPP, containing the third proline at position ϩ2, has a V max higher than that of AAATPAP but also a higher K m . Hence, its catalytic efficiency is the same as that of AAATPAP. The GalNAc transfer to AAATAAA was measurable but too low to determine the kinetic parameters under the conditions used.
The positive effect caused by two proline residues at position ϩ1 and ϩ3 was analyzed in the case of serine as the glycosylated amino acid. As shown in Fig. 2D, these two proline residues were effective on the GalNAc transfer to serine, and the importance of proline at each position was similar to the result observed in the transfer to threonine as shown in Fig. 2

(A-C).
Deletion Analysis of N-terminal Amino Acids of the Glycosylation Site to Determine the Minimal Length-The influence of deletion in the N-terminal amino acids were analyzed to determine the minimal length required for high level GalNAc transfer. The deletion of alanines at position Ϫ3 and Ϫ2 reduced the glycosylation gradually but was not fatal (Fig. 3). However, the lack of Ala at Ϫ1 made a significant loss of the transfer, suggesting that the amino acid at position Ϫ1 is required for efficient O-glycosylation. Because Young et al. (29) suggested that acetylation (Ac) of the N-terminal amino group of the glycosylated threonine might compensate for this deletion, we also analyzed the influence of acetylation of the N-terminal amino group as well as amidation (Am) of the C-terminal carboxyl group. The results summarized in Fig. 3 disclosed that no effect was detected from the comparison of (Ac)-TPAP with TPAP. In contrast, the peptides containing Ala at position Ϫ1, such as ATPAP and ATPAP-(Am), showed approximately 10fold higher transfer of GalNAc than TPAP. The amidation of C terminus had no remarkable effect. These observations suggested that the minimal length required for high level O-glycosylation is five amino acids from Ϫ1 to ϩ3.
Influences of Amino Acid Diversity at Positions Ϫ1, ϩ2, and ϩ4 -The influences of amino acid diversity at position Ϫ1 and ϩ2, at which proline did not show significant effect (Fig. 2,  A-C), were analyzed. The glycosylation was affected consider-  2. Survey of the site-specific effect of proline residues on GalNAc transfer. One of alanine residues in AAATAAA (A), AAATAAP (B), and AAATPAP (C) was replaced to a proline to evaluate the effect of the first, second, and third proline, respectively. D, the site-specific effect of proline residues when the glycosylation site is serine. The mean values obtained at least three independent experiments are presented.
ably by the change of amino acids at position Ϫ1 and ϩ2 (Fig.  4, A and B).
The bovine colostrum enzyme highly transferred GalNAc to peptides containing Tyr, Ala, Trp, Phe, Thr, Ile, Ser, Gly, or Val, but not to peptides containing Lys, Asp, Glu, or Arg at position Ϫ1 (Fig. 4A). Therefore, at position Ϫ1, it is likely that uncharged amino acids were acceptable while the charged amino acids were unfavorable. It was noteworthy that Tyr, Trp, and Phe having a bulky aromatic ring were preferable as well as Ala, Ser, and Gly having a small side chain.
High level transfer was seen when the amino acid at ϩ2 was Pro, Ala, Cys, Lys, Arg, Ser, and His, but low level transfer was detected when it was Asp, Asn, Phe, Trp, Tyr, and Gly (Fig.  4B). From these results, we presumed that the amino acids with a positive charge or small side chain are advantageous at position ϩ2. Nevertheless, an exception is Gly at position ϩ2, which showed approximately 20% of the glycosylation to AAATPAP.
The influence of various amino acids at position ϩ4 was assayed. By contrast to the results of amino acids at position Ϫ1 and ϩ2, no remarkable difference on the transfer was observed (Fig. 4C). However, histidine decreased the transfer level to 24% of AAATPAP.
We reported here the features of primary peptide sequences required for high level O-glycosylation by a bovine colostrum O-GalNAcT1 in vitro. These results appear to be very useful for the improvement of mucin-type O-glycosylation efficiency and prediction of the glycosylation sites in glycoproteins. In summary we propose the several motifs for high level O-glycosylation: XTPXP, XTXXP, XTPXX, and XSPXP. They potentially correspond to single mucin-type O-glycosylation sites by O-GalNAcT1. XTPXP is the best motif among them.
The substitution analysis of hEPO-derived sequence on Gal-NAc transfer suggested that mutations of the amino acid sequence around the inherent O-glycosylation site might improve the glycosylation efficiency of intact hEPO in mammalian cells. Indeed, Elliott et al. (30) reported that hEPO mutants of Thr-126 (PPDAATAAPLR) and Pro-127 (PPDAASPAPLR) were almost completely O-glycosylated in COS-1 cells, while wild type glycoprotein (PPDAASAAPLR) was not fully O-glycosylated. This indicates that the results from in vitro GalNAc transfer analysis using short peptides can be used to convert an incompletely O-glycosylated glycoproteins to a fully glycosylated glycoproteins.
Our results showed that bovine colostrum O-GalNAcT could transfer GalNAc to serine at a low level (Figs. 1A and 2D), while the GalNAc transfer to threonine of this enzyme was apparently higher than that to serine by 15 to 40-fold in the same surrounding sequences (Figs. 1B and 2, A-D). This preference of threonine to serine was in good agreement with the previous results of O-GalNAcT1 using sequences from hEPO, bovine myelin, and porcine submaxillary mucin (8,17,31). It was interesting that the peptide derived from hEPO was more highly glycosylated than peptides from bovine -casein and hG-CSF, even though the sequence contained a serine as the glycosylated amino acid. This observation suggested that the surrounding sequence of the hEPO glycosylation site is more suitable than those of bovine -casein and hG-CSF.
Surveys of the effective proline residues surrounding the glycosylation site clearly showed that two specific proline residues at position ϩ3 and ϩ1 facilitate the glycosylation, but other prolines do not have such significant roles (Fig. 2, A-C). These effects were observed similarly when serine was selected as the glycosylated amino acid in place of threonine (Fig. 2D). The kinetic studies showed that the effect of these proline residues is on V max rather than K m (Table I). This indicates that proline residues do not enhance the binding energy between substrate and O-GalNAcT. The significance of a proline at ϩ3 in the surrounding sequence of hEPO was consistent with those in hVF and hMuc1 mucin (15,20,21). However, the influence of proline at Ϫ1 was not so significant, although statistical analyses by some researchers have suggested that Pro at Ϫ1 and ϩ3 might be important (14 -16). Although O'Connell et al. (20) reported that a second proline placed at Ϫ1, ϩ1, ϩ2, or ϩ4 is particularly effective, our data indicated the second prolines at Ϫ1 and ϩ2 have little effect. The positive effects of Pro at Ϫ1 and ϩ2 observed in their results seem to be caused by the alteration of an unpreferable amino acid to a preferable one, i.e. from Gly to Pro at position ϩ2 (Fig. 4B). A peptide AAPTPAP showed an apparent decrease from AAAT-PAP in the GalNAc transfer. While the exact reason is not clear, it might be come from the unfavorable conformation by the introduction with the third proline at position Ϫ1.
The minimal length required for the GalNAc transfer by bovine colostrum O-GalNAcT was five amino acids from position Ϫ1 to ϩ3. While Young et al. (29) previously indicated that an amino acid at position Ϫ1 might not be required for the in vitro glycosylation from the comparison of (Ac)-TPPP and RT-PPP, our results clearly showed that Ala at Ϫ1 has a significantly larger effect than the N-terminal acetylation or Arg at Ϫ1 (Figs. 3 and 4A). A deletion experiment of N-terminal amino acids indicated that not only the amino acid at position Ϫ1 but also those at Ϫ3 and Ϫ2 may affect the O-GalNAc transfer (Fig.  3), although no significant differences of the glycosylation level was observed in the substitution from Ala to Pro at position Ϫ3, Ϫ2, and Ϫ1 (Fig. 2, A-C).
The in vitro glycosylation of peptides was greatly affected by the substitution of amino acids at position Ϫ1 and ϩ2, and a clear preference was observed at each position (Fig. 4, A and B). At position Ϫ1, uncharged amino acids were good for the glycosylation and the size of side chain was not involved. The preferable amino acids at position ϩ2 were those with a positive charge or small side chain. The positive charge seemed to be very advantageous in this position, because the side chains of lysine, arginine, and histidine are not so small. Glycine at ϩ2 showed relatively low level glycosylation, even though its side chain is apparently small. The reason for this phenomenon may be come from the specificangle made by glycine. In contrast to positions Ϫ1 and ϩ2, no remarkable difference of the glycosylation was observed in the substitution of amino acid at position ϩ4 (Fig. 4C), suggesting that this position is not so important in the recognition by bovine colostrum O-GalNAcT.
We proposed several motifs for mucin-type O-glycosylation by in vitro analysis using a series of systematic synthetic peptides. Because mucin-type O-glycans on glycoproteins have various biological and physicochemical roles, the creation of a new O-glycosylation site in a protein seems to be very useful to improve its delivery and stability. We are currently investigating whether these motifs can work as a signal for mucin-type O-glycosylation when they are introduced into a protein.