|
Advertisement | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
J. Biol. Chem., Vol. 282, Issue 19, 14586-14597, May 11, 2007
Systematic Analysis of Proteoglycan Modification Sites in Caenorhabditis elegans by Scanning Mutagenesis*
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Biosynthesis of proteoglycan sugar chains involves numerous synthetic and processing steps to attach the first sugar, build a linker tetrasaccharide, and synthesize and modify a GAG disaccharide repeat. The first step in the glycosylation of a proteoglycan modification site is the catalyzed transfer of xylose to a specific serine in a core protein by the polypeptide xylosyltransferase (ppXyl-T; EC 2.4.2.26 [EC] ), using the sugar donor UDP-xylose. It is the ppXyl-T enzyme's substrate specificity that defines which serines in a core protein become modified with proteoglycan-type sugar chains.
Secreted proteins frequently contain large numbers of serine and threonine residues in their sequences, yet only selected residues within a core protein appear to be recognized and modified by ppXyl-T. As with most modification sites, it is the primary and secondary structure of the proteoglycan addition site that define or regulate the critical interactions between the modified serine residue and the active site of the initiating enzyme, ppXyl-T. Such enzyme-substrate interactions must be very specific, since the ppXyl-T enzyme must, otherwise, compete with a large number of other polypeptide sugar transferases that O-glycosylate serine (and threonine), such as the family of polypeptide GalNAc transferases, which initiate mucin-type O-glycosylation. The attachment of the first sugar (xylose or GalNAc) to a serine is a key discriminating event, since the first attached carbohydrate moiety becomes the substrate for the addition of the next sugars in the proteoglycan tetrasaccharide or mucin oligosaccharide chain structures.
Initial comparison of mammalian GAG attachment sites suggested that the sequence motif Ser0-Gly+1-Xaa+2-Gly+3 may be a minimal protein sequence context for proteoglycan modification sites (14). However, there are cases where mapped proteoglycan sites lack the Gly+3. In addition, glycine
alanine substitution at the +1- and +3-positions in decorin did not ablate GAG chain addition (15). Amino acid alignment of 51 experimental and predicted mammalian chondroitin sulfate attachment sites suggested the following recognition sequence: a-a-a-a-Gly-Ser-Gly-a-b-a (where a represents acidic amino acids (Glu or Asp), and b represents Gly, Glu, or Asp) (16). Another comparison of over 30 heparan sulfate and chondroitin sulfate glycosaminoglycan attachment sites showed that at least two acidic amino acid residues were present on one or both sides of the proteoglycan substitution site (17). Indeed, in vitro kinetic studies also supported the observation that the most favorable peptide substrates for proteoglycan attachment include clusters of acidic amino acids flanking a Ser-Gly dipeptide or Ser-Gly-Xaa-Gly (14, 16).
For a number of reasons, the current state of published data on the experimentally mapped proteoglycan sites is suboptimal for producing a predictive algorithm for glycosylation sites and for understanding the enzyme-substrate requirements of the proteoglycan glycosylation process. First, the experimental data set is small and not uniform. The past data sets group poorly glycosylated sequences with strong sites and combine glycosylation states and rates from in vivo or in vitro studies. Second, systematic mutagenesis studies have not been performed on the flanking sequence of the modification sites. Furthermore, some of the published data sets of proteoglycan sites include predicted sites without experimental mapping data. In our present study, we used 46 site-directed mutant reporter-protein constructs to investigate the influence of amino acids from position -6 to +6 surrounding serine on the xylosylation of reporter substrates. The reporter sequence is derived from a glycosylated region of the C. elegans cell surface proteoglycan, SDN-1. Reporter proteins were co-expressed in tissue culture with either a nematode or fly polypeptide xylosyltransferase, and the modification state was examined to determine what role the flanking amino acids have on proteoglycan modification in vivo. The results of this analysis define peptide sequence requirements for proteoglycan modification sites and provide insights into how the cellular machinery distinguishes between the modification of hydroxyamino acids in mucin and proteoglycan biosynthetic pathways.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
To express the worm ppXyl-T enzyme, the full-length cDNA of the ppXyl-T enzyme was amplified with the primers PRIM346 (d(CACATGCATAGTTTTTACAGGGCGTGCTCCTCATGT)) and PRIM69 (d(CTCCGGAGCGGCCGCTAAATCAAGGTCTGCGTATC)). This PCR product was digested with NsiI and NotI and inserted into pLC-S2 to make the plasmid pLC-S2-FL-pXT.
Ligated plasmids were transformed into Escherichia coli and isolated by plasmid minipreparation (Qiagen Turbo 8). DNA sequencing was performed on all plasmids to identify the cDNA insert, its reading frame, and the specific amino acid substitution.
Expression of the Peptide Reporters in Insect CellsDrosophila Schneider (S2) cells were cultured in SDM medium (Gibco) supplemented with 10% (v/v) fetal bovine serum and 1% penicillin/streptomycin at 27 °C. Cells were passaged every 3-4 days. For transfections, 106 cells were added to each well of a 6-well plate, incubated for 1 day, and then treated with a DNA-FuGENE-6 mixture. Briefly, 3 µl of FuGENE-6 (Roche Applied Science) transfection reagent was diluted into 97 µl of serum-free SDM medium and incubated at room temperature for 5 min. Then 1 µg of reporter plasmid DNA was added with mixing to the diluted FuGENE-6, and this mixture was incubated for 20 min at room temperature. Then the DNA-FuGENE-6 mixture was added in a dropwise manner to cells. Vigorous agitation/swirling was used to mix the cells and transfection reagent. For co-expression of worm polypeptide xylosyltransferase with the reporter proteins, 0.1 µg of pLC-S2-FL-pXT plasmid DNA was included in the transfection mix with the reporter plasmid. After a 1-day incubation at 27 °C, transfected cells were induced for protein expression by the addition of 600 µM CuSO4. After 3 days, cell culture medium, containing the secreted reporter, was harvested and clarified by centrifugation at 2000 rpm for 10 min.
RNA Interference of Fly ppXyl-T Activity in Insect CellsFor RNA interference, double-stranded RNA (dsRNA) was synthesized in vitro and added to insect cells immediately prior to transfection. To prepare dsRNA, fly ppXyl-T cDNA was first amplified from total S2 cell cDNA and cloned into the MluI and NotI sites of the pIMKF3 plasmid (19), using the primers Prim357 (d(CTGCACGCGTACTGGCAGTCCCTGTATCA)) and Prim358 (d(GTGTGCGGCCGCTTCATTTGAGCAGGGCATCCACATC)). Next, T7 promoter sites were introduced to each end of the fly ppXyl-T cDNA by PCR amplification of the vector's insert, using the following oligonucleotides: Prim652, d(GTCCATAATACGACTCACTATAGGGAAGACGATGACGATAAACACGC); Prim376, d(GTCCATAATACGACTCACTATAGGGCATGTCTGGATCCGTGGTC). The 5' ends of these later two primers contain a T7 RNA polymerase promoter sequence; therefore, this places T7 RNA polymerase promoters at both ends of the fly ppXyl-T PCR product. dsRNA was produced using T7 RNA polymerase and 100 ng of the latter PCR product in an in vitro transcription reaction (Megascript Kit; Ambion). Following a 6-h transcription reaction at 37 °C, the RNA products were subjected to a denaturation step of 2 min at 90 °C and a slow annealing temperature gradient of 80-30 °C over a 30-min period. The dsRNA was phenol-extracted, ethanol-precipitated, and resuspended to a final concentration of 5 mg/ml.
For RNAi treatment, S2 cells were plated at 106 cells/ml (2 ml/well) in 6-well plates (Corning Glass). After a 24-h incubation at 27 °C, the cells were washed twice with serum-free SDM medium, and then 1 ml of serum-free SDM medium containing 15 µg of ppXyl-T dsRNA was added to each well and immediately mixed with vigorous agitation and swirling. After a 2-h incubation at 27 °C, 2 ml of SDM with 10% fetal bovine serum was added to each well. These dsRNA-treated cells were then transfected with reporter plasmids, as described above.
Immunoprecipitation and Glycanase TreatmentTo purify the reporter proteins, 25 µl of anti-FLAG M2 agarose was added directly to 1 ml of clarified cell culture medium. The affinity resin was rocked at 4 °C overnight and was washed twice in Buffer A (20 mM Tris, 200 mM NaCl, 5% glycerol, pH 7.4). Reporter proteins bound to beads were treated with O-glycanase (0.125 milliunits of endo-
-N-acetylgalactosaminidase), sialidase (0.5 milliunits of
2-3,6,8,9-neuraminidase), and N-glycanase (500 milliunits of peptide N-glycanase F) in a final volume of 48 µl for 5 h at 37 °C, according to the manufacturer's specifications (Glycoprotein Deglycosylation Kit; Calbiochem). For SDS-PAGE analysis, 16 µl of 4 x NuPage sample loading buffer was added to the glycosidase-treated samples. Control N-glycanase digestions used human salivary amylase, which migrates as a mixture, containing a nonglycosylated and an N-glycosylated isoform (20). An O-glycanase control contained a predicted mucin-type O-glycosylated domain, rich in serine/threonine/proline from C. elegans LET-653 (21), which was expressed in S2 cells, under the same conditions as the proteoglycan reporter proteins.
Glycoprotein Detection and Occupancy DeterminationEach reporter construct was expressed in S2 cells in triplicate, and the glycosylation state was analyzed by Western blot. Reporter proteins were separated on BisTris 12% SDS-PAGE with MOPS running buffer (NuPAGE; Invitrogen) and electrophoretically transferred onto nitrocellulose membranes. After a 2-h room temperature incubation in blocking solution (0.5% casein blocker (Roche Applied Science) in Tris-buffered saline with 0.05% Tween), the membrane was incubated overnight with a 1:5000 dilution of S-protein-horseradish peroxidase conjugate (Novagen) at 4 °C. After washing the membranes five times in Tris-buffered saline, 0.05% Tween 20, chemiluminescent detection was performed using SuperSignal West Femto Maximum Sensitivity substrate (Pierce). The percentage occupancy or glycosylation state was calculated in triplicate by densitometry scanning of the glycosylated and unmodified protein bands.
RT-PCR of Endogenous Fly GlycosyltransferasesTotal RNA was isolated from S2 cells in 300 µl of TriReagent and 2 µl of polyacrylamide carrier (Molecular Research Center). First strand cDNA synthesis was conducted using Super Script III (Invitrogen) in a final volume of 20 µl, containing 5 µg of total RNA and an oligo(dT) primer. For RT-PCR, first strand cDNA from each cell line was added to a GoTaq reaction mixture made for a total of 10 reactions, according to the manufacturer's specifications (Promega); however, the primers were not added. To ensure equal delivery of each cDNA template, this mixture was mixed and dispensed into 10 individual tubes, and PCR was initiated by adding 0.5 mM PCR primers for each glycosyltransferase target. Primer pairs for a total of four glycosyltransferase cDNAs were dispensed into separate reaction tubes. After a five-cycle touch-down annealing step from 72 to 62 °C, the cDNA targets for selected glycosyltransferases were amplified in a 24-cycle reaction, using the following conditions: 94 °C for 20 s, 62 °C for 30 s, and 72 °C for 30 s, with a 1-s/cycle extension. To avoid amplification of genomic DNA, at least one oligonucleotide from each primer pair was designed around a splice junction (listed in supplemental table 2). The following Drosophila melanogaster mRNAs were used in parallel with the fly ppXyl-T (accession number AJ430595
[GenBank]
) (22) RT-PCR for additional endogenous fly glycosyltransferase controls: OST1 (oligosaccharyltransferase I/ribophorin-1; accession number NM_205958
[GenBank]
), pgant5 (polypeptide GalNAc transferase-T5; accession number AY268066
[GenBank]
) (23), and Gal-T1 (
1,4-galactosyltransferase 7; accession number NM_143062
[GenBank]
). The loading of PCR products from each S2 cell preparation was normalized, using the level of the PCR product for the pgant5 cDNA target.
Neural Network TrainingA neural network does not understand letters, so the amino acid sequence must be translated into numbers. We used a sparse encoding method (24, 25), which is the conventional way to convert the amino acid sequence into numerical form. The neural networks were of the two-layer feed-forward type, trained by standard back-propagation. We presented a varying number of amino acids on either side of the modification site to the network and also changed the number of neurons in the hidden layer to find the optimal complexity for this particular prediction problem. The predictive performance was monitored using the Matthews correlation coefficient during training and test of the network (26). The data set comprised all serine peptides presented in this paper as well as two positive and 25 negative serine sites experimentally determined in full-length C. elegans syndecan/SDN-1,3 giving a total of 21 positive sites and 42 negative sites. The 63 sites were randomly divided into seven sets of nine sites each, where each set contained both negative and positive sites. Using the method of cross-validation, every network was trained seven times using six sets as training sets and one set as the test set. The reported cross-validation performance is the joint performance of the seven resulting networks on their respective test sets.
|
| RESULTS |
|---|
|
|
|---|
Drosophila S2 cells make up a favorable cell culture system, because they have high transfection rates and are sensitive to RNA interference. Co-transfection was used to express the worm ppXyl-T glycosyltransferase and SDN-1A reporter, whereas RNAi was used to knock down the expression of the endogenous fly ppXyl-T mRNA. A single dose of a 944-bp ppXyl-T dsRNA fragment was effective in reducing the level of ppXyl-T mRNA in untransfected cells, transfected cells, and cells expressing worm ppXyl-T, over a 4-day cell culture period (Fig. 2A, lanes 2, 4, and 5, respectively). This RNAi treatment was specific for the endogenous fly ppXyl-T mRNA and did not influence the transcript levels of endogenous OST1 (oligosaccharyltransferase I/ribophorin-1), pgant5 (polypeptide Gal-NAc transferase-T5), and Gal-T1, three key glycosyltransferases in the N-glycosylation, mucin-type, and proteoglycan O-glycosylation pathways, respectively.
Recombinant proteins were purified on an anti-FLAG antibody resin, so that the ratio of glycosylated and unglycosylated forms of the reporter could be determined. The preparations were not passed on an anion exchange or size exclusion resin to artificially enrich for weakly glycosylated isoforms of the reporter protein. Approximately 30% of the wild type reporter was expressed as an unmodified secreted polypeptide with an apparent molecular mass of 15 kDa, whereas 70% of the reporter migrated on SDS-PAGE as a second band with a higher molecular weight (Fig. 2B, lane 1). This high molecular weight species disappeared following ppXyl-T RNAi treatment (Fig. 2B, lane 2) that reduced the majority of the mRNA transcript levels for the endogenous Drosophila ppXyl-T (Fig. 2A, lane 4).
To evaluate the action of C. elegans ppXyl-T on the SDN-1A reporter peptide, worm ppXyl-T and SDN-1A reporters were co-transfected into these RNAi-treated cells. Co-expression of worm ppXyl-T with the SDN-1 reporter resulted in a recapitulation of the higher molecular weight, glycosylated species of the SDN-1 reporter. (Fig. 2B, lanes 3 and 4). Therefore, the RNAi and ppXyl-T co-expression experiments both support the finding that the worm ppXyl-T acts on the 15-kDa SDN-1 reporter peptide to produce a higher molecular weight glycosylated form, bearing xylose-containing proteoglycan chains. Treatment with O-glycanase and N-glycanase showed no alteration in electrophoretic mobility of the glycosylated reporter (Fig. 2B, lane 4), indicating that the reporter was not modified with N-or O-linked glycans, whereas control proteins (human salivary amylase with an N-linked glycan and a C. elegans LET-653 glycoprotein with a mucin domain) were susceptible to this glycosidase treatment (Fig. 2D). The absence of a mobility shift following glycosidase treatment of the SDN-1A reporter indicates that neither the reporter peptide insert (DIEVNGS71GYPTDD) nor sequences from the fusion protein were modified with potential N-glycans or mucin-type O-glycans, despite the fact that consensus sequences for these sites appeared to be present (NGS71 is a potential N-glycosylation site, and S71GYPT75 could be a potential mucin site).
Co-expression of the nematode ppXyl-T and the S0A mutant reporter (with a serine
alanine mutation at Ser71) produced a protein that migrated as the unmodified reporter (Fig. 2, B and C, reporter 2). Similarly, a serine
threonine mutation resulted in a peptide (S0T reporter) that was not detectably glycosylated, indicating that the nematode ppXyl-T has a strong requirement for the serine residue in the engineered proteoglycan modification site (Fig. 2, B and C, reporter 3). Therefore, Ser71
Ala/Thr mutagenesis results support the conclusion that Ser71 is the only proteoglycan modification site within the complete reporter protein (Fig. 1C) and that effective proteoglycan attachment sites strongly favor a serine, not a threonine. Unless specified otherwise, all further reporter experiments were expressed in cell culture conditions where RNAi was used to knock down the endogenous fly ppXyl-T, and co-transfection was used to co-express the C. elegans ppXyl-T with the reporter protein.
|
|
alanine replacements. The wild type sequence flanking Ser71 is GS71GYP. Conversion of both glycine residues to alanine at positions -1 and +1 abolished proteoglycan modification of Ser71, since no glycosylated form of the G(-1,+1)A mutant was detected (Fig. 3A, reporter 4). The requirement of the glycine residue appeared very strict, since glycosylation was abolished when either glycine at position -1 or +1 was mutated to alanine. Both G-1A and G+1A mutants lost the high molecular weight (glycosylated) species upon SDS-PAGE (Fig. 3A, reporters 5 and 6). Although two proximal glycine residues appeared to be essential, the significance of a distal glycine (at position +3) was also evaluated, because previous studies observed that a significant subset of proteoglycans possess a glycine residue at position +3 (16). Therefore, the SDN-1A reporter sequence was mutated from GSGYP to GSGYG (Fig. 3A, reporter 7). This proline
glycine mutation at position +3 resulted in a 23% increase in proteoglycan site occupancy from 70 to 86%, relative to the wild-type reporter (Fig. 3B, reporter 1). It is possible that replacement of the +3 proline relieved a slight inhibitory effect on ppXyl-T substrate activity, since the proline
alanine mutant at +3 (reporter 8) had a modest increase in glycosylation state occupancy (78%) relative to the 70% glycosylated species of the wild type, proline-containing site (Fig. 3, compare peptides 7 and 8).
Proteoglycan Sites Have a Position and Density Requirement for Acidic Amino Acid ResiduesData sets of mammalian glycosylation sites show that aspartic and glutamic acid residues cluster in the N- and C-terminal flanks of proteoglycan modification sites (14, 28-32). To carefully define the acidic amino acid requirements for proteoglycan modification sites, scanning mutagenesis was performed to create a panel of reporters, in which the position and density of the acidic amino acid residues were systematically varied along the length of the peptide sequence flanking the glycosylation site (Table 1). Note that in the SDS-PAGE analysis, the glycosylated form of the reporter migrated at a higher molecular weight than its unglycosylated state. Furthermore, the electrophoretic mobility of the unmodified reporter was slightly reduced when acidic amino acid residues were present. For example, the alanine-rich mutant (reporter 10) ran faster than the more acidic wild type reporter (reporter 1; Fig. 4A and Table 1). This electrophoretic mobility increase for Asp
Ala mutants most likely resulted from increased binding of SDS to peptide segments containing fewer negatively charged side chains.
|
|
Each of these initial scanning mutants (reporters 13-18) contained one N-terminal acidic amino acid, in addition to two distal C-terminal aspartate residues. To determine if the N-terminal acidic amino acids in reporters 15-17 were the only acidic amino acids required for a functional glycosylation site, the C-terminal aspartates (positions +5 and +6) were mutated to alanine in reporters 19-24 (Table 1). Reporters with a single aspartic acid residue were only glycosylated when a single acidic residue is in position -4, -3, or -2 (reporters 19-21; Fig. 4C). In contrast, comparable C-terminal scanning mutants with aspartic acid at positions +2, +3, and +4 were not glycosylated (peptides 22-24; Fig. 4C and Table 1). Although the reporters with a single N-terminal acidic residue at positions -4, -3, and -2 were glycosylated, their level of glycosylation was reduced to 48, 39, and 29% occupancy, relative to 70% occupancy for analogous reporters that contained an additional pair of acidic residues in the C-terminal flank (reporters 19-21; Table 1). This increase in the occupancy of the glycosylation site when C-terminal aspartate residues were present suggested that distal acidic residues on the C-terminal flank could enhance the rate of glycosylation, although C-terminal acidic residues alone were not sufficient for an efficient glycosylation site.
Our data from scanning mutagenesis suggested that increased negative charge density (on the C terminus) was important for maximizing glycosylation states of peptide reporters. To explore whether an increased charge density influenced glycosylation state occupancy in a position-dependent manner, we performed additional mutagenesis experiments to vary the position of four charged (Asp/Glu) residues (Table 1, peptides 25-32). Reporters analysis revealed that two acidic amino acids in a large variety of positions on the N terminus were equally glycosylated to 60-70% occupancy (Table 1; gel is not shown for peptides 25-32, since these reporters run identically to the wild type reporter 1). Surprisingly, the reporter (-6,-5)D with two acidic residues at positions -6 and -5 was fully glycosylated (reporter 25; Fig. 4B and Table 1). Here the acidic residues were outside the -4 to -2 range, previously shown to be essential for glycosylation in reporters containing single acidic residues. This means that the increased charge density of two negative charges at positions -6 and -5 lead to a functional glycosylation site, whereas a single acidic residue at either -6 or -5 was not a sufficient signal for glycosylation (in reporters 13 and 14; Fig. 4B).
In summary, these studies on acidic amino acids in the flanking sequences indicated that the presence, position, and density of acidic amino acid residues influence the initiation of proteoglycan addition. The most discriminating position for acidic amino acids is in the -4 to -2 range (N-terminal) to the glycosylation site, since only one acidic amino acid in this position is sufficient to confer substrate recognition by the ppXyl-T proteoglycan modification machinery. Furthermore, the presence of an N-terminal cluster of acidic amino acid residues can expand the acidic window out to the -6 position. In contrast, single or clustered acidic amino acids on the C-terminal flank of the serine are neither necessary nor sufficient for xylose addition, but in combination with N-terminal acidic residues, they can increase glycosylation site occupancy rates.
|
Ala mutation at position +3 in wild-type reporter was evaluated and compared with other proline mutants. Mutant P+3A reporter protein (peptide 8; Figs. 3A and 5A) did not show a large change in the glycosylation state, being glycosylated at 78 ± 2.3% relative to 70 ± 1.2% for the wild-type reporter (reporter 1), indicating that proline is only slightly inhibitory at best or that the glycosylated serine is outside of the range of proline's influence. To discover the effective range or tolerance of proline residues, proline-scanning mutagenesis was performed to produce a set of reporters with a single proline residue from position -3 through +3 (Fig. 5A, peptides 33-37 and 1, respectively). Scanning results showed that the placement of a proline residue at positions -3, -2, and +3 had a negligible effect on the glycosylation of the proteoglycan reporters (reporters 33, 34, and 1). However, when the proline residue was placed at position -1, +1, or +2, glycosylation of the peptides was essentially eliminated (Fig. 5A, reporters 35-37), indicating that flanking proline residue blocked the activity of ppXyl-T on the substituted serine for proteoglycan synthesis. However, the loss of glycosylation of the -1P and +1P reporter mutants may have resulted from the elimination of essential glycine residues at these two specific positions, flanking the glycosylated serine. The results from scanning mutagenesis with a single proline revealed a very narrow effective range of proline's inhibition on proteoglycan substitution site (position +2 appears to be uniquely sensitive to proline). In order to evaluate if a higher density of proline sites confers a greater inhibition, the accumulated effects of two proline residues were studied by introducing a second proline residue at position +3, which is the position of a naturally occurring proline in the wild type SDN-1A sequence (peptide 1; Fig. 5C). As with -3P and -2P single proline reporters, proline double mutants (-3,+3)P and (-2,+3)P were glycosylated but with a very slight loss in the glycosylation site occupancy, at 56 and 63%, respectively, relative to 70% glycosylation for the wild type reporter (Fig. 5B, reporters 38 and 39). No detectable glycosylation on mutant (-1,+3)P, (+1,+3)P, and (+2,+3)P was obtained (peptides 40-42; Fig. 5, B and C). Taken together, proline-scanning mutagenesis suggested that serine residues with a neighboring proline at positions -1, +1, and +2 did not form favorable proteoglycan substitution sites.
Comparisons of Worm and Fly Proteoglycan Modification ReactionsTo define whether there is conservation in the proteoglycan modification machinery encoded by C. elegans and Drosophila, nine discriminating reporter proteins were expressed in S2 cells with the endogenous fly ppXyl-T enzyme. Reporter peptides SDN(A), S0A, G(-1,+1)A, D/E(-6,-4A), -6D, -4E, P+3A, +2P, and (-2,+3)P (reporter peptides 1, 2, 4, 11, 13, 15, 8, 37, and 39, respectively) were selected to measure the glycosylation state of Ser71 in a peptide sequence background that probes the influence of glycine at position -1 and +1, of acidic amino acids at positions -6 and -4 and of proline at +3, +2, and -2. The level of glycosylation occupancy with these substrate peptides catalyzed by fly ppXyl-T was comparable with that of the nematode ppXyl-T (data summarized in Table 2). Therefore, the ppXyl-T substrate specificity (peptide sequence) requirements for glycine, acidic amino acids, and proline in proteoglycan modification sites appear evolutionarily conserved in the fly and worm. Such conservation in substrate recognition sequences of ppXyl-T suggests that a proteoglycan site predictor algorithm, based on C. elegans proteoglycans, would be a useful tool for future studies.
|
|
The best network performance was found when presenting five amino acid residues on either side of the modified serine (a sequence window of 11 residues) and using seven hidden neurons. With this sequence window, the input data to the network are identical for the negative peptide -5D (reporter 14 in Table 1) and the positive peptide (-6,-5)D (reporter 25 in Table 1). Because of this, the predictor cannot predict both of these sites correctly. Wrongly predicting peptide -5D as positive is, in fact, the only mistake the best predictor makes. Therefore, it correctly predicts 100% of the positive sites and 98% of the negative sites, while 95% of the predicted sites are in fact true positives. The Matthews correlation coefficient is 0.97, to be compared with 1, which is the correlation coefficient of a perfect predictor, and 0, which is the correlation coefficient of a random guess.
We conclude that a predictor can correctly learn the recognition pattern described by the peptide data presented here. However, it is likely that the sequence space has not been sufficiently sampled, since all sites come from the same protein, and most are mutants of the same site and show limited sequence variation. This is the first proteoglycan site predictor developed. In the future, an improved predictor will be developed, as experimental data on additional proteoglycan sites become reported and incorporated into the data set, and this will be made publicly available.
Proteoglycan Sites in MammalsThrough a search of the original literature, we have been able to identify 37 experimentally verified proteoglycan sites in mammalian proteins (supplemental Table 3) plus a number of mutant and peptide sites, based on these sites, making a total of 51 experimentally verified sites. Sequence logos (36) of the mammalian proteoglycan sites and the C. elegans data show the frequencies of amino acid residues at each position, as the relative height of each letter (Fig. 6). The degree of sequence conservation is reflected by the total height of a stack of letters, measured in bits of information. Position zero denotes the location of the glycosylated serine residue. Sequence weighting was performed to reduce the impact of very similar sequences (37). The most striking difference between the logos for the mammalian and C. elegans data is that information content is much higher in the C. elegans case, even for peripheral positions not believed to be very important. This is due to lack of sufficient sequence variation in the C. elegans data and illustrates exactly why our prediction method is only preliminary. Apart from this, the recognition sequence in mammals is clearly related to the one in C. elegans. There is a strong preference for glycine in positions -1 and +1, although position -1 is weaker for mammals than for C. elegans. Also, there is a preference for acidic residues in the vicinity of the modification site, especially N-terminally in positions -2, -3, and -4.
The preliminary C. elegans proteoglycan predictor was tested on the mammalian sites to quantify the similarities. The preliminary C. elegans predictor correctly predicts 41% of the positive sites and 98% of the negative sites. 49% of the predicted sites are in fact true positives. The Matthews correlation coefficient is 0.43. At this point, it is impossible to know for sure if this less than perfect performance is due to evolutionary differences or the preliminary nature of the algorithm. However, it is reasonable to conclude that the recognition sequences are evolutionarily related, since a correlation coefficient of 0.43 is well above the performance of a random guess.
|
| DISCUSSION |
|---|
|
|
|---|
The favorable sequences that flank serine in glycosylation sites provide some clues to the characteristics of the active site of ppXyl-T. Glycine, in immediate proximity to the modified serine, suggests that either the peptide backbone of the glycosylation site requires some structural flexibility or that spatial constraints (bulky amino acids in the active site) limit access of peptide substrates to the enzyme. Indeed, surveys of the sequences surrounding proteoglycan modification sites have revealed a high frequency of glycine residues adjacent to the modified serine (16). Strikingly, the relative occurrence of glycine residue at position +1 is 98% (16). Our mutagenesis studies on glycine provide direct evidence for the requirement of glycine residues at both positions -1 and +1 for xylosylation of Ser71. This requirement for a small, flexible residue appears real, since replacement of -1 glycine with alanine, proline, or aspartate inhibited glycosylation, although the aspartate in other N-terminal positions was shown to be required for proteoglycan addition.
Mutagenesis data with acidic residues further suggest that there is some directionality of how peptide sequences are fed into the active site of the ppXyl-T enzyme. The presence of N-terminal acidic amino acids is required for the recognition and xylosylation of the serine residue in SDN-1A (in reporters 12 and 19-21), and these acidic positions are sufficient, since glycosylation of these reporters does not require C-terminal acidic residues. Analogous reporters that only contain one or more acidic residues on the C-terminal side were not glycosylated (reporters 11 and 22-24). Therefore, the peptide backbone of the glycosylation site must interact with the ppXyl-T active site to orient peptides into a specific N-terminal to C-terminal direction. The requirement of acidic residues suggests that specific electrostatic interactions must exist between the negatively charged side chains of the glycosylated substrate and residues on one side of the active site of the enzyme. The spacing of the interactions between the substrate's negative charge and the modified serine is restricted, since aspartic acid at position -1 inhibits xylosylation and since the most favorable positions for negatively charged amino acids range from -4 to -3 and -2 on the N-terminal side of the modification site. At more distal positions -5 and -6, a single acidic residue does not promote proteoglycan glycosylation; however, if both distal positions (-6 and -5) are negatively charged, then serine glycosylation is efficient. Therefore, N-terminal acidic residues are required but with a somewhat loose position dependence, suggesting that acidic residues may be important for tethering the N-terminal flanking sequence of a glycosylation site near the active site of the enzyme and not for precise positioning of the substrate for catalysis.
The proline-scanning mutagenesis data support the hypothesis that the ppXyl-T enzyme may require substrates with a proximal glycine for peptide backbone flexibility at the glycosylation site. First, proline at either -1or +1 inhibits xylosylation, indicating that the structural flexibility and small size provided by glycine are essential on both sides of the modified serine in proteoglycan sites. Second, a proline positioned between Asp-4 and Gly-1 is tolerated in functional glycosylation sites and does not inhibit glycosylation in reporters 33 and 34. This latter observation supports a tethering role of N-terminal acidic residues for loosely feeding serine residues to the active site and not precisely positioning the modified residue.
Flanking Sequences Regulate Multiple Glycosylation PathwaysIt is noteworthy that there are 47 serine and threonine residues in the 202-amino acid extracellular domain of C. elegans SDN-1 protein. Much of this extracellular domain is predicted to consist of Ser/Thr-rich mucin-type glycosylation sites, based on NetOGlyc analysis (38). Up to three proteoglycan modification sites in SDN-1 were speculated previously, by searching for "SG" sequences. There is also a potential for consensus sequence overlap of N-glycosylation sites and mucin-type O-glycosylation sites at proteoglycan modification sites. For example, the SDN-1A reporter sequence (DIEVNGS71GYPTDD) used in this study contains 1) a proteoglycan modification site at Ser71, 2) a consensus sequence for an N-glycosylation site (NGS) at Asn69, and 3) a possible mucin modification site at Ser71 or Thr75. A consensus overlap model for competing glycosylation pathways could influence glycosylation site occupancy at any one of the glycosylated positions (39); however, in our studies, different classes of glycosylation did not appear to co-exist on the same peptide reporter. Ser71 in the wild type reporter is only a proteoglycan modification site, since it was only modified in the presence of fly or worm ppXyl-T. N-Glycanase or O-glycanase treatment of the reporter proteins indicated that neither N-linked nor mucin O-linked sugars were present. Moreover, a conservative mutation of Asn69 to glutamine eliminated the potential N-glycosylation site (Table 1, reporter 9) but did not show any variation in electrophoretic mobility (gel not shown), indicating that N-linked modification did not occur.
The mutagenesis data in this paper suggest that multiple interactions between the glycosylation site and the glycosyltransferase enzyme may govern how polypeptide sugar transferases identify and modify their downstream target proteins, mucins and proteoglycans. It appears that residues that favor or are required for one glycosylation pathway are, in contrast, neutral or inhibitory for the other pathway, suggesting that sequence requirements are entirely different (Table 3).
|
In our study, the Drosophila ppXyl-T in cultured cells showed the same preference for recognition sites as that of the worm enzyme. This observation of cross-species conservation is in agreement with recent findings that ppXyl-T orthologues from flies, worms, and humans are able to recognize similar peptide substrates in vitro (12) This latter study also reinforces our sequence comparisons of mammalian and worm proteoglycan sites (Fig. 6), which show that the recognition sequence for the ppXyl-T is evolutionarily conserved, at least to some extent, even in mammals. This means that any good predictor method developed for C. elegans proteoglycan sites will have some relevance to other metazoan organisms as well. In the future, when more C. elegans proteoglycan sites have been identified, we will develop a better predictor, which more confidently can predict proteoglycan sites in different sequence contexts. This predictor will be made publicly available.
In summary, the data presented here indicate that the flanking amino acid sequence significantly affects the recognition and modification of substrate serine residues by polypeptide xylosyltransferase, the key glycosyltransferase that defines proteoglycan modification sites. Recent studies suggest that the local conformation and accessibility for ppXyl-T might be another important determinant (15, 44). Therefore, the logical progression of this work is to examine the present sequence rules on native proteoglycan core proteins. Future efforts that identify the ppXyl-T enzyme's substrate-binding site and its crystal structure will help clarify the mechanism by which individual amino acids regulate proteoglycan addition. Nevertheless, our studies suggest that a defined collection of mapped glycosylated sequences can be used as a basis for neural network predictions of proteoglycan modification sites.
| FOOTNOTES |
|---|
The on-line version of this article (available at http://www.jbc.org) contains supplemental Tables 1-3. ![]()
1 To whom correspondence should be addressed: Box 611, University of Rochester Medical Center, 601 Elmwood Ave., Rochester NY 14642. Tel.: 585-275-0336; Fax: 585-276-0190; E-mail: fred_hagen{at}urmc.rochester.edu.
2 The abbreviations used are: GAG, glycosaminoglycan; ppXyl-T, polypeptide xylosyltransferase; SDN-1, syndecan-1; WT, wild type; RNAi, RNA interference; dsRNA, double-stranded RNA; BisTris, 2-[bis(2-hydroxyethyl)amino]-2-(hydroxymethyl)propane-1,3-diol; MOPS, 4-morpholinepropanesulfonic acid; RT, reverse transcription. ![]()
3 H. Wang and F. K. Hagen, unpublished results. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. J. Zoeller, W. Pimtong, H. Corby, S. Goldoni, A. E. Iozzo, R. T. Owens, S.-Y. Ho, and R. V. Iozzo A Central Role for Decorin during Vertebrate Convergent Extension J. Biol. Chem., April 24, 2009; 284(17): 11728 - 11737. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Zoeller, A. McQuillan, J. Whitelock, S.-Y. Ho, and R. V. Iozzo A central function for perlecan in skeletal muscle and cardiovascular development J. Cell Biol., April 21, 2008; 181(2): 381 - 394. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Molecular and Cellular Proteomics |
| Journal of Lipid Research | ASBMB Today |