Advertisement
JBC

HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Originally published In Press as doi:10.1074/jbc.M609193200 on March 16, 2007

J. Biol. Chem., Vol. 282, Issue 19, 14586-14597, May 11, 2007
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow All Versions of this Article:
282/19/14586    most recent
M609193200v1
Right arrow Submit a Letter to Editor
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wang, H.
Right arrow Articles by Hagen, F. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wang, H.
Right arrow Articles by Hagen, F. K.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Systematic Analysis of Proteoglycan Modification Sites in Caenorhabditis elegans by Scanning Mutagenesis*Formula

Huan Wang{ddagger}, Karin Julenius§, Jennifer Hryhorenko{ddagger}, and Fred K. Hagen{ddagger}1

From the {ddagger}Department of Biochemistry and Biophysics, Center for Oral Biology, Aab Institute of Biomedical Sciences, University of Rochester Medical Center, Rochester, New York 14642, the §Division of Matrix Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-17177 Stockholm, Sweden, and the Stockholm Bioinformatics Center, SCFAB, Stockholm University, SE-10691 Stockholm, Sweden

Received for publication, September 27, 2006 , and in revised form, February 21, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Proteoglycan modification is essential for development and early cell division in Caenorhabditis elegans. The specification of proteoglycan attachment sites is defined by the Golgi enzyme polypeptide xylosyltransferase. Here we evaluate the substrate specificity of this xylosyltransferase for its downstream targets by using reporter proteins containing proteoglycan modification sites from C. elegans syndecan/SDN-1. The N terminus of the SDN-1 contains a Ser-Gly proteoglycan site at Ser71, flanked by potential mucin and N-glycosylation sites. However, Ser71 was exclusively used as a proteoglycan site in vivo, based on mapping studies with a Ser71 reporter protein, glycosyltransferase RNA interference, and co-expression of worm polypeptide xylosyltransferase. To elucidate the substrate requirements of this enzyme, a library of 42 point mutants of the Ser71 reporter was expressed in tissue culture. The nematode proteoglycan modification site in SDN-1 required serine (not threonine), two flanking glycine residues (positions -1 and +1), and either one proximal acidic N-terminal amino acid (positions -4, -3, and -2) or a pair of distal N-terminal acidic amino acids (positions -6 and -5). C-terminal acidic amino acids, although present in many proteoglycan modification sites, had minimal impact on xylosylation at Ser71. Proline inhibited glycosylation when present at -1, +1, or +2. The position of glycine, proline, and acidic amino acids allows the glycosylation machinery to discriminate between mucin and proteoglycan modification sites. The key residues that define proteoglycan modification sites also function with the Drosophila polypeptide xylosyltransferase, indicating that the specificity in the glycosylation process is evolutionarily conserved. Using a neural network method, a preliminary proteoglycan predictor has been developed.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Proteoglycans are proteins bearing a variable number of glycosaminoglycan (GAG)2 chains. Most of our knowledge of proteoglycans is based on studies involving cell surface, extracellular matrix, and connective tissue in the mammalian system, where proteoglycan sugar chains function in a broad variety of cellular and physiological activities, such as cell differentiation, signaling, adhesion, and division; blood coagulation; and wound repair (1, 2). Loss of function studies in Caenorhabditis elegans have demonstrated that glycosyltransferase-mediated modification of proteoglycans is essential for embryonic development. Both strong mutations and RNA interference (RNAi) of the glycosyltransferases in this pathway lead to maternal effect embryonic lethality, caused by a failure to complete cytokinesis in the early cell cleavages (3-5). Hypomorphic alleles of these same genes show that this protein modification pathway is important for egg laying, vulva development, and adult viability (6-8). Sulfotransferase modification of proteoglycan sugar chains is highly regulated and critical for cell migration in C. elegans (9). In addition, the nematode syndecan protein SDN-1, a major heparan sulfate proteoglycan, is required for neuronal and coelomocyte cell migration and axon guidance (10). SDN-1 and another heparan sulfate proteoglycan glypican/GPN-1 (glypican-1) interact with Kallmann syndrome protein Kal-1 via the sulfated GAG chains to promote ventral neuroblast migration prior to epidermal enclosure of the C. elegans embryo (11). Although the proteoglycan chains are critical for these processes, there are essentially no experimental data on the number and position of the sugar chains in these nematode proteoglycans. However, two very recent studies have provided insights into the glycosylation of worm proteoglycans. Recombinant worm, fly, and human glycosyltransferases were capable of modifying similar peptides derived from human and fly proteoglycans in an in vitro assay (12), and nine C. elegans chondroitin proteoglycans have been identified and partially mapped for glycosylation sites by mass spectrometry (13).

Biosynthesis of proteoglycan sugar chains involves numerous synthetic and processing steps to attach the first sugar, build a linker tetrasaccharide, and synthesize and modify a GAG disaccharide repeat. The first step in the glycosylation of a proteoglycan modification site is the catalyzed transfer of xylose to a specific serine in a core protein by the polypeptide xylosyltransferase (ppXyl-T; EC 2.4.2.26 [EC] ), using the sugar donor UDP-xylose. It is the ppXyl-T enzyme's substrate specificity that defines which serines in a core protein become modified with proteoglycan-type sugar chains.

Secreted proteins frequently contain large numbers of serine and threonine residues in their sequences, yet only selected residues within a core protein appear to be recognized and modified by ppXyl-T. As with most modification sites, it is the primary and secondary structure of the proteoglycan addition site that define or regulate the critical interactions between the modified serine residue and the active site of the initiating enzyme, ppXyl-T. Such enzyme-substrate interactions must be very specific, since the ppXyl-T enzyme must, otherwise, compete with a large number of other polypeptide sugar transferases that O-glycosylate serine (and threonine), such as the family of polypeptide GalNAc transferases, which initiate mucin-type O-glycosylation. The attachment of the first sugar (xylose or GalNAc) to a serine is a key discriminating event, since the first attached carbohydrate moiety becomes the substrate for the addition of the next sugars in the proteoglycan tetrasaccharide or mucin oligosaccharide chain structures.

Initial comparison of mammalian GAG attachment sites suggested that the sequence motif Ser0-Gly+1-Xaa+2-Gly+3 may be a minimal protein sequence context for proteoglycan modification sites (14). However, there are cases where mapped proteoglycan sites lack the Gly+3. In addition, glycine -> alanine substitution at the +1- and +3-positions in decorin did not ablate GAG chain addition (15). Amino acid alignment of 51 experimental and predicted mammalian chondroitin sulfate attachment sites suggested the following recognition sequence: a-a-a-a-Gly-Ser-Gly-a-b-a (where a represents acidic amino acids (Glu or Asp), and b represents Gly, Glu, or Asp) (16). Another comparison of over 30 heparan sulfate and chondroitin sulfate glycosaminoglycan attachment sites showed that at least two acidic amino acid residues were present on one or both sides of the proteoglycan substitution site (17). Indeed, in vitro kinetic studies also supported the observation that the most favorable peptide substrates for proteoglycan attachment include clusters of acidic amino acids flanking a Ser-Gly dipeptide or Ser-Gly-Xaa-Gly (14, 16).

For a number of reasons, the current state of published data on the experimentally mapped proteoglycan sites is suboptimal for producing a predictive algorithm for glycosylation sites and for understanding the enzyme-substrate requirements of the proteoglycan glycosylation process. First, the experimental data set is small and not uniform. The past data sets group poorly glycosylated sequences with strong sites and combine glycosylation states and rates from in vivo or in vitro studies. Second, systematic mutagenesis studies have not been performed on the flanking sequence of the modification sites. Furthermore, some of the published data sets of proteoglycan sites include predicted sites without experimental mapping data. In our present study, we used 46 site-directed mutant reporter-protein constructs to investigate the influence of amino acids from position -6 to +6 surrounding serine on the xylosylation of reporter substrates. The reporter sequence is derived from a glycosylated region of the C. elegans cell surface proteoglycan, SDN-1. Reporter proteins were co-expressed in tissue culture with either a nematode or fly polypeptide xylosyltransferase, and the modification state was examined to determine what role the flanking amino acids have on proteoglycan modification in vivo. The results of this analysis define peptide sequence requirements for proteoglycan modification sites and provide insights into how the cellular machinery distinguishes between the modification of hydroxyamino acids in mucin and proteoglycan biosynthetic pathways.


    EXPERIMENTAL PROCEDURES
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Construction of DNA Vehicles for Expression of Worm Proteins in Insect Cells—Reporter constructs with a single glycosylation site were built by phosphorylating and annealing complementary sense and antisense oligonucleotides (listed in supplemental Table 1) to produce a cDNA insert for ligation into the BsaI sites of the vector pLC-S2-SP-BsaI, a low copy origin of replication derivative of pMT plasmid (Invitrogen). The pLC-S2-SP-BsaI plasmid contains a modified cloning site that is downstream from a metallothionein promoter, after an N-terminal secretory signal peptide, and an S-tag peptide (KETAAAKFERQHMDS) fused to a FLAG peptide epitope (DYKDDDDK) and a high affinity metal chelate binding site (HHWHHH) (18). C-terminal to the insert is a second FLAG and metal chelate site; consequently, the peptide insert is flanked by the fusion protein on both sides (Fig. 1). Therefore, all reporter peptide sequences are expressed in the same structural context of the fusion protein and not positioned at either end of the protein. Purification of recombinant reporter proteins was achieved, using the FLAG epitope and anti-FLAG M2 antibody-agarose, whereas detection was accomplished by exploiting the affinity of an S-protein-horseradish peroxidase conjugate for the recombinant S-tag in the fusion protein.

To express the worm ppXyl-T enzyme, the full-length cDNA of the ppXyl-T enzyme was amplified with the primers PRIM346 (d(CACATGCATAGTTTTTACAGGGCGTGCTCCTCATGT)) and PRIM69 (d(CTCCGGAGCGGCCGCTAAATCAAGGTCTGCGTATC)). This PCR product was digested with NsiI and NotI and inserted into pLC-S2 to make the plasmid pLC-S2-FL-pXT.

Ligated plasmids were transformed into Escherichia coli and isolated by plasmid minipreparation (Qiagen Turbo 8). DNA sequencing was performed on all plasmids to identify the cDNA insert, its reading frame, and the specific amino acid substitution.

Expression of the Peptide Reporters in Insect CellsDrosophila Schneider (S2) cells were cultured in SDM medium (Gibco) supplemented with 10% (v/v) fetal bovine serum and 1% penicillin/streptomycin at 27 °C. Cells were passaged every 3-4 days. For transfections, 106 cells were added to each well of a 6-well plate, incubated for 1 day, and then treated with a DNA-FuGENE-6 mixture. Briefly, 3 µl of FuGENE-6 (Roche Applied Science) transfection reagent was diluted into 97 µl of serum-free SDM medium and incubated at room temperature for 5 min. Then 1 µg of reporter plasmid DNA was added with mixing to the diluted FuGENE-6, and this mixture was incubated for 20 min at room temperature. Then the DNA-FuGENE-6 mixture was added in a dropwise manner to cells. Vigorous agitation/swirling was used to mix the cells and transfection reagent. For co-expression of worm polypeptide xylosyltransferase with the reporter proteins, 0.1 µg of pLC-S2-FL-pXT plasmid DNA was included in the transfection mix with the reporter plasmid. After a 1-day incubation at 27 °C, transfected cells were induced for protein expression by the addition of 600 µM CuSO4. After 3 days, cell culture medium, containing the secreted reporter, was harvested and clarified by centrifugation at 2000 rpm for 10 min.

RNA Interference of Fly ppXyl-T Activity in Insect Cells—For RNA interference, double-stranded RNA (dsRNA) was synthesized in vitro and added to insect cells immediately prior to transfection. To prepare dsRNA, fly ppXyl-T cDNA was first amplified from total S2 cell cDNA and cloned into the MluI and NotI sites of the pIMKF3 plasmid (19), using the primers Prim357 (d(CTGCACGCGTACTGGCAGTCCCTGTATCA)) and Prim358 (d(GTGTGCGGCCGCTTCATTTGAGCAGGGCATCCACATC)). Next, T7 promoter sites were introduced to each end of the fly ppXyl-T cDNA by PCR amplification of the vector's insert, using the following oligonucleotides: Prim652, d(GTCCATAATACGACTCACTATAGGGAAGACGATGACGATAAACACGC); Prim376, d(GTCCATAATACGACTCACTATAGGGCATGTCTGGATCCGTGGTC). The 5' ends of these later two primers contain a T7 RNA polymerase promoter sequence; therefore, this places T7 RNA polymerase promoters at both ends of the fly ppXyl-T PCR product. dsRNA was produced using T7 RNA polymerase and 100 ng of the latter PCR product in an in vitro transcription reaction (Megascript Kit; Ambion). Following a 6-h transcription reaction at 37 °C, the RNA products were subjected to a denaturation step of 2 min at 90 °C and a slow annealing temperature gradient of 80-30 °C over a 30-min period. The dsRNA was phenol-extracted, ethanol-precipitated, and resuspended to a final concentration of 5 mg/ml.

For RNAi treatment, S2 cells were plated at 106 cells/ml (2 ml/well) in 6-well plates (Corning Glass). After a 24-h incubation at 27 °C, the cells were washed twice with serum-free SDM medium, and then 1 ml of serum-free SDM medium containing 15 µg of ppXyl-T dsRNA was added to each well and immediately mixed with vigorous agitation and swirling. After a 2-h incubation at 27 °C, 2 ml of SDM with 10% fetal bovine serum was added to each well. These dsRNA-treated cells were then transfected with reporter plasmids, as described above.

Immunoprecipitation and Glycanase Treatment—To purify the reporter proteins, 25 µl of anti-FLAG M2 agarose was added directly to 1 ml of clarified cell culture medium. The affinity resin was rocked at 4 °C overnight and was washed twice in Buffer A (20 mM Tris, 200 mM NaCl, 5% glycerol, pH 7.4). Reporter proteins bound to beads were treated with O-glycanase (0.125 milliunits of endo-{alpha}-N-acetylgalactosaminidase), sialidase (0.5 milliunits of {alpha}2-3,6,8,9-neuraminidase), and N-glycanase (500 milliunits of peptide N-glycanase F) in a final volume of 48 µl for 5 h at 37 °C, according to the manufacturer's specifications (Glycoprotein Deglycosylation Kit; Calbiochem). For SDS-PAGE analysis, 16 µl of 4 x NuPage sample loading buffer was added to the glycosidase-treated samples. Control N-glycanase digestions used human salivary amylase, which migrates as a mixture, containing a nonglycosylated and an N-glycosylated isoform (20). An O-glycanase control contained a predicted mucin-type O-glycosylated domain, rich in serine/threonine/proline from C. elegans LET-653 (21), which was expressed in S2 cells, under the same conditions as the proteoglycan reporter proteins.

Glycoprotein Detection and Occupancy Determination—Each reporter construct was expressed in S2 cells in triplicate, and the glycosylation state was analyzed by Western blot. Reporter proteins were separated on BisTris 12% SDS-PAGE with MOPS running buffer (NuPAGE; Invitrogen) and electrophoretically transferred onto nitrocellulose membranes. After a 2-h room temperature incubation in blocking solution (0.5% casein blocker (Roche Applied Science) in Tris-buffered saline with 0.05% Tween), the membrane was incubated overnight with a 1:5000 dilution of S-protein-horseradish peroxidase conjugate (Novagen) at 4 °C. After washing the membranes five times in Tris-buffered saline, 0.05% Tween 20, chemiluminescent detection was performed using SuperSignal West Femto Maximum Sensitivity substrate (Pierce). The percentage occupancy or glycosylation state was calculated in triplicate by densitometry scanning of the glycosylated and unmodified protein bands.

RT-PCR of Endogenous Fly Glycosyltransferases—Total RNA was isolated from S2 cells in 300 µl of TriReagent and 2 µl of polyacrylamide carrier (Molecular Research Center). First strand cDNA synthesis was conducted using Super Script III (Invitrogen) in a final volume of 20 µl, containing 5 µg of total RNA and an oligo(dT) primer. For RT-PCR, first strand cDNA from each cell line was added to a GoTaq reaction mixture made for a total of 10 reactions, according to the manufacturer's specifications (Promega); however, the primers were not added. To ensure equal delivery of each cDNA template, this mixture was mixed and dispensed into 10 individual tubes, and PCR was initiated by adding 0.5 mM PCR primers for each glycosyltransferase target. Primer pairs for a total of four glycosyltransferase cDNAs were dispensed into separate reaction tubes. After a five-cycle touch-down annealing step from 72 to 62 °C, the cDNA targets for selected glycosyltransferases were amplified in a 24-cycle reaction, using the following conditions: 94 °C for 20 s, 62 °C for 30 s, and 72 °C for 30 s, with a 1-s/cycle extension. To avoid amplification of genomic DNA, at least one oligonucleotide from each primer pair was designed around a splice junction (listed in supplemental table 2). The following Drosophila melanogaster mRNAs were used in parallel with the fly ppXyl-T (accession number AJ430595 [GenBank] ) (22) RT-PCR for additional endogenous fly glycosyltransferase controls: OST1 (oligosaccharyltransferase I/ribophorin-1; accession number NM_205958 [GenBank] ), pgant5 (polypeptide GalNAc transferase-T5; accession number AY268066 [GenBank] ) (23), and Gal-T1 (beta1,4-galactosyltransferase 7; accession number NM_143062 [GenBank] ). The loading of PCR products from each S2 cell preparation was normalized, using the level of the PCR product for the pgant5 cDNA target.

Neural Network Training—A neural network does not understand letters, so the amino acid sequence must be translated into numbers. We used a sparse encoding method (24, 25), which is the conventional way to convert the amino acid sequence into numerical form. The neural networks were of the two-layer feed-forward type, trained by standard back-propagation. We presented a varying number of amino acids on either side of the modification site to the network and also changed the number of neurons in the hidden layer to find the optimal complexity for this particular prediction problem. The predictive performance was monitored using the Matthews correlation coefficient during training and test of the network (26). The data set comprised all serine peptides presented in this paper as well as two positive and 25 negative serine sites experimentally determined in full-length C. elegans syndecan/SDN-1,3 giving a total of 21 positive sites and 42 negative sites. The 63 sites were randomly divided into seven sets of nine sites each, where each set contained both negative and positive sites. Using the method of cross-validation, every network was trained seven times using six sets as training sets and one set as the test set. The reported cross-validation performance is the joint performance of the seven resulting networks on their respective test sets.


Figure 1
View larger version (30K):
[in this window]
[in a new window]

 
FIGURE 1.
Structure of SDN-1 and chimeric reporter protein. A, SDN-1 is a type I transmembrane protein with an N-terminal signal peptide (SP; slanted hatched box), an extracellular domain (open rectangle), a transmembrane domain (solid box), and a short C-terminal cytoplasmic domain (checkered box). The extracellular domain of SDN-1 contains three Ser-Gly dipeptides, at serines 71, 86, and 214 (black bar), which were previously reported as potential proteoglycan modification sites (10, 11, 27). Actual modification sites, determined in Fig. 7, are indicated with an asterisk. B, the reporter peptide construct, pLC-S2-SP-SDN-1A, includes a BIP secretory signal peptide (SP; slanted hatched box), two FLAG epitope tags (black boxes), two metal chelate tags (open circle), one S-peptide tag (striped oval), and the glycosylation site/sequence 1A, derived from Ser71 of the SDN-1 extracellular domain. C, the sequence of the complete open reading frame of the reporter, described above, beginning with signal peptide (cleavage site at position 25-26 amino acids; see vertical arrow) followed by the FLAG recognition sequence. The insert from SDN-1 (glycosylation site 1A) is shown in brackets. All mutant reporter constructs only modify the sequence in the brackets.

 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Cell Culture Assay with Drosophila and C. elegans ppXyl-T-based Modification Machinery—The amino acid sequence of C. elegans SDN-1 suggested three potential proteoglycan attachment sites in previous studies (10, 11, 27): Ser71, Ser86, and Ser214 (SDN-1A, -1B, and -1C, respectively), based on the presence of Ser-Gly dipeptides (Fig. 1A). To define how the flanking sequence affects proteoglycan assembly, we expressed a reporter fusion protein containing the SDN-1 peptide sequence of site 1A: DIEVNGS71GYPTDD, which encompasses the first potential proteoglycan attachment site at Ser71 (Fig. 1B). Because a transfectable C. elegans cell line is not available, the reporter protein was expressed, under the control of the metallothionein promoter, in Drosophila S2 Schneider cells, as a protein containing a BIP secretory signal, an S-tag detection peptide, and purification tags (metal binding site and FLAG epitope) both N-terminal and C-terminal to the proteoglycan modification site (Fig. 1, B and C) (see "Experimental Procedures").

Drosophila S2 cells make up a favorable cell culture system, because they have high transfection rates and are sensitive to RNA interference. Co-transfection was used to express the worm ppXyl-T glycosyltransferase and SDN-1A reporter, whereas RNAi was used to knock down the expression of the endogenous fly ppXyl-T mRNA. A single dose of a 944-bp ppXyl-T dsRNA fragment was effective in reducing the level of ppXyl-T mRNA in untransfected cells, transfected cells, and cells expressing worm ppXyl-T, over a 4-day cell culture period (Fig. 2A, lanes 2, 4, and 5, respectively). This RNAi treatment was specific for the endogenous fly ppXyl-T mRNA and did not influence the transcript levels of endogenous OST1 (oligosaccharyltransferase I/ribophorin-1), pgant5 (polypeptide Gal-NAc transferase-T5), and Gal-T1, three key glycosyltransferases in the N-glycosylation, mucin-type, and proteoglycan O-glycosylation pathways, respectively.

Recombinant proteins were purified on an anti-FLAG antibody resin, so that the ratio of glycosylated and unglycosylated forms of the reporter could be determined. The preparations were not passed on an anion exchange or size exclusion resin to artificially enrich for weakly glycosylated isoforms of the reporter protein. Approximately 30% of the wild type reporter was expressed as an unmodified secreted polypeptide with an apparent molecular mass of 15 kDa, whereas 70% of the reporter migrated on SDS-PAGE as a second band with a higher molecular weight (Fig. 2B, lane 1). This high molecular weight species disappeared following ppXyl-T RNAi treatment (Fig. 2B, lane 2) that reduced the majority of the mRNA transcript levels for the endogenous Drosophila ppXyl-T (Fig. 2A, lane 4).

To evaluate the action of C. elegans ppXyl-T on the SDN-1A reporter peptide, worm ppXyl-T and SDN-1A reporters were co-transfected into these RNAi-treated cells. Co-expression of worm ppXyl-T with the SDN-1 reporter resulted in a recapitulation of the higher molecular weight, glycosylated species of the SDN-1 reporter. (Fig. 2B, lanes 3 and 4). Therefore, the RNAi and ppXyl-T co-expression experiments both support the finding that the worm ppXyl-T acts on the 15-kDa SDN-1 reporter peptide to produce a higher molecular weight glycosylated form, bearing xylose-containing proteoglycan chains. Treatment with O-glycanase and N-glycanase showed no alteration in electrophoretic mobility of the glycosylated reporter (Fig. 2B, lane 4), indicating that the reporter was not modified with N-or O-linked glycans, whereas control proteins (human salivary amylase with an N-linked glycan and a C. elegans LET-653 glycoprotein with a mucin domain) were susceptible to this glycosidase treatment (Fig. 2D). The absence of a mobility shift following glycosidase treatment of the SDN-1A reporter indicates that neither the reporter peptide insert (DIEVNGS71GYPTDD) nor sequences from the fusion protein were modified with potential N-glycans or mucin-type O-glycans, despite the fact that consensus sequences for these sites appeared to be present (NGS71 is a potential N-glycosylation site, and S71GYPT75 could be a potential mucin site).

Co-expression of the nematode ppXyl-T and the S0A mutant reporter (with a serine -> alanine mutation at Ser71) produced a protein that migrated as the unmodified reporter (Fig. 2, B and C, reporter 2). Similarly, a serine -> threonine mutation resulted in a peptide (S0T reporter) that was not detectably glycosylated, indicating that the nematode ppXyl-T has a strong requirement for the serine residue in the engineered proteoglycan modification site (Fig. 2, B and C, reporter 3). Therefore, Ser71 -> Ala/Thr mutagenesis results support the conclusion that Ser71 is the only proteoglycan modification site within the complete reporter protein (Fig. 1C) and that effective proteoglycan attachment sites strongly favor a serine, not a threonine. Unless specified otherwise, all further reporter experiments were expressed in cell culture conditions where RNAi was used to knock down the endogenous fly ppXyl-T, and co-transfection was used to co-express the C. elegans ppXyl-T with the reporter protein.


Figure 2
View larger version (58K):
[in this window]
[in a new window]

 
FIGURE 2.
Expression and glycosylation of chimeric reporter protein in Drosophila S2 cells. A, RNA interference is used to knock down the expression of fly ppXyl-T mRNA in insect cell cultures. RT-PCR on control (lanes 1 and 3) and RNAi-treated S2 cells (lanes 2, 4, and 5) shows relative transcript levels for endogenous fly glycosyltransferases in proteoglycan, mucin-type glycosylation, and N-glycosylation pathways (PCR gene targets are indicated to the right of the gel). The only band in the PCR sample corresponded to the exact size predicted by the oligonucleotide primers (supplemental Table 2). Lanes 1 and 2 amplify target RNA from nontransfected S2 cells, whereas lanes 3-5 are from cells transfected with the reporter construct. Lane 5, S2 cells are RNAi-treated and co-transfected with the worm ppXyl-T expression construct, as well as the worm SDN-1A reporter. B, SDS-PAGE analysis of SDN-1A reporters expressed in S2 insect cells. Reporter numbers are indicated below the lanes. Above the lanes, the cell culture and protein treatment conditions are summarized. RNAi is used to knock down the endogenous fly ppXyl-T mRNA in lanes with a plus symbol. Rescue of ppXyl-T activity is achieved by co-expression with a full-length C. elegans ppXyl-T in RNAi-treated S2 cells (four lanes on the right). Lanes 4-6, chimeric reporter proteins were treated with a combination of O-glycanase and N-glycanase before SDS-PAGE. Reporters 2 and 3 have a Ser71 -> Ala and a Ser71 -> Thr mutation, respectively. C, the sequence and glycosylation state are listed by reporter peptide number. D, positive controls for N-glycanase (peptide N-glycanase F) and O-glycanase are human salivary amylase, which migrates as a nonglycosylated and N-glycosylated doublet (lanes 1 and 2), and a recombinant C. elegans LET653 mucin-type O-linked glycoprotein domain expressed in Drosophila S2 insect cells (lanes 3 and 4).

 


Figure 3
View larger version (63K):
[in this window]
[in a new window]

 
FIGURE 3.
Glycine influences on proteoglycan modification site selection. A, chimeric reporter proteins (construct number is indicated below the gel) were immunoprecipitated, treated with a combination of O-glycanase and N-glycanase, and analyzed by Western blot with anti-S-protein antibody-horseradish peroxidase conjugate. Lane 1, the WT SDN-1A reporter. The amino acid sequence immediately flanking the Ser71-glycosylated position are listed at the top of each lane. The asterisk in the GSG* label for reporters 7 and 8 indicates that the proline in position +3 has been mutated to a glycine or alanine. B, the sequence and glycosylation state of the reporters from Fig. 3A.

 
Proteoglycan Modification Sites of SDN-1 Require Glycine Residues Flanking a Serine—Because many mammalian proteoglycan modification sites contain neighboring glycine residues, the requirement of glycine in proteoglycan modification sites was studied with mutants having glycine -> alanine replacements. The wild type sequence flanking Ser71 is GS71GYP. Conversion of both glycine residues to alanine at positions -1 and +1 abolished proteoglycan modification of Ser71, since no glycosylated form of the G(-1,+1)A mutant was detected (Fig. 3A, reporter 4). The requirement of the glycine residue appeared very strict, since glycosylation was abolished when either glycine at position -1 or +1 was mutated to alanine. Both G-1A and G+1A mutants lost the high molecular weight (glycosylated) species upon SDS-PAGE (Fig. 3A, reporters 5 and 6). Although two proximal glycine residues appeared to be essential, the significance of a distal glycine (at position +3) was also evaluated, because previous studies observed that a significant subset of proteoglycans possess a glycine residue at position +3 (16). Therefore, the SDN-1A reporter sequence was mutated from GSGYP to GSGYG (Fig. 3A, reporter 7). This proline -> glycine mutation at position +3 resulted in a 23% increase in proteoglycan site occupancy from 70 to 86%, relative to the wild-type reporter (Fig. 3B, reporter 1). It is possible that replacement of the +3 proline relieved a slight inhibitory effect on ppXyl-T substrate activity, since the proline -> alanine mutant at +3 (reporter 8) had a modest increase in glycosylation state occupancy (78%) relative to the 70% glycosylated species of the wild type, proline-containing site (Fig. 3, compare peptides 7 and 8).

Proteoglycan Sites Have a Position and Density Requirement for Acidic Amino Acid Residues—Data sets of mammalian glycosylation sites show that aspartic and glutamic acid residues cluster in the N- and C-terminal flanks of proteoglycan modification sites (14, 28-32). To carefully define the acidic amino acid requirements for proteoglycan modification sites, scanning mutagenesis was performed to create a panel of reporters, in which the position and density of the acidic amino acid residues were systematically varied along the length of the peptide sequence flanking the glycosylation site (Table 1). Note that in the SDS-PAGE analysis, the glycosylated form of the reporter migrated at a higher molecular weight than its unglycosylated state. Furthermore, the electrophoretic mobility of the unmodified reporter was slightly reduced when acidic amino acid residues were present. For example, the alanine-rich mutant (reporter 10) ran faster than the more acidic wild type reporter (reporter 1; Fig. 4A and Table 1). This electrophoretic mobility increase for Asp -> Ala mutants most likely resulted from increased binding of SDS to peptide segments containing fewer negatively charged side chains.


View this table:
[in this window]
[in a new window]

 
TABLE 1
Influence of acidic residues on proteoglycan glycosylation

a Naming designation for point mutants indicates amino acid substitutions. Naming designation for scanning mutants reflects the position of the acidic residue in the scanning position. An apostrophe indicates that no other acidic residues are present in the glycosylation site. White type on black shows the positions of the acidic residues. A potential N-glycosylation site is mutated in reporter 9 to show that the glycosylation states of these flanking sequence mutants are not related to N-glycosylation.

 
When all acidic amino acid residues were mutated, the higher molecular weight (glycosylated) form of the reporter was not detected (Fig. 4A, reporter 10). Similarly, selective mutagenesis of the N-terminal acidic residues (reporter 11) resulted in a complete loss of detectable proteoglycan modification (Fig. 4A). However, the C-terminal acidic amino acids (underlined) in DIEVNGS71GYPTDD did not appear to be essential for proteoglycan modification, since reporter 12 (DIEVNGS71GYPTAA), lacking C-terminal aspartic acid residues, was modified to a similar extent as the wild type reporter. To determine which of the N-terminal acidic amino acids are required for proteoglycan modification, single point mutations were also evaluated. The single mutation on glutamic acid at position -4 (peptide 13 in Table 1) effectively eliminated the ppXyl-T modification of the reporter substrate; however, the substitution of aspartic acid at position -6 (peptide 15 in Table 1) yielded a glycosylation site occupancy similar to wild type levels (Fig. 4A), indicating that the position of the N-terminal acidic amino acid residues appears to be significant for the initiation of proteoglycan modification.


Figure 4
View larger version (86K):
[in this window]
[in a new window]

 
FIGURE 4.
Influence of acidic amino acids on proteoglycan modification. Chimeric reporter proteins were analyzed and presented, as in Fig. 3. A, mutant reporter proteins having alanine point mutations at their wild type acidic amino acid positions. The number of acidic residues N-terminal or C-terminal to the glycosylation site of the reporter proteins are indicated with a plus symbol at the top of each lane. Lanes 1-6, reporters 1, 10-13, and 15 (Table 1). B, acidic scanning mutant reporters probing N-terminal positions with acidic amino acid substitutions (indicated at the top of each gel). Each of these reporter proteins has two acidic amino acids, C-terminal to the glycosylation site at positions +5 and +6, in addition to one N-terminal acidic amino acid (for reporters 13-18) or two N-terminal acidic positions for reporters 1 and 25. C, reporter proteins with single acidic amino acid mutations that scan both the N-terminal and C-terminal flank of the glycosylation site (sequences of reporter peptides 1, 10, 22-24, and 11 are shown in Table 1).

 
To understand how the position of acidic amino acids influences proteoglycan modification in greater detail, we performed scanning mutagenesis with a single acidic amino acid residue substituting fixed positions throughout the length of reporter peptide sequence (Fig. 4, B and C; Table 1). The results showed that peptide sequences with an acidic amino acid residue at either position -6 (mutant -6D) or position -5 (mutant -5D) alone were not recognized as substrates by the ppXyl-T enzyme (Fig. 4B, reporters 13 and 14), whereas single acidic amino acid residues promoted proteoglycan attachment to serine when the negatively charged residues were in closer proximity: at position -4, -3, or -2 (reporters 15, 16, and 17; Fig. 4B and Table 1). The reporter protein -1D, having an acidic amino acid immediately adjacent (position -1) to the modified serine, was not a substrate for ppXyl-T, and this was most likely due to the elimination of an essential glycine at position -1 (peptide 18; Fig. 4B and Table 1), since a conservative alanine substitution at position -1 produced a similar loss in glycosylation (see reporter 6 in Fig. 3B).

Each of these initial scanning mutants (reporters 13-18) contained one N-terminal acidic amino acid, in addition to two distal C-terminal aspartate residues. To determine if the N-terminal acidic amino acids in reporters 15-17 were the only acidic amino acids required for a functional glycosylation site, the C-terminal aspartates (positions +5 and +6) were mutated to alanine in reporters 19-24 (Table 1). Reporters with a single aspartic acid residue were only glycosylated when a single acidic residue is in position -4, -3, or -2 (reporters 19-21; Fig. 4C). In contrast, comparable C-terminal scanning mutants with aspartic acid at positions +2, +3, and +4 were not glycosylated (peptides 22-24; Fig. 4C and Table 1). Although the reporters with a single N-terminal acidic residue at positions -4, -3, and -2 were glycosylated, their level of glycosylation was reduced to 48, 39, and 29% occupancy, relative to 70% occupancy for analogous reporters that contained an additional pair of acidic residues in the C-terminal flank (reporters 19-21; Table 1). This increase in the occupancy of the glycosylation site when C-terminal aspartate residues were present suggested that distal acidic residues on the C-terminal flank could enhance the rate of glycosylation, although C-terminal acidic residues alone were not sufficient for an efficient glycosylation site.

Our data from scanning mutagenesis suggested that increased negative charge density (on the C terminus) was important for maximizing glycosylation states of peptide reporters. To explore whether an increased charge density influenced glycosylation state occupancy in a position-dependent manner, we performed additional mutagenesis experiments to vary the position of four charged (Asp/Glu) residues (Table 1, peptides 25-32). Reporters analysis revealed that two acidic amino acids in a large variety of positions on the N terminus were equally glycosylated to 60-70% occupancy (Table 1; gel is not shown for peptides 25-32, since these reporters run identically to the wild type reporter 1). Surprisingly, the reporter (-6,-5)D with two acidic residues at positions -6 and -5 was fully glycosylated (reporter 25; Fig. 4B and Table 1). Here the acidic residues were outside the -4 to -2 range, previously shown to be essential for glycosylation in reporters containing single acidic residues. This means that the increased charge density of two negative charges at positions -6 and -5 lead to a functional glycosylation site, whereas a single acidic residue at either -6 or -5 was not a sufficient signal for glycosylation (in reporters 13 and 14; Fig. 4B).

In summary, these studies on acidic amino acids in the flanking sequences indicated that the presence, position, and density of acidic amino acid residues influence the initiation of proteoglycan addition. The most discriminating position for acidic amino acids is in the -4 to -2 range (N-terminal) to the glycosylation site, since only one acidic amino acid in this position is sufficient to confer substrate recognition by the ppXyl-T proteoglycan modification machinery. Furthermore, the presence of an N-terminal cluster of acidic amino acid residues can expand the acidic window out to the -6 position. In contrast, single or clustered acidic amino acids on the C-terminal flank of the serine are neither necessary nor sufficient for xylose addition, but in combination with N-terminal acidic residues, they can increase glycosylation site occupancy rates.


Figure 5
View larger version (66K):
[in this window]
[in a new window]

 
FIGURE 5.
Scanning mutagenesis with proline residues. The position of proline residues in each reporter is listed at the top of each lane in the Western blots. A, proline scanning from position -3 to +3 in the reporters containing a single proline at various positions and an alanine at position +3 in most cases (for reporters 33-37, 1, and 8). B, proline scanning from position -3 to +2 in the reporters containing a second proline residue at position +3 (for reporters 38-42). C, proline-scanning reporter number designation, sequence, and percentage occupancy. The arrow indicates Ser71 in the reporter peptide. The glycosylation state is calculated from scanning densitometry of triplicate gels showing the percentage of the higher molecular weight (glycosylated) species relative to the total protein in the lane. White type on a black background shows proline-scanning positions in reporter mutants.

 
Proximal Proline Inhibits Proteoglycan Modification—Different classes of glycosylation sites have a different tolerance or requirement for proline residues. For example, the presence of a proximal proline residue flanking the substitution site facilitates mucin-type O-glycosylation, whereas it blocks N-linked core glycosylation (33-35). In the SDN-1A wild-type reporter peptide, a proline residue is present at position +3 relative to the modified serine. To reveal the role of proline on proteoglycan modification efficiency, a Pro -> Ala mutation at position +3 in wild-type reporter was evaluated and compared with other proline mutants. Mutant P+3A reporter protein (peptide 8; Figs. 3A and 5A) did not show a large change in the glycosylation state, being glycosylated at 78 ± 2.3% relative to 70 ± 1.2% for the wild-type reporter (reporter 1), indicating that proline is only slightly inhibitory at best or that the glycosylated serine is outside of the range of proline's influence. To discover the effective range or tolerance of proline residues, proline-scanning mutagenesis was performed to produce a set of reporters with a single proline residue from position -3 through +3 (Fig. 5A, peptides 33-37 and 1, respectively). Scanning results showed that the placement of a proline residue at positions -3, -2, and +3 had a negligible effect on the glycosylation of the proteoglycan reporters (reporters 33, 34, and 1). However, when the proline residue was placed at position -1, +1, or +2, glycosylation of the peptides was essentially eliminated (Fig. 5A, reporters 35-37), indicating that flanking proline residue blocked the activity of ppXyl-T on the substituted serine for proteoglycan synthesis. However, the loss of glycosylation of the -1P and +1P reporter mutants may have resulted from the elimination of essential glycine residues at these two specific positions, flanking the glycosylated serine.

The results from scanning mutagenesis with a single proline revealed a very narrow effective range of proline's inhibition on proteoglycan substitution site (position +2 appears to be uniquely sensitive to proline). In order to evaluate if a higher density of proline sites confers a greater inhibition, the accumulated effects of two proline residues were studied by introducing a second proline residue at position +3, which is the position of a naturally occurring proline in the wild type SDN-1A sequence (peptide 1; Fig. 5C). As with -3P and -2P single proline reporters, proline double mutants (-3,+3)P and (-2,+3)P were glycosylated but with a very slight loss in the glycosylation site occupancy, at 56 and 63%, respectively, relative to 70% glycosylation for the wild type reporter (Fig. 5B, reporters 38 and 39). No detectable glycosylation on mutant (-1,+3)P, (+1,+3)P, and (+2,+3)P was obtained (peptides 40-42; Fig. 5, B and C). Taken together, proline-scanning mutagenesis suggested that serine residues with a neighboring proline at positions -1, +1, and +2 did not form favorable proteoglycan substitution sites.

Comparisons of Worm and Fly Proteoglycan Modification Reactions—To define whether there is conservation in the proteoglycan modification machinery encoded by C. elegans and Drosophila, nine discriminating reporter proteins were expressed in S2 cells with the endogenous fly ppXyl-T enzyme. Reporter peptides SDN(A), S0A, G(-1,+1)A, D/E(-6,-4A), -6D, -4E, P+3A, +2P, and (-2,+3)P (reporter peptides 1, 2, 4, 11, 13, 15, 8, 37, and 39, respectively) were selected to measure the glycosylation state of Ser71 in a peptide sequence background that probes the influence of glycine at position -1 and +1, of acidic amino acids at positions -6 and -4 and of proline at +3, +2, and -2. The level of glycosylation occupancy with these substrate peptides catalyzed by fly ppXyl-T was comparable with that of the nematode ppXyl-T (data summarized in Table 2). Therefore, the ppXyl-T substrate specificity (peptide sequence) requirements for glycine, acidic amino acids, and proline in proteoglycan modification sites appear evolutionarily conserved in the fly and worm. Such conservation in substrate recognition sequences of ppXyl-T suggests that a proteoglycan site predictor algorithm, based on C. elegans proteoglycans, would be a useful tool for future studies.


View this table:
[in this window]
[in a new window]

 
TABLE 2
Comparison of worm and fly ppXyl-T substrate specificity

 


Figure 6
View larger version (32K):
[in this window]
[in a new window]

 
FIGURE 6.
Sequence logos of proteoglycan sites. A logo shows the frequencies of amino acid residues at each position, as the relative heights of letters, along with the degree of sequence conservation as the total height of a stack of letters, measured in bits of information. Position zero denotes the location of the modified serine residue. A, sequence logo for mammalian proteoglycan sites. B, sequence logo for C. elegans proteoglycan sites.

 
Development of a Proteoglycan Site Predictor—Using the data presented here, as well as sites experimentally determined in full-length C. elegans syndecan/SDN-1,3 we trained a preliminary worm proteoglycan site predictor using a neural network method. Goals for the predictor were 1) to see if the amount of glycosylation data on modified and unmodified serines in wild type SDN-1 and its mutants was sufficient to develop a predictor, in comparison with the complexity of the recognition pattern and 2) to use the predictor to identify new potential proteoglycan sites in C. elegans for future studies.

The best network performance was found when presenting five amino acid residues on either side of the modified serine (a sequence window of 11 residues) and using seven hidden neurons. With this sequence window, the input data to the network are identical for the negative peptide -5D (reporter 14 in Table 1) and the positive peptide (-6,-5)D (reporter 25 in Table 1). Because of this, the predictor cannot predict both of these sites correctly. Wrongly predicting peptide -5D as positive is, in fact, the only mistake the best predictor makes. Therefore, it correctly predicts 100% of the positive sites and 98% of the negative sites, while 95% of the predicted sites are in fact true positives. The Matthews correlation coefficient is 0.97, to be compared with 1, which is the correlation coefficient of a perfect predictor, and 0, which is the correlation coefficient of a random guess.

We conclude that a predictor can correctly learn the recognition pattern described by the peptide data presented here. However, it is likely that the sequence space has not been sufficiently sampled, since all sites come from the same protein, and most are mutants of the same site and show limited sequence variation. This is the first proteoglycan site predictor developed. In the future, an improved predictor will be developed, as experimental data on additional proteoglycan sites become reported and incorporated into the data set, and this will be made publicly available.

Proteoglycan Sites in Mammals—Through a search of the original literature, we have been able to identify 37 experimentally verified proteoglycan sites in mammalian proteins (supplemental Table 3) plus a number of mutant and peptide sites, based on these sites, making a total of 51 experimentally verified sites. Sequence logos (36) of the mammalian proteoglycan sites and the C. elegans data show the frequencies of amino acid residues at each position, as the relative height of each letter (Fig. 6). The degree of sequence conservation is reflected by the total height of a stack of letters, measured in bits of information. Position zero denotes the location of the glycosylated serine residue. Sequence weighting was performed to reduce the impact of very similar sequences (37). The most striking difference between the logos for the mammalian and C. elegans data is that information content is much higher in the C. elegans case, even for peripheral positions not believed to be very important. This is due to lack of sufficient sequence variation in the C. elegans data and illustrates exactly why our prediction method is only preliminary. Apart from this, the recognition sequence in mammals is clearly related to the one in C. elegans. There is a strong preference for glycine in positions -1 and +1, although position -1 is weaker for mammals than for C. elegans. Also, there is a preference for acidic residues in the vicinity of the modification site, especially N-terminally in positions -2, -3, and -4.

The preliminary C. elegans proteoglycan predictor was tested on the mammalian sites to quantify the similarities. The preliminary C. elegans predictor correctly predicts 41% of the positive sites and 98% of the negative sites. 49% of the predicted sites are in fact true positives. The Matthews correlation coefficient is 0.43. At this point, it is impossible to know for sure if this less than perfect performance is due to evolutionary differences or the preliminary nature of the algorithm. However, it is reasonable to conclude that the recognition sequences are evolutionarily related, since a correlation coefficient of 0.43 is well above the performance of a random guess.


Figure 7
View larger version (75K):
[in this window]
[in a new window]

 
FIGURE 7.
Characteristics of Ser-Gly sites in SDN-1 reporters. Evaluation of three potential proteoglycan sites. A, Western blot analysis of reporters in tissue culture. The reporter number is indicated for each lane. Alanine point mutants show the position of the unglycosylated form of the reporter. Glycosylated proteins have higher molecular weight bands. B, position and sequence of each glycosylation site insert. White type on a black background shows amino acids that either are required for or enhance glycosylation. Underlined prolines indicate prolines in inhibitory positions. C, key features of the flanking sequence surrounding serine-glycine sites.

 
Ser-Gly Sequences Are Required but Not Sufficient Predictors of Proteoglycan Sites—To evaluate whether the sequence features defining the proteoglycan modification site at Ser71 also apply to other serine positions, we measured the proteoglycan modification level of other Ser-Gly sites in the SDN-1 extracellular domain, using the same peptide reporter system as for Ser71, and we evaluated the sequence context of each mapped site (Fig. 7). The SDN-1 extracellular domain contains three serine sites with one C-terminal glycine: Ser71, Ser86, and Ser214. When these three sites were expressed in tissue culture, only two sites were modified by ppXyl-T: Ser71 and Ser86. Sequence comparisons showed that Ser71 and Ser86 contain all of the sequence requirements for proteoglycan glycosylation: a cluster of N-terminal acidic amino acids; glycine residue presented at both position -1 and +1; no proline present at position +2. Although Ser214 was predicted in three previous studies (10, 11, 27), it is not glycosylated in our tissue culture system, which is not surprising, since its flanking sequence 1) lacks an N-terminal acidic amino acid, 2) lacks a glycine at position -1, and 3) contains a potentially inhibitory proline at position +2. Therefore, our mutagenesis study derived positive and negative signals for proteoglycan modification sites that correctly apply to all of the Ser-Gly positions in the SDN-1 extracellular domain. Correct localization of these glycosylation sites is important for testing the biological roles of proteoglycan chains on syndecan in C. elegans.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
In this current study, 46 peptide reporters, based on proteoglycan modification sites of C. elegans SDN-1, were expressed in tissue culture, and the glycosylated isoform of the reporter was resolved from its unmodified state by SDS-PAGE. Because the proteoglycan sites for all of these constructs were inserted into the same fusion protein, the reporters used in this study should not exhibit a significant difference in their secondary or tertiary structure. Therefore, the structural context of the neighboring protein domains should contribute a neutral influence in the substrate-enzyme interactions between the ppXyl-T enzyme and the reporter proteins used here. Our study demonstrated that single amino acid changes in the sequences flanking the glycosylated residue could dramatically alter the glycosylation level on the reporter substrates in vivo. In agreement with the earlier results, indicating that the activity of polypeptide xylosyltransferase is serine-specific in vitro (14), our data showed that threonine substitution of the serine residue essentially eliminated proteoglycan addition to the reporter in vivo.

The favorable sequences that flank serine in glycosylation sites provide some clues to the characteristics of the active site of ppXyl-T. Glycine, in immediate proximity to the modified serine, suggests that either the peptide backbone of the glycosylation site requires some structural flexibility or that spatial constraints (bulky amino acids in the active site) limit access of peptide substrates to the enzyme. Indeed, surveys of the sequences surrounding proteoglycan modification sites have revealed a high frequency of glycine residues adjacent to the modified serine (16). Strikingly, the relative occurrence of glycine residue at position +1 is 98% (16). Our mutagenesis studies on glycine provide direct evidence for the requirement of glycine residues at both positions -1 and +1 for xylosylation of Ser71. This requirement for a small, flexible residue appears real, since replacement of -1 glycine with alanine, proline, or aspartate inhibited glycosylation, although the aspartate in other N-terminal positions was shown to be required for proteoglycan addition.

Mutagenesis data with acidic residues further suggest that there is some directionality of how peptide sequences are fed into the active site of the ppXyl-T enzyme. The presence of N-terminal acidic amino acids is required for the recognition and xylosylation of the serine residue in SDN-1A (in reporters 12 and 19-21), and these acidic positions are sufficient, since glycosylation of these reporters does not require C-terminal acidic residues. Analogous reporters that only contain one or more acidic residues on the C-terminal side were not glycosylated (reporters 11 and 22-24). Therefore, the peptide backbone of the glycosylation site must interact with the ppXyl-T active site to orient peptides into a specific N-terminal to C-terminal direction. The requirement of acidic residues suggests that specific electrostatic interactions must exist between the negatively charged side chains of the glycosylated substrate and residues on one side of the active site of the enzyme. The spacing of the interactions between the substrate's negative charge and the modified serine is restricted, since aspartic acid at position -1 inhibits xylosylation and since the most favorable positions for negatively charged amino acids range from -4 to -3 and -2 on the N-terminal side of the modification site. At more distal positions -5 and -6, a single acidic residue does not promote proteoglycan glycosylation; however, if both distal positions (-6 and -5) are negatively charged, then serine glycosylation is efficient. Therefore, N-terminal acidic residues are required but with a somewhat loose position dependence, suggesting that acidic residues may be important for tethering the N-terminal flanking sequence of a glycosylation site near the active site of the enzyme and not for precise positioning of the substrate for catalysis.

The proline-scanning mutagenesis data support the hypothesis that the ppXyl-T enzyme may require substrates with a proximal glycine for peptide backbone flexibility at the glycosylation site. First, proline at either -1or +1 inhibits xylosylation, indicating that the structural flexibility and small size provided by glycine are essential on both sides of the modified serine in proteoglycan sites. Second, a proline positioned between Asp-4 and Gly-1 is tolerated in functional glycosylation sites and does not inhibit glycosylation in reporters 33 and 34. This latter observation supports a tethering role of N-terminal acidic residues for loosely feeding serine residues to the active site and not precisely positioning the modified residue.

Flanking Sequences Regulate Multiple Glycosylation Pathways—It is noteworthy that there are 47 serine and threonine residues in the 202-amino acid extracellular domain of C. elegans SDN-1 protein. Much of this extracellular domain is predicted to consist of Ser/Thr-rich mucin-type glycosylation sites, based on NetOGlyc analysis (38). Up to three proteoglycan modification sites in SDN-1 were speculated previously, by searching for "SG" sequences. There is also a potential for consensus sequence overlap of N-glycosylation sites and mucin-type O-glycosylation sites at proteoglycan modification sites. For example, the SDN-1A reporter sequence (DIEVNGS71GYPTDD) used in this study contains 1) a proteoglycan modification site at Ser71, 2) a consensus sequence for an N-glycosylation site (NGS) at Asn69, and 3) a possible mucin modification site at Ser71 or Thr75. A consensus overlap model for competing glycosylation pathways could influence glycosylation site occupancy at any one of the glycosylated positions (39); however, in our studies, different classes of glycosylation did not appear to co-exist on the same peptide reporter. Ser71 in the wild type reporter is only a proteoglycan modification site, since it was only modified in the presence of fly or worm ppXyl-T. N-Glycanase or O-glycanase treatment of the reporter proteins indicated that neither N-linked nor mucin O-linked sugars were present. Moreover, a conservative mutation of Asn69 to glutamine eliminated the potential N-glycosylation site (Table 1, reporter 9) but did not show any variation in electrophoretic mobility (gel not shown), indicating that N-linked modification did not occur.

The mutagenesis data in this paper suggest that multiple interactions between the glycosylation site and the glycosyltransferase enzyme may govern how polypeptide sugar transferases identify and modify their downstream target proteins, mucins and proteoglycans. It appears that residues that favor or are required for one glycosylation pathway are, in contrast, neutral or inhibitory for the other pathway, suggesting that sequence requirements are entirely different (Table 3).


View this table:
[in this window]
[in a new window]

 
TABLE 3
Occurrence of amino acid residues in O-glycosylation sites

 
First, proteoglycan sites (from all organisms) are almost exclusively serine, whereas mucin sites prefer threonine over serine (the glycosylation rate for the mucin-initiating enzyme polypeptide GalNAc transferase-T1 is 58 times greater for threonine over serine (40)). Second, proline inhibits proteoglycan modification, but it enhances mucin glycosylation. Proline is the most frequent residue near the modified serine or threonine of mucins, and mutagenesis data indicate that proline promotes mucin-type O-glycosylation (33-35, 38, 41, 42). Proline, therefore, may be one of the important residues that allow the cell's glycosylation machinery to discriminate between mucin and proteoglycan O-glycosylation. Third, glycine is essential in at least one (if not both) of the -1- and +1-positions flanking the glycosylated serine, whereas it is less favored in mucin-type O-glycosylation sites (38). Fourth, acidic amino acids are essential for proteoglycan modification, whereas they inhibit mucin-type glycosylation in some positions and are otherwise neutral (38, 43).

In our study, the Drosophila ppXyl-T in cultured cells showed the same preference for recognition sites as that of the worm enzyme. This observation of cross-species conservation is in agreement with recent findings that ppXyl-T orthologues from flies, worms, and humans are able to recognize similar peptide substrates in vitro (12) This latter study also reinforces our sequence comparisons of mammalian and worm proteoglycan sites (Fig. 6), which show that the recognition sequence for the ppXyl-T is evolutionarily conserved, at least to some extent, even in mammals. This means that any good predictor method developed for C. elegans proteoglycan sites will have some relevance to other metazoan organisms as well. In the future, when more C. elegans proteoglycan sites have been identified, we will develop a better predictor, which more confidently can predict proteoglycan sites in different sequence contexts. This predictor will be made publicly available.

In summary, the data presented here indicate that the flanking amino acid sequence significantly affects the recognition and modification of substrate serine residues by polypeptide xylosyltransferase, the key glycosyltransferase that defines proteoglycan modification sites. Recent studies suggest that the local conformation and accessibility for ppXyl-T might be another important determinant (15, 44). Therefore, the logical progression of this work is to examine the present sequence rules on native proteoglycan core proteins. Future efforts that identify the ppXyl-T enzyme's substrate-binding site and its crystal structure will help clarify the mechanism by which individual amino acids regulate proteoglycan addition. Nevertheless, our studies suggest that a defined collection of mapped glycosylated sequences can be used as a basis for neural network predictions of proteoglycan modification sites.


    FOOTNOTES
 
* This work was supported by National Institutes of Health Grant DE14088-05 (to F. K. H.) and by the Knut and Alice Wallenberg Foundation (to K. J.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Back

Formula The on-line version of this article (available at http://www.jbc.org) contains supplemental Tables 1-3. Back

1 To whom correspondence should be addressed: Box 611, University of Rochester Medical Center, 601 Elmwood Ave., Rochester NY 14642. Tel.: 585-275-0336; Fax: 585-276-0190; E-mail: fred_hagen{at}urmc.rochester.edu.

2 The abbreviations used are: GAG, glycosaminoglycan; ppXyl-T, polypeptide xylosyltransferase; SDN-1, syndecan-1; WT, wild type; RNAi, RNA interference; dsRNA, double-stranded RNA; BisTris, 2-[bis(2-hydroxyethyl)amino]-2-(hydroxymethyl)propane-1,3-diol; MOPS, 4-morpholinepropanesulfonic acid; RT, reverse transcription. Back

3 H. Wang and F. K. Hagen, unpublished results. Back



    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Kjellen, L., and Lindahl, U. (1991) Annu. Rev. Biochem. 60, 443-475[CrossRef][Medline] [Order article via Infotrieve]
  2. Jackson, R. L., Busch, S. J., and Cardin, A. D. (1991) Physiol. Rev. 71, 481-539[Free Full Text]
  3. Wang, H., Spang, A., Sullivan, M. A., Hryhorenko, J., and Hagen, F. K. (2005) Mol. Biol. Cell 16, 4202-4213[Abstract/Free Full Text]
  4. Mizuguchi, S., Uyama, T., Kitagawa, H., Nomura, K. H., Dejima, K., Gengyo-Ando, K., Mitani, S., Sugahara, K., and Nomura, K. (2003) Nature 423, 443-448[CrossRef][Medline] [Order article via Infotrieve]
  5. Hwang, H. Y., Olson, S. K., Esko, J. D., and Horvitz, H. R. (2003) Nature 423, 439-443[CrossRef][Medline] [Order article via Infotrieve]
  6. Hwang, H. Y., Olson, S. K., Brown, J. R., Esko, J. D., and Horvitz, H. R. (2003) J. Biol. Chem. 278, 11735-11738[Abstract/Free Full Text]
  7. Herman, T., Hartwieg, E., and Horvitz, H. R. (1999) Proc. Natl. Acad. Sci. U. S. A. 96, 968-973[Abstract/Free Full Text]
  8. Bulik, D. A., and Robbins, P. W. (2002) Biochim. Biophys. Acta 1573, 247-257[Medline] [Order article via Infotrieve]
  9. Kinnunen, T., Huang, Z., Townsend, J., Gatdula, M. M., Brown, J. R., Esko, J. D., and Turnbull, J. E. (2005) Proc. Natl. Acad. Sci. U. S. A. 102, 1507-1512[Abstract/Free Full Text]
  10. Rhiner, C., Gysi, S., Frohli, E., Hengartner, M. O., and Hajnal, A. (2005) Development 132, 4621-4633[Abstract/Free Full Text]
  11. Hudson, M. L., Kinnunen, T., Cinar, H. N., and Chisholm, A. D. (2006) Dev. Biol. 294, 352-365[CrossRef][Medline] [Order article via Infotrieve]
  12. Brunner, A., Kolarich, D., Voglmeir, J., Paschinger, K., and Wilson, I. B. (2006) Glycoconj J. 23, 543-554[CrossRef][Medline] [Order article via Infotrieve]
  13. Olson, S. K., Bishop, J. R., Yates, J. R., Oegema, K., and Esko, J. D. (2006) J. Cell Biol. 173, 985-994[Abstract/Free Full Text]
  14. Bourdon, M. A., Krusius, T., Campbell, S., Schwartz, N. B., and Ruoslahti, E. (1987) Proc. Natl. Acad. Sci. U. S. A. 84, 3194-3198[Abstract/Free Full Text]
  15. Mann, D. M., Yamaguchi, Y., Bourdon, M. A., and Ruoslahti, E. (1990) J. Biol. Chem. 265, 5317-5323[Abstract/Free Full Text]
  16. Brinkmann, T., Weilke, C., and Kleesiek, K. (1997) J. Biol. Chem. 272, 11171-11175[Abstract/Free Full Text]
  17. Esko, J. D., and Zhang, L. (1996) Curr. Opin. Struct. Biol. 6, 663-670[CrossRef][Medline] [Order article via Infotrieve]
  18. Wragg, S., Hagen, F. K., and Tabak, L. A. (1995) J. Biol. Chem. 270, 116947-16954
  19. Hagen, F. K., and Nehrke, K. (1998) J. Biol. Chem. 273, 8268-8277[Abstract/Free Full Text]
  20. Ramachandran, P., Boontheung, P., Xie, Y., Sondej, M., Wong, D. T., and Loo, J. A. (2006) J. Proteome Res. 5, 1493-1503[CrossRef][Medline] [Order article via Infotrieve]
  21. Jones, S. J., and Baillie, D. L. (1995) Mol. Gen. Genet. 248, 719-726[CrossRef][Medline] [Order article via Infotrieve]
  22. Wilson, I. B. (2002) J. Biol. Chem. 277, 21207-21212[Abstract/Free Full Text]
  23. Ten Hagen, K. G., Tran, D. T., Gerken, T. A., Stein, D. S., and Zhang, Z. (2003) J. Biol. Chem. 278, 35039-35048[Abstract/Free Full Text]
  24. Qian, N., and Sejnowski, T. J. (1988) J. Mol. Biol. 202, 865-884[CrossRef][Medline] [Order article via Infotrieve]
  25. Hertz, J., Krogh, A., and Palmer, R. (1991) Introduction to the Theory of Neural Computation, Addison-Wesley, Redwood City, CA
  26. Matthews, B. W. (1975) Biochim. Biophys. Acta 405, 442-451[Medline] [Order article via Infotrieve]
  27. Minniti, A. N., Labarca, M., Hurtado, C., and Brandan, E. (2004) J. Cell Sci. 117, 5179-5190[Abstract/Free Full Text]
  28. Dolan, M., Horchar, T., Rigatti, B., and Hassell, J. R. (1997) J. Biol. Chem. 272, 4316-4322[Abstract/Free Full Text]
  29. Pfeil, U., and Wenzel, K. W. (2000) Glycobiology 10, 803-807[Abstract/Free Full Text]
  30. Dong, S., Cole, G. J., and Halfter, W. (2003) J. Biol. Chem. 278, 1700-1707[Abstract/Free Full Text]
  31. Winzen, U., Cole, G. J., and Halfter, W. (2003) J. Biol. Chem. 278, 30106-30114[Abstract/Free Full Text]
  32. Zhang, L., and Esko, J. D. (1994) J. Biol. Chem. 269, 19295-19299[Abstract/Free Full Text]
  33. Mellquist, J. L., Kasturi, L., Spitalnik, S. L., and Shakin-Eshleman, S. H. (1998) Biochemistry 37, 6833-6837[CrossRef][Medline] [Order article via Infotrieve]
  34. O'Connell, B. C., Hagen, F. K., and Tabak, L. A. (1992) J. Biol. Chem. 267, 25010-25018[Abstract/Free Full Text]
  35. Shakin-Eshleman, S. H., Spitalnik, S. L., and Kasturi, L. (1996) J. Biol. Chem. 271, 6363-6366[Abstract/Free Full Text]
  36. Schneider, T. D., and Stephens, R. M. (1990) Nucleic Acids Res. 18, 6097-6100[Abstract/Free Full Text]
  37. Henikoff, S., and Henikoff, J. G. (1994) J. Mol. Biol. 243, 574-578[CrossRef][Medline] [Order article via Infotrieve]
  38. Julenius, K., Molgaard, A., Gupta, R., and Brunak, S. (2005) Glycobiology 15, 153-164[Abstract/Free Full Text]
  39. Gerlitz, B., Hassell, T., Vlahos, C. J., Parkinson, J. F., Bang, N. U., and Grinnell, B. W. (1993) Biochem. J. 295, 131-140[Medline] [Order article via Infotrieve]
  40. Hagen, F. K., Van Wuyckhuyse, B., and Tabak, L. A. (1993) J. Biol. Chem. 268, 18960-18965[Abstract/Free Full Text]
  41. Gooley, A. A., Classon, B. J., Marschalek, R., and Williams, K. L. (1991) Biochem. Biophys. Res. Commun. 178, 1194-1201[CrossRef][Medline] [Order article via Infotrieve]
  42. Wilson, I. B., Gavel, Y., and von Heijne, G. (1991) Biochem. J. 275, 529-534[Medline] [Order article via Infotrieve]
  43. Nehrke, K., Hagen, F. K., and Tabak, L. A. (1996) J. Biol. Chem. 271, 7061-7065[Abstract/Free Full Text]
  44. Kokenyesi, R., and Bernfield, M. (1994) J. Biol. Chem. 269, 12304-12309[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
J. Biol. Chem.Home page
J. J. Zoeller, W. Pimtong, H. Corby, S. Goldoni, A. E. Iozzo, R. T. Owens, S.-Y. Ho, and R. V. Iozzo
A Central Role for Decorin during Vertebrate Convergent Extension
J. Biol. Chem., April 24, 2009; 284(17): 11728 - 11737.
[Abstract] [Full Text] [PDF]


Home page
JCBHome page
J. J. Zoeller, A. McQuillan, J. Whitelock, S.-Y. Ho, and R. V. Iozzo
A central function for perlecan in skeletal muscle and cardiovascular development
J. Cell Biol., April 21, 2008; 181(2): 381 - 394.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow All Versions of this Article:
282/19/14586    most recent
M609193200v1
Right arrow Submit a Letter to Editor
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wang, H.
Right arrow Articles by Hagen, F. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wang, H.
Right arrow Articles by Hagen, F. K.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 All ASBMB Journals   Molecular and Cellular Proteomics 
 Journal of Lipid Research   ASBMB Today 
Copyright © 2007 by the American Society for Biochemistry and Molecular Biology.
Advertisement
spacer
Advertisement
Advertisement