Characterization of the Heparan Sulfate and Chondroitin Sulfate Assembly Sites in CD44*

Isoforms of CD44 are differentially modified by the glycosaminoglycans (GAGs) chondroitin sulfate (CS), heparan sulfate (HS), and keratan sulfate. GAG assembly occurs at serines followed by glycines (SG), but not all SG are utilized. Seven SG motifs are distributed in five CD44 exons, and in this paper we identify the HS and CS assembly sites that are utilized in CD44. Not all the CD44 SG sites are modified. The SGSG motif in CD44 exon V3 is the only HS assembly site; this site is also modified with CS. HS and CS attachment at that site was eliminated by mutation of the serines in the V3 motif to alanine (AGAG). Exon E5 is the only other CD44 exon that supports GAG assembly and is modified with CS. Using a number of recombinant CD44 protein fragments we show herein that the eight amino acids located downstream of the SGSG site in V3 are responsible for the specific addition of HS to this site. If the eight amino acids located downstream from the first SG site in CD44 exon E5 are exchanged with those located downstream of the SGSG site in exon V3, the SG site in E5 becomes modified with HS and CS. Likewise if the eight amino acids found downstream from the first SG in E5 are placed downstream from the SGSG in V3, this site is modified with CS but not HS. We also show that these sequences cannot direct the modification of CD44 with HS from a distance. Constructs containing CD44 exon V3 in which the SGSG motif was mutated to AGAG were not modified with

Investigative interest has focused on understanding the involvement of CD44 in lymphocyte homing, hematopoiesis, leukocyte activation, and tumor metastasis. This apparent diversity of biological function likely results in part from the numerous alternatively spliced CD44 isoforms, all of which may bestow unique molecular function. For example, CD44 binding to hyaluronic acid may in part provide the mechanism of these biological functions (1), but this interaction is dependent on the alternatively spliced isoform expressed (2). Additionally, a greater metastatic potential resides in RAT carcinoma cells that express exon V6 containing CD44 isoforms (3). Also, V3 containing CD44 isoforms are modified with heparan sulfate (HS) 1 and have been shown to bind growth factors (4,5).
Additional functional differences of CD44 isoforms may in part be influenced by the variety of glycosaminoglycans (GAGs) that are assembled on CD44, which include HS, keratan sulfate, and chondroitin sulfate (CS) (6). Distinct functions have been attributed to CD44 modified with HS versus CS. Cell surface CS-proteoglycans are required by microvascular endothelial cells for migration on fibrin gels (7). The cell surface CS-proteoglycan that is predominantly expressed on microvascular endothelial cells is CD44, and an antibody to CD44 blocks migration on fibrinogen (8). Alternatively spliced isoforms that are modified with HS impart a different function. The isoforms that are modified with HS can bind and present heparin binding MIP-1␤ to lymphocytes in vitro and induce VLA-/VCAM-1 interactions, a required step during lymphocyte homing (9). While the demonstration that CD44 is the proteoglycan that provides this function in vivo has been lacking, a recent report showed that wound microvascular endothelial cells contain mRNA that encodes for isoforms, which include exon V3 (7). To help clarify the functions that are attributed to alternatively spliced CD44 isoforms, we have identified which CD44 exons have functional GAG assembly sites.
Signals for GAG assembly are encoded for by the proteoglycan backbone. GAG synthesis occurs at serines that are followed by glycines (SG), with one or more proximal acidic amino acid. This minimal motif is not always utilized, suggesting that secondary and/or tertiary structure rather than the primary sequence is important (10). In addition, some acceptor sites are modified with only CS, yet others have both CS and HS. This suggests that CS synthesis occurs by default at any site capable of GAG attachment, while HS assembly requires additional signals. Distal signals from the GAG assembly site may exist. It has been suggested that such signals may target proteins to subcellular compartments that contain the HS-or CS-synthesizing enzymes (11)(12)(13). Signals additionally found in proteoglycan backbones when HS is assembled include a proximal hydrophobic residue (14,15) and duplication of the SG motif (16). These additional residues are present in CD44 exon V3.
We have shown previously that CD44 isoforms containing exon V3 are modified with HS. It was not clear, however, if HS was added to the SG site in exon V3 or if it was added to other assembly sites. In this paper we define which of the seven CD44 SG motifs can be modified with CS and HS. Exon V3 is the only exon that has a HS acceptor site. The highly specific nature by which HS is added to the SGSG motif in CD44 exon V3 led us to investigate whether we could identify those sequences in V3 responsible for this event. We also wanted to investigate whether sequences in CD44 exon V3 could direct the modification of a distal SG site with HS if the normal acceptor site in exon V3 was not available. In addition, the HS attached to the recombinant proteins was analyzed and shown to have growth factor binding activity.

MATERIALS AND METHODS
Cell Culture-COS cells were purchased from American Type Culture Collection (Manassas, VA) and maintained in Dulbecco's modified Eagle's medium with 10% fetal bovine serum, penicillin (100 units/ml), streptomycin (100 g/ml), and 2 mM L-glutamine.
PCR reaction conditions were as follows: 94°C for 5 min with 35 cycles of 94°C for 30 s, 57°C for 1 min, and 72°C for 1 min 45 s. PCR products were purified with Qiaquick spin PCR purification kit (Qiagen Corp., Chatsworth, CA). PCR products were digested with enzymes SpeI and BamHI (Boehringer Mannheim), gel-purified, and ligated into SpeI/BamHI cut vector CDM8 with the CD5 signal sequence, and human IgG1 immunoglobulin region (Rg) 3Ј of CD44 insert as described previously (17). All constructs were checked for the correct sequence.
Metabolic Labeling and Enzymatic Digestion-COS cell CD44-Rg fusion protein was produced by using a DEAE-dextran transfection procedure with approximately 10 7 cells as described by Aruffo et al. (1). After Me 2 SO shock and overnight recovery in Dulbecco's modified Eagle's medium with 10% fetal bovine serum, cells were cultured in sulfate free media without fetal bovine serum and labeled with 500 Ci of [ 35 S]NaHSO 4 (NEN Life Science Products) for 36 h. Cells were also labeled with 150 Ci/ml of 6-[ 3 H]GlcN (NEN Life Science Products) for 24 h in Dulbecco's modified Eagle's medium. Labeled supernatants were batch-purified with protein A-Sepharose (Repligen, Cambridge, MA), washed with PBS containing 0.05% Tween 20, and aliquoted equally. One aliquot was left untreated, others were digested for 1 h at 37°C with 50 milliunits of Proteus vulgaris chondroitin ABC lyase, 1 milliunit of Flavobacterium heparinum heparitinase (ICN Immunobiologicals, Lisle, IL), or both. Samples were washed in PBS containing 0.05% Tween 20, heated for 10 min at 95°C in an equal volume of 2 ϫ sample buffer with ␤-mercaptoethanol, and analyzed on 8 -16% Tris/ glycine SDS-PAGE gradient gels (Novex, San Diego, CA). Gels were fixed and soaked in Amplify solution (Amersham Pharmacia Biotech). Dried gels were then analyzed by PhosphorImager (Molecular Dynamics, Sunnyvale, CA) for the presence or absence of modifying sulfate label on fusion proteins.
Cleavage with Cyanogen Bromide-CD44V3-Rg fusion protein was reconstituted in 100 l of 70% formic acid, and a solution of cyanogen bromide (30 mg/100 l) in 70% formic acid was added to provide a 1000-fold molar excess over methionine. The reaction proceeded under a nitrogen cushion for 4 h at 30°C and for an additional 18 h at 22°C in the dark. The digested protein was vacuum-dried, reconstituted in 100 l of 0.4 M Tris-HCl, pH 8.5, containing guanidine HCl (6 M) and Na 2 EDTA (0.1%), and reduced with dithiothreitol (0.02 M) at 50°C for 2 h. Samples were subsequently S-pyridylethylated with 4-vinylpyridine (0.10 M) for 4 h at 22°C. The reaction mixture was acidified to pH 2.0 with 20% trifluoroacetic acid, and the cyanogen bromide peptides were separated by high performance liquid chromatography (HPLC) with a Bio-Sil TSK-250 (7.5 ϫ 600 mm, Bio-Rad) gel filtration column. The chromatography was carried out in 0.1% trifluoroacetic acid containing 40% acetonitrile at a flow rate of 0.25 ml/min.
Cleavage with Asp-N Protease-Cleavage of the cyanogen bromide peptides of V3 wt -Rg and V3 mut -Rg fusion proteins with Pseudomonas fragi Asp-N protease was done in 40 l of 0.1 M Tris acetic acid buffer containing 2 M urea, pH 8.0, at 37°C for 16 h The enzyme to substrate ratio was 1 to 25. The enzymatic digests were acidified with 10% trifluoroacetic acid to pH 2.0 and separated by reverse-phase HPLC (19).
Amino Acid Sequence Analysis-Automated sequence analysis was performed in a pulsed-liquid protein sequencer (model 476A, Applied Biosystems), using manufacturer-released cycle programs (19).
Preparation of Small Oligosaccharides from V3-Rg-V3 wt -Rg was digested with heparitinase I (5 units) for 24 h in PBS at 37°C. The digested material was separated by chromatography on a Sephadex G-50 column (1 ϫ 100 cm, Bio-Rad) that was equilibrated with 10 mM phosphate buffer containing 1.0 M NaCl, pH 7.0. The column was calibrated with [ 14 C]glucose oligomers. Fractions corresponding to 6 -10 glucose sugar residues were pooled and desalted with a Bio-Gel P-2 column (1 ϫ 40 cm, Dionex, Sunnyvale, CA). [ 3 H]GlcN-labeled V3-Rg was used as a tracer for oligosaccharide purification.
Monosaccharide Analysis-Strong acid hydrolysis of the glycopeptides was done in 2 M trifluoroacetic acid at 100°C for 4 h. Samples were analyzed by high performance anion-exchange chromatography on a BioLC System (Dionex, Sunnybale, CA) using a 4 ϫ 250-mm CarboPac PA1 column (Dionex, Sunnybale, CA) using the conditions described previously (20). A set of seven neutral saccharides was run as standards. These included fructose, manose galactose, glucose, xylose, glucosamine, and galactosamine.
Solid Phase Binding Assay of 125 I-b-FGF to V3-Rg-Human b-FGF was obtained from R & D Systems, Minneapolis, MN (carrier-free). Iodination of b-FGF was performed by using IODO-BEADS (Pierce) as described by the manufacturer's procedure. Iodinated b-FGF was separated from free iodine by using Sephadex G-25 (Amersham Pharmacia Biotech) equilibrated with 1% bovine serum albumin/PBS. The specificity of iodinated b-FGF was 10 mCi of 125 I/mg of b-FGF. Falcon MicroTest III 96-well assay plates (Becton Dickinson, Lincoln Park, NJ) were coated with CD44 fusion protein overnight at 4°C in Tris-buffered saline buffer. Coated wells were blocked with 1% bovine serum albumin/PBS for 1 h at room temperature and washed with the same buffer three times. [ 125 I]b-FGF was added to each well (0.025 mCi) and incubated at room temperature for 1 h. Wells were washed three times and each well counted with a ␥-counter.

Identification of the CD44 Exons That Are Modified with CS
and HS-We and others have reported previously that the post-translational modification of CD44 with HS requires the presence of the variably spliced CD44 exon V3 (4 -6). In addition, the expression of a fusion protein with only 24 amino acids of exon V3, which included the SGSG motif, resulted in production of a protein modified with CS and HS (16). However, it was not clear if HS could be assembled on other sites; also unknown was the location of the CS assembly sites. The min-imal GAG assembly motif is the amino acid sequence SG. This motif is found in five CD44 exons: E5, E15, E16, V3, and V10. In this study we analyzed all the possible CD44 GAG assembly sites to determine whether they are modified with HS and CS.
HS and CS containing CD44 exons were identified by analyzing individual CD44 exons expressed separately as recombinant immunoglobulin (Rg) fusion proteins (Fig. 1). The fusion proteins were analyzed for the disappearance of [ 35 S]NaHSO 4 label after enzymatic digestion with chondroitin ABC lyase and heparitinase. The retention of some label is always observed and is in part a results of keratan sulfate modification (21). Fig.  2 demonstrates that V3 wt -Rg contains both HS and CS, and E5-Rg was modified with only CS. Unexpectedly, V10-Rg was not modified with either HS or CS. The potential GAG assembly site in exon V10 is positioned at the very end of this exon. The first five amino acids of the following exon, E15, are DQDTF. Since this is a constitutively expressed exon, these amino acids were included in V10-Rg, thus creating a motif that is composed of acidic and hydrophobic amino acids, the hallmark for GAG synthesis. In addition, HS and CS were not detected on E15-Rg and E16-Rg proteins (data not shown). In summary, only the CD44 fusion proteins composed of exons E5 or V3 supported GAG assembly.
HS and CS Are Added to the SGSG Motif in Exon V3-In order to establish that the SGSG motif is the site of HS and CS assembly, a fusion protein, V3 mut -Rg, was made where the SGSG was mutated to AGAG (Fig. 1). This protein was analyzed for HS and CS assembly by monitoring accumulation of [ 35 S]NaHSO 4 label, followed by digestion with enzymes specific for HS and CS. These experiments demonstrated that GAG assembly did not take place on the fusion protein V3 mut -Rg (data not shown). In addition, to further define the usage of the SGSG site V3 mut -Rg and V3 wt -Rg were cleaved with cyanogen bromide, and the resulting peptides were purified by gel permeation chromatography and identified by amino-terminal sequence analysis (Fig. 3, A and B). A comparison of the profiles reveals that V3 wt -Rg contains three peptide pools (pools A, B, and C) of distinct MW ranges, while digestion of V3 mut -Rg produced only two peptide pools (pools B and C). Peptide pool B generated from both fusion proteins contained the Rg domain. Peptide pools A and C contained peptides that consisted of V3 residues. Pool A, generated only from V3 wt -Rg, was of a higher molecular weight, while pool C contained V3 peptides of lower molecular weight. Enzymatic digestion was used to identify which peptide pools contained HS and CS: V3 wt -Rg pool A was the only pool that contained the two GAGs.
The ratio of HS and CS was determined by analyzing [ 3 H]GlcN-labeled and unlabeled V3 wt -Rg. The HS and CS were released from pool A peptides by ␤ elimination. Samples of released HS and CS were run over a Sephadex G-50 column before and after digestion with either heparitinase I and heparinase or with chondroitin ABC lyase. The percent decrease in peak area was calculated for the unlabeled material, and the counts/min for each fraction was determined for the radiolabeled carbohydrates. The molar ratio of HS to CS was determined by both methods to be 3 to 2.
We also wanted to determine whether both serines of the SGSG motif in V3 wt -Rg were being utilized for GAG modification. To carry out this analysis, smaller V3 peptide fragments were required. Peptide pool A was resistant to V8 and Asp-N proteases. Protease-sensitive V3 wt -Rg peptide pool C was found to be modified with xylose and galactose, which are constituents of the GAG precursor linkage oligosaccharide. Therefore, pool C peptides were cleaved with Asp-N protease, and the resulting peptides were separated by reverse-phase HPLC (Fig.  4). A comparison of the two chromatographs reveals that the elution profiles of the V3 wt -Rg peptides in peak 3 and 4 are much broader than the peaks corresponding to V3 mut -Rg peptides. This is indicative of potential glycosylation. Amino acid sequencing and xylose determination of all 12 peptides confirms that the two V3 wt -Rg peptides, from peaks 3 and 4, contained the sequence SGSG and were modified with xylose (Table I). In addition, as determined by recovery percentages during amino acid sequencing, both serines in the SGSG motif were occupied on 50% of the peptides. The other 50% of the serines were not modified. Taken together these results demonstrate that both HS and CS assembly occurs at the SGSG motif in CD44 exon V3.
Identification of the Sequence Motif Responsible for HS Addition in CD44 Exon V3-We next investigated in more detail the sequence surrounding the SG sites to identify signals for directing HS versus CS assembly. Negatively charged and hydrophobic residues located proximal to a SG site have been proposed to play a role in GAG assembly at SG sites (14 -16). In addition, repetitive SG sites have been shown to enhance HS assembly (16). The first SG motif in exon E5 has a stretch of acidic amino acids preceding it. In exon V3 there are acidic residues both upstream and downstream of the SGSG motif. The acidic residues located downstream of the SGSG tetrapeptide are flanked by hydrophobic residues. To determine whether the hydrophobic and acidic residues following the SGSG site in exon V3 are responsible for HS modification, the eight aa following the SGSG motif in exon V3 were exchanged with the corresponding amino acids in exon E5 (E5 V3/8aa -Rg,   5). This exchange effectively switches the addition of HS from exon V3 to exon E5 (Fig. 6). Thus, replacing SSSERSST with IDDDEDFI after the first SG motif in E5 results in a protein product that is modified with HS and CS (Fig. 6). On the other hand, the V3 SGSG motif followed by the sequence SSSERSST rather than IDDDEDFI (V3 E5/8aa -Rg) is modified with CS but not HS (Fig. 6). These results suggest that the presence of acidic residues flanked by hydrophobic residues downstream of the SG motif are necessary for the addition of HS, while duplication of the SG motif is not required for HS modification.
The Presence of CD44 Exon V3 Does Not Result in the Modification of Distal SG Sites with HS-Next we investigated if the regulatory sequences in exon V3, which directs HS assembly at the SGSG site, could drive the modification of a distal SG motif when the SGSG motif in V3 was mutated to AGAG. Four constructs were made that contained either wild type or mutant V3 in combination with other CD44 exons containing SG sites (Fig. 5). Wild type exon V3 or mutant exon V3 were included in fusion proteins containing exon E5 (E5V3 wt -Rg and E5V3 mut -Rg, respectively) or exons V4-V10 (V3 wt -V10-Rg and V3 mut -V10-Rg, respectively). These exon combinations were chosen since at least one of the SG sites in E5 is modified with  4. Reverse-phase high performance liquid chromatography of V3 wt -Rg and V3 mut -Rg peak C CNBr peptide after further digestion with Asp-N protease. The peptides were separated on a 2.1 ϫ 100-mm RP-300 column. The elution pattern is shown for 150 pmol of the peptides from V3 mut -Rg (A) and 150 pmol of the peptides from V3 wt -Rg (B). Elution of the peptides was achieved with a 60-min gradient of 0.1% trifluoroacetic acid in water to 45% acetonitrile containing 0.1% trifluoroacetic acid at a flow rate of 100 l/min at 40°C.

FIG. 2. GAG modification of CD44 exons expressed as independent Rg fusion proteins. [ 35 S]NaHSO 4 -labeled V3-Rg (A), E5-Rg (B), and V10-Rg (C)
were recovered from the supernatant of COS cell transfectants, purified, and divided equally into four aliquots. One aliquot was left untreated, and the others were digested for 1 h with heparitinase, chondroitin ABC lyase, or both enzymes. The proteins were then resolved by SDS-PAGE and analyzed by radiography.
CS, while the single SG site in V10 is not utilized. Both fusion proteins that contained wild type exon V3, E5V3 wt -Rg and V3 wt -V10-Rg, were found to be modified with both CS and HS (Fig. 7). This contrasts with the fusion proteins containing the AGAG mutation in exon V3, E5V3 mut -Rg, which was only modified with CS. In addition, V3 mut -V10-Rg was not modified with GAGs (Fig. 7). These findings suggest that the sequences in exon V3 which direct GAG assembly at the proximal SGSG site do not influence GAG assembly at distal SG sites.
125 I-b-FGF Binds V3 wt -Rg but Not V3 mut -Rg-Previously we demonstrated that b-FGF can bind HS-modified CD44 produced in COS cells (4). Here we show that 125 I-b-FGF can bind HS-modified exon V3 wt -Rg, demonstrating that this exon, when independently expressed, is fully functional. This was demonstrated by 125 I-b-FGF to adding increasing concentrations of immobilized V3 wt -Rg and V3 mut -Rg on a microtiter plate (Fig. 8). The interaction was concentration-dependent and saturable. 125 I-b-FGF did not bind to V3 mut -Rg, confirming the requirement of HS for the interaction. In addition, the interaction between 125 I-b-FGF and V3 wt -Rg was inhibited by 20 g/ml heparin (porcine intestinal mucosa) and by 20 g/ml purified V3 wt -Rg-HS oligosaccharides (6 -10-mers) generated by heparitinase I digestion (data not shown). DISCUSSION GAG biosynthesis is known to occur on serines at SG sites, but not all SG sites found in proteoglycans become modified. Additionally, some SG sites are modified with CS only, while others are substrates for both CS and HS synthesis. Previously, we showed that HS was added to CD44 isoforms that include exon V3, but it was not clear at which assembly site the synthesis occurred (4). CD44 contains seven SG motifs encoded in    [ 35 S]NaHSO 4 -labeled E5V3 wt -Rg, E5V3 mut -Rg, V3 wt -V10-Rg, and V3 mut -V10-Rg proteins were recovered from the supernatant of COS cell transfectants, purified, and divided equally into four aliquots. One aliquot was left untreated, and the others were digested for 1 h with heparitinase, chondroitin ABC lyase, or with both enzymes. The proteins were then resolved by SDS-PAGE and analyzed by radiography. five separate exons. Table II presents the sequences that surround the SG motifs and summarizes the results that determined which assembly sites were used. Inspection of the sequences reveals that they all contain acidic and hydrophobic residues, which are hallmarks for GAG acceptor sites (22). However, HS and/or CS synthesis only occurred on E5-Rg and V3 wt -Rg. E5-Rg was modified with only CS, and V3 wt -Rg was modified with both CS and HS. Both V3 wt -Rg and E5-Rg have a cluster of at least three acidic residues, V10-Rg and E15-Rg contain a DQD motif proximal to the SG, and E16-Rg only has a couple of acidic residues at a distance. Inspection of other proteins that are modified with GAGs reveals that some proteoglycans contain unclustered acidic residues not unlike V10-Rg and E15-Rg (16). Taken together, these data strongly support the notion that a simple linear sequence is not sufficient to initiate GAG synthesis but that the secondary and/or tertiary structure around the SG motif is critical. Mann et al. (10) used energy minimization calculation to predict the structure of the seven amino acids at the GAG attachment site in decorin and found that the SG was embedded in a ␤-turn. Currently, the protein structure of the GAG assembly sites in CD44 is not known.
The finding that only CD44 exon V3 supports HS assembly and that HS is added to the SGSG site located in this exon led us to investigate whether we could identify CD44 sequences which direct the modification of a given SG site with a specific GAG. The results of experiments described herein show that the eight amino acids located downstream of the SG site are involved in directing the modification of the SGSG in V3 with HS. Replacing these eight aa with the corresponding aa located downstream from the first SG in CD44 exon E5, which is only modified with CS, allows the SGSG site in V3 to be modified with CS only. Conversely, we were able to show that if the eight aa located downstream of the SGSG site in CD44 exon V3 are placed downstream of the first SG site in E5, then E5 is modified with both HS and CS.
We also explored the possibility that sequences in CD44 exon V3 are capable of driving the addition of GAGs to distal SG sites when the normal site of GAG modification is not available. This was done by analyzing the GAG content of two polypeptides that contained a mutant V3 exon, in which the SGSG site was changed to AGAG, as well as additional SG sites. The mutant V3 exon was included in polypeptides containing SG sites, which were either not modified with GAGs or modified with CS. In both cases we found that inclusion of the mutant V3 exon does not alter the GAG modification of distal SG sites. This suggest that the enzyme(s) involved in adding the xylose moiety to the SG residue, the initial step in GAG modification, binds to sequences proximal to the SG site that will be modified with GAGs.
HS-modified isoforms of CD44 are expressed by splicing of alternative exon V3, thereby imparting the ability to interact with HS-binding proteins. Previously we had shown that HSmodified CD44 can bind b-FGF (4). In this paper we show that b-FGF binds CD44 HS-modified exon V3 in a dose-dependent manner. This demonstrates that exon V3 is functional when expressed independently, allowing for it to be used to generate artificial proteoglycans (see accompanying paper (23)).
Different GAGs impart unique functions to glycoproteins, and it is of interest to determine the different functions of CD44 isoforms. Exon E5 is expressed in all isoforms of CD44, and here we have shown that this exon and alternatively spliced exon V3 are modified with CS. CS-modified CD44 has been shown recently to be involved in microvascular endothelial migration on fibrinogen (7). Previously it had been demonstrated that wounded migrating aortic endothelial cells convert from expressing predominately HS to CS-A and -B (8). In contrast, when CD44 exon V3 is expressed, CD44 is able to concentrate and present HS-binding growth factors and chemokines. In this way, CD44 may provide a critical step in a large range of biological functions.
Gaining an understanding of the sequence requirements for GAG assembly led to the idea that GAG assembly sites may be introduced into other proteins. These proteins would then acquire an additional function and could be used to deliver HSbinding proteins to locations of interest. The accompanying paper (23) addresses this concept of creating artificial proteoglycans.