The Glycosaminoglycan Attachment Regions of Human Aggrecan*

Aggrecan possesses both chondroitin sulfate (CS) and keratan sulfate (KS) chains attached to its core protein, which reside mainly in the central region of the molecule termed the glycosaminoglycan-attachment region. This region is further subdivided into the KS-rich domain and two adjacent CS-rich domains (CS1 and CS2). The CS1 domain of the human is unique in exhibiting length polymorphism due to a variable number of tandem amino acid repeats. The focus of this work was to determine how length polymorphism affects the structure of the CS1 domain and whether CS and KS chains can coexist in the different glycosaminoglycan-attachment domains. The CS1 domain possesses several amino acid repeat sequences that divide it into three subdomains. Variation in repeat number may occur in any of these domains, with the consequence that CS1 domains of the same length may possess different amino acid sequences. There was no evidence to support the presence of KS in either the CS1 or the CS2 domains nor the presence of CS in the KS-rich domain. The structure of the CS chains was shown to vary between the CS1 and CS2 domains, particularly in the adult, with variation occurring in chain length and the sulfation of the non-reducing terminal N-acetyl galactosamine residue. CS chains in the adult CS2 domain were shorter than those in the CS1 domain and possessed disulfated terminal residues in addition to monosulfated residues. There was, however, no change in the sulfation pattern of the disaccharide repeats in the CS chains from the two domains.

Aggrecan is the major proteoglycan in the extracellular matrix of tissues such as cartilage (1,2) and intervertebral disc (3) and endows these tissues with their turgid nature or ability to resist compression. These properties are related to the high fixed charge density of the aggrecan and its ability to form aggregates in association with hyaluronan, which in turn are intimately related to its structure.
Aggrecan consists of a central core protein possessing multiple carbohydrate side chains. The glycosaminoglycans (GAGs), 2 chondroitin sulfate (CS) and keratan sulfate (KS), form the bulk of the carbohydrate (1) and are responsible for the fixed charged density. O-linked and N-linked oligosaccharides are also present (4), but their function is less clear. The aggrecan core protein is divided into multiple structural regions (Fig. 1), which serve unique functions (5)(6)(7)(8). Three globular regions (G1, G2, and G3) are present, with two being at the amino terminus of the core protein (G1 and G2) and one at the carboxyl terminus (G3). The amino-terminal G1 region is responsible for the interaction with hyaluronan (9) and is separated from the G2 region by a short interglobular domain. The G2 and G3 regions are separated by a long extended domain to which the majority of GAG chains are attached. This GAG-attachment region is subdivided into three adjacent domains, termed the KS-rich domain (10), the CS1 domain, and the CS2 domain. Although CS and KS predominate in their respective domains, it is not clear that they do so exclusively. This is particularly true in the adult, where KS content increases and CS content decreases (11), and it has been suggested that some of the additional KS may reside in the CS1 or CS2 domains (3,12,13).
Human aggrecan is unique among the species studied to date in possessing structural polymorphism within its CS1 domain (14). In all species, the CS1 domain is composed of adjacent repeats of a relatively conserved amino acid sequence, although the number of repeats does vary between species. In the human, the repeat sequence is 19 amino acids long and possesses two attachment sites for CS. However, unlike other species, the number of repeats in the human varies between individuals. To date, human CS1 domains varying from 13 to 33 repeats have been described, with sizes of 26 -28 repeats being most common. The structure of the amino acid repeats does not show absolute conservation as the amino acids at positions 6, 11, and 12 of each repeat may vary (Fig. 1). The repeats may be divided into three groups: the aminoterminal repeats possessing threonine at residue 6 and aspartic acid at residue 12, the central repeats possessing alanine at residue 6 and aspartic acid at residue 12, and the carboxyl-terminal repeats possessing threonine at residue 6 and glutamic acid at residue 12. The carboxyl-terminal repeats may be further subdivided according to whether aspartic acid or glutamic acid resides at residue 11. At present, it is unclear whether size polymorphism results from repeat number variation in all three CS1 subdomains or just the longer central subdomain. The purpose of this work was to resolve some of these unanswered questions by determining the origin of size polymorphism in the CS1 domain, whether KS is present in the CS1 or CS2 domains or CS is present in the KS-rich domain and whether CS in the CS1 and CS2 domains is of an identical structure.

EXPERIMENTAL PROCEDURES
Source of DNA-Genomic DNA was isolated from outdated peripheral blood obtained from the Red Cross blood bank. DNA was purified using DNA isolation kits (Qiagen).
Aggrecan Polymorphism Analysis-The aggrecan CS1 encoding domain was amplified by PCR using 50 of ng DNA and upstream sense and downstream antisense primers, 5Ј-TAGAGGGCTCTGCCTCTGGAGTTG-3Ј and 5Ј-AGGTCCCCTACCGCAGAGGTAGAA-3Ј, respectively (14). To ensure an adequate yield of amplicons that are commonly in the 1-2-kbp range, the Expand long template PCR system (Roche Applied Science) was used, according to the manufacturer's instructions for buffer 2. Samples were denatured for 2 min at 95°C and then subjected to 30 cycles of denaturation for 40 s at 95°C, annealing for 40 s at 66.5°C, and extension for 2 min at 68°C. After the final cycle, the mixture was left at 68°C for 10 min.
Agarose Gel Electrophoresis-PCR products (10 l from a 50-l PCR reaction) were analyzed in 0.8% agarose (Invitrogen) gels in Tris borate EDTA buffer at 90 V for 2 h. A standard DNA ladder was run on the same gel to permit chain lengths, and hence, repeat numbers to be determined. DNA was visualized by ethidium bromide staining.
Nucleotide Sequencing-PCR products representing each allele were isolated by TA cloning (Invitrogen) following insertion into pCR2.1-TOPO and expansion in TOP10-competent Escherichia coli, according to the manufacturer's instructions. The identity and size of each cloned PCR product was verified by PCR amplification and agarose gel electrophoresis, as described above. Each plasmid was linearized using EcoRI and then sequenced by Big Dye terminator cycle sequencing on an ABI Prism 3100 DNA sequencer. All sequence reactions were repeated two times.
Source of Cartilage-Articular cartilage was obtained from the femoral condyles at the time of autopsy from individuals ranging in age from the newborn to the mature adult. All tissue was collected within 20 h of the time of death.
Preparation of Aggrecan-Cartilage was finely divided then extracted with 4 M guanidunium chloride, 0.1 M sodium acetate, pH 6.0, containing proteinase inhibitors (11). Extraction was for 48 h at 4°C with continuous stirring. The extract was then filtered through glass wool to remove the cartilage residue, and CsCl was added to give a density of 1.5 g/ml (11). This mixture was subjected to density gradient centrifugation at 100,000 g av for 48 h at 10°C (15). The resulting density gradients were fractionated, and fractions with a density greater than 1.55 g/ml were pooled and then dialyzed exhaustively against water before freeze drying.
Trypsin Digestion-Aggrecan was dissolved at 1 mg/ml in 50 mM Tris/HCl, pH 7.5, containing 150 mM NaCl and 1 mM CaCl 2 . Trypsin (tosylamide phenylethyl chloromethyl ketone-treated, Sigma) was dissolved at 1 mg/ml in the above buffer and added to the aggrecan solution at 10 l/ml (10 g of trypsin/1 mg of aggrecan) (16). The mixtures were incubated at 37°C for 20 h. The reaction was terminated by the addition of soybean trypsin inhibitor (Sigma) at 5 g/g of trypsin.
DEAE Chromatography-The DEAE-Sephacel (Amersham Biosciences) was packed into a 17 ϫ 1.5-cM column (Bio-Rad) to give a 5-ml bed volume. The resin was equilibrated in 50 mM Tris/HCl, pH 7.5, containing 150 mM NaCl. The trypsin-digested aggrecan was loaded in the same buffer, and bound material was eluted with a 0.15-1.5 M NaCl gradient (100 ml) at a flow rate of 27 ml/h. Fractions (2 ml) were collected and analyzed for GAG content. Selected fractions were also analyzed by agarose gel electrophoresis and immunoblotting.
Sepharose CL-4B Chromatography-The Sepharose CL-4B (Amersham Biosciences) was packed into a 120 ϫ 1-cM column with a bed volume of 90 ml. The resin was equilibrated in 200 mM sodium acetate, pH 5.5. The trypsin-digested aggrecan (1 mg in 1 ml of the same buffer) was loaded on the column and then eluted at a flow rate of 6 ml/h. Fractions (1 ml) were collected and analyzed for GAG content. Selected fractions were also analyzed by agarose gel electrophoresis and immunoblotting.
Agarose Gel Electrophoresis-Samples of intact or trypsin-digested aggrecan (5 l of 1 mg/ml) were analyzed by agarose gel electrophoresis (17). 1.2% agarose gels were prepared by dissolving 0.48 g of agarose (Seakem) in 40 ml of 0.1 M Tris acetate, pH 7.3. Gels were cast in 10 ϫ 10-cM plates using 3-mM spacers and 10-well combs, and electrophoresis was performed using a Bio-Rad mini gel system. Samples were loaded in 20 l of 0.5% agarose, 50 mM Tris, 0.384 M glycine, pH 8.3, containing 0.005% bromphenol blue, which was allowed to gel before electrophoresis at 90 V for about 1.5 h. Electrophoresis was terminated when the tracking dye had moved about 2.5-3 cM. Gels were stained with 0.02% toluidine blue (Chroma-Gesellschaft), 0.5% Triton X-100 in 3% acetic acid for 1-2 h and then destained initially in 3% acetic acid and finally water. When the background was clear, gels were dried onto GelBond (Mandel). Alternatively, gels were processed for immunoblotting.
Immunoblotting-Nitrocellulose membranes (Bio-Rad) were wet and then incubated for 5 min in 1% cetyl pyridinium chloride (Sigma) dissolved in 30% 1-propanol (18). 0.15 M NaCl was then added, and incubation continued with gentle agitation for 15 min. The derivatized membrane was rinsed repeatedly with 0.15 M NaCl with vigorous shaking until there was no more foaming. Aggrecan or trypsin-generated fragments were then transferred to the derivatized membrane by capillary blotting, overnight at room temperature. Membranes were then blocked overnight in 5% skim milk (Carnation instant skim milk powder) dissolved in Tris-buffered saline (20 mM Tris/HCl, pH 7.6, 137 mM NaCl) containing 0.1% Tween 20 (BDH Laboratory Supplies). The blocked blots were probed with a rabbit anti-CS1 antiserum at a dilution of 1:1,000 or a mouse anti-KS antibody (19) at a dilution of 1:5,000 followed by immunolocalization using the ECL system (Amersham Biosciences). The rabbit anti-CS1 antiserum was raised to a synthetic peptide, GRIEWPSTPTVGELGC, conjugated to ovalbumin (20). Residues 2-14 of the peptide correspond to an amino acid sequence at the aminoterminal end of the human CS1 domain.
GAG Digestion-For keratanase II and endo-␤-galactosidase (BioLynx Inc.) digestion following trypsin treatment, the samples were dialyzed into 10 mM sodium acetate, pH 6.0. Enzymes were then added to the sample to give a final concentration of 0.005 microunits of enzyme/g of proteoglycan (21). Incubation was carried out at 37°C for 4 h. For chondroitinase ABC (MP Biomedicals Inc.) digestion following trypsin treatment, the samples were dialyzed into 100 mM Tris, 100 mM sodium acetate, pH 7.3. Chondroitinase was then added at 0.05 microunits/g of proteoglycan (21), and the samples were incubated overnight at 37°C.
GAG Analysis-GAG analysis was carried out using the dimethyl methylene blue (DMMB) dye binding assay (22) adapted for use with 96-well plates (23).
CS and KS Analysis-CS and KS present in trypsin-generated proteoglycan fragments were analyzed by fluorophore-assisted capillary electrophoresis (24,25) following digestion of the samples with chondroitinase ABC or keratanase II/endo-␤-galactosidase, respectively. The aggrecan core protein is depicted as possessing three globular regions (G1, G2, and G3) and intervening extended domains. G1 and G2 are separated by an interglobular domain (IGD), and G2 and G3 are separated by three adjacent glycosaminoglycan-attachment domains (KS, CS1, and CS2). The human CS1 domain is composed of multiple 19-amino acid (AA) repeats and exhibits size polymorphism as the repeat number may vary from 13 to 33. Within each repeat, there may be variation in amino acid residues 6, 11, and 12, depending on the location of the repeat within the CS1 domain.

RESULTS
CS1 Polymorphism-Genomic DNA from different individuals was analyzed by PCR using primers spanning the CS1-encoding domain. The length of the PCR product was then assessed by agarose gel electrophoresis and used to determine the number of 57-bp repeats. This indicated that the number of 19-amino acid repeats present in the CS1 domain ranged from 20 to 29 in the population under study.
Nucleotide sequence analysis was then used to determine the structural heterogeneity present within the repeats (Fig. 2). The published aggrecan DNA sequence contains 19 repeats (6), with 5, 11, and 3 repeats being in the amino-terminal, central, and carboxyl-terminal subdomains, respectively. Repeat number variation was shown to be present in all three subdomains. Of the nine alleles analyzed, the number of amino-terminal repeats ranged from three to six, and the number of central repeats ranged from 11 to 22. There were always two or three carboxyl-terminal repeats, with the repeat adjacent to the central subdomain being absent or present. There appeared to be no relationship between total repeat number and the number of repeats within the subdomains, and alleles with the same total number of repeats could show different patterns of heterogeneity.
GAG Distribution-Aggrecan from adult human articular cartilage showed size heterogeneity when analyzed by agarose gel electrophoresis, with a three-component pattern being visualized by toluidine blue staining (Fig. 3A). Immunoblotting revealed that each of these components possess the CS1 domain (Fig. 3B) and also possess KS (Fig. 3C). It is therefore likely that the components differ by proteolytic processing within the CS2 domain. Trypsin digestion was carried out to further analyze the GAG distribution among the different domains within the GAG-attachment region. In human aggrecan, there are no lysine or arginine residues within the CS1 domain, and hence, it remains intact following trypsin treatment (6). In contrast, the KS-rich domain and the CS2 domain are cleaved at 3 and 17 sites, respectively, to yield fragments with 20 -140 amino acid residues. The CS1 domain will vary in size depending on the number of repeats that are present, and it is contained within a fragment ranging from 254 to 734 amino acid residues (14). Hence, the CS1 fragment is always much larger than any of the trypsingenerated fragments from the KS-rich or CS2 domains. Upon agarose gel electrophoresis, the trypsin-generated aggrecan fragments showed a three-component pattern with all components being of greater mobility than any of the components present in the intact molecule (Fig. 3A). Upon immunoblotting, only the slowest migrating of the trypsin-generated components was shown to contain the CS1 domain (Fig. 3B). In contrast, the KS chains migrated in the vicinity of the component of intermediate mobility (Fig. 3C).
Trypsin digestion was also used to analyze aggrecan from individuals of different ages from the neonate to the mature adult. In all cases, the three-component pattern was observed on toluidine blue staining (Fig.  4A), although the mobility of each component increased with age. In addition, the staining intensity for the intermediate component increased with age, and that for the most mobile component decreased with age. Chondroitinase digestion eliminated toluidine blue staining in the fastest and slowest migrating components, indicating the presence of CS, and diminished staining in the intermediate component, suggesting the presence of both CS and KS. Immunoblotting verified the presence of KS only in the region of the intermediate migrating components (Fig. 4B) and the presence of the CS1 domain in the slower migrating component (data not shown). It was apparent that the degree of anti-CS1 immunoreactivity varied between different individuals, probably indicative of sequence variation within the peptide epitope.
To determine whether CS and KS were present on the same peptide fragments, trypsin-generated fragments from a mature adult were treated with chondroitinase, keratanase, or endo-␤-galactosidase prior to analysis by agarose gel electrophoresis and immunoblotting. The mature adult was chosen as it contains the maximal content of KS, although similar results were obtained at all ages. Analysis of the CS1 domain showed no change in mobility upon treatment with either keratanase or endo-␤-galactosidase, supporting the absence of KS (Fig. 5A). In contrast, the KS-containing fragments showed no change in mobility upon chondroitinase treatment (Fig. 5B), suggesting that KS and CS do not occur on the same peptide. As expected, both keratanase and endo-␤-galactosidase abolished KS immunoreactivity.
To further verify the unique migration of the CS1 domain and keratan sulfate, the trypsin-generated aggrecan fragments were fractionated FIGURE 2. Analysis of aggrecan CS1 organization. Aggrecan alleles were analyzed for total repeat number and the number of repeats possessing equivalent amino acid sequence based on repeat residues 6, 11, and 12 (X 6 X 11 X 12 ). These repeats can be divided into amino-terminal (X 6 X 11 X 12 ϭ TED), central (X 6 X 11 X 12 ϭ AED), and carboxylterminal (X 6 X 11 X 12 ϭ TE/DE) subdomains. Alleles analyzed in the present study (A-I) are compared with the published aggrecan allele. according to their size by gel filtration chromatography through Sepharose CL-4B. A bimodal profile was observed upon analysis of the column fractions for GAG content (Fig. 6A). Immunoblotting revealed that the CS1 domain was present in only the less retarded pool (Fig. 6B) and was distinct from the KS-containing fragments, which were more retarded (Fig. 6C). In addition, the trypsin-generated fragments were also analyzed by ion exchange chromatography through DEAE-Sephacel. GAG analysis via the DMMB dye binding assay revealed that elution was mainly in one peak with a trailing edge (Fig. 7A). Toluidine blue staining of the fragments in the fractions following agarose gel electrophoresis showed that the fastest and slowest migrating component were present in the major peak (Fig. 7B), with the slowest migrating component representing the CS1 domain (Fig. 7C). In contrast, all the KS-containing fragments were present in the trailing edge (Fig. 7D).
Thus, it appears that trypsin generates aggrecan fragments that contain either CS or KS but not both GAGs. GAG Structure-To determine whether CS chain composition may vary between the CS1 and CS2 domains, the two pools of trypsin-generated fragments separable by Sepharose CL-4B chromatography were subjected to carbohydrate analysis following chondroitinase digestion ( Table 1). As expected, the ratio of 6-sulfated to 4-sulfated disaccharides increased with age. There was, however, no change in this ratio in the larger pool containing the CS1 domain and that in the smaller pool containing the CS2 fragments derived from the same aggrecan preparation. Differences were noted in the sulfation pattern of N-acetyl galac-   tosamine residues at the non-reducing terminus of the CS chains. Although the CS1 domains always contained 4-sufation exclusively, the CS2 domains derived from adult aggrecan also contained 4,6-disulfation. With the exception of the newborn, it was also apparent that CS chain length was lower in the fragments derived from the CS2 domain than those derived from the CS1 domain.
The KS chains present in the two Sepharose CL-4B pools were also analyzed for composition changes with age following keratanase and endo-␤-galactosidase digestion (Table 2). However, in this case, the GAG chains are not on CS1-or CS2-derived fragments and probably represent different sized fragments derived mainly from the KS-rich domain. In both juvenile and adult specimens, the major disaccharide was 6-sulfated on the N-acetyl glucosamine residues with a minor proportion also possessing fucose substituted on the N-acetyl glucosamine residues. The remainder of the disaccharides (about 30 -40%) were disulfated, possessing 6-sulfation on both the galactose and the N-acetyl glucosamine residues. This composition is similar to that present in corneal keratan sulfate and accounts for the immunoreactivity of the KS chains. The number of disaccharides to sialic acid, an indicator of KS chain length, was, however, smaller in the cartilage KS chains.

DISCUSSION
The present results demonstrate that trypsin digestion of human aggrecan generates fragments that possess either CS or KS but not both GAGs. The ability to isolate the CS1 domain of aggrecan in an intact state allows its separation from all other fragments by virtue of its larger size, and this enables its deficiency in KS to be conclusively established. With respect to the CS2 domain, the deficiency in KS cannot be proven categorically as it is not possible to isolate this domain in an intact state. One could argue that CS and KS chains are present within the CS2 domain but at discrete sites that just happen to be separated by trypsin cleavage sites. This is, however, an extremely improbable scenario based on the amino acid sequence of the CS2 domain (6), and it is most likely that the majority of KS chains within the GAG-attachment region of aggrecan are confined to the KS-rich domain. Such a scenario has previously been suggested to occur in aggrecan derived from chick cartilage (26). KS chains may also be present in the G1, interglobular domain, and G2 regions (27,28). The results also imply that there are no CS chains within the KS-rich domain. Thus, it would appear that the bulk of the increase in KS associated with increasing age in cartilage aggrecan is due to a combination of increased substitution within the KS-rich domain and proteolytic processing decreasing the size of the CS2 domain.
Although KS appears to be absent from the human aggrecan CS1 and CS2 domains, it is not clear that CS represents the only type of glycosylation that is present. It is known that aggrecan is substituted with O-linked oligosaccharide chains that resemble the linkage region through which KS is attached to serine or threonine residues on the core protein (4). It is thought that with increasing age, some O-linked oligosaccharides are replaced by keratan sulfate to give an increased core protein substitution (29). Presumably, such O-linked oligosaccharides are located within the KS-rich domain, but one cannot unequivocally exclude their presence also in the CS1 or CS2 domains.
It is interesting to note that CS chain termination and length can vary between the CS1-and CS2-derived trypsin fragments in the same aggrecan preparation, particularly in the adult. This raises the question of whether CS chains on an individual aggrecan molecule are different. As CS chain composition varies with age, one factor that could potentially influence the analysis of CS structure is the increased proteolytic processing of the CS2 domain that takes place throughout life (30). Such processing results in the loss of CS2-derived fragments from the cartilage, and hence, the CS2 trypsin-generated fragments studied in this work might be expected to possess a greater contribution from the more intact proteoglycans synthesized later in life. One would then predict that CS within the CS2 fragments should contain an increased 6:4sulfation ratio relative to CS in the CS1 domain. This, however, was not the case. In addition, proteolytic processing cannot account for the observed variation in CS chain termination. Thus, on a given aggrecan molecule, the CS chains in the CS2 domain may indeed be structurally distinct from those in the CS1 domain. This being the case, the CS chains in the CS2 domain may also be immunologically distinct from those in the CS1 domain and behave differently in immunoassays of aggrecan fragments that are based on CS epitopes (31)(32)(33).
The CS1 domain of human aggrecan shows size polymorphism due to variability in the number of adjacent 19-amino acid repeats (14). It has been suggested that this may be of functional consequence due to variation in the number of CS chains that may be present on aggrecan molecules. Aggrecan molecules possessing the longer CS1 domains would possess an increased fixed change density and presumably enhanced osmotic properties. One might therefore predict that tissue possessing aggrecan with shorter CS1 domains may be functionally inferior and more susceptible to degeneration, and such an association has been reported (34 -36). From the present work, it is apparent that the variation in CS1 size is not due to a single mechanism in which repeat variation occurs at a common site, such as within the long central subdomain. It is apparent that repeat deletion or insertion may occur throughout the CS1 domain so that individuals with the same number of CS1 repeats may have different CS1 amino acid sequences. It remains to be established whether such amino acid sequence variation is of any functional consequence in relation to altered immunogenicity.
Finally, one must consider whether the results obtained with human aggrecan also apply to other species. With respect to CS1 size polymorphism, there is no evidence to date for its occurrence in any other species, and hence, its functional relevance, if any, may be confined to the human. It is interesting to speculate whether an increase in CS1 domain length might be an evolutionary response to bipedalism as an adaptation to provide more functional aggrecan to resist increased loading in weight-bearing cartilages, although such a relationship awaits biomechanical validation. With respect to GAG distribution, it is not possible to perform similar analyses in other species as the human is unique in its absence of lysine and arginine residues within the CS1 domain. Hence, this domain cannot be isolated in an intact state from other species and so separated from trypsin-derived KS domain and CS2 domain fragments. However, there are data in the literature from clostripain digestion of chick aggrecan suggesting that KS occurs in a unique region of the core protein (26), in agreement with the present work. There are also data suggesting that some fragments derived from trypsin digestion of bovine aggrecan may possess both KS and CS on the same peptide (12,13), implying that some KS is present within the CS1 and/or CS2 domains. This may represent a true species difference, or it could relate to the identification of KS by glucosamine analysis, which cannot distinguish KS from O-linked oligosaccharides. Thus, these data may provide evidence that O-linked oligosaccharides are present within the CS-rich domains of aggrecan.