Identification of the N-linked oligosaccharide sites in chick corneal lumican and keratocan that receive keratan sulfate.

Corneal proteoglycans have chondroitin/dermatan and keratan sulfate (KS) chains and belong to the leucine-rich proteoglycan gene family. Corneal KS is N-linked to Asn of an NX(S/T) site through a complex oligosaccharide linkage region. Only some sites receive KS, whereas others remain in a high mannose form. To determine whether the attachment of KS was biased toward specific sites, we isolated trypsin-digested KS-containing fragments of chick corneal proteoglycans and sequenced the peptides. Results showed that all of the peptides sequenced aligned to the deduced amino acid sequence of either chick lumican or chick keratocan at the first, third, and fourth potential N-linked sites. Sites 1 and 4 in lumican and keratocan are in a homologous location. By analogy with the structure of ribonuclease inhibitor (a Leu-rich repeat containing protein), the KS chains would extend outward on the outer face of a horseshoe-like structure. The amino acid sequences surrounding the potential N-linked sites were also compared. Sites receiving KS tend to have a higher occurrence of aromatic residues, in particular Phe, located within 3 amino acids of NX(S/T). These conserved Phe residues may have a role in the conversion of high mannose N-linked oligosaccharides to polylactosamine and/or keratan sulfate.

The corneal stroma contains an extensive matrix consisting of collagen and proteoglycans that interact with each other to produce a transparent structure. The proteoglycan constituents of the corneal stroma were originally designated as chondroitin/dermatan sulfate or keratan sulfate (KS) 1 proteoglycans, but a number of these core proteins have since been cloned. They are now recognized as decorin (1), which contains a single chondroitin/dermatan sulfate side chain and lumican (2)(3)(4), osteoinductive factor (5, 6), or osteoglycin (7) and keratocan (5), which contain KS side chains. In adult chicken corneas, a portion of the population of decorin also receive KS to make a decorin isoform containing both chondroitin/dermatan sulfate and KS (8).
KS is a repeating disaccharide of glucosamine and galactose that bears sulfate esters. In cornea, KS is N-linked to an Asn residue of an NX(S/T) site through a linkage region that is a complex-type oligosaccharide (9). Previous studies on corneal KSPGs showed N-linked sites were substituted with either a high mannose form or processed to a keratan sulfate containing complex form (9,10). Since complex type oligosaccharides as such were not found, it was proposed that all N-linked sites processed to a complex type were also further processed to receive keratan sulfate. Biochemical analysis indicated that the N-linked sites in the corneal KSPGs are substituted with either high mannose oligosaccharides or KS chains (i.e. no sites are left unsubstituted) (9,11). Analysis of core protein sequence deduced from cDNA clones for lumican and keratocan indicates there are four to five potential N-linked oligosaccharide sites, depending upon the species (2)(3)(4)(5). Characterization of KS containing proteoglycans purified from embryonic chick corneas reveals that, on the average, not all the N-linked sites are processed to contain KS and that only half of the total sites that are substituted remain as N-linked oligosaccharides (11). Analysis of KS proteoglycans from bovine corneas indicates that three of five N-linked sites on keratocan and one of the four sites on lumican and osteoinductive factor have KS (12). It is not known, however, which N-linked sites in any of the corneal proteoglycans receive KS.
The deduced amino acid sequences from cDNA clones to corneal proteoglycans reveal that they are homologous in structure and that they belong to the Leu-rich proteoglycan (LRP) gene family (1)(2)(3)(4)(5)(6)(7)(8). Furthermore, although some N-linked sites are in a homologous location on the core protein, others are not. We have previously developed methods to purify chick KSPG and have isolated cDNA clones to chick lumican (2,8). In the present study, we cloned chick keratocan and then isolated and sequenced the KS peptides from chick corneal proteoglycans to determine which N-linked oligosaccharides on chick lumican and keratocan receive KS.

EXPERIMENTAL PROCEDURES
Isolation of cDNA Clone to Chick Keratocan-A previously prepared chick corneal cDNA library (2) was screened using a 32 P-radiolabeled cDNA clone to bovine keratocan (5) obtained by PCR. Briefly, the bovine keratocan clone was generated by reverse transcription-PCR using DNase-treated RNA isolated from cultured bovine corneal fibroblasts and the following primers: reverse transcription primer 5Ј-GACTTTA-ACTGCAAGAGTACGC-3Ј; forward primer 5Ј-TGGAGAACCTGACC-CTTCTTGAC-3Ј; reverse primer 5Ј-ATCCAGACGGAGGTAGCGAAG-ATG-3Ј. The resulting 491-bp product was subcloned into the GenHunter vector (GenHunter Corp.) and sequenced to confirm its identity as bovine keratocan. The high specific activity PCR probe was generated in a standard 20-cycle PCR reaction containing 0.1 ng/l plasmid DNA, forward and reverse keratocan primers, 20 M dATP, dGTP, and dTTP, and 0.6 M [␣-32 P]dCTP, 3000 Ci/mmol (NEN Life Science Products). Screening 140,000 plaque-forming units of the library under low stringency conditions yielded 250 positive clones; 74 phage plugs were extracted followed by PCR to determine which clones were full-length. Briefly, 5 l of extracted phage was lysed in 35 l of TE, 0.1% Tween 20 (GenHunter Corp.), boiled 5 min, and centrifuged 2 min; the supernatant was collected, and 2.0 l was used in a 20-l, 30 -40-cycle, PCR reaction with the T3 primer (Stratagene) and keratocan reverse primer. The five largest clones, determined by PCR, were used for the second library screen; positive clones were selected, and the pBluescript plasmid was rescued from the Uni-Zap XR vector as described previously (1). Restriction mapping of these five clones along with PCR reactions using the T3 primer and keratocan backward primer confirmed which clones contained the largest keratocan cDNA inserts. Two clones of similar size were sequenced (Seqwright), and both were confirmed to be keratocan by homology with bovine keratocan. One of the clones was full-length, and the other contained a longer alternative 3Ј end but was lacking 496 bp from the 5Ј end. The sequence reported in this publication is from the first, full-length clone.
Proteoglycan Purification-Corneas were removed from adult chicken eyes obtained from a local source and stored at Ϫ80°C. The proteoglycans were extracted from the corneas, and the keratan sulfate containing proteoglycans were isolated and purified from the extract using previous procedures (8,10,13). 5 g of corneas were thawed and extracted at 4°C for 16 h in 10 volumes (w/v) of 4 M guanidine HCl containing proteolytic inhibitors. The extract was dialyzed against 6 M urea containing 0.05 M Tris, pH 6.8, 0.15 M NaCl, 0.1% Chaps (Trisbuffered urea) and applied to a column (2.5 ϫ 5.0 cm) of DEAE-Sepharose (Amersham Pharmacia Biotech) equilibrated with the same solvent. The proteoglycans were eluted with a linear gradient (0.15-1.15 M) of NaCl, and their elution position was determined with dimethymethylene blue (DMB) (14). The tubes containing proteoglycans were pooled, dialyzed against deionized water, and lyophilized to dryness. The proteoglycans were dissolved in 3 ml of 4 M guanidine HCl containing 0.02 M Tris, pH 6.8, 0.1% Chaps and applied to a column (1.5 ϫ 100 cm) of Sepharose CL-4B (Amersham Pharmacia Biotech) equilibrated and eluted with the same solvent. The elution position of the proteoglycans determined as above and those tubes containing proteoglycans were pooled, dialyzed against deionized water, and lyophilized to dryness.
The proteoglycans (42 mg) isolated from CL-4B were reconstituted in 5.0 ml of chondroitinase ABC buffer containing proteolytic inhibitors and digested with 5 units of chondroitinase ABC (Seikagaku America, Inc.) at 37°C for 3 h. The digest was dialyzed against Tris-buffered urea overnight, applied to a column (1.0 ϫ 8.0 cm) of Source 15Q (Amersham Pharmacia Biotech) equilibrated with Tris-buffered urea, and eluted with a NaCl gradient by FPLC. The elution position of the proteoglycan fractions (now only those containing keratan sulfate), determined as above, were pooled, dialyzed against deionized water, and lyophilized to dryness. The keratan sulfate containing proteoglycans were dissolved in 2.0 ml of 4 M guanidine containing 0.02 M Tris, pH 6.0, 0.1% Chaps and applied to a column (1.6 ϫ 50 cm) of Superose 6 (Amersham Pharmacia Biotech) equilibrated and eluted with the same solvent. The tubes containing the keratan sulfate proteoglycans were pooled, dialyzed against deionized water, and lyophilized to yield 18 mg. The keratan sulfate proteoglycans were dissolved at 10 mg/ml 4 M guanidine dialyzed against deionized water and stored at Ϫ20°C as the KSPG stock. Digestion of 5 l of the KSPG stock with keratanase (Seikagaku American, Inc.) was performed as described previously (15) and analysis by SDS-polyacrylamide gel electrophoresis with Coomassie Blue staining resulted in a broad band at ϳ M r ϭ 52,000 (not shown); the core glycoprotein size previously shown (13) for corneal keratan sulfate proteoglycans treated with keratanase.
Identification of Keratan Sulfate Attachment Sites-6 mg (600 l) of the KSPG stock was dialyzed against 0.1 M Tris, pH 8.0, and then digested with 60 g of L-1-tosylamide-2-phenylethyl chloromethyl ketone-treated trypsin (Sigma) for 4 h at 37°C. The sample was applied to a column (0.5 ϫ 4 cm) of Source 15Q (Amersham Pharmacia Biotech) in 0.02 M Tris, pH 8.0, and eluted with a gradient (0.0 -1.0 M) of NaCl on FPLC. The tubes containing the KS peptides were identified by the DMB assay, and the amount was calculated using KS (Seikagaku America) as a standard. The KS containing fractions were pooled, dialyzed against deionized water, and lyophilized to dryness. The KS peptides were reconstituted in 400 l of 4 M guanidine HCl containing 0.02 M Tris, pH 6.8, Chaps 0.1% and then applied to a column (1.0 ϫ 30 cm) of Superose 12 (Amersham Pharmacia Biotech) equilibrated and eluted in the same solvent. The tubes containing the KS peptides, determined as above, were pooled and dialyzed against deionized water. A portion of the KS peptides were further digested with keratanase, as above, followed by reversed phase chromatography. KS containing peptides were purified by application to a Vydac C-18 reversed phase column (30 ϫ 2.1 mm) in 0.1% trifluoroacetic acid. Elution was with a linear gradient of acetonitrile (0 -70% over 45 min) at a flow rate of 0.2 ml/min, and the eluate was monitored at 220 nm. Glycopeptide contain-ing samples were compared with a buffer, and enzyme containing blank and peaks unique to the peptide containing sample were analyzed by Edman degradation on an Applied Biosystems/Perkin-Elmer 477A sequencer with on-line detection of phenylthiohydantoin-derivatives. Samples exhibited a prominent peak of Chaps which eluted later than the peptides. As is typical for glycopeptides, peaks were often rather broad and heterogeneous.

RESULTS
140,000 recombinants of a Uni-Zap XR cDNA chick corneal fibroblast library were screened using a high specific activity 32 P-bovine keratocan probe generated by PCR. Full-length clones were determined using phage lysis and PCR with the T3 primer which anneals to the vector and the bovine keratocan reverse primer. PCR-selected phage clones were purified and rescued in the pBluescript plasmid from the Uni-Zap vector. Two putative clones, which were similar in insert size, were sequenced. The first clone consisted of a 2455-bp cDNA sequence containing a 138-bp 5Ј-untranslated region, a 353 amino acid open reading frame, and a 1257-bp 3Ј-untranslated region including the polyadenylated tail (Fig. 1). The deduced amino acid sequence of the full-length clone contains a signal peptide, underlined, estimated by the "-3, -1" rule (16), of 21 residues in length with a cleavage site between the Thr and Arg residues (Fig. 1). The sequence contains 7 Cys residues, one in the signal peptide, four near the N terminus and two near the C terminus. In addition, there are five potential N-linked oligosaccharide attachment sites, determined by the Asn-Xaa-(Ser/Thr) consensus sequence, located between the N-and Cterminal Cys residues. Although the polyadenylated tail was sequenced (A 32 ), the highly conserved AATAAA signal was not present 11-30 nucleotides upstream from the site of poly(A) addition. However, a similar sequence, AAATAA, was found located within this region. The second cDNA clone contained a 2456-nucleotide cDNA sequence that was identical to the sequence of the first clone except it contained additional 3Јuntranslated sequence (not shown) and lacked the 5Ј 496 nucleotides which encoded for the N-terminal 119 amino acids. Interestingly, PRELP, another member of the Leu-rich proteoglycan (LRP) family, has been shown to have several different 3Ј-untranslated regions which accounts for the mRNA sizes of 1.7, 4.6, and 6.7 kilobase pairs (17).
A comparison of the deduced amino acid sequence of the first cDNA clone to sequences in GenBank TM revealed the highest homology to bovine keratocan, accession number U48360 (Fig.  2). There are only two regions where major variations exist, in the signal peptide and in the C-terminal domain at amino acids 306 -320. The remainder of the protein is highly conserved with a 72.3% amino acid identity between chick and bovine keratocan. In comparison between chick and bovine species, keratocan was more conserved than lumican but not as conserved as decorin. Alignment of keratocan with other members of the LRP family revealed that both chick and bovine keratocan have the highest homology with human PRELP (18), a 53.3 and 54.5% amino acid identity, respectively. Chick keratocan had a slightly higher amino acid identity with bovine fibromodulin (19) than with chick lumican (2) and had a much lower homology with chick decorin (1), 39.8, 37.7, and 26.8%, respectively. It appears that keratocan, PRELP, fibromodulin, and lumican are all within the same gene family. This family can be divided further into two subgroups where keratocan and PRELP share Ͼ53% homology and lumican and fibromodulin share Ͼ45% homology between them, respectively.
Chick keratocan contains 11 Leu-rich repeats, all of which are of the B-type (20), which is defined by a highly conserved Asn residue in the 10th position of the repeat. The Leu-rich repeat as well as the surrounding amino acids follow a pattern in which certain residues within each repeat are conserved at the same position. Therefore, the first 9 Leu-rich repeats of chick keratocan can be organized into equally spaced motifs and units as shown in Fig. 3. The motifs 1, 2, and 3 are joined tandemly to form a unit. Each of the motifs has unique spacing and a unique Leu-rich repeat consensus sequence as follows: motif 1 has a spacing of 24 amino acid residues, motif 2 is the longest at 25-26 residues, and motif 3 is the shortest with 20 -21 amino acids. This arrangement produces a triplet where the three motifs are aligned tandemly, and this arrangement is repeated three times. In addition to each motif having a unique consensus sequence within the Leu-rich repeat, there are also sequence identities that surround these repeats. For instance, Pro residues in position 16 of motif 1 and in positions 16 and 20 of motif 3 may give the motif a specific secondary structure. There are also five Phe residues, in motif 1 at position 20 and in motif 2 at position 22, that are not only highly conserved (see Fig. 2) but are located 5-6 amino acids N-terminal of a Leu-rich repeat. In addition to the 9 Leu-rich repeats shown in Fig. 3, there are two additional repeats that are present in chick keratocan. They are excluded from the units because their spacing is altered, and Leu-repeat 11 is located between the two C-terminal Cys residues making it structurally unique. The Asn residues in chick corneal proteoglycans that receive KS were also determined. Adult chicken corneas were extracted with 4 M guanidine in the presence of protease inhibitors, and the corneal KSPG proteoglycans were purified using both molecular sieve and ion exchange chromatography as well as digestion with chondroitinase ABC according to previous studies (8). This KSPG stock was then used to determine the KS attachment sites. The KSPGs were digested with trypsin, fractionated over a Source 15Q FPLC column, and eluted with NaCl, and the elution position of KS was determined using the DMB assay (Fig. 4). The DMB-positive fractions 15-30 were pooled and further purified on a Superose 12 FPLC column (Fig. 5). There were three KS containing peaks after Superose 12. The peak at fraction 10 had a very low DMB reactivity and may contain partially digested core protein due to its larger size. The peak in tubes 21/22 contained a high concentration of contaminating peptides and very small sized DMB-positive material; therefore, only the fractions in the largest main peak, fractions 12-17, were pooled and used for further analysis including amino acid sequencing.
The experimentally determined sequences from the KS peptides were then aligned with the deduced sequence from the cDNA clones (Fig. 6). The sequences containing potential Nlinked glycosylation sites, starting at the estimated trypsin digestion site are marked with arrows, and any amino acid sequence, obtained by Edman degradation, of the KS containing peptides is listed below. There were a total of 11 peptide fragments that were sequenced, six of which aligned to chick and keratocan, it is still possible that sites 2 and 5 receive KS and that we did not sequence these fragments due to either low abundance (KS attachment at these sites is a rare event) or that the regions surrounding these sites were trypsin-resistant (in which the site would be contained in a large fragment such as the peak in fraction 10 after Superose 12 (Fig. 5)).
Although both proteoglycans receive KS attachment on sites 1, 3, and 4, only sites 1 and 4 are located in a homologous region of the core protein (see Fig. 8). Several peptides contained one or two residues that were undetermined, indicated by the X, during cycle sequencing which may be due to carbohydrate linked to the core protein at that site. In addition, there was one discrepancy between the deduced amino acid sequence and the peptide sequencing. The two peptides sequenced which aligned to lumican, residues 245-259, contain a Ser at position 247, whereas the cDNA deduced amino acid sequence indicates that this position should contain a Thr residue. The data on the amino acid sequencing were clear, and it is possible that there was a single nucleotide change during library construction or that this difference is due to a polymorphism. All other peptide fragments matched lumican and keratocan exactly. None of the peptides aligned to other published sequences, and no peptide sequence was obtained that did not match either lumican or keratocan.
The sequence flanking the NX(S/T) sites containing KS in lumican and keratocan were compared with sequence flanking the NX(S/T) sites that did not contain detectable KS (Fig. 7A). One striking difference in the sequences was the presence of aromatic residues surrounding most of the NX(S/T) sites that contain KS. The sequences surrounding the KS attachment sites contained 7 aromatic residues compared with only 2 aro-matic residues surrounding the sites that were not found to contain KS. In particular, most of the sites that contain KS have a Phe residue present at a position within 3 amino acids N-terminal to the Asn residue. The only exception was site 3 in lumican which did not contain any aromatic residues in this region. DISCUSSION The results of this study show that chick keratocan has a message size of 2.45 kilobase pairs that encodes for a 353amino acid protein which has high homology with bovine keratocan (72.3%). As is typical of proteins in the LRP family, chick keratocan contains 11 Leu-rich repeats that can be arranged into three units comprised of three motifs. This arrangement reveals the unique consensus sequence for each units' Leu-rich repeat and the amino acids surrounding it in the motif.
This study also shows that both chick lumican and chick keratocan have three out of five potential N-linked oligosaccharide sites substituted with KS chains. N-Linked oligosaccharides at sites 1, 3, and 4 in both chick lumican and keratocan are modified to KS. There were, however, some KS containing peptides that were exceptionally large (Fig. 5) that we did not attempt to characterize, and they may contain additional sequences. Previous studies on KSPGs from bovine cornea indicate that keratocan has three KS chains, whereas lumican and osteoglycin contained only one KS chain (12). In the present study, all the sequences we detected aligned with either lumican or keratocan. It may be that chick corneal osteoglycin is present in too low abundance to be detected in the material that was characterized. In support of this, bovine KS core proteins were found to be present at a 3:6:2 ratio, for keratocan, lumican, and osteoglycin, respectively (4,12).
Comparison of N-linked sites that have KS chains to sites that were not found in the KS-containing peptide fractions show that KS-containing sites tend to have a nearby aromatic residue, usually a Phe. More specifically, sites 1, 3, and 4 of keratocan and sites 1 and 4 of lumican all have Phe located within 3 residues N-terminal to the Asn. Conversely, all of the sites that were not found to contain KS chains lack Phe in this region. In support of the possible role of aromatic residues potentially enhancing KS attachment, fibromodulin, which can have KS chains on the first 4 out of 5 N-linked oligosaccharide sites (21), has at least one aromatic residue located near (within 5 amino acids) the Asn in the NX(S/T) consensus site ( whereas chick lumican site 2 contains a Tyr residue two amino acids N-terminal to the Asn but was not found in the GAG attachment fractions. Thus, the presence and proximity of aromatic residues, in particular Phe, near the Asn may enhance or facilitate KS attachment but is not required or necessary for this processing to occur. Most of the Phe residues in close proximity to KS-substituted sites are also located 5-6 amino acids N-terminal to a Leu-rich repeat (see Fig. 3). This occurs with sites 1 and 4 in lumican, keratocan, and fibromodulin and may indicate that the potential role of these Phe residues is to support the secondary structure necessary for the ␤-sheet (i.e. at the top of the coil, see Fig. 8) as well as KS attachment. Regardless of the signals present in the core protein which may enhance or signal KS attachment, the developmental or tissue-specific expression of the transferases and sulfotransferases required for KS synthesis will regulate whether the core protein can be processed to the proteoglycan form.
The location of the KS attachment sites in lumican and keratocan were compared in a model based on the x-ray crystallography structure of porcine ribonuclease inhibitor, a protein comprised entirely of Leu-rich repeats (20,22). A major feature of this model is that the Leu-rich repeats are coiled in a spiral with the Leu-rich regions stacked in a parallel ␤-sheet array. Fig. 8 shows all the potential N-linked sites and those substituted with KS in a two-dimensional model for lumican and keratocan. The first 10 Leu-rich repeats are contained in the central coiled domain between the N-and C-terminal globular domains. The 11th Leu-rich repeat for both lumican and keratocan is located between the two Cys within the C-terminal globular domain. The putative ␤-sheet, comprised of residues LXXLXLXXN within the Leu-rich repeat, is shown as the top third of the coil in which XLX (i.e. amino acids YLY in repeat 1) are located at the top, exposed surface of the spiraled coil. Based on the structure of ribonuclease inhibitor, these coils would bend into a horseshoe-like structure (as implied in Fig.  8) with the Leu-rich repeats in closer proximity to each other thereby forming the inner, concave surface of the horseshoe. This hypothetical configuration also allows all three of the KS attachment sites in lumican and keratocan to extend outward on the outer, convex surface of the horseshoe structure. It is possible that the inner, Leu-rich, surface of the horseshoe-like structure is involved in protein-protein interaction, possibly collagen binding, whereas the outer surface, with the protruding GAG side chains, is involved in "space filling" between the collagen molecules.
Comparison of the GAG attachment sites within keratocan, lumican, and fibromodulin indicate that sites 1 and 4 in each of these proteoglycans are located in the same homologous region of the core protein, and all of these proteoglycans support KS attachment at these sites. These two sites are located three residues N-terminal to a Leu-rich repeat which, according to our structural model shown in Fig. 8, allows the GAG chain to protrude outwards from the core protein. Site 3 which also receives KS chains is located in a different position to the Leu-rich repeat than sites 1 and 4. Lumican and fibromodulin have a homologous site 3 which is located at position 1 of a Leu-rich repeat. Keratocan has a unique site 3 which is located at position 9 within the Leu-rich repeat. Interestingly, this repeat is "unconventional" due to a Thr residue being present at the first position of the consensus LXXLXLXXNXL repeat instead of a Leu or Ile residue. This may represent a structure unique to keratocan. Despite these differences with site 3 between lumican and keratocan, our model in Fig. 8 indicates that the predicted three-dimensional structure would place site 3 in a similar location, and the GAG chains may also extend in the same direction so that lumican and keratocan may have a very similar three-dimensional appearance.