The primary structure of a basic leucine-rich repeat protein, PRELP, found in connective tissues.

We have determined the primary structure of a connective tissue matrix protein from the nucleotide sequence of a clone isolated from a human articular chondrocyte cDNA library. The major part of the amino acid sequence has also been determined by direct protein sequencing. The translated primary sequence corresponds to 382 amino acid residues, including a 20-residue signal peptide. The molecular mass of the mature protein is 41,646 Da. The main part of the protein consists of 10 leucine-rich repeats ranging in length from 20 to 26 residues, with asparagine at position 10 (B-type). The N-terminal part is unusual in that it is basic and rich in arginine and proline. There are four potential N-linked glycosylation sites present. In three of these sites, post-translational modifications are likely to be present since Asn was not found by direct protein sequencing. The amino- and carboxyl-terminal parts contain four and two cysteine residues, respectively, probably forming disulfide bonds by analogy with the other members of this family. The protein shows highest identity (36%) to fibromodulin and 33% to bovine lumican, two other leucine-rich repeat connective tissue proteins. Northern blot analysis showed the presence of an 3.8-kilobase mRNA in different types of bovine cartilage and cultured osteoblasts, whereas RNAs isolated from bovine kidney, skin, spleen, thymus, and trabecular bone and rat calvaria were negative. Human articular chondrocyte and rat chondrosarcoma cell RNAs contained an additional mRNA of 1.6 and 1.8 kilobases, respectively.

then, the number of identified primary sequences for LRR proteins in connective tissues has increased. Connective tissue proteins of the extracellular matrix with LRRs so far known are biglycan (3), decorin and fibromodulin (4), lumican (5), chondroadherin (6), proteoglycan-Lb (7), and osteoinductive factor (8). Except for chondroadherin, all of them are proteoglycans with one or a few glycosaminoglycan chains, and several of these molecules have been shown to bind components of the extracellular matrix, e.g. collagen (9), growth factors (10), and cells (11).
Intra-and extracellular LRR-containing proteins are found in almost any system studied, e.g. in mammalian and plant cells, yeast, and prokaryotes (12). The number of residues in a given LRR is between 20 and 29, and the consensus sequence derived from all known LRR proteins contains leucine or other aliphatic residues at positions 2, 5, 7, 12, 16, 21, and 24 and asparagine (B-type repeat), cysteine (A-type repeat), or threonine at position 10. Recently, the three dimensional structure of the LRR-containing porcine ribonuclease inhibitor was determined first for the protein and then for its complex with ribonuclease (13). It is likely that other LRR proteins may have a similar structure in their repeat region. The three-dimensional structure shows that each LRR is composed of a ␤-sheet and an ␣-helix. In the ribonuclease inhibitor, the ␤-sheets that form the consensus part of the LRR are arrayed on one face of the protein, while the less conserved helices are arrayed on the opposite face. LRR-containing proteins are known to participate in protein-protein interactions. The specificity and the diversity of the protein-protein interactions probably arise from the non-consensus residues.
The protein described here was originally purified as a prominent component of bovine articular cartilage with a molecular mass of 58 kDa (14). The amino acid composition of the 58-kDa protein was similar to that of fibromodulin, with a high content of leucine and aspartic acid/asparagine residues. The 58-kDa protein was also rich in proline. The proteins showed different ionic properties, with only fibromodulin binding to DEAE-cellulose. Studies of the distribution among tissues by radioimmunoassays showed that the protein was present in many types of cartilage and also in non-cartilage connective tissues such as aorta, sclera, cornea, kidney, liver, skin, and tendon. It was not detected in bone extracts.
To further characterize the 58-kDa protein, we have determined its primary structure. This reveals that the protein belongs to the LRR family of connective tissue proteins with four potential N-linked glycosylation sites and, in contrast to those previously described, a rather basic N-terminal extension rich in arginine and proline. We therefore propose to refer to the protein as PRELP (proline arginine-rich end leucine-rich repeat protein).

MATERIALS AND METHODS
Protein Purification-PRELP was isolated from bovine articular cartilage according to the method described by Heinegård et al. (14).
N-terminal Sequencing-N-terminal sequencing of PRELP indicated that the N terminus was blocked. Several attempts were made to remove the putative N-terminal pyroglutamic acid by digestion with pyroglutamyl aminopeptidase (EC 3.4.19.3) using methods described by the supplier (Boehringer Mannheim). In no case were clear sequence data obtained, although there were indications that some cleavage had occurred in the N-terminal region, probably due to contamination with minor proteases. The possibility exists, therefore, that the protein is blocked in some other way, such as acylation.
Peptide Isolation and Characterization-Peptides were isolated by digestion of purified protein with endoprotease Lys-C or endoprotease Glu-C (Boehringer Mannheim) at enzyme/substrate ratios of 1:25. Peptides were separated by reversed-phase high performance liquid chromatography (HPLC) as described previously (6) with a gradient of 0 -70% acetonitrile over 90 min. Peptides were sequenced on an Applied Biosystems 477A automated sequencer with on-line analysis of phenylthiohydantoin-derivatives on an Applied Biosystems 120A microbore HPLC apparatus.
Two-dimensional chromatography was performed on an endoprotease Lys-C digest by preceding the reversed-phase separation of peptides with a gel filtration step on Superdex 75 (Pharmacia Biotech Inc.) equilibrated in 4 M guanidine HCl, 25 mM phosphate, pH 6.5. Peptides were isolated from individual fractions by reversed-phase HPLC.
RNA Extraction-Chondrocytes from bovine tracheal cartilage were isolated by collagenase digestion (15). Total RNA was extracted from freshly isolated cells with guanidine isothiocyanate essentially according to Adams et al. (16). Primary bovine osteoblasts were prepared according to Robey and Termine (17). Rat chondrosarcoma cells were provided by Dr. James Kimura (Henry Ford Hospital, Detroit, MI). Total RNAs from these cells were extracted by the same method. Various tissues from an ϳ2-trimester-old bovine fetus and calvariae from 5-day-old rats were homogenized, and total RNA was extracted similarly. Total RNA from isolated human articular chondrocytes was provided by Dr. Michael Bayliss (Kennedy Institute of Rheumatology, London).
cDNA Synthesis and PCR-cDNA synthesis on total RNA isolated from bovine tracheal chondrocytes with both oligo(dT) and random hexamers as primers and the first PCR amplification with degenerate primers were performed essentially as described by Lee and Caskey (18). The resulting mixture was diluted 100 times and run in a second PCR with nested degenerate primers under the same conditions. The PCR was run on a 1% agarose gel and isolated with a QIAEX gel extraction kit (QIAGEN Inc.). The isolated cDNA was ligated into the pCR-Script SK(ϩ) vector (Stratagene). The DNA sequence of the fragment was determined, and the resulting primary sequence was compared with the peptide sequences to verify that the amplified fragment represented a cDNA for PRELP.
Screening of cDNA Libraries and DNA Sequencing-The PCR fragment corresponding to bovine PRELP was used as a probe for screening of cDNA libraries. Approximately 700,000 plaque-forming recombinants were screened from a bovine articular (19) and 250,000 from a bovine tracheal (20) chondrocyte gt11 cDNA library with no positive clones found. Approximately 250,000 plaque-forming recombinants were screened from a human articular chondrocyte ZAPII cDNA library (obtained through Dr. Michael Bayliss) (21) with several positive clones found. The pBluescript SK(ϩ) plasmid containing the cDNA insert was rescued from the ZAP vector by the use of in vivo excision as described in the ZAP-cDNA® synthesis kit (Stratagene). The cDNA sizes were estimated by electrophoresis on a 1% agarose gel after digestion with EcoRI or XhoI/XbaI. The longest clone (1.6 kbp) was purified with a Plasmid Midi kit (QIAGEN Inc.). The cDNA was digested to various lengths by the use of the Erase-a-Base® system (Promega). DNA sequencing was performed in full on both strands by standard double-stranded dideoxy termination sequencing using T3, T7, and synthetic internal specific primers. Northern Blots-10 g of total RNAs from various tissues and species were run on a 1% formaldehyde-agarose gel. Northern blots were applied to a nitrocellulose filter (NitroPure, Micron Separations). The PCR-generated bovine cDNA fragment was random primer-labeled (Random Primed DNA Labeling kit, Boehringer Mannheim) with [␣-32 P]dCTP (Redivue TM , Amersham Corp.). This probe was used for filters with bovine total RNA. A StyI fragment from the human cDNA of ϳ0.6 kbp was used as probe for filters with rat or human total RNA. The Northern blot of human chondrocytes was washed in 0.1 ϫ SSC, 0.1% SDS at 65°C. The Northern blot with various bovine fetus tissues was washed in 0.1 ϫ SSC, 0.1% SDS at 55°C. The other blots were washed in 0.2 ϫ SSC, 0.1% SDS at 55°C prior to detection of radiolabel with the FUJIX Bio-Imaging BAS2000 analyzer.

RESULTS
Amino Acid Sequencing-The protein isolated from bovine articular cartilage was cleaved with endoprotease Lys-C. The resulting peptides were purified by reversed-phase HPLC, and the amino acid sequences were determined (Table I). A data base search with the peptide sequences showed similarity to fibromodulin and biglycan. A number of the peptides could be aligned with similar regions of fibromodulin (data not shown). On the basis of the alignment of the peptides with fibromodulin, a set of nested degenerate PCR primers was designed (Table II).
Reverse Transcriptase-PCR-Total RNA from bovine tracheal chondrocytes was prepared and used for cDNA synthesis with both oligo(dT) and random hexamers as primers. The cDNA mixture was used for PCR amplification with degenerate primers, designed from the amino acid sequences. The resulting fragment (0.39 kbp) was isolated, ligated into pCR-Script, and sequenced. The translated primary structure of the fragment corresponded to peptide sequences and contained five LRRs with one potential N-linked glycosylation site (data not shown).
cDNA Cloning and Sequencing-Screening of cDNA libraries from bovine tracheal and articular chondrocytes with the bovine cDNA fragment gave no positive results. Therefore, a ZAP-cDNA library from human articular chondrocytes was screened with the same probe. Three clones with ϳ1.6-kbp inserts and two containing 0.9-kbp inserts were isolated. One of the longer clones was chosen for sequencing. The resulting DNA sequence and the translated primary structure are shown  Fig. 1 and represent 382 amino acid residues, including a 20-residue signal peptide. This gives a calculated molecular mass of 41,646 Da for the mature protein. A polyadenylation signal is not present, which indicates that it is not a full-length clone and that the mRNA is further extended with noncoding sequence at the 3Ј-end. Secretory Signal Sequence-A prediction of the eukaryotic secretory signal sequence by the PC/GENE program indicated a putative cleavage site between amino acid residues 20 and 21 (Fig. 1). This site is in good agreement with the predicting rules proposed by Kozak (24) and von Heijne (25).
Structure of LRRs-The translated sequence of PRELP contains 10 well conserved LRRs (Table III). These are ordered in a pattern with two longer repeats (24 -26 amino acids) followed by a shorter repeat (20 -21 amino acids), with some uncertainty regarding the length of the last repeat. The consensus sequences for the repeats are shown in Table III. All the LRRs are B-type, with an asparagine residue at position 10.
Glycosylation-The amino acid sequence contains four potential N-linked glycosylation sites (Fig. 1). After digestion with endoprotease Lys-C, three peptides (58-11, 58-12, and 58-13) ( Table I) were found that contained putative N-glycosylation sites. All of these gave blanks on sequence analysis that were consistent with N-glycosylation sites. A peptide obtained by digestion with endoprotease Glu-C (V8) that contained a putative N-glycosylation site (HLYLNNNSI) did not have a blank cycle on the second Asn, indicating that Asn 300 is not substituted.
There is some evidence for a substituent, possibly an Olinked oligosaccharide, on Thr 23 . A relatively high molecular mass peptide (10 -15 kDa) could be isolated after digestion with endoprotease Lys-C. This peptide included the Thr residue and gave a blank on Edman degradation at this position. However, the background in the sequence data was high, and this peptide probably represented the blocked N-terminal peptide. A more detailed description of the N-terminal region of this protein will require further analysis.
Removal of N-linked oligosaccharides by N-glycosidase digestion resulted in a band of ϳ48 kDa in mobility (Fig. 2). Digestion with keratanase resulted in a small, but clearly visible shift on an 8% SDS-polyacrylamide gel, thus indicating the possible existence of keratan sulfate or polylactosamine on the protein (Fig. 2). O-Glycosidase digestion with and without prior digestion with neuraminidase resulted in no shift (data Similarity to Other Proteins-A comparison of the primary structures of PRELP and other LRR-containing connective tissue proteins was made. A dendrogram constructed using the pair group maximum averages method (26,27) shows that PRELP has the highest similarity to fibromodulin and lumican (Fig. 3). The identity to human fibromodulin is 36% and to bovine lumican 33%. An alignment of human PRELP, human fibromodulin (SWISS-PROT accession no. Q06828), and bovine lumican (PIR accession no. A46743) by the method of Myers and Miller (28) is shown in Fig. 4. The proteins show a perfectly conserved cysteine pattern in the amino-and carboxyl-terminal domains. The proteins all contain 10 LRRs, with an Asn residue at position 10 in all the LRRs.
Alternative Forms of mRNA from Different Species-Northern blotting of total RNA from bovine tracheal chondrocytes was performed, showing a single mRNA species of ϳ3.8 kilobases (Fig. 5A). Since the isolated human cDNA did not correspond to this size, Northern blots from other species and tissues were tried. Hybridization to human articular chondrocytes in total RNA blots indicated two mRNA sizes, one of ϳ1.8 kilobases and a fainter band slightly larger than the mRNA from bovine tracheal chondrocytes (Fig. 5A). Hybridization to total RNA isolated from rat chondrosarcoma cells resulted also in two sizes of mRNA, one of ϳ1.6 kilobases and one slightly smaller than the mRNA from bovine tracheal chondrocytes (Fig. 5A), with the larger band produced in the highest amount.
Tissue-specific Transcription-Total RNAs isolated from different tissues from a second trimester bovine fetus were probed in Northern blots. RNAs from articular and epiphyseal cartilage, trabecular bone, skin, kidney, liver, spleen, and thymus were included. mRNA for PRELP was only detected in articular and epiphyseal cartilage (Fig. 5B). A signal for PRELP message was also detected in total RNA isolated from primary cultures of bovine osteoblasts (Fig. 5A). No signal was detected in total RNA from calvariae of 5-day-old rats (Fig. 5A). DISCUSSION The primary structure of mature human PRELP connective tissue protein represents 362 amino acid residues, which correspond to a calculated molecular mass of 41,646 Da. Laser desorption mass spectrometry indicates that the mass of the intact protein falls into a broad range of 52,000 Ϯ 2500 Da. This difference in size compared with the protein isolated from articular cartilage is at least partly due to N-linked oligosaccharide modifications since N-glycosidase digestion of the protein resulted in an apparent molecular mass of 48 kDa. The discrepancy in size of the protein after N-glycosidase digestion compared with the translated amino acid sequence is likely to be caused by the presence of other post-translational modifications of the protein. An indication of other modifications was obtained from a Lys-C digest that showed that the threonine at position 23 could not be detected by Edman degradation, indicating the presence of an O-glycosidically linked carbohydrate moiety. Direct analysis for O-glycosidically linked oligosaccharides by enzyme digestion was, however, negative.
The main part of PRELP consists of 10 LRRs. The ␤-sheet forming part of the repeats is highly conserved, whereas residues from position 15 in the repeats to the end of the repeats are less well conserved. The length of the repeats ranges from 20 to 26 residues. They show a periodicity beginning with two 24 -26-residue-long repeats followed by a shorter 20 -21-residue-long repeat. The end of the last repeat is, however, difficult to predict. This periodicity of the last is also present in fibro-

TABLE III
LRRs in PRELP and the consensus sequence for the repeats Conserved residues (present in 50% or more of the repeats) are shown in boldface. The leucine residues in the consensus sequence are, in some cases, replaced by other aliphatic residues such as valine, isoleucine, or methionine. In two cases, Arg and Gly appear in the consensus position. The positions of the conserved Leu and Asn, residues in the LRRs are shown by boxes. modulin, lumican, decorin, and biglycan. In analogy with the other related proteins (6,29), disulfide bridges are likely in the amino-terminal part between cysteine residues at positions 53 and 69 and in the carboxyl-terminal part between cysteine residues at positions 312 and 353. Several other LRR-containing proteins show a similar pattern, with PRELP being most homologous to fibromodulin and lumican. Proteoglycan-Lb (7) and osteoinductive factor (8) are shorter proteins with fewer LRRs, but with the same conserved cysteine residues in the amino-terminal part and a pair of cysteines in the carboxylterminal part. Chondroadherin diverges partly from this pattern of cysteines at its carboxyl-terminal end, with four cysteines forming two disulfide bridges (6).
The post-translational modifications of the LRR-containing connective tissue proteins differ. All of them, except for chondroadherin, appear to have oligosaccharide substitutions, but with different content and in different numbers. Biglycan and decorin have two and one chondroitin/dermatan sulfate chains, respectively, close to the amino terminus (30). Fibromodulin and lumican have at least one and at the most four keratan sulfate chains positioned at the N-linked oligosaccharide substitutions (31). The four N-glycosylation sequences are located at conserved positions in LRR-1, -3, -5, and -8 in both fibromodulin and lumican. In fibromodulin, the four sites are identified as hexosamine-rich, but whether they all contain keratan sulfate is uncertain. The sulfate substitutions show variations in different tissues (32). Arterial lumican appears to be unsulfated, whereas corneal lumican is highly sulfated. Fibromodulin contains sulfated tyrosine residues in the amino-terminal part (30), and lumican contains consensus sites for tyrosine sulfation. The sulfate substitutions of fibromodulin and lumican contribute acidic properties to the proteins. However, in addition, the primary sequences show low pI values. PRELP differs considerably from lumican and fibromodulin in that its basic amino-terminal region lacks consensus sites (33,34) for tyrosine sulfation. Whether PRELP is a proteoglycan with keratan sulfate chains is not clear. Four potential N-linked glycosylation sites are present according to the consensus sequence Asn-Xaa-Ser/Thr. In three of the sites, post-translational modifications are likely to be present as amino acid sequence analysis gave blank cycles at these positions. Two of the N-linked oligosaccharide sites are situated in the same position in the LRRs as in fibromodulin and lumican (LRR-1 and -8), whereas the last substituted glycosylation site is positioned in the last repeat. The behavior on DEAE-cellulose chromatography shows that PRELP has basic properties. However, due to basic residues in the amino-terminal part, which give the protein's primary structure a higher pI than without (9.7 versus 8.3), a shorter keratan sulfate chain might be present. Keratanase digestion showed a small shift on a SDSpolyacrylamide gel, which might indicate either a keratan sulfate chain or a non-sulfated polylactosamine. Carbohydrate analysis (14) does not exclude keratan sulfate/polylactosamine substitutions. Attempts to identify keratan sulfate chains on the intact protein and on peptide fragments of the protein by the use of several monoclonal antibodies to keratan sulfate were not conclusive.
Residues 4 -47 contain an arginine-and proline-rich segment followed by a proline-rich segment. Inserted between these is a short acidic segment. The proline-rich segment is reminiscent of three turns of an extended collagen-type helix and is therefore likely to form an extended structure. The arginine-and proline-rich segment is also likely to form an extended structure due to steric occlusion and/or charge repulsion of the side chains. The N terminus may, therefore, form an extended structure or loop back on itself in a hairpin, depending on whether the basic and acidic residues interact with each other. The basic region of the amino-terminal part in PRELP contains two T/GRRPRP sequences. This sequence corresponds to the proposed sequence for protein-glycosaminoglycan interactions: X-B-B-X-B-X, where B denotes a basic residue (35). The consensus sequence was derived from 12 known heparin-binding sequences in vitronectin, apolipoproteins E and B-100, and platelet factor 4. It has also been suggested that the noncollagenous NC4 domain of collagen type IX, which is basic and has one consensus sequence at the N-terminal end, may interact with polyanionic glycosaminoglycan in cartilage (36). The existence of an interaction between PRELP and glycosaminoglycans, however, has to be experimentally verified.
Most of the LRR-containing proteins have been shown to participate in protein-protein interactions probably mediated through the LRR structure. Chondroadherin is cell binding (11). Fibromodulin and decorin bind to collagens I and II and affect fibril formation (9,37,38), and biglycan binds to collagen VI. 2 Biglycan, decorin, and fibromodulin have all been shown to bind to transforming growth factor-␤ (10). Whether PRELP has any of these properties has yet to been determined.
In Northern blot analysis, PRELP seems to be synthesized in high amounts only in cartilage since in RNA isolated from tissues of a bovine fetus (liver, kidney, skin, spleen, and thymus), no mRNA corresponding to PRELP was detected. The detection of PRELP mRNA in cultured bovine osteoblasts may be the effect of up-regulated PRELP expression under culture conditions since the bovine trabecular bone and rat calvarial RNAs gave no positive signal. In the radioimmunoanalysis carried out by Heinegård et al. (14), no PRELP was detected in bone extracts. However, the radioimmunoanalysis indicated the presence of PRELP in bovine kidney, liver, and skin extracts. One possible explanation could be the age of the tissues. In the literature, a 55-kDa protein with similar properties to PRELP has been described (39). This protein appears to be identical to PRELP as partial peptide sequences of the 55-kDa protein are identical to PRELP at positions 138 -149 and 217-228. The 55-kDa protein seems to be deficient in newborns and accumulates in cartilage with age rather than being destroyed and resynthesized by the chondrocytes.