A Novel 3-Hydroxyproline (3Hyp)-rich Motif Marks the Triple-helical C Terminus of Tendon Type I Collagen*

Because of its unique physical and chemical properties, rat tail tendon collagen has long been favored for crystallographic and biochemical studies of fibril structure. In studies of the distribution of 3-hydroxyproline in type I collagen of rat bone, skin, and tail tendon by mass spectrometry, the repeating sequences of Gly-Pro-Pro (GPP) triplets at the C terminus of α1(I) and α2(I) chains were shown to be heavily 3-hydroxylated in tendon but not in skin and bone. By isolating the tryptic peptides and subjecting them to Edman sequence analysis, the presence of repeating 3-hydroxyprolines in consecutive GPP triplets adjacent to 4-hydroxyproline was confirmed as a unique feature of the tendon collagen. A 1960s study by Piez et al. (Piez, K. A., Eigner, E. A., and Lewis, M. S. (1963) Biochemistry 2, 58–66) in which they compared the amino acid compositions of rat skin and tail tendon type I collagen chains indeed showed 3–4 residues of 3Hyp in tendon α1(I) and α2(I) chains but only one 3Hyp residue in skin α1(I) and none in α2(I). The present work therefore confirms this difference and localizes the additional 3Hyp to the GPP repeat at the C terminus of the triple-helix. We speculate on the significance in terms of a potential function in contributing to the unique assembly mechanism and molecular packing in tendon collagen fibrils and on mechanisms that could regulate 3-hydroxylation at this novel substrate site in a tissue-specific manner.

Prolyl 3-hydroxylation, a long recognized quantitatively minor post-translational modification of collagen (1), has received much attention in the last few years after gene mutations affecting its formation were found to cause recessive forms of osteogenesis imperfecta (2)(3)(4)(5). A single primary site of 3-hydroxyproline (3Hyp) 2 is present in normal collagen ␣1(I) and ␣1(II) chains at Pro-986 of the triple-helix (6, 7) but is not hydroxylated in the tissues of mice and humans with recessive osteogenesis imperfecta caused by mutations in CRTAP or LEPRE1 (the gene encoding P3H1) (2)(3)(4)(5). The LEPRE1 gene encodes P3H1, which is one of three prolyl 3-hydroxylases (P3H1, P3H2, and P3H3) in the mammalian genome. CRTAP encodes a protein that is homologous to the N-terminal half of P3H1, which it associates with P3H1 together with cyclophilin B to form the functional enzyme complex required for Pro-986 3-hydroxylation of unfolded collagen chains in the endoplasmic reticulum (8).
Using ion trap tandem mass spectrometry, we interrogated all candidate GPP triplets in collagen ␣1(I) and ␣2(I) chains from rat skin, bone, and tail tendons for the presence of additional hydroxyls (ϩ16-dalton mass). The results revealed that the tendon type I collagen chains were distinguished by having up to 4 3Hyp residues in series in the (GPP) n motif at the extreme C terminus of the triple-helical domain, whereas bone and skin type I collagen had none. The findings may be fundamentally important for understanding the mechanisms by which the unique features of cellular assembly, deposition, and structure of the parallel arrays of collagen fibrils secreted by tendon cells (11,12) are controlled.

EXPERIMENTAL PROCEDURES
Tissue Collagen Preparation-Skin, bone, and tail tendons were dissected from the carcasses of adult normal Sprague-Dawley laboratory rats obtained as a byproduct from approved and completed animal studies. Bovine Achilles tendon, dissected from adult steer (18 month), was obtained from a local abattoir. All tissues were stored at Ϫ20°C prior to analysis. Tendons were pulled from proximal tail ends using the usual two-hemostat repetitive method (13). Skin, bone, and bovine tendon were scraped clean and defatted in chloroform/methanol (3:1 v/v); bone was demineralized for several days in 0.5 M EDTA according to established methods, all at 4°C (14).
Collagen was solubilized from each tissue matrix by heat denaturation for 2-3 min at 100°C directly in SDS-PAGE sample buffer. Acid-extracted collagen (3% acetic acid v/v, 4°C) was also prepared and digested with trypsin for chromatographic resolution of individual peptides for N-terminal sequence analysis.
SDS-Polyacrylamide Electrophoresis and In-gel Trypsin Digestion-The method of Laemmli (15) was used with 6% gels for denaturant extracts of tissue collagen. Collagen ␣-chains * This work was supported, in whole or in part, by National Institutes of Health Grants AR37694 and AR37318 from the NIAMS (to D. R. E.). This work was also supported by funds from the Ernest M. Burgess Endowed Chair research program of the University of Washington. 1 To whom correspondence should be addressed: Box 356500, Seattle, WA 98195-6500. Fax: 206-685-4700; E-mail: deyre@u.washington.edu. 2 The abbreviations used are: 3Hyp, 3-hydroxyproline; GPP, Gly-Pro-Pro; 4Hyp, 4-hydroxyproline; CRTAP, cartilage-associated protein; P3H, prolyl 3-hydroxylase.
were cut from SDS-PAGE gels and subjected to in-gel trypsin digestion (7,16). Tandem Mass Spectrometry of Tryptic Peptides-Electrospray MS was performed on the tryptic peptides using an LCQ Deca XP ion trap mass spectrometer equipped with in-line liquid chromatography (LC) (Thermo-Finnigan) using a C8 capillary column (300 m ϫ 150 mm; Grace Vydac 208MS5.315) eluted at 4.5 l/min. The LC mobile phase consisted of buffer A (0.1% formic acid in MilliQ water) and buffer B (0.1% formic acid in 3:1 acetonitrile:n-propyl alcohol v/v). The LC sample stream was introduced into the mass spectrometer by electrospray ionization with a spray voltage of 3 kV. The machine is normally run in automatic triple play mode cycling through a full scan, zoom scan, and MS/MS every few milliseconds. The machine can also be made to target specific low abundance ions by narrowing the selecting mass range. Sequest search software (Thermo-Finnigan) was used for peptide identification using the NCBI Protein Database. Large collagenous peptides not found by Sequest were identified manually by calculating the possible MS/MS ions and matching these to the actual MS/MS spectrum (17).
Edman N-terminal Sequence Analysis-N-terminal sequence analysis of the ␣1(I) GPP repeat containing tryptic peptide was carried out by Edman chemistry on a Porton 2090E machine equipped with on-line HPLC analysis of phenylthiohydantoin derivatives as reported (7).

RESULTS
Tandem mass spectrometry revealed a repeating 16-dalton difference across a ladder of molecular ions given by a tryptic peptide from the C terminus of the ␣1(I) chain of tendon type I collagen but not from its equivalent peptide from skin or bone ( Fig. 1). The MS/MS fragmentation pattern established that it consisted of the C terminus of the triple-helix running into the C-telopeptide and ending at the cross-linking lysine. Peptides that originated from chains in which the lysine had been converted to lysine aldehyde by lysyl oxidase or remained as lysine were both identified in the LC/MS profiles, with similar 16-dalton ladder distinction from tendon. Results are shown for the unmodified lysine-(and for bone, lysine and hydroxylysine) containing peptides. The MS/MS fragmentation profiles established that the source of the 16-dalton ladder was the (GPP) 5 repeat, specifically the X-position prolines next to Y-position 4-hydroxyprolines. This strongly suggested that prolyl 3-hydroxylation was responsible. The relative quantities of the individual sequences making up the ion ladder indicated an average of 2.8 3Hyp residues per collagen ␣1(I) chain in its (GPP) 5 C-terminal sequence. The MS/MS data showed the highest occupancy on the N-terminal GPP triplet tailing off C-terminally with the maximum number detected in any one peptide and hence ␣1(I)-chain of 4 3Hyp residues in a repeating series. Tandem mass spectrometry of tryptic peptides from the ␣1(I) chain reveals a 3Hyp repeat in the C-terminal sequence from the triplehelical domain of tendon but not skin or bone type I collagen. a, the three collagen preparations were run on SDS-PAGE for in-gel trypsin digestion of the excised ␣1(I) chains. Also identified in a are the principal ␤ dimer components common to all three tissues and a slower ␤11 band prominent in extracts of skin collagen (see "Discussion"). b, the parent ion ladders given by the C-terminal tryptic peptide from the ␣-chain of each tissue. These full scan mass spectra are taken from the tryptic peptide LC/MS profile of each ␣-chain scrolling across an elution window that combines all post-translational variants of this peptide. c, the MS/MS fragmentation spectrum of the parent ion (1876.3 2ϩ ) of the tendon peptide with three additional hydroxyls on X position prolines. The sequence is shown with b and y ion breakages that establish the proline residues bearing the additional 16-dalton masses. P#, 3Hyp; P*, 4Hyp.
In contrast, no extra hydroxyls (and hence potential 3Hyp residues) were detected in the equivalent peptides from rat skin or bone (Fig. 1). The latter differed in their MS profile because from bone, the ␣1(I) C-telopeptide lysine is partially hydroxylated, whereas it is not from skin and tail tendon. The MS/MS fragmentation spectra clearly reveal this because the y-ion fragments from the C terminus of the peptide show only a lysine mass from skin and tendon but the presence of both hydroxylysine-containing and lysine-containing y-ions from bone. This explains why from bone ␣1(I) C-terminal tryptic peptide, the ϩ16-dalton parent ion (1860 2ϩ ) is prominent, but not from skin.
The equivalent peptide purified from a trypsin digest of acidsoluble rat tail tendon collagen by reverse-phase HPLC (not shown) was subjected to N-terminal Edman sequencing. The results confirmed that the additional ϩ16 daltons on the X-position prolines on the (GPP) 5 repeat were due to 3-hydroxyproline, based on its distinctive phenylthiohydantoin amino acid HPLC signature, as compared with 4Hyp or Pro as we previously reported for a secondary 3Hyp site at Pro-944 in the ␣1(II) chain of bovine type II collagen (7).
We similarly interrogated the tryptic peptide LC/MS profiles given by the ␣2(I) chains of tendon, skin, and bone for a peptide with a ϩ16-dalton ladder derived from the (GPP) 4 repeat that forms the C terminus of its triple-helical domain. Again the results show a high occupancy with an average of 2.8 3Hyp residues per (GPP) 4 in tendon ␣2(I) but none from skin or bone ␣2(I) (Fig. 2a). The mass spectral parent ion results clearly show this, but the ␣2(I) peptide MS/MS fragmentation profiles (Fig.  2) lack a strong C-terminal y-ion contribution because no lysine (or arginine) is present in the tissue collagen C-telopeptide sequence to produce a charged C terminus on trypsin cleavage.
Although rodent tail tendon type I collagen appears to be especially highly 3-hydroxylated at its C terminus, the phenomenon extends to tendon in general. For example, results from a bovine tendon ␣2(I) chain preparation are shown in Fig. 2b. The ␣1(I) chain from the same collagen sample showed a similar ϩ16-dalton 3Hyp ladder for the tryptic peptide from its C terminus (not shown, but the results presented illustrate the phenomenon). Human Achilles tendon type I collagen also showed an average of 1 3Hyp residue per C-terminal (GPP) n domain in ␣1(I) and ␣2(I) (not shown). Precise 3Hyp percentages at each of the GPP triplets of the (GPP) n patch cannot simply be derived from the MS data, but it was clear for both ␣1(I) and ␣2(I) from tendon of all species examined that the highest occupancy was on the N-terminal GPP with falling percentages C-terminally. Tendons in general, therefore, seem to express this post-translational modification as a marker of their phenotype, although other collagens, notably type II collagen, have a GPP repeat at their triple-helix C terminus and so are potential substrates.   Fig. 1, cut out, and subjected to in-gel trypsin digestion and LC/MS analysis. a, the parent ion ladder from the C-terminal tryptic peptide of rat tail tendon ␣2-chain is shown, and below it is the MS/MS fragmentation profile from the ion bearing three additional 16-dalton masses on Y position prolines. The position of predicted 3Hyp residues was determined from the b and y ion masses as indicated for this variant and also for the other variant parent ions in the ladder. The sequence determined matches that in the Ensembl rat genomic database (ENSRNOT00000016423). b, MS results from the ␣2(I) chain of mature bovine Achilles tendon show a similar ϩ16-dalton series of molecular ions for this peptide. The bovine Achilles ␣1(I) chain gave a similar ladder (data not shown). P#, 3Hyp; P*, 4Hyp.
Screening of type II collagen from various tissues indicates that some 3Hyp may be present, the level depending on the tissue (data not shown). Prolyl 3-hydroxylation of the (GPP) n repeat motif may therefore not be restricted completely to tendon type I collagen, although the latter, particularly in rodents, is uniquely hyperhydroxylated.

DISCUSSION
The findings establish a previously unrecognized site and sequence motif for 3-hydroxyproline formation that appears to be a phenotypic characteristic of tendon collagen. In rat tail tendon ␣1(I), about 3 residues of 3Hyp are present on average per C-terminal (GPP) n repeat together with 1 residue at Pro-986. However, in skin and bone ␣1(I), no 3Hyp is present in the C-terminal (GPP) n repeat, just the 1 residue at Pro-986. The ␣2(I) chain showed a similar tissue-dependent difference in the GPP repeat hydroxylation. This is consistent with results in a classic study by Piez et al. (19) in which they reported the amino acid compositions of rat skin and tail tendon ␣1(I) and ␣2(I) chains. Their data show 4 residues of 3Hyp in each ␣1(I) and ␣2(I) chain from tendon but only one in ␣1(I) and none in ␣2(I) from skin. The additional 3Hyp residue they observed from tendon appears, therefore, to be in the C-terminal triplehelical (GPP) n repeat.
Based on their sequence motifs, the three classes of 3Hyp substrate site are: class 1, the unique Pro-986 motif; class 2, which includes all the other sites (which tend to be preceded N-terminally by a phenylalanine residue (7)); and class 3, the new (GPP) n site at the C terminus of the triple-helix of tendon collagen. This apparent class distinction may be significant as three P3H enzymes are present in the mammalian genome (20). P3H1 is known to be required for Pro-986 3-hydroxylation in types I and II collagens (4) as part of a complex with CRTAP protein and cyclophilin B (8). Recombinant P3H2 was shown using synthetic peptide substrates to actively 3-hydroxylate (Gly-Pro-4Hyp) 5 and to prefer known 3Hyp motifs in type IV collagen (which resemble in sequence those of the above class 2 sites of fibrillar collagens) more than the ␣1(I) Pro-986 sequence (21). It seems likely, therefore, that each of the three enzymes P3H1, P3H2, and P3H3 will prove to have preferred specificities among the above defined three classes of substrate site in fibril-forming collagens.
A key question remains, however. What is the function of 3Hyp residues in collagen? It is clearly an ancient post-translational modification, found in the most primitive invertebrates including sponge (Porifera) fibrillar collagens (22) and prominent in basement membrane type IV collagens in which about 1 in 10 hydroxyprolines are 3Hyp (23). This implies biological benefits through a fundamental contribution to the properties of collagen structure itself. We recently suggested, based on the observed D-periodic spacing (7) and externally directed 3(S)OH (24), that inter-triple-helical hydrogen bonding may be involved, for example in helping fine-tune the polymeric assembly of fibrils and basement membrane networks. With this concept in mind, the possible consequences of a 3Hyp repeat at the junction of the triple-helix and the C-terminal telopeptide/propeptide are worth considering.
The triple-helix folds from the C terminus after the propeptides have formed a complex in the endoplasmic reticulum and the (GPP) n repeat are important for its nucleation (25). The chain order in the type I collagen heterotrimer has not been defined (␣1␣1␣2, ␣1␣2␣1, or ␣2␣1␣1) for any tissue (26). Possibly the tertiary structure of the triple-helix initiated when 3Hyp residues are present in the GPP repeat could direct a particular chain order in tendon that results in an assembled molecule, polymer, and subsequent cross-linking interactions FIGURE 3. Molecular location of the novel 3Hyp repeat at the C terminus of the triple-helix in relation to known 3Hyp sites. The axial positioning of the three classes of 3Hyp that can be defined by their sequence context is shown in the upper procollagen molecule. How these are positioned relative to each other in a fibril of D-staggered molecules is shown below. Sequences of the C-terminal GPP repeat domain for all clade A and clade B collagen ␣-chains from the human genome (Ensembl and NCBI genomic databases) are aligned for comparison and to put the rat sequences studied here in a broader context. that differ from those of skin and bone collagen type I fibrils. Tendon cells secrete collagen fibrils in a linear direction through what appear to be cellular invaginations and what ultrastructurally Kadler and colleagues (27,28) have referred to as fibripositors. The unique post-translational phenotype of tendon type I collagen may have evolved in conjunction with this cellular machinery to assemble fibrils better suited to tendon growth and function. Indeed, the cross-linking chemistry of tendon collagen differs from that of skin despite both using exclusively the lysine aldehyde pathway (29). The placement of intermolecular cross-links also appears to differ between tendon and skin type I collagen specifically for those formed from the ␣1(I) C-telopeptide lysine aldehyde. We have evidence for an intramolecular aldol cross-link at the C terminus of collagen type I in skin but not tendon. An effect of this is apparent in Fig.  1 with a slow mobility form of ␤11 dimer from skin but not tendon collagen. This component appears to be a ␤11 dimer in which the ␣1(I) chains are linked at both ends by allysine aldol. 3 We suspect that an altered triple-helix chain order or another effect of the neighboring 3Hyp C-terminal repeat is responsible for the altered cross-linking properties.
The 3Hyp repeat could also present a binding site for noncollagenous fibril-associated proteins, for example, members of the small leucine-rich repeat proteoglycan (SLRP) family, which are known to bind at or close to the junctions between hole and overlap domains on the surface of collagen fibrils. Such molecules are thought to influence fibril size and collagenase susceptibility (30 -32). Fig. 3 illustrates the site of the (GPP) n class 3 3Hyp motif in the collagen molecule and assembled fibril.
Lastly, the mechanisms that regulate (GPP) n 3Hyp formation in tendon cells and prevent it in dermal fibroblasts and osteoblasts will be important to define. Conceivably, pathological conditions may exist in which misregulation results in lack of C-terminal 3Hyp and defective tendon properties or presence of C-terminal 3Hyp in non-tendon tissues with negative consequences. It will be important therefore to determine which of the three P3H enzymes is primarily responsible for (GPP) n prolyl 3-hydroxylation and whether other proteins are required in a complex for the selective activity.