Location of 3-Hydroxyproline Residues in Collagen Types I, II, III, and V/XI Implies a Role in Fibril Supramolecular Assembly*

Collagen triple helices are stabilized by 4-hydroxyproline residues. No function is known for the much less common 3-hydroxyproline (3Hyp), although genetic defects inhibiting its formation cause recessive osteogenesis imperfecta. To help understand the pathogenesis, we used mass spectrometry to identify the sites and local sequence motifs of 3Hyp residues in fibril-forming collagens from normal human and bovine tissues. The results confirm a single, essentially fully occupied 3Hyp site (A1) at Pro986 in A-clade chains α1(I), α1(II), and α2(V). Two partially modified sites (A2 and A3) were found at Pro944 in α1(II) and α2(V) and Pro707 in α2(I) and α2(V), which differed from A1 in sequence motif. Significantly, the distance between sites 2 and 3, 237 residues, is close to the collagen D-period (234 residues). A search for additional D-periodic 3Hyp sites revealed a fourth site (A4) at Pro470 in α2(V), 237 residues N-terminal to site 3. In contrast, human and bovine type III collagen contained no 3Hyp at any site, despite a candidate proline residue and recognizable A1 sequence motif. A conserved histidine in mammalian α1(III) at A1 may have prevented 3-hydroxylation because this site in chicken type III was fully hydroxylated, and tyrosine replaced histidine. All three B-clade type V/XI collagen chains revealed the same three sites of 3Hyp but at different loci and sequence contexts from those in A-clade collagen chains. Two of these B-clade sites were spaced apart by 231 residues. From these and other observations we propose a fundamental role for 3Hyp residues in the ordered self-assembly of collagen supramolecular structures.

Collagens are the most abundant and ubiquitous proteins in multi-cellular animals. It is well established that 4-hydroxyproline (4Hyp) 2 residues stabilize the collagen triple helix through water-bridged intramolecular hydrogen bonding (1). However, the function of the much less abundant 3-hydroxyproline (3Hyp), although discovered 50 years ago, is unknown (2). Only 1-2 residues of 3Hyp occur per chain in collagen types I and II and 3-6 residues occur per chain in collagen types V and XI. The content is highest in type IV collagens of basement mem-branes in which 10% of the total hydroxyproline can be 3Hyp (3).
Specific prolyl 3-hydroxylases (P3Hs) are responsible for 3Hyp synthesis. Three different genes encoding P3H1, P3H2, and P3H3 are present in the human genome, which show tissue specificity in their expression (4,5). Substrate proline residues occur in a prerequisite sequence -Pro-4Hyp-Gly. The ␣1(I) chain has only one established 3Hyp site at Pro 986 in a motif conserved across vertebrate species (human GLPGPIGPPGPR) a close variant of which also occurs in type II collagen (human GIPGPIGPPGPR).
Renewed interest in 3Hyp was recently sparked by the discovery that a recessive form of osteogenesis imperfecta (OI) is caused by mutations in CRTAP. This gene encodes a protein (cartilage-associated protein) that is bound to P3H1 and cyclophilin B in the endoplasmic reticulum and is required for prolyl 3-hydroxylation at the Pro 986 site in collagen ␣1(I) and ␣1(II) chains (6,7). Further studies showed that mutations in P3H1 itself also caused recessive, severe OI (8,9). A key question is whether the brittle bone phenotype in OI is caused by the absence of 3Hyp in bone matrix collagen or an intracellular assembly and transport defect caused by the malfunctioning enzyme complex or both.
Because little is known about the distribution of 3Hyp in normal fibril-forming collagens beyond the single Pro 986 site in ␣1(I) and ␣1(II) chains, we used protein mass spectrometry to locate further sites in all A-clade and B-clade gene products used in vertebrate collagen fibril formation. Collagen type I fibrils are assembled on a filamentous template of collagen type V, and collagen type II fibrils are assembled on a template of collagen type XI (10). To provide a basis for understanding the overall post-translational effects of mutations in CRTAP, LEPRE1 (encodes P3H1), and other genes involved in collagen prolyl-3-hydroxylation, it is important to identify all of the sites of prolyl-3-hydroxylation in normal collagens from human and other vertebrate tissues.
Our results reveal several partially hydroxylated sites of 3Hyp in the various fibrillar collagen chains, in addition to the usually fully hydroxylated primary site at Pro 986 in ␣1(I), ␣1(II), and ␣2(V). All of the additional sites lack the distinctive sequence motif of Pro 986 but share common features with known 3Hypcontaining sequences in type IV collagen. One important finding is the D-periodic spacing between sites A2 and A3 and between sites A3 (␣2(I) and ␣2(V)) and A4 (␣2(V)) of A-clade chains (␣1(II), ␣2(V), and ␣2(I)) and between sites B2 and B3 of B-clade chains (␣1(V), ␣1(XI), and ␣2(XI)). In contrast, mammalian type III collagen lacks any 3Hyp despite having a recog-nizable primary site motif at Pro 986 . From the conserved sites, sequence motifs, and spacing of 3Hyp sites along the collagen chains, we speculate a role for 3Hyp in mediating inter-triplehelical interactions and in aiding the supramolecular assembly of collagen.

EXPERIMENTAL PROCEDURES
Source of Tissues-Adult human bone, cartilage, and meniscus (20 -40 years old) were purchased from the Northwest Tissue Center (Seattle, WA). Fetal human bone and cartilage were obtained from the Birth Defects Research Laboratory of the University of Washington with Internal Review Board (IRB) approval. Human intervertebral disc tissue was obtained from normally discarded surgical tissue with patient-informed consent and IRB approval. Chicken skin (12-14 weeks old) was dissected from chicken wings purchased at a local supermarket. Bovine vitreous was dissected from adult steer eyes (18 months old) obtained from a local abattoir.
Preparation of Collagens-Types I and V collagens were prepared from adult and fetal human bone. Powdered bone was defatted at 4°C in methanol/chloroform (1/3 v/v) and demineralized at 4°C in 0.5 M EDTA, 0.05 M Tris-HCl, pH 7.5. Type III collagen was prepared from defatted chicken skin and adult human meniscus. Type II collagen was prepared from human adult articular cartilage, fetal epiphyseal cartilage, and adult nucleus pulposus, and bovine meniscus and vitreous (18 months old). Type XI collagen was prepared from fetal human articular cartilage. Proteoglycans were removed from cartilaginous tissues with 4 M guanidine HCl, 0.05 M Tris-HCl, pH 7.5, with protease inhibitors (5 mM 1,10phenanthroline and 2 mM phenylmethylsulfonyl fluoride) for 24 h at 4°C, and the residue was washed thoroughly. Collagens from all of the tissues were solubilized with pepsin (1:20 w/w, pepsin/dry tissue) in 3% acetic acid for 24 h at 4°C (11). Serial precipitations of solubilized bone collagen with 0.7 and 1.8 M NaCl separated types I and V collagens, respectively. Serial precipitations of solubilized articular cartilage, nucleus pulposus, and vitreous collagens with 0.8 and 1.2 M NaCl separated type II and type XI collagens, respectively. Skin type III collagen was precipitated at 0.8 M NaCl. Meniscus collagens were serially precipitated with 0.7, 0.9, and 1.2 M NaCl to separate types I/III, type II, and types V/XI, respectively. Collagen type II is a minor component of the meniscus and is highly modified post-translationally, causing it to precipitate at 0.9 M NaCl, separated from the bulk type I collagen (12). Portions of demineralized bone and guanidine HCl-extracted cartilage residue were digested with CNBr in 70% formic acid at room temperature for 24 h (13), and the resulting CB peptides were freeze-dried. For microsequence analysis, ␣1(II) CB9,7 was prepared from bovine nucleus pulposus and digested with trypsin, and individual peptides were resolved by reverse phase HPLC.
SDS-PAGE-The method of Laemmli (14) was used with 6% gels for pepsinized collagen and 12.5% gels for CNBr peptides.
Microsequence Analysis-N-terminal sequence analysis was carried out by Edman chemistry on a Portion 2090E machine equipped with on-line HPLC analysis of cleaved phenylthiohydantoin amino acids.
Peptide Mass Spectrometry-Collagen ␣-chains or CB peptide bands were cut from SDS-PAGE gels (15) and subjected to in-gel trypsin digestion (16,17). Electrospray MS was performed on the tryptic peptides using an LCQ Deca XP ion trap mass spectrometer equipped with in-line liquid chromatography (LC) (ThermoFinnigan) using a C8 capillary column (300 m ϫ 150 mm; Grace Vydac 208MS5.315) eluted at 4.5 l/min. The LC mobile phase consisted of Buffer A (0.1% formic acid in MilliQ water) and Buffer B (0.1% formic acid in 3:1 acetonitrile:n-propanol, v/v). An electrospray ionization source introduced the LC sample stream into the mass spectrometer with a spray voltage of 3 kV. The machine is normally run in triple play mode with ion exclusion turned on, meaning it will do a full scan and then a zoom scan and MS/MS of the most abundant ion several times and then switch to the next most abundant ion. The machine can also be made to target specific low abundance ions by narrowing the selecting mass range. Sequest search software (ThermoFinnigan) was used for peptide identification using the NCBI protein data base. Many large collagenous peptides were not found by Sequest and had to be identified manually by calculating the possible MS/MS ions and matching these to the actual MS/MS. Hydroxyl differences were searched for manually by scrolling or averaging the full scan over several minutes so that all the post-translational variations of a given peptide appear together in the full scan.
In addition, tryptic peptides prepared from individual CB peptides and whole ␣-chains on in-gel trypsin digestion were surveyed for mass variants (ϩ16 Da) indicating 3Hyp at other GPP sites. From ␣1(II), a second partially hydroxylated site was identified at Pro 944 in CB peptide, CB9,7, from human and bovine articular cartilage ( Fig. 2; results for bovine shown). Table 1 lists fragment y and b ions used to interpret the MS/MS spectra from Fig. 2. This will serve also as a guide in interpreting the results presented in all the spectra presented in Figs. 1 and 2 (see also Figs. 4 -7). We will refer to this site and subsequent sites as A2, A3, etc., with A1 as the primary site. No other sites were found in ␣1(II) by close inspection of all identifiable peptides containing candidate -GPP-sequences in the various CB peptides or whole ␣-chains. The partial hydroxylation of Pro 944 from bovine articular ␣1(II) prompted a survey of other tissue sources of type II collagen. Consistent species-and tissue-dependent variations at this site were revealed. These analytical results are summarized in Fig. 2. The degree of Pro 944 3-hydroxylation estimated from the mass ratios (and site of the ϩ16 addition established by MS/MS fragmentation profile) ranged consistently from more than 80% in bovine vitreous type II collagen to less than 20% in bovine articular type II   Table  1 for a guide to how fragment ions establish the sequence and position of the 3Hyp residue. P # , 3Hyp; P*, 4Hyp.
We know that 4Hyp occurs only in the Y position of the (GXY) n repeat of collagens, so hydroxylated proline at X is strong evidence in itself for 3Hyp. To rule out 4Hyp (because mass spectrometric results alone cannot distinguish 3Hyp from 4Hyp), Edman microsequencing was applied to the isolated tryptic peptide containing hydroxylated site A2 from bovine ␣1(II). The results are shown in Fig. 3 for the fully 3-hydroxylated peptide isolated from calf nucleus pulposus ␣1(II). At cycle 11, the phenylthiohydantoin-derivative reverse phase HPLC profile is similar to that reported as characteristic of 3Hyp phenylthiohydantoin degradation products (19) and quite distinct from that given by 4Hyp (see cycle 12).
Manual evaluation of the mass spectra of all tryptic peptides from the ␣2(I) chain of bone type I collagen (Fig. 4) revealed a third site of 3Hyp at Pro 707 (site A3). (No candidate -GPP-or sequence motif was recognizable where site A1 or A2 would be in ␣2(I) of human or other vertebrate species we examined (Ensemble entry: ENSG00000164692).) Residue Pro 707 was 80% hydroxylated in human ␣2(I) (similar in bovine). Screening human ␣2(V) similarly, the Pro 707 locus was also hydroxylated to a similar extent. Thus in ␣2(V) all three sites, A1, A2, and A3, was heavily hydroxylated.
The near D-periodic spacing between Pro 707 and Pro 944 (237 residues versus D ϭ 234) (20) prompted us to search by tandem mass spectrometry for any further 3Hyp site spaced by one or two D-periods more N-terminal. Fig. 5 shows the results of analysis of a tryptic peptide containing a candidate proline at Pro 470 (site A4) from ␣2(V) of bone. The residue was indeed partially hydroxylated as confirmed by MS/MS fragment analysis of the 3Hyp and Pro versions of the peptide. Additional analyses showed evidence of variable levels of 3-hydroxylation of a proline in the equivalent tryptic peptide from the ␣1(I) chain, but only from cell culture so the biological significance for ␣1(I) at present is unclear (results not shown).
Lack of 3Hyp in Mammalian Type III Collagen-The protein sequence of human and other mammalian type III collagens (Ensemble entry: ENSG0000168542) shows a recognizable motif and GPP at the primary site Pro 986 . Mass spectrometry, however, showed peptides of the mass of the proline form but none for the 3Hyp form from bovine and human collagen III prepared from skin, aorta, and other tissues ( Fig. 6 shows results from human meniscus ␣1(III)). In comparing the genomic data base (Ensemble) for all available COL3A1 sequences, all had GHx in place of GLP or GIP the triplet before GPIGPP, predicting a lack of substrate recognition by prolyl 3-hydroxylase (see Fig. 6 for sample sequences). On inspecting a broader range of vertebrate COL3A1 sequences, chicken stood out with GYP not GHP (Fig. 6). To see whether this enabled 3Hyp formation in the neighboring GPP in vivo, collagen III was purified from chicken skin and analyzed by mass spectrometry. As shown in Fig. 6, the candidate tryptic peptide from chicken ␣1(III) was 100% hydroxylated at the homologous locus Pro 989 . It appears, therefore, that a hydrophic residue is required at residue 980 (Ile, Leu, or Tyr) or at least not a histidine, for the P3H complex to recognize the proline as a substrate.
3Hyp in Gene B-clade Collagen Chains ␣1(V), ␣1(XI), and ␣2(XI)-Collagen type V and collagen type XI prepared from human bone and articular cartilage, respectively, were similarly analyzed. Manual inspection of the tryptic peptide profiles produced by LC-MS unequivocally revealed three sites of 3Hyp at  Pro 434 , Pro 665 , and Pro 692 (Fig. 7) that we refer to here as sites B3, B2, and B1 (consistent with the right to left order used along clade A chains). Sites B1 and B2 are in the same tryptic peptide 27 residues apart (Fig. 7). The results are shown only for ␣1(V).
The MS/MS fragmentation patterns of the 3ϩ parent ions established the specific locations of the added hydroxyl groups (ϩ16). The ␣1(XI) and ␣2(XI) equivalent peptides from a cartilage type XI collagen preparation showed a similar degree of hydroxylation at these two sites. In Fig. 6B, site B3 (Pro 434 ) mass spectral results are shown for ␣1(XI) from cartilage, but again ␣1(V) and ␣2(XI) gave very similar levels of 3-hydroxylation at this site. But the early literature reporting amino acid compositions of isolated chains and derived cyanogen bromide peptides is consistent with one residue/chain at the single site in ␣1(I) and ␣2(I) (21-23), one or two residues in ␣1(II) (24), no 3Hyp in ␣1(III) (25), three or four residues in ␣1(V) (26,27), and two or three residues in ␣2(V) (26). The present results show some evidence for clustering, for example 3Hyp sites B1 and B2 spaced 27 residues apart. Also the underlined proline in the GP # P*GPP* sequence (where P # indicates 3Hyp and P* indicates 4Hyp) at site B3 (Fig. 8) showed significant hydroxylation (Յ50%) on analysis of the ␣1(V) chain prepared from bovine meniscus but not from bone (data not shown). Notably from meniscus, the ␣1(II) chain consistently was more hydroxylated than ␣1(II) from articular cartilage at site A2 (Fig. 8).

DISCUSSION
Our findings establish several sites of prolyl 3-hydroxylation not previously identified in fibril-forming collagens. Most of the data on 3Hyp in collagen in the literature were gathered from amino acid analyses as the different chain types were discovered and their cyanogen bromide-derived peptides were characterized (21)(22)(23)(24)(25)(26)(27). The present results are consistent with these original quantitative measurements, which showed, for example, one residue of 3Hyp per ␣1(I) and ␣2(I) chain of type I collagen (22,23). The primary site (site A1) in the ␣1(I) chain at Pro 986 was originally estab-  lished from Edman sequencing of tryptic peptides from calf skin ␣1(I) CB6 (18). The single 3Hyp residue in ␣2(I) was found in the C-terminal third (␣2(I)CB5) of the chain (23), but its exact sequence context to our knowledge has not been established. We show here that its location at ␣2(I) Pro 707 (site A3) is near the N terminus of CB5, and its sequence motif is unlike that of the A1 site in ␣1(I). The ␣2(I) sequence has no recognizable A1 proline site. The ␣1(II) chain as shown here and previously has an A1 site that is almost fully 3-hydroxylated in cartilage tissue (6). The lack of any 3-hydroxylation of the A1 site motif at Pro 992 in ␣1(III) (Fig. 6), despite a candidate proline, is consistent with an earlier reported absence of 3Hyp from human and bovine type III collagen based on amino acid composition and Edman sequencing analyses (25).
Lack of 3Hyp in Mammalian Type III Collagen-The fully hydroxylated Pro 986 primary site in chicken ␣1(III) but not in mammals (Fig. 6) most probably reflects the lack of a recognizable substrate sequence. This appears to be an evolutionary loss in mammals. Inspecting the COL3A1 sequences of the zebra finch, the only other bird in the genomic data base (Ensemble entry: ENSTGVG00000010995); the anole lizard, a reptile (Ensemble entry: ENSACAG00000015062); and Xenopus tropicalis, an amphibian (Ensemble entry: ENSXETG00000010783), all show the same GTSGYPGPIGPPGPR at site 1, which predictably from chicken versus mammal sequences in Fig. 6 means that their Pro 986 site in tissue type III collagen will be 3-hydroxylated.
A key question, therefore, is whether the lack of 3Hyp in type III collagen of mammals has any consequences in terms of the functional behavior of type III collagen in mammalian extracellular matrices. Collagen III does not form thick fibrils in its own right but occurs copolymerized on the surface of type I collagen fibrils in skin and other tissues (28) and on type II collagen fibrils in mature articular cartilage (29,30). The main roles for type III collagen appear to be in wound healing, matrix repair, and tissue development and as a structural component of mechanically pliable "soft" tissues, such as arterial walls. It always coexists, it seems, as a component of fibrils formed from more abundant type I and/or type II collagens, at least in mammals. Whether collagen III can function in a more independent fibrillar role in species in which its A1 site can be 3-hydroxylated is an interesting question. For example, perhaps it could polymerize independently on a template of collagen V/XI as do types I and II fibrils (10).
Comparison of A-clade and B-clade 3Hyp Sites-The A1 sequence motif is evident in ␣1(I), ␣1(II), ␣1(III), and ␣2(V), all A-clade collagen gene products. Their common motif is GXXGPIGP # P*GPR. The other fibrillar collagen sites (Fig. 8, sites A2-A4 and B1-B3) lack this sequence but share some common features with each other and with known prolyl 3-hydroxylation sites in type IV collagen (19). Their most recognizable feature, beyond the -PP*G-requirement, is a phenylalanine residue nine residues or less N-terminal to the substrate proline. This can be seen at sites A2, A3, B2, and B3 in Fig. 8. Site B1 lacks such a phenylalanine but follows closely after B2 in the same tryptic peptide. Such placement of 3Hyp residues following phenylalanine is evident at two sites previously reported for the ␣1(IV) chain in the homologous sequence -GFXGP # P*GP- (19). Whether phenylalanine is required for enzyme recognition or is simply a coincident feature of the recognized substrate sequence remains to be seen. Also relevant is the observed importance of phenylalanine in model triple-helical collagen peptides in In each, the upper spectrum is the full scan mass spectrum over an LC elution window that would combine all post-translational forms of the peptide shown, and the lower spectrum is an MS/MS analysis of the single peptide so found. The 755 ion is from an unrelated ␣1(III) tryptic peptide. From human and bovine (not shown) ␣1(III), Pro 992 was not hydroxylated but from chicken the homologous Pro 989 was 100% hydroxylated. P # , 3Hyp; P*, 4Hyp.
promoting higher order structures through interactions with Pro/ Hyp in neighboring molecules (31).
Recent studies imply that the enzyme variant P3H2 is responsible for prolyl 3-hydroxylation of type IV collagen (5). It is possible therefore that the non-A1 sites in A-clade and B-clade collagen chains are not hydroxylated by P3H1. However, P3H1 does seem to be the main isoform expressed by cells that make fibrillar collagens, whereas P3H2 is most prominently expressed in basement membrane-rich tissues (5). Alternatively the non-A1 3Hyp sites may not require the same enzyme complex as site A1. The latter site is normally hydroxylated by a trimeric protein complex of P3H1, CRTAP protein, and cyclophilin B (32). Without CRTAP, P3H1 fails to hydroxylate site A1 in collagen ␣1(I) and ␣1(II) from studies on the crtap null mouse (6) and human CRTAPnull OI patients (9). The situation is less clear from analyses of P3H1 (LEPRE1)-null human cells even for site A1 where some residual hydroxylation was observed in cell culture (8,9). Perhaps if expressed, P3H2 can act to some extent as a 3-hydroxylase for both A1 and non-A1 sites in A-clade and B-clade collagen chains as well as for type IV collagen, because it appears not to form a complex with CRTAP and cyclophilin B (5). Clearly, analyses of the differential effects of CRTAP and LEPRE1 mutations on the various prolyl 3-hydroxylaton sites should be helpful in understanding the significance of 3Hyp formation for normal collagen biology and its defective formation in the pathogenesis of recessive OI.
Origin of the A1 Site as a Substrate-It is tempting to speculate that the A1 site in fibrillar collagen chains appeared quite late in eukaryote evolution just prior to the emergence of vertebrates. Although 3Hyp is present in invertebrate collagens as far back as porifera (sponges), the most primitive extant multicellular animals (33), 3-hydroxylation of a recognizable A1 site sequence motif makes its appearance in primitive vertebrates. 3 A single P3H gene is present in the ascidian Ciona intestinalis genome (a primitive chordate) and ancestrally at least as far back as Cnidaria (34). Because ancestors of basement membrane type IV collagen and fibril-forming collagens are recognizable in sponges (Porifera) (33), we speculate that the A1 sequence is in evolutionary terms a relatively new substrate for P3H that became recognizable perhaps when the P3H1/ CRTAP/cyclophilin B complex (or its ancestral form) first appeared. Presumably the event that created hydroxylation activity at this site occurred before the series of whole or partial genomic duplications that led to the divergence of A-clade collagen genes (␣1(I), ␣2(I), ␣1(II), ␣1(III), and ␣2(V)) in vertebrates (35) and perhaps also before or soon after the ancestral leprecan (P3H) gene was duplicated twice and eventually diverged into three copies (4,34,36,37). Because the sequence motif at site A1 differs from that at sites A2-A4 and B1-B3 (Fig. 8) and from the 3Hyp motifs in type IV collagen, a gain of function in P3H activity, perhaps through P3H1 associating with CRTAP, seems more likely than simply a collagen sequence change alone. Such an explanation would also fit the differences evident among vertebrate A-clade collagen chains in their relative prolyl 3-hydroxylation levels at sites A2, A3, and A4 (Fig. 8), which by the logic of this concept are more ancient substrates than the A1 site. These findings are perhaps best explained by site-specific changes in A2, A3, and A4 substrate activities as their sequences in the five A-clade genes diverged.
The ␣2(V) chain shows the most complete pattern with 3-hydroxylation across all four sites, A1, A2, A3, and A4. It should be noted that ␣2(V) is an A-clade gene product, but it functions exclusively in heterotrimers in combination with two B-clade chains, for example, two ␣1(V) chains, one ␣1(V) and one ␣1(XI), or two ␣1(XI) chains dependent on the tissue (10). Because collagen V/XI acts as a template for collagen types I and II fibril polymerization and growth, it is tempting to suspect a role for the D-periodic spacing of 3Hyp in the A-clade chain of the V/XI oligomer in recruiting A-clade type I or II molecules to form a hybrid fibril. Genetic Defects Affecting Prolyl 3-Hydroxylation-The importance of the Pro 986 (site A1) 3Hyp site for normal bone and cartilage development was revealed in studies on the CRTAPnull mouse (6). Tandem mass spectral analysis of the tryptic peptide containing this known prolyl 3-hydroxylation site showed a complete absence of 3Hyp. This was not a complete surprise because the CRTAP protein has strong sequence homology to the N-terminal half of P3H1 (but no active site and so no enzyme activity) and was known to be complexed with P3H1 and cyclophilin B protein in the endoplasmic reticulum (4). Further work showing that mutations in CRTAP and P3H1 caused recessive forms of human OI confirmed the association of disease expression with absent or diminished 3Hyp content at site A1 in collagen type I (6 -9). Still not resolved, however, is whether a lack of 3Hyp in the extracellular tissue collagen of the mice or recessive OI cases is in itself responsible for defective  tissue. For example, does this 3Hyp domain present a binding site for a fibril-associated protein that might be necessary for collagen to mineralize properly? Or is the pathogenesis due to a collagen chaperone defect in the endoplasmic reticulum that causes a secondary cellular dystrophy and consequent defi-ciency of adequately assembled matrix collagen? Mutating the A1 site at Pro 986 in ␣1(I) from proline to another amino acid in a transgenic mouse could test this.
Speculative Function for 3-Hydroxyproline in Collagen-In considering all that is known about 3Hyp in collagen biology, FIGURE 9. Speculated concept of fibril molecular packing in which the subunits are molecular dimers in register and staggered axially by D-periods. A shows the placement of complex intermolecular cross-links in skeletal tissue collagens. B illustrates a molecular model of fibril packing. Such an arrangement could result from the influence of inter-triple-helical hydrogen bonding between 3Hyp-containing loci. The location of site A1 at Pro 986 and potential for inter-helical bonding is illustrated. The remaining 3Hyp sites could similarly facilitate D-staggered helical association during the macromolecular assembly process.
OI pathogenesis, and the function of 4Hyp in stabilizing the triple helix, we suspect a fundamental role for 3Hyp residues in supramolecular assembly by forming hydrogen bonds between adjacent collagen triple helices. The D-spacing between 3Hyp residues (Fig. 8, sites A2 to A3, A3 to A4, and B2 to B3) suggests such interactions between 3Hyp-containing domains could be involved in fine-tuning the D-periodic relationship and forming dimers in register through inter-triplehelical hydrogen bonds. There is good evidence that aggregates of aligned procollagen molecules exit the Golgi (38), and aggregates appear to be a better substrate for BMP-1, the procollagen C-propeptidase, than individual procollagen molecules (39). Initial studies with synthetic peptides suggested a possible destabilization of the triple helix by 3Hyp (40), but further work concluded a marginal added stability (41). The crystal structure of a synthetic peptide containing 3Hyp and a Gly-Xaa-Xaa repeat showed that the 3-OH on proline pointed out from the triple helix and so could mediate hydrogen bonding to other protein molecules (42). One logical binding partner would be another triple helix, perhaps through a water molecule, analogous to how 4Hyp stabilizes the triple helix itself through interchain hydrogen bonding. (It is notable that crystals of collagenlike peptides can form inter-triple-helical hydrogen bonds between 4Hyp hydroxyls (43)(44)(45)(46).) If so, mutual interactions might be strongest between adjacent 3Hyp-containing domains of neighboring molecules staggered by D-periods or in register with each other.
Rather than driving the D-stagger itself, which at 234 residues (20) is three residues less than the observed A-clade 3Hyp interval (237 residues; Fig. 8), mutual 3Hyp hydrogen bonding between the hydroxyls and backbone carbonyls, directly or through water, could strengthen the relationship driven by electrostatic and hydrophobic forces. This hypothesis is particularly attractive because it could contribute forces that help fine-tune the assembly of molecules in a D-staggered array to form fibrils with an optimal placement of intermolecular crosslinks. If registered dimers of procollagen molecules rather than monomers were the subunits for fibrillogenesis in the late Golgi and secretory vesicles, it would explain the efficient formation of mature intermolecular cross-links between the two nearest neighbor pairs of molecules in register and another staggered by 4D-periods (47) (Fig. 9). Such mature cross-linking is a hallmark of vertebrate skeletal tissues and particularly of bone and cartilage collagens (47).
These concepts are illustrated in Fig. 9. The packing arrangement of tetragonally packed dimers (Fig. 9B) (with potential supercoiling of dimeric subunits) was a model originally considered by Woodhead-Galloway (48) to be a better fit for the x-ray diffraction data and the measured protein density of collagen fibrils than are the more densely packed quasihexagonal arrays of monomers or pentafibrils that continue to be reference standards (49). A square packing arrangement of dimers is also attractive in considering how bone collagen fibrils can have the space to accommodate ordered internal plates of mineral crystallites, aligned between sheets of cross-linked collagen molecules, to form an intimate composite (50). Moreover, disruption of ordered molecular packing that specifically accommodates mineral crystallite deposition could be especially det-rimental to bone properties as evidenced in osteogenesis imperfecta (51).
In summary, we conclude that the advent of the A1 3Hyp site in an A-clade collagen founder gene was superimposed on a background of more ancient 3Hyp sites. To speculate, this new feature impacted the mechanism of collagen assembly at the threshold of vertebrate evolution with subsequent influence on tissue-related diversifications in collagen fibril subunit composition and cross-linking properties (10).