The Pro- a 3(V) Collagen Chain COMPLETE PRIMARY STRUCTURE, EXPRESSION DOMAINS IN ADULT AND DEVELOPING TISSUES, AND COMPARISON TO THE STRUCTURES AND EXPRESSION DOMAINS OF THE OTHER TYPES V AND XI PROCOLLAGEN CHAINS*

The low abundance fibrillar collagen type V is widely distributed in tissues as an a 1(V) 2 a 2(V) heterotrimer that helps regulate the diameters of fibrils of the abundant collagen type I. Mutations in the a 1(V) and a 2(V) chain genes have been identified in some cases of classical Ehlers-Danlos syndrome (EDS), in which aberrant collagen fibrils are associated with connective tissue fragility, particularly in skin and joints. Type V collagen also exists as an a 1(V) a 2(V) a 3(V) heterotrimer that has remained poorly characterized chiefly due to inability to obtain the complete primary structure or nucleic acid probes for the a 3(V) chain or its biosynthetic precursor, pro- a 3(V). Here we provide human and mouse full-length pro- a 3(V) sequences. Pro- a 3(V) is shown to be closely related to the a 1(V) precursor, pro- a 1(V), but with marked differences in N-propeptide sequences, and collagenous domain features that provide insights into the low melting temperature of a 1(V) a 2(V) a 3(V) heterotrimers, lack of heparin binding by a 3(V) chains and the possibility that a 1(V) a 2(V) a 3(V) heterotrimers are incorporated into heterotypic fibrils. In situ hybridization of mouse embryos detects a 3(V) expression primarily in the epimysial sheaths of developing muscles and within nascent ligaments by 30–35 cycles of 94 °C/30 s, °C/30 s, 72 °C/2 min, final extension 72 °C/10 All cloned into pGEM-T. Primers for the 784-bp a 3(V) Northern blot probe, which corresponded to 3 9 -UTR sequences, 5 9 -TGAAGTTGTGAGGTGGGAAGGAAGCT-3 9 (forward) and 5 9 -GAGCACAGTTCCTTGGTTTATTCT-3 9 (reverse), excised from pGEM-T with Sac II and Spe I. Primers for the 1,480-bp a 3(V) in situ hybridization probe (nt 34–1513), corresponded N-propeptide/telopeptide sequences, were 5 9 -AGACCAGTC- CACATCCCCCTTGGCCT-3 9 (forward) and 5 9 -CTTTCATGGACAGCT-GAGCCTGTTGCA-3 9 (reverse). Riboprobes were generated from this by linearizing with Apa LI and transcribing with polymerase (antisense) or by linearizing with Not I and transcribing with po- lymerase T7 (sense). Primers for the 1,206-bp a 1(V) Northern blot corresponded to C-propeptide and 3 9 -UTR sequences, were 5 9 -GGAGAGCTACGTGGATTATGC-3 9 (forward) and 5 9 -CCATCGGAA-AGGCACGTGTGG-3 9 (reverse), with the probe excised from pGEM-T with Spe I and Apa I. Primers for the 475-bp a 1(V) in situ hybridization probe, to 3 (77). Col5a3 PCR of

Monomers of the low abundance fibrillar collagen types V and XI are incorporated into fibrils of the abundant collagen types I and II, respectively (1,2). In vitro fibrillogenesis experiments (3,4) and analysis of a type V mutation in transgenic mice (5) have indicated that type V collagen helps regulate the size and shape of type I/V heterotypic fibrils. Further evidence that type V collagen plays a role in regulating type I collagen fibrillogenesis in vivo comes from the heritable connective tissue disorder classical Ehlers-Danlos syndrome (EDS), 1 in which type I collagen fibrils of abnormal shape and diameter have been shown to result from mutations in type V collagen genes (6 -10). Similar evidence for an in vivo role for type XI collagen in regulating type II collagen fibrillogenesis comes from a study of chondrodysplasia, in which abnormal type II collagen fibrils were shown to result from defects in a type XI collagen gene (11).
Type V collagen is widely distributed in vertebrate tissues as an ␣1(V) 2 ␣2(V) heterotrimer (12,13). However, other forms of type V collagen include an ␣1(V) 3 homotrimer that is secreted by a line of Chinese hamster cells (14) and which may also exist in normal tissues (15,16), and a poorly characterized ␣1(V)␣2(V)␣3(V) heterotrimer, isolated primarily from placenta (17,18), but also reported in uterus, skin, and synovial membranes (12, 19 -21). Type XI collagen, in the form of an ␣1(XI)␣2(XI)␣3(XI) heterotrimer (22), was first characterized as a minor collagen of cartilage. However, findings of type XI chains in noncartilaginous tissues (23), of type V chains in cartilage (24), and of cross-type heterotrimers composed of ␣2(V) and ␣1(XI) chains (25,26) now suggest that type V and type XI chains constitute a single collagen type in which different combinations of chains associate in a tissue-specific manner.
Fibrillar collagens are synthesized as procollagen precursors with N-and C-propeptides that are proteolytically processed to yield mature monomers. Complete primary structures of the type V/XI procollagen chains pro-␣1(V), pro-␣1(XI), pro-␣2(XI), and pro-␣2(V) are now known (27)(28)(29)(30)(31)(32)(33)(34)(35). In addition, the primary structure of the pro-␣3(XI) chain is known, in that it is thought to be an alternatively spliced product of the gene that encodes the pro-␣1 chain of type II collagen (13,24). Full-length cDNA sequences have provided not only the inferred primary structure of each chain, but have also provided probes that have allowed fine mapping of the expression domains of cognate mRNAs (27, 36 -41). Such studies are important, as the low levels of collagen type V/XI chains have limited biochemical and histochemical analyses of expression in developing and adult tissues. Nucleic acid probes have also enabled those studies which established the causal links between defects in type V/XI chains and genetic diseases (6 -11). The only known type V/XI procollagen chain, or fibrillar procollagen chain, for which neither complete primary structure nor nucleic acid probes * This work was supported by National Institutes of Health Grants GM46846 and AR43621 (to D. S. G.) and FibroGen Inc., South San Francisco, CA. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) AF176645 and AF177941.
have been available is pro-␣3(V).
Thus, although NH 2 -terminal sequencing of proteolytic fragments of ␣3(V) chains has yielded a third of the amino acid sequence of the major collagenous domain (42), the nature of this chain has remained relatively obscure. Nevertheless, a true understanding of the nature of collagen type V/XI and its roles in development, physiology, and disease requires characterization of the very low abundance and hitherto elusive pro-␣3(V) chain, the limited distribution of which may reflect a more specialized role than those of the other type V/XI chains.
Here we report full-length pro-␣3(V) cDNA sequences for human and mouse, use nucleic acid probes to analyze pro-␣3(V) expression in developing and adult tissues, and map the chromosomal locations of the cognate mouse Col5a3 and human COL5A3 genes. Implications of pro-␣3(V) sequences and expression domains are discussed in the context of type V/XI biology and the possible involvement of pro-␣3(V) defects in human disease.

Determination of Full-length Human and Mouse Pro-␣3(V) cDNA
Sequences-A BLAST search of the dbEST data base of expressed sequences tags, using query sequence LGPPGEDGAXGSVGPTGLPG-DLGPPGDPGVSGIDG from a human ␣3(V) peptide TSK5/K1 (42), located 459-bp of ␣3(V) triple helix-encoding sequences from a mouse mammary gland EST (GenBank accession number AI021711). Primers 5Ј-GGTCCCACAGGACTCCCTGGAGATCT-3Ј (forward, nt 3853-3878 of the full-length mouse pro-␣3(V) cDNA sequences reported in the present study, AF176645) and 5Ј-TAGCCCAGGAGGTCCCAGGAGAC-CTG-3Ј (reverse, nt 4209 -4184), corresponding to EST sequences, amplified a 357-bp PCR product, using a mouse 17-day postcoitus (dpc) embryo cDNA 5Ј stretch gt10 library (CLONTECH) as template. This product was used to screen the same gt10 library, yielding one positive clone (ME7) with a 1742-bp insert. The EST clone, IMAGE clone 1366609, was obtained from the IMAGE consortium, sequenced in its entirety, and found to contain an insert of 2259-bp corresponding to roughly the 3Ј-most third of the final full-length mouse pro-␣3(V) cDNA sequence (nt 3850 -6108). Sequences of clone ME7 overlapped those of the EST clone and contained an additional 422-bp at the 5Ј-end. A 304-bp EcoRI fragment from the 5Ј-portion of the clone ME7 insert was used as a probe for further screening of the 17 dpc embryo library, yielding two additional clones, ME8-11 (1059-bp insert) and ME3-5 (876-bp insert), with 606 and 423 bp of additional 5Ј sequences, respectively. Next, 5Ј RACE was performed with two nested pro-␣3(V)-specific reverse primers, 5Ј-CCTTCAAACCAATGGGTCCTGGGTCT-3Ј (nt 3061-3036) and 5Ј-CAATGCCACCAGAGGGGCCTACAGGA-3Ј (nt 3142-3117), corresponding to sequences near the 5Ј-end of clone ME8 -11, using the Marathon cDNA Amplification Kit and mouse brain Marathon-Ready cDNA template, according to the manufacturer's protocol (CLONTECH). This nested 5Ј-RACE produced a 613-bp product. To obtain further mouse sequences, two pro-␣3(V)-specific reverse primers corresponding to sequences near the 5Ј-end of the 613-bp 5Ј-RACE product, 5Ј-CTTTCTCCCCCAGTGGTCCCAAGGGT-3Ј (primer MSP3, nt 2530 -2505) and 5Ј-CCGGTGTGCCGCGTTCTCCTTCCTCT-3Ј (primer MSP4, nt 2584 -2559), were used both for a further nested 5Ј-RACE, performed as above, but in addition using Advantage-GC cDNA Polymerase Mix (CLONTECH); and for nested PCR using 17 dpc embryo gt10 library cDNA as template and a gt10 vector-specific primer, 5Ј-TCCCCACCTTTTGAGCAAGTTCAGCCT-3Ј. Nested PCR with the gt10 primer and library yielded a product with 898 bp of pro-␣3(V) sequences. The 5Ј-RACE products were subcloned into the pGEM-T vector (Promega). A forward PCR primer, 5Ј-GTGACAGGGAGTGATG-GCGCACCA-3Ј (nt 1930 -1953), corresponding to sequences within the 898-bp PCR product, and reverse primer MSP3 (see above) were used as a primer set for PCR screening of the 5Ј-RACE product, pGEM-T clones. One clone, which contained a 2530-bp PCR insert, was found to contain the remainder of mouse pro-␣3(V) coding sequences plus 81-bp of the 5Ј-untranslated region (UTR).
First rounds of nested RACE PCRs were performed in 50-l reactions with 20 pmol of each primer, 5 l of Marathon cDNA, and 1 l of Advantage cDNA Polymerase Mix (CLONTECH) at 95°C/3 min followed by 40 cycles of 95°C/20 s, 68°C/30 s, 72°C/2-4 min and final extension at 72°C/7 min. When Advantage-GC cDNA Polymerase Mix was used, GC-Melt was added to a final concentration of 1 M per reaction. First rounds of nested PCRs using gt10 primers were performed the same way as first round RACE PCRs, except that the annealing temperature was 70°C, and template was 5 l of a gt10 library that had been diluted 12-fold with water and heat-denatured by boiling for 10 min. The second nested rounds of RACE PCRs and second nested rounds of PCRs using gt10 primers were performed the same way as first rounds, except that 25, rather than 40, cycles were used and template was 5 l of first round PCR products diluted 50-fold with water.
Blots were hybridized to random primed probes in ExpressHyb (CLONTECH) at 65°C. Northern blots were washed in 2ϫ SSC, 0.1% SDS at 65°C, followed by 0.1ϫ SSC, 0.1% SDS at 55°C; and dot blots were washed following the manufacturer's instructions (CLONTECH). For in situ hybridization, uniform labeling of riboprobes with [ 35 S]UTP, tissue preparation, and hybridization were performed as described (44), except that sections were 5 m thick and mounted two to six/slide. For histological analysis, sections were prepared and stained with hematoxylin, eosin, and Alcian blue as described previously (45). Slides were analyzed using light-and dark-field optics of a Zeiss Axiophot 2 microscope.

The Primary Structure of the Pro-␣3(V) Collagen Chain-
The full-length mouse and human prepro-␣3(V) collagen chain sequences, inferred from cDNA clones and PCR products described under "Experimental Procedures," are presented in Fig.  1. The human and mouse prepro-␣3(V) chains comprise 1745 and 1739 amino acid residues, respectively. This includes, for each, a 1011-amino acid major collagenous domain (COL1), which is shorter than the COL1 domains of the other known vertebrate fibrillar collagen chains. The latter COL1 domains range in length from 1014 amino acids, for the ␣1(I), ␣2(I), ␣1(II), ␣1(XI), ␣2(XI), and ␣1(V) chains; to 1017 amino acids, for the ␣2(V) chain; to 1029 amino acids, for the ␣1(III) chain. As predicted, based on partial amino acid sequences obtained from proteolytic fragments of the human ␣3(V) COL1 region (42), the pro-␣3(V) chain is most similar among fibrillar procollagens to the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains. The latter three chains form a subgroup among fibrillar procollagen chains on the basis of sequence similarities, structures of cognate genes, and size and configuration of N-propeptides (27,28,30,33,34,42,48,49). As in the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains, the pro-␣3(V) NH 2 -terminal region that lies between the signal peptide and COL1 domain is relatively large (comprising 412 amino acid residues in both mouse and human) and can be divided into four subdomains (Figs. 1 and 2A). Immediately upstream of the COL1 domain is a short non-collagenous linker region, and immediately NH 2 -terminal of this is a short collagenous domain. In the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains these regions have been referred to as the NC2 (noncollagenous 2) and COL2 domains, respectively (33). Although COL2 has been described as being divided into three small triple helical motifs by two short noncollagenous interruptions, in the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains (13), it has also been argued (34) that the short length of the most COOH-terminal 3 Gly-X-Y triplets and low imino acid content makes it unlikely that this last set of repeats participates in triple helix formation in the pro-␣1(XI) COL2 domain. In the case of the human pro-␣3(V) chain, this COOHterminal set of repeats is reduced to a single Gly-X-Y triplet devoid of imino acids, and thus quite unlikely to participate in triple helix formation. In the mouse pro-␣3(V) chain, even this final single Gly-X-Y triplet is missing. The most NH 2 -terminal set of COL2 repeats, comprising 4 Gly-X-Y repeats in pro-␣1(XI) and pro-␣2(XI) and 5 Gly-X-Y repeats in pro-␣1(V), is reduced to 3 Gly-X-Y triplets in both human and mouse pro-␣3(V) (Figs. 1 and 2A). Thus, the triple helix formed by the pro-␣3(V) COL2 domain is likely to be shorter than those formed by the COL2 domains of the other procollagen chains of this subfamily.
Between the pro-␣3(V) COL2 domain and signal peptide is a large globular region. Similar globular regions in the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains have been referred to as NC3 domains (33) and, as in these other procollagen chains, the pro-␣3(V) NC3 domain can be roughly divided into two subdomains ( Figs. 1 and 2A). The amino-terminal portion of NC3, which extends from the signal peptide to the vicinity of two clustered cysteines in all 4 chains, was first designated the PARP (proline/arginine-rich protein) domain for the pro-␣2(XI) chain (33,50). PARP domains show some conservation of sequences between pro-␣3(V), pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains ( Fig. 2A) and also have homologies to similar modules found in FACIT collagens, such as types IX and XII, and to the NH 2 -terminal heparin-binding domain of throm-bospondin (51). Four cysteines, which have been shown to form two intramolecular disulfide bonds in the pro-␣2(XI) PARP domain (50), are perfectly conserved in the PARP domains of pro-␣1(V), pro-␣1(XI), and pro-␣3(V) (Figs. 1 and 2A) suggesting similar conformations for this module in the different chains. However, the highly basic pI predicted by the sequence of the pro-␣2(XI) PARP domain (10.4) is replaced by somewhat acidic pI values predicted by the sequences of the PARP domains of pro-␣1(XI) and pro-␣1(V) (6.0 and 5.9, respectively) and by a markedly acidic pI of 4.4 predicted by the sequence of the PARP domain of pro-␣3(V). Thus, despite similarities, the PARP domains of the various type V/XI procollagen chains are predicted to differ in physical properties which may reflect functional differences. Subsequent to cleavage from the rest of the pro-␣2(XI) chain, the pro-␣2(XI) PARP domain persists intact at relatively high concentrations in some cartilage (50), suggestive of a physiological role for this isolated module. The NC3 domains of pro-␣1(XI) (52) and pro-␣1(V) (12,53,54) also appear to be processed at a site just downstream of the PARP domain and, thus, the pro-␣1(XI) and pro-␣1(V) PARP regions may also be released as intact modules that serve functional roles in the extracellular compartment. In vitro assays have suggested that the PARP domain may be cleaved from the pro-␣1(V) chain by the astacin-like protease bone morphogenetic protein-1 (BMP-1), and various residues found at the BMP-1 cleavage site of the human pro-␣1(V) chain are con- age site, are not conserved at similar positions in either the mouse or human pro-␣3(V) chain ( Fig. 2A). Astacin-like proteases are generally not highly specific for residues immediately flanking cleavage sites (55) and, thus, it is possible that the pro-␣3(V) PARP domain is cleaved by BMP-1. Alternatively, the unique string of basic residues immediately COOHterminal to the pro-␣3(V) PARP region (Figs. 1 and 2A), suggests the possibility that cleavage may occur via a furin-like proprotein convertase (56,57). Further insights into the nature of processing of the pro-␣3(V) N-propeptide region may be obtained from in vitro assays similar to those used to study NH 2 -terminal processing of the pro-␣1(V) chain (53).
Between PARP and COL2 is a region of the NC3 domain that has been designated the variable region (33), since little to no homology exists in this region between pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains (27,28,33,34) (Fig. 2A). Unlike the procollagen chains of collagen types I-III, type V/XI procollagen chains retain some NH 2 -terminal globular sequences (12, 15, 52, 54, 58 -61), and a number of studies suggest that retained NH 2 -terminal sequences include the variable regions of the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains (12,33,50,(52)(53)(54). These retained sequences appear to be of functional importance since, as shown for type V collagen, they protrude beyond the surface of heterotypic fibrils and may directly control fibrillogenesis by sterically hindering the further addition of collagen monomers to the fibril surface (54). These protruding sequences may also help modulate interactions between heterotypic collagen fibrils and other components of the extracellular matrix. Thus, the divergence of sequences in variable regions may reflect differences in the biological activities of the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains. The diversity of variable region sequences is further increased by a complex and tissue-specific pattern of alternative splicing of sequences in the variable domains of the pro-␣1(XI) (62, 63) and pro-␣2(XI) (31,62) chains, predicted to produce pro-␣1(XI) and pro-␣2(XI) chains with variable regions in which highly acidic stretches of amino acid residues are either present or absent or, in the case of pro-␣1(XI), are replaced by stretches of basic residues. Such alternative splicing does not appear to occur within the pro-␣1(V) variable region (27,62), which has a highly acidic predicted pI of 3.4, and which is rich in tyrosines that are sulfated (12), further acidifying this domain. Variants of pro-␣1(XI) and pro-␣2(XI) chains with variable regions that retain stretches of acidic amino acids are also rich in tyrosine residues ( Fig. 2A). In contrast to these other chains, the pro-␣3(V) variable domain has a highly basic predicted pI (e.g. 10.3 for the human sequence) and a total absence of tyrosines ( Figs. 1 and 2A). Moreover, PCR with a number of different primer sets in this region of mouse and human pro-␣3(V), using various templates from different tissues and developmental stages, gave no evidence for alternative splicing (data not shown). The basic and acidic predicted pI values of the pro-␣3(V) and pro-␣1(V) variable regions, respectively, indicate that the retained NH 2 -terminal sequences of ␣1(V)␣2(V)␣3(V) heterotrimers will have far different charge properties than those of ␣1(V) 2 ␣2(V) heterotrimers, providing heterotypic fibrils which incorporate the different molecules with far different surface characteristics.
It has previously been shown that homotypic covalent crosslinks between type V/XI collagen chains involves lysines that are 24 residues NH 2 -terminal of COL1, within the NC2 domain, and at residue 924 of the major collagenous domain (COL1), in the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains (64,65). Heterotypic cross-linking of type V/XI chains to type I or type II collagen chains involves lysines at residue 84 of the pro-␣1(V) and pro-␣1(XI) COL1 domains (64,65). Lysyl residues are conserved at the same three positions within the pro-␣3(V) chain (Figs. 1 and 2A), suggesting that the pro-␣3(V) chain may participate in the same types of homo-and heterotypic cross-linking already characterized for the other members of this subfamily of procollagen chains. Interestingly, the indication that ␣3(V) chains may be involved in heterotypic crosslinks, further suggests that ␣1(V)␣2(V)␣3(V) heterotrimers, like ␣1(V) 2 ␣2(V) heterotrimers, may be incorporated into heterotypic fibrils. An RGD sequence juxtaposed to the lysine at COL1 position 84 in the pro-␣1(V), and pro-␣2(XI) chains, is conserved at the same position in pro-␣3(V) (Fig. 1). Such RGD sequences are conceivably involved in interactions with cell surfaces, as it has been reported that cells may adhere to type V collagen via RGDintegrin interactions (66). A second RGD is found in the mouse pro-␣3(V) sequence at COL1 position 360, but is not conserved in the human pro-␣3(V) sequence, or in any of the other chains.
The COL1 domain of type V collagen has been shown to possess a site which binds heparin/heparan sulfate under physiological conditions (68,69). This site has been located within the NH 2 -terminal half of the ␣1(V) COL1 domain and contains a cluster of basic residues which appear to be necessary for heparin/heparan sulfate binding (69,70). Unlike isolated ␣1(V) chains, ␣2(V) and ␣3(V) chains do not bind heparin under physiological or denaturing conditions (69 -71). Similarly, triple helical type V collagen trimers bind to heparin with decreasing affinity in the order ␣1(V) 3 indicating that ␣1(V) chains mediate heparin binding, while ␣2(V) and ␣3(V) chains do not (70,71). It has been suggested that ␣2(V) chains do not bind heparin because the region which corresponds to the ␣1(V)-binding site is less basic (69,70). It has similarly been suggested that type XI collagen binds heparin due to high basicity in the corresponding region in type XI chains (69). However, the corresponding ␣3(V) sequence has not previously been available for comparison. In Fig. 2B, alignment of the cluster of basic amino acids in the heparin-binding domain of ␣1(V) with the corresponding regions of the ␣3(V), ␣1(XI), ␣2(XI), and ␣2(V) chains shows that ␣3(V), like ␣2(V), has less basic residues in this region than do ␣1(V), ␣1(XI), or ␣2(XI). Moreover, ␣3(V), like ␣2(V), has more acidic residues in this region than do the other chains (Fig. 3B), further reducing localized basicity. Thus, the ␣3(V) sequence is consistent with predictions (69 -71) that basicity in the region depicted in Fig. 2B is a determinant of heparin/heparan sulfate binding in type V/XI collagen chains.
In contrast to type I-III procollagen chains, in which Cpropeptides are cleaved by BMP-1 (72), the pro-␣1(V) C-propep-tide is not cleaved by BMP-1, but instead appears to be cleaved by a furin-like proprotein convertase (53). This cleavage occurs just COOH-terminal of the COL1 domain, immediately downstream of the sequence RTRR (53), a canonical RX(K/R)R furin cleavage site (56,57). As can be seen (Fig. 2C), both human and mouse pro-␣3(V) chains have canonical RX(K/R)R sites that align with that of the pro-␣1(V) chain, and with the sequence KKTRR in pro-␣1(XI) and pro-␣2(XI) chains, which is also suitable for cleavage by furin-like proprotein convertases (56). Thus, the C-propeptides of the ␣1/␣3(V)/␣1/␣2(XI) subfamily of procollagen chains may all be cleaved by the same, or by similar furin-like proprotein convertases. Within the pro-␣3(V) C-propeptide, or NC1 domain, are 7 cysteine residues conserved at similar positions in the C-propeptides of all previously characterized fibrillar procollagen chains (Fig. 2C). It has been predicted (29) that fibrillar procollagen chains capable of forming homotrimers will have 8, rather than 7 C-propeptide cysteines, as is the case for the pro-␣1(V) chain. If so, the presence of 7 C-propeptide cysteines would be consistent with reports that ␣3(V) chains are found in tissues only in the context of ␣1(V)␣2(V)␣3(V) heterotrimers (17,18,71), but inconsistent with reports of ␣3(V) 3 homotrimers (73). Alignment of sequences shows that the pro-␣3(V) C-telopeptide is shortened compared with those of the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains, as is the portion of the pro-␣3(V) C-propeptides immediately adjacent to the C-telopeptide (Fig. 2C). Both of these regions have previously been noted as areas of relative sequence variability between procollagen chains (74). A poten-tial site for Asn-linked glycosylation that precedes cysteine 6 of the C-propeptide is highly conserved between members of the pro-␣1(I)/pro-␣2(I)/pro-␣1(II)/pro-␣1(II)/pro-␣2(V) subfamily of procollagen chains, in which it is thought to be of some functional significance (74), and is also conserved in the pro-␣1(V) and pro-1(XI) chains (27)(28)(29). However, it is absent from the pro-␣2(XI) chain (30) and, although it is found in the human pro-␣3(V) sequence, it is absent in mouse. Thus, this site would not seem to be of great functional significance for either pro-␣2(XI) or pro-␣3(V) chains. In contrast, a potential glycosylation site (NQT) that is conserved in mouse and human pro-␣3(V) sequences, between C-propeptide cysteines 6 and 7, is not found in any other fibrillar procollagen C-propeptide and, thus may be of specific importance to the structure/function of pro-␣3(V) chains. Availability of the pro-␣3(V) sequence also demonstrates that the potential glycosylation site NFT, which occurs immediately downstream of C-propeptide cysteine 4, is conserved in all members of the pro-␣1(V)/pro-␣1(XI)/pro-␣2(XI)/pro-␣3(V) subfamily of procollagen chains. This site is not conserved in any member of the pro-␣1(I)/pro-␣2(I)/pro-␣1(II)/pro-␣1(II)/pro-␣2(V) subfamily of chains, and thus may represent some fundamental structural/functional difference between the C-propeptides of the two subclasses of fibrillar procollagen chains. human tissues and compared with the distributions of mRNAs for the pro-␣1(V), pro-␣2(V), pro-␣1(XI), and pro-␣2(XI) chains in the same array (Fig. 3). Particularly high pro-␣3(V) expression was detected in mammary gland, correlating with the initial isolation of pro-␣3(V) sequences as a mouse mammary gland EST (see "Experimental Procedures") and suggesting a role for pro-␣3(V) chains in this tissue in humans and mice. Relatively high pro-␣3(V) mRNA levels were also seen in placenta and uterus, consistent with the results of previous protein studies (12,(17)(18)(19). In addition, high expression of pro-␣3(V) mRNA was found in fetal heart and lung, and moderately high levels were detected in certain structures of adult human heart (Fig. 3). Relatively high levels of pro-␣1(V) and pro-␣2(V) RNA were found in most of the same tissues just noted for pro-␣3(V) expression, suggesting the presence of ␣1(V)␣2(V)␣3(V) heterotrimers in these tissues. An exception was adult brain, in which relatively high levels of pro-␣3(V) mRNA expression were not matched by high levels of either pro-␣1(V) or pro-␣2(V) mRNA. The significance of this finding is unknown, although these data are consistent with the possibility that pro-␣3(V) chains may combine with other procollagen chains or form homotrimers in these regions of adult human brain. In the same dot-blot array, highest levels of pro-␣1(XI) and pro-␣2(XI) mRNA were seen in trachea, probably reflecting the hyaline cartilage content of this structure. Surprisingly high levels of pro-␣1(XI) and especially high levels of pro-␣2(XI) mRNA were also found in structures of adult human brain. However, although this may suggest the possibility of heterotrimer formation between pro-␣3(V) and one or both type XI procollagen chains in brain, it must be noted that distributions of both type XI procollagen mRNAs in the different brain structures are quite different from that of pro-␣3(V) mRNA.

Distributions of Expression of Pro-␣3(V) RNA in Adult and
Expression patterns of pro-␣3(V) mRNA in adult human tissues, and comparison to the expression patterns of other type V/XI chains, were further characterized by Northern analysis of poly(A) ϩ RNA from a subset of the tissues analyzed by dot-blot array. As can be seen (Fig. 4, A and B) the distribution of pro-␣3(V) expression detected by the Northern blots was generally consistent with that detected by the dot-blot array, with particularly high levels of expression of a ϳ6.0-kb band detected in heart, placenta, and uterus. Interestingly, pro-␣3(V) mRNA in liver had a somewhat faster mobility (ϳ5.5-kb) than that detected in the other tissues just noted for pro-␣3(V) expression, while the relatively high levels of pro-␣3(V) mRNA in brain were found to be in the form of a considerably smaller ϳ4.2-kb band. The reason for the smaller size of pro-␣3(V) transcripts in liver and brain is, at present, unknown. In particular, the nature of the ϳ4.2-kb transcript in brain is rather mysterious, as the full-length pro-␣3(V) coding sequence is 5235-bp, while PCR with a number of different primer sets using mouse and human brain RNA as templates, found no evidence for pro-␣3(V) N-propeptide alternative splicing (data not shown). As in the dot-blot array, Northern blot analysis found coexpression of pro-␣1(V), pro-␣2(V), and pro-␣3(V) mRNAs in heart, placenta, and uterus, but low to undetectable levels of both pro-␣1(V) and pro-␣2(V) mRNAs in brain, and readily detectable levels of pro-␣1(XI) and pro-␣2(XI) mRNAs in the latter tissue. Thus, the nature of pro-␣3(V) expression in brain and the possible interaction of pro-␣3(V) chains with other type V/XI procollagen chains in this tissue appears to be unique and will merit further investigation. An interesting and, to our knowledge, novel observation in both dot-blot array and Northern blot analysis was the high expression of pro-␣1(XI) and pro-␣2(XI) mRNA in testis (Figs. 3 and 4B), suggesting roles for these chains in that tissue.
To begin characterizing the temporal expression pattern of pro-␣3(V) during development, and to compare this pattern to the temporal expression patterns of other type V/XI procollagen chains, pro-␣3(V)-, pro-␣1(V)-, pro-␣2(V)-, pro-␣1(XI)-, and pro-␣2(XI)-specific probes were hybridized to a Northern blot containing poly(A) ϩ RNA from 7, 11, 15, and 17 dpc mouse embryos (Fig. 4C). The pro-␣3(V) probe hybridized to a single ϳ6.3-kb band that was at readily detectable levels in the RNA of 7 dpc mid-gastrulation mouse embryos. This pro-␣3(V) ϳ6.3-kb mRNA disappears at 11 dpc (Fig. 4C) and was not visible even upon prolonged exposure of the blot, nor was signal for pro-␣3(V) RNA detectable at this stage by in situ hybridization of 11 dpc mouse embryos (not shown). Pro-␣3(V) mRNA reappears at 15 dpc and is further increased in abundance at 17 dpc, during a period of post-organogenesis fetal growth and development. Strong expression of both pro-␣1(V) and pro-␣2(V) mRNAs accompany that of pro-␣3(V) mRNA at 15 and 17 dpc. However, although strong pro-␣2(V) mRNA expression is evident at 7 dpc, expression of pro-␣1(V) is not readily detectable at this stage of development (Fig. 4C), with low levels of pro-␣1(V) mRNA just visible upon prolonged exposure of the blot (not shown). Pro-␣1(XI) and pro-␣2(XI) mRNAs are also readily detectable at 15 and 17 dpc, but even prolonged exposure of the blot (not shown) did not reveal detectable levels at 7 and 11 dpc. These results suggest a role for type V, but not type XI collagen chains in mid-gastrulation mouse embryos. The results are also consistent with the possibility that pro-␣3(V) chains may exist either as homotrimers or in heterotrimeric combination with pro-␣2(V) chains, in the absence of pro-␣1(V) chains, at this time. However, the possibility that ␣3(V) chains are found only in the context of ␣1(V)␣2(V)␣3(V) heterotrimers at 7 dpc, despite wide differences in RNA levels for the various chains, has certainly not been excluded.
To determine the distribution of expression of pro-␣3(V) during mouse development, and to compare this to the expression domains of other type V/XI procollagen chains, a series of in situ hybridizations were performed on serial sagittal and parasagittal sections of 13.5 and 15.5 dpc mouse embryos using antisense, and sense control, riboprobes specific for pro-␣3(V), pro-␣1(V), pro-␣1(XI), and pro-␣2(X) sequences. At 13.5 dpc pro-␣3(V) RNA expression was barely detectable, although pro-␣1(V) RNA expression was widely distributed throughout developing mesenchyme and intense pro-␣1(XI) and pro-␣2(XI) signals were already visible in nascent chondrified cartilaginous elements (data not shown). At 15.5 dpc, however, pro-␣3(V) expression was readily discernible and the pro-␣3(V) expression domain was seen to be a subset of that of pro-␣1(V) (Figs. 5 and 6). Interestingly, although pro-␣1(V) expression was widely distributed throughout developing connective tissues, with especially high levels of expression seen in the perichondrium associated with cartilaginous primordia of future bones, expression of pro-␣3(V) was not detected in perichondrium or other regions of bone primordia, but was instead most readily detectable in the superficial fascia and in the epimysia, or connective tissue sheaths, tracing the outlines of the developing muscles of the anterior chest wall, the cutaneous panniculus carnosus muscle and the developing musculature of the neck. In addition to its expression in epimysium, pro-␣3(V) expression was also seen in the connective tissue sheath, or epineureum, of some nerves (Fig. 6). Although pro-␣3(V) was not expressed in perichondrium, high pro-␣3(V) expression was observed closely apposed to the cartilage primordia of future bones in the soft tissue associated with a number of joints, in what appeared to be incipient ligamentous attachments (formation of ligaments and tendons first begins in mouse development, as mesenchymal condensations at 14 dpc, Ref. 75). In Figs. 5 and 6, pro-␣3(V) expression in nascent ligamentous attachments can be seen (i) between the cartilage primordia of the basioccipital bone at the base of the skull and the first two cervical vertebrae C1 (atlas) and C2 (axis), (ii) apposed to the cartilage primordium of the exoccipital bone, and (iii) between the cartilage primordia of the femoral head and acetabulum of the hip joint. Pro-␣3(V) signal was also detectable in forming tendons within the hindlimb (Fig. 5).
Chromosomal Assignment of the Murine Col5a3 and Human COL5A3 Pro-␣3(V) Genes-Previously, mutations in the human COL5A1 and COL5A2 genes, which encode the pro-␣1(V) and pro-␣2(V) chains, respectively, have been identified as the underlying defects in cases of the heritable connective tissue disorder classical Ehlers-Danlos syndrome (EDS) (6 -10) (formerly EDS types I and II, see Ref. 76). However, both COL5A1 and COL5A2 have been excluded in some cases of classical EDS I, while a locus has yet to be identified for the hypermobility type of EDS (formerly EDS type III). As a first step toward examining possible involvement of pro-␣3(V) in human disease and in abnormal phenotypes in mice, chromosomal positions were established for the human COL5A3 and mouse Col5a3 pro-␣3(V) genes. The location of COL5A3 was determined by radiation hybrid mapping (46), using PCR analysis of the Genebridge 4 radiation hybrid panel (Research Genetics) (see "Experimental Procedures"). Scoring, submitted to the WICGR Mapping Service at the Whitehead Institute/MIT Center for Genome Research, clearly mapped COL5A3 to chromosome 19p, 6.19 cR from WI-8049 and 2.02 cR from WI-7557 (Lod 2.68 relative to most likely). According to the Genome Data base, WI-7557 amplifies from gene DNMT1, which has been cytogenetically mapped to 19p13.2 (77). Col5a3 was mapped by PCR analysis of 94 progeny of the C57BL/6J X Mus spretus (BSS) backcross from the Jackson Laboratory (47) (see "Experimental Procedures"), to a region of proximal chromosome 9 homologous to human 19p13.2. Mapping of the human and mouse sequences reported herein to homologous positions in the human and murine genomes, supports the contention that they are human and mouse homologues of the same gene, rather than genes for related, but genetically distinct procollagen chains. Connective tissue or musculoskeletal disorders that might arise in an obvious way from defects in the pro-␣3(V) chain have yet to be mapped to the same chromosomal region as either COL5A3 or Col5a3. However, the highly polymorphic simple sequence (CA) repeat D19S413, with a maximum heterozygosity of 0.78 (78) has, like COL5A3, been mapped to the ϳ3.6 centimorgan interval between WI-8049 and WI-7557 and, thus, should be of use in the initial analysis of linkage between COL5A3 and disease phenotypes in EDS and other affected families.
Summary-Here we have presented the primary structure of the pro-␣3(V) collagen chain, a hitherto elusive molecule. These sequences show pro-␣3(V) to be closely related to the pro-␣1(V), pro-␣1(XI), and pro-␣2(XI) chains, with sequence similarities and differences that provide insights into the nature and biology of the pro-␣3(V) chain. As an example, a conserved Lys at position 84 of the COL1 domain suggests that ␣1(V)␣2(V)␣3(V) heterotrimers may be incorporated into heterotypic fibers, while differences in N-propeptide/telopeptide sequences suggest that such heterotypic fibrils would have different surface charge properties than heterotypic fibrils which incorporate ␣1(V) 2 ␣2(V) heterotrimers, likely to influence fibril shape/diameters and interactions with other macromolecules. An unexpected finding of great potential interest was the expression of pro-␣3(V) RNA primarily in the connective tissue sheaths (epimysia) of forming muscles and in the rudiments of ligamentous attachments adjacent to forming bones and within nascent joints during development. Aside from providing insights into pro-␣3(V) biology, this observation may have special meaning in regard to COL5A3 as a possible candidate locus for human diseases. Both classical and hypermobility types of EDS are marked by hypermobility/laxity of the joints, muscular hypotonicity in early childhood, premature degenerative joint disease, and joint pain (79,80). In particular, the hypermobility type, which may be the most common type of EDS (80), is marked by gross joint laxity and recurrent joint dislocation, plus chronic diffuse muscle pain not attributable to joint involvement. The  Fig. 5, stained with hematoxylin-eosin and Alcian blue (H&E, Alcian) or hybridized to pro-␣3(V) or pro-␣1(V) riboprobes, are shown at high magnification (ϫ 50) to more clearly show various relevant structures. An asterisk denotes pro-␣3(V) signals in the superficial fascia and epimysium of the panniculus carnosus. Arrows mark pro-␣3(V) signals in presumptive ligamentous attachments apposed to cartilage primordia of the basioccipital bone and vertebrae C1 and C2 (bc), and hip joint (hj); and in the epineureum surrounding spinal nerves (ep). m marks a muscle in the neck region, representative of numerous muscles in that region, in which strong pro-␣3(V) signal is seen in surrounding epimysia. Arrows also mark strong pro-␣1(V) signal in the perichondria of vertebrae (pv), ribs (pr), and bones of the hip joint (ph), structures in which matching pro-␣3(V) signals were not observed. fact that pro-␣3(V) chains combine with pro-␣1(V) and pro-␣2(V) chains to form heterotrimers, and demonstration here of the distribution of pro-␣3(V) expression in developing ligaments and tendons, within joints, and in the epimycia lining developing muscles, suggest the human pro-␣3(V) gene, COL5A3, as a good candidate locus for at least some cases of classical EDS in which COL5A1 and COL5A2 have been excluded, and for at least some cases of the hypermobility type of EDS. Expression of pro-␣3(V) in epimycium also raises the possibility that defects in COL5A3 and Col5a3 might result in some muscle myopathies, as has recently been shown to be the case with the genes for type VI collagen (81). Importantly, sequences and cDNA clones presented here will enable further analysis of the roles of type V/XI collagen in development, homeostasis, and disease.