Extensive Alternative Splicing within the Amino-propeptide Coding Domain of α2(XI) Procollagen mRNAs EXPRESSION OF TRANSCRIPTS ENCODING TRUNCATED PRO-α CHAINS

Heterogeneity in type XI procollagen structure is extensive because all three α(XI) collagen genes undergo complex alternative splicing within the amino-propeptide coding domain. Exon 7 of the human and exons 6-8 of the mouse α2(XI) collagen genes, encoding part of the amino-propeptide variable region, have recently been shown to be alternatively spliced. We show that exon 6-containing mRNAs for human α2(XI) procollagen are expressed at 28 weeks in fetal tendon and cartilage but not at 38-44 days or 11 weeks. In the mouse, exon 6 is expressed in chondrocytes from 13.5 days onward. We recently identified conserved sequences within intron 6 of the human and mouse α2(XI) collagen genes, containing additional consensus splice acceptor and donor sites that potentially increase the size of exon 7, dividing it into three parts, designated 7A, 7B, and 7C. We show by reverse transcription polymerase chain reaction and in situ hybridization that these potential splice sites are used to yield additional α2(XI) procollagen mRNA splice variants that are expressed in fetal tissues. In human, expression of exon 7B-containing transcripts may be developmental stage-specific. Interestingly, inclusion of exon 7A or exon 7B in human and mouse α2(XI) procollagen mRNAs, respectively, would result in the insertion of an in-frame termination codon, suggesting that some of the additional splice variants encode a truncated pro-α2(XI) chain.

Heterogeneity in type XI procollagen structure is extensive because all three ␣(XI) collagen genes undergo complex alternative splicing within the amino-propeptide coding domain. Exon 7 of the human and exons 6 -8 of the mouse ␣2(XI) collagen genes, encoding part of the amino-propeptide variable region, have recently been shown to be alternatively spliced. We show that exon 6-containing mRNAs for human ␣2(XI) procollagen are expressed at 28 weeks in fetal tendon and cartilage but not at 38 -44 days or 11 weeks. In the mouse, exon 6 is expressed in chondrocytes from 13.5 days onward. We recently identified conserved sequences within intron 6 of the human and mouse ␣2(XI) collagen genes, containing additional consensus splice acceptor and donor sites that potentially increase the size of exon 7, dividing it into three parts, designated 7A, 7B, and 7C. We show by reverse transcription polymerase chain reaction and in situ hybridization that these potential splice sites are used to yield additional ␣2(XI) procollagen mRNA splice variants that are expressed in fetal tissues. In human, expression of exon 7B-containing transcripts may be developmental stage-specific. Interestingly, inclusion of exon 7A or exon 7B in human and mouse ␣2(XI) procollagen mRNAs, respectively, would result in the insertion of an in-frame termination codon, suggesting that some of the additional splice variants encode a truncated pro-␣2(XI) chain.
The collagens are an important class of extracellular matrix components because the structural integrity and functional properties of different tissues are influenced by the characteristic combinations and amounts of these molecules (1). These triple helical glycoprotein molecules belong to a large family consisting of 19 different types with 30 genes encoding their constituent ␣ chains (2,3). Diversity within a single type of collagen can arise not only through varying the combinations of their ␣ chains within heterotrimers, but also from the synthesis of different isoforms arising from alternative splicing of mRNAs. There is increasing evidence for the syntheses of different isoforms within each collagen type arising from alternatively spliced transcripts such as for types II, VI, IX, XI, XIII, XIV, and XVIII collagens (4 -7). In addition, alternative splicing of ␣2(I) and ␣1(III) collagen genes produces mRNAs that could encode a non-collagenous protein and/or truncated collagen, suggesting that some collagen genes may have alternative functions (8,9).
Type XI collagen, a fibril-forming collagen, is a heterotrimer composed of ␣1(XI), ␣2(XI), and ␣3(XI) subunits (10) and is synthesized as a precursor procollagen molecule with aminoand carboxyl-terminal globular extensions (propeptides). The amino-propeptides in both ␣1(XI) and ␣2(XI) procollagen consist of several subdomains, a proline-and arginine-rich peptide region (PARP), 1 a variable region (VR), and a constant region (CR). For both of these procollagens the PARP and CR subdomains are similar in size, but the VR subdomain is highly variable both in length and amino acid sequence, raising questions about possible different functional roles of the VR (11,12).
We have recently found that heterogeneity in type XI collagen structure may arise from differing combinations of the three ␣(XI) chains within heterotrimers and homotrimers (13). The heterogeneity in type XI procollagen structure may be more extensive because all three genes for type XI collagen have been shown to undergo alternative splicing within the amino-propeptide coding domain. The VR in the amino-propeptide of ␣2(XI) procollagen is encoded by exons 5-9 in the gene. Within this coding region, exon 7 of the human and exons 6 -8 of the mouse ␣2(XI) collagen genes have been shown to undergo complex alternative splicing (14,15). Alternative processing has also been demonstrated in the VR of ␣1(XI) procollagen in human, chick, and rat (14,16). The pro-␣3(XI) chain of type XI procollagen is an overglycosylated variant of pro-␣1(II) and therefore is the gene product of the ␣1(II) collagen gene (17). Exon 2 of the ␣1(II) collagen gene is also alternatively spliced, yielding two forms of ␣1(II) procollagen mRNAs which either include or exclude an exon encoding a cysteine-rich domain in the amino-propeptide (18). Assembly of the translation products of these different pro-␣(XI) alternatively spliced transcripts within type XI procollagen heterotrimers would result in extensive heterogeneity in the structure of the aminopropeptide of type XI procollagen.
The genomic organization, intron-exon boundaries, and sequence conservation for the amino-propeptide coding domain of the human and mouse ␣2(XI) collagen genes have been determined recently, partly by comparing cDNA and genomic sequences (14,15,19) (Fig. 1). In assessing the degree of interspecies sequence conservation for the ␣2(XI) collagen gene, we * This work was supported by the Arthritis and Rheumatism Council (United Kingdom), the Croucher Foundation, and a Hong Kong University Strategic Research grant. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  had noted a region of conserved sequences containing motifs corresponding to consensus splice acceptor and donor sites within intron 6 of the human and mouse genes (19). In the present study, we have aimed to assess whether these splice signals can function to generate alternative transcripts using RT-PCR and in situ hybridization. We show that these potential splice sites are used to yield additional ␣2(XI) procollagen mRNA splice variants that are expressed in human and mouse fetal tissues. Interestingly, inclusion of one of these additional exons in ␣2(XI) procollagen mRNAs would result in the insertion of an in-frame termination codon within the VR, suggesting that some of the additional splice variants encode a truncated pro-␣2(XI) chain.

Human Fetal Material
Human fetal tissues were obtained from elective therapeutic terminations of pregnancy. Approval was obtained from the Ethical Committee of the Medical Faculty, Hong Kong University, for work with fresh embryonic materials. The gestational age of specimens was determined by assessing the centers of ossification (20) and the histologic characteristics of the stained sections. The crown-rump length was also measured for intact fresh specimens (21).

DNA Sequencing and Sequence Analyses
Nucleotide sequences were determined on both strands by the dideoxy chain termination method (22) on double-stranded DNA using Sequenase 2.0 (U. S. Biochemical Corp.) following the manufacturer's protocol. Longer contiguous stretches of DNA sequence were obtained using the exonuclease/mungbean nuclease method of generating overlapping clones (23). Sequence comparison analyses were performed using the UWGCG computer program (24).

Oligonucleotide Primers
The following pairs of oligonucleotide primers were used for cDNA synthesis and PCR. To facilitate cloning of the PCR products, a HindIII adaptor (underlined) was added to the 5Ј ends of primers. The locations of these primers are also shown in Figs. 1 and 2.

Cloning of RT-PCR Products
First strand cDNA was transcribed from 20 g of total human RNA or 10 g of total mouse RNA using avian myeloblastosis virus reverse transcriptase (35 units, Seikagaku America Inc. or 200 units, Superscript II, Life Technologies, Inc.) with a corresponding antisense primer as described previously (25). One-tenth of the reaction product was used for second strand DNA synthesis and amplification by PCR with the corresponding 5Ј sense primer. PCRs were carried out with 1-min denaturation of DNA at 94°C, 1-min annealing reaction at 55°C, 1-min extension reaction at 72°C for 30 cycles. As controls, PCRs were performed with same set of RNAs but without the addition of reverse transcriptase to test for genomic contamination of RNA samples. Second round PCR amplifications were performed with one-fifteenth of the first round amplified products. The PCR amplifications were carried out under the same conditions as the first round PCR amplification reactions. RT-PCR products were either directly cloned into the EcoRV site of pUBS (gift of G. Murphy, AFRC, Norwich) or digested with HindIII (New England Biolabs) and subcloned into the HindIII site of pBluescribe (Stratagene).

RNA Isolation and in Situ Hybridization
Total human and mouse RNAs were prepared by the lithium chloride-urea differential precipitation method (26). RNAs were isolated from week 11 human fetal cartilage and from week 28 human fetal cartilage and tendon. Mouse RNAs were isolated from 1-day-old neonatal cartilage and from day 15.5 whole mouse fetuses. Two human fetuses of 38 -42 days and 42-44 days gestation and day 16.5 CBA/n mouse fetuses were collected and processed for in situ hybridization as described (27). Single-stranded 35 S-labeled sense and antisense riboprobes were generated from subclones containing human and mouse pro-␣2(XI) collagen gene exons as described previously (25).
In situ hybridization slides were stained in Harris hematoxylin and eosin, and K5 photographic emulsion (Ilford) was used for autoradiography. Photographs of in situ hybridizations were taken using Kodak Ektachrome ASA 64 film on a Zeiss Axioskop microscope under darkfield illumination.

RESULTS AND DISCUSSION
Additional Splice Sites Flanking Exon 7 within the Variable Region Coding Domain-A comparison of the genomic sequences of the human and mouse ␣2(XI) collagen genes 2 (19) showed that there was little sequence conservation for the introns between exons 5 and 7. An exception was a highly conserved region in the intron sequence between exons 6 and 7, immediately upstream of the splice junction of exon 7 ( Fig. 2 and Table I). Interestingly, this conserved region contained additional potential splice acceptor and donor sequences similar to the consensus splice site signals 5Ј-T/CT/CTTT/CT/CT/ CT/CT/CNCAG͉G-3Ј (acceptor) and 5Ј-AG͉GTA/GAGT-3Ј (donor) (28). One site (site 1, 5Ј-TCTCTGCTGCAG͉CG-3Ј in human, 5Ј-TCTCTCACTGCAG͉CG-3Ј in mouse) is located 450 and 476 bp upstream of the published splice acceptor site for exon 7. The other site (site 2) is a potential donor site (5Ј-TTACCGGCA͉GTAGAG-3Ј in human and 5Ј-TCGGCAG͉GT-AGAG-3Јin mouse) 227 and 198 bp 5Ј of the exon 7 splice acceptor site in human and mouse, respectively (Fig. 2). Two  (29), suggesting that these signals could be functional.
Consequences of Utilizing Additional Splice Signals within Intron 6 of the ␣2(XI) Collagen Gene-The presence of additional signals within intron 6 of the ␣2(XI) collagen gene potentially increases the size of exon 7 and if used, would divide it into three parts, which we have designated 7A, 7B, and 7C as shown in Figs. 1 and 2. Exon 7C corresponds to the alternatively spliced exon 7 reported previously in human and mouse (14,15). Utilization of 5Ј acceptor site 1 in conjunction with the closest donor (site 2) would yield a separate exon 7A of 250 and 252 bp in human and mouse, respectively. Acceptor site 1 could also be used in conjunction with the donor site at the 3Ј end of exon 7C to yield an enlarged exon 7 consisting of parts A, B, and C, 540 bp in size for human ␣2(XI) collagen mRNA (513 bp for the mouse homolog). Since the sequence at the 5Ј end of part 7B was more typical of a donor site than an acceptor, splicing of this exon as a separate exon was probably rare. The third splice variation would result from the use of previously reported splice signals for exon 7 flanking part C, yielding a separate exon 7C of 63 bp (14,15).
Translation of the combined sequences within parts A, B, and C of exon 7 for the human gene revealed an open reading frame, in-frame with that for ␣2(XI) procollagen, which ended in exon 7A because of the presence of the termination codon TGA. Exon 7B also contained a termination codon TAA. In the mouse gene the open reading frame ended within exon 7B with the termination codon TGA. The difference in length of the open reading frame in human and mouse lies in the replacement of the human TGA codon in exon 7A with CCA in the mouse. There were no canonical polyadenylation signals within the intron 7 sequence; however, cleavage and polyadenylation of transcripts containing exon 7A/7B within intron 7 cannot be excluded since in other genes RNA processing can occur even in the absence of a clear consensus sequence (30).
The absence of an alternative long open reading frame in the combined sequence of exons 1-7C suggests that human pro-␣2(XI) mRNAs containing exon 7A or 7B and mouse transcripts containing exon 7B would encode a truncated ␣2(XI) procollagen containing approximately two-thirds of the intact aminopropeptide that terminates in the variable region. Since assembly of trimeric procollagen molecules requires association of the carboxyl-propeptides, these truncated pro-␣2(XI) chains are not expected to associate within trimers. Instead, a non-collagenous polypeptide of 312 and 395 amino acids would be synthesized for human and mouse, respectively. However, in the mouse alternatively spliced transcripts containing exon 7A and not 7B would encode full-length pro-␣2(XI) collagen chains. The sequence identity between human and mouse for these alternative exons was high at the amino acid and nucleotide levels (Table I), suggesting a functional role for the truncated pro-␣2(XI) chain.

Expression of Additional Splice Variants of Exon 7 in Human and
Mouse-Reverse transcription PCR and in situ hybridization were used to determine if the potential additional splice variants of exon 7 could be expressed in vivo in human and mouse fetal tissues.
RT-PCR Analyses-RT-PCR assays using human fetal RNA and oligonucleotide primers to exons 5 and 7C (see "Experimental Procedures") generated two fragments of 130 and 50 bp from week 28 cartilage and tendon (Fig. 3A). Only the 50-bp fragment was obtained from week 11 cartilage RNA. The sizes of these fragments were consistent with an exon composition of exon 5-6-7C (130 bp) and exon 5-7C (50 bp). The presence of exon 6 within the 130-bp fragment was confirmed by hybridization with an exon 6-specific probe (pVL50; Fig. 3A). Further confirmation of the exon composition of these fragments was obtained by cloning and sequencing the RT-PCR fragments (data not shown). Comparison of the RT-PCR sequence with the human genomic sequence showed that the 130-bp fragment contained exons 5-6-7C and the 50-bp fragment, exons 5-7C; however, no RT-PCR fragments containing exons 7A and 7B were obtained. Although alternative transcripts containing exon 6 have been demonstrated in the mouse (15), expression in human has not been reported (14).
To determine if the failure to detect exons 7A and 7B in the RNA samples truly reflected their absence, primers specific to exons 7A and 7B were also used for RT-PCR (see "Experimental Procedures"). No RT-PCR product was obtained using primers to exons 5 and 7B (data not shown); however, two fragments of 190 and 110 bp were generated from week 28 cartilage and tendon RNA using the exon 7A primer (Fig. 3B). For week 11 cartilage RNA only the 110-bp fragment was obtained (Fig.  3B). The 190-bp fragment could not be cloned because of the trace amount of this fragment obtained; however, hybridization of the 190-bp fragment with both exon 6-and exon 7A-specific probes (Fig. 3B) is consistent with an exon 5-6-7A composition. The 110-bp fragment contained exon 5-7A since it hybridized only with the exon 7A probe and not exon 6. This predicted composition of the 110-bp fragment was confirmed by cloning and sequencing the RT-PCR product (data not shown). Since the presence of the termination codon in exon 7A (Fig. 2), predicted from the genomic sequence was confirmed, this fragment represents a transcript encoding a truncated ␣2(XI) procollagen. Although the relative amounts of the different exon 7-containing transcripts were not measured, in human, mRNAs containing exon 7A were probably of low abundance at the developmental stage studied, since they could only be revealed using exon 7A-specific primers.
Total RNAs from mouse fetuses 15.5 days postcoitus and from 1-day-old neonatal mouse cartilage were also analyzed by RT-PCR using oligonucleotide primers to exons 5 and 7C (see "Experimental Procedures"). For both RNA samples, amplified products of 490, 290, and 130 bp were obtained (Fig. 4). These fragments hybridized with probe pVL5E1, containing exons 5-7 of the human COL11A2 gene (Fig. 3), suggesting complex alternative splicing within this region of the gene. The exon composition within these fragments was determined by cloning and sequencing the RT-PCR fragments (data not shown) and by comparing the DNA sequences with the mouse genomic sequence.
Sequence analyses of the cloned RT-PCR fragments revealed that the compositions of the different fragments were as fol- lows: the 490-bp fragment, exons 5-7A-7B-7C; the 290-bp fragment, exons 5-7A-7C; and the 130-bp fragment, exons 5-6-7C (data not shown). These analyses showed that mouse Col11a-2 exon 6 was 78 bp, in agreement with the previous report (15). The sizes of exons 7A and 7B, 252 and 198 bp, respectively, were as predicted from the genomic sequence (Fig. 2). Transcripts containing exons 5-7B-7C were not detected as may be expected if the junction of 7A and 7B acted mainly as a donor site (Fig. 2). Because of the termination codon in exon 7B, the transcript represented by the 490-bp fragment would encode truncated ␣2(XI) procollagen.
In Situ Hybridization Analyses-The fetal tissue pattern of expression of transcripts containing alternatively spliced exons was studied by in situ hybridization using antisense riboprobes specific for the each of the individual exons 6, 7A, or 7B (Fig. 5). Hybridization signals were at background level for the sense controls for all three riboprobes (reviewed, data not shown). Despite the high sequence homology (80%) between human and mouse exon 7B, background hybridization signals were obtained for the day 42-44 human fetal tissues using the mouse exon 7B probe. In contrast, a strong hybridization signal was obtained in the vertebral bodies using the human exon 7B probe (data not shown), demonstrating the species and sequence specificity of the antisense riboprobes used.
Transcripts containing exon 6 were not detected in human fetal chondrocytes at 38 -42 days and at 42-44 days (Fig. 5, A  and B). It is interesting that exon 6-containing transcripts were also not expressed by week 11 chondrocytes but were found in week 28 cartilage and tendon (Fig. 4). In human, expression of exon 6 transcripts may be restricted to later developmental stages, unlike the mouse where exon 6-containing mRNAs were clearly found in mouse fetal chondrocytes at 16.5 days (Fig. 5C) and at 1 day postpartum (Fig. 4). The expression of mRNAs containing exon 6 by mouse fetal chondrocytes is in contrast to another study where most of the mouse Col11a-2 mRNAs in 3-week-old neonate cartilage lacked exon 6, and expression was found mainly in non-chondrogenic tissues such as muscle, brain, heart, and calvaria (15). However, exon 6 mRNAs were reported in developing forelimb bud in embryos between 10 and 14 days. Differences in our findings may reflect stage-specific transient expression of alternative transcripts containing exon 6.
Exon 7A and 7B probes hybridized to chondrogenic tissues such as the chondrocytes within the cartilagenous masses within vertebral bodies in both human and mouse (Fig. 5, D-I).
The sensitivity of in situ hybridization is usually insufficient for the detection of splicing precursors. The strong hybridization signals for these exons suggest that the inclusion of exons 7A and 7B in transcripts is the result of alternative splicing and does not represent intermediates of mRNA processing. Expression of transcripts containing exon 7B may be stagespecific for human, since exon 7B could only be detected in mRNAs at earlier stages of embryonic development (summarized in Table II). In contrast for the mouse, exon 7B was expressed throughout chondrogenesis from 13.5 days onward 2 (this study).
Synthesis of a Truncated ␣2(XI) Amino-propeptide-Although the relative levels of the different exon 7 variants were not determined, it is interesting that exon 7C contains a conserved purine-rich sequence 5Ј-GAAGAGGAAGAA-3Ј (Fig. 2), which is similar to that found to act as an exon splicing enhancer in some alternatively spliced genes (for review, see Refs. [31][32][33]. Purine-rich motifs within exons have also been shown to act as splicing silencers in viral genes (34). Exons 7A and 7B also contained purine-rich motifs, but these were shorter. It would be important to determine if these motifs have a role in regulating usage of the alternative exon 7 splice sites.
The presence of premature termination codons in genes has been shown to accelerate the decay rate of mRNAs, a phenomenon called nonsense-mediated mRNA decay (35,36). Skipping of exons containing a premature termination codon has also been observed for fibrillin and ornithine ␦-aminotransferase genes (37). The presence of the premature termination codon in mRNAs that include exon 7A and 7B therefore raises questions about the stability and the efficiency of translation of these transcripts and whether levels of the truncated pro-␣2(XI) polypeptide may be low. An additional question would be whether the premature termination affects splice site selection frequency. The in situ hybridization data did not show an especially low level of exon 7A-or exon 7B-containing transcripts in human and mouse. The presence of the termination codon therefore probably had not affected mRNA stability. Although the physiologic significance of the synthesis of trun-  cated ␣2(XI) collagen is unknown, the synthesis of truncated ␣2(XI) collagen chain may be important since it is conserved among different species. Another question arising would be whether the PARP peptide previously thought to be a processed product of ␣2(XI) procollagen (38) is the product of mRNAs encoding a truncated pro-␣ chain. This truncated polypeptide, although larger, includes the PARP domain, and PARP peptide could be derived from it by further processing.
Heterogeneity within the Amino-propeptide of ␣2(XI) Procollagen Created by Alternative Splicing-Our data have shown by RT-PCR and in situ hybridization analyses that the splice signals present in intron 6 are utilized in both human and mouse, giving rise to additional splice variants of ␣2(XI) procollagen mRNA. Interestingly, the expression of exons 6 and 7B may be developmental stage-specific in human (Table II). In addition to the alternatively spliced exons 6, 7A, and 7B identified in the present study, exons 7C and 8 have also been shown to be alternatively spliced in the mouse (15). With the exception of the alternative transcripts that encode truncated ␣2(XI) procollagen, the added differential usage of these exons could generate great diversity in the structure of the aminopropeptide of ␣2(XI) procollagen and therefore in heterotrimeric type XI procollagen.
Several functions have been proposed for the amino-propeptide of fibril-forming collagens. Proposals based on studies on types I and III procollagen include a role of retained propeptide in preventing premature fibril formation intracellularly and in regulating fibril diameter (39,40). Biosynthetic studies have shown that parts of the amino-propeptides of pro-␣1(XI) and pro-␣2(XI) are retained within fibrils (41,42), which may be important for their proposed roles in regulating fibrillogenesis and modulating fibril diameter (43)(44)(45).
It is also interesting that the amino-propeptides of type I and III procollagen have been shown to be capable of down-regulating collagen synthesis by cultured fibroblasts and in cell-free translation assays (39,46,47). The possible synthesis of truncated amino-propeptide for ␣2(XI) procollagen raises further questions about potential roles for these polypeptides in regulating biosynthesis at the translational level.
The complex alternative splicing found for the ␣1(XI) and ␣2(XI) collagen genes with its resulting structural diversity within the amino-terminal region and the developmental stage differences in expression raise questions about possible differing functional roles for the amino-propeptide. The roles are expected to differ depending on whether the amino-propeptide is retained, cleaved, or synthesized as a separate polypeptide.