Structural Organization of Distinct Domains within the Non-collagenous N-terminal Region of Collagen Type XI*

Collagen XI is a heterotrimeric molecule found predominantly in heterotypic cartilage fibrils, where it is involved in the regulation of fibrillogenesis. This function is thought to involve the complex N-terminal domain. The goal of this current study was to examine its structural organization to further elucidate the regulatory mechanism. The amino-propeptide ( a 1-Npp) alone or with isoforms of the variable region were recombi-nantly expressed and purified by affinity and molecular sieve chromatography. Cys-1–Cys-4 and Cys-2–Cys-3 disulfide bonds were detected by liquid chromatography-tandem mass spectrometry. This pattern is identical to the homologous a 2-Npp, indicating that the recombinant proteins were folded correctly. Anomalous elution on molecular sieve chromatography suggested that the variable region was extended, which was confirmed using rotary shadowing; the a 1-Npp formed a globular “head” and the variable region an extended “tail.” Circular dichroism spectra analysis determined that the a 1-Npp comprised 33% b -sheet, whereas the variable region largely comprised non-periodic

Collagen XI is a heterotrimeric molecule consisting of ␣1, ␣2, and ␣3 collagen chains (6). The ␣3(XI) chain is an overglycosylated form of the ␣1(II) collagen chain (7), whereas the ␣1(XI) and ␣2(XI) chains are distinct gene products (8). The ␣1(XI) and ␣2(XI) chains contain similar large, N-terminal domains (NTDs), 1 comprising a "PARP" or "PARP-like" domain, a variable region, and a minor triple helix ( Fig. 1a; Refs. 9 and 10). The PARP (proline-arginine rich protein) domain was originally isolated as a distinct protein from bovine cartilage (11) but was later demonstrated to be a fragment of the N terminus of the ␣2(XI) chain (12). Recently, the proteolytic processing site of the ␣1(XI) chain has been identified (13,14), and since nearly all of the domain is removed by this processing event, both the PARP-like and PARP domains are now termed the amino-propeptides (Npp). The Npp domains are structurally homologous to modules found in seven other collagen chains, as well as the glycoproteins laminin and thrombospondin (15,16).
Both the ␣1 and ␣2 chains contain a variable region that lies between the Npp and the minor helix, and alternative splicing of the mRNA within these variable domains generates considerable sequence diversity (10,17,18). During chondrocyte differentiation, splicing of the ␣2(XI) mRNA rapidly converges to expression of a single protein isoform that lacks all segments encoded by alternatively spliced exons (19). Conversely, alternative splicing of the ␣1(XI) mRNA at two separate sites during the differentiation process results in divergence from a single species to a complex set of protein isoforms. Six possible isoforms are synthesized, which are named according to the peptide encoded by the exon: p6a, p6b, p8, p[6aϩ8], p[6bϩ8] and p0, an isoform that contains constitutively expressed regions only (Fig. 1a). 2 Alternative splicing leads to considerable structural complexity, since the p6a and p8 peptides are acidic while the p6b peptide is extremely basic (18).
Although collagen XI comprises a relatively small amount of the total collagen present in cartilage fibrils, its presence is essential for the regulated assembly, organization, and development of the cartilage. This is evidenced by studies on the autosomal recessive chondrodysplasia (Cho) mouse, where a frameshift in the ␣1(XI) gene leads to a functional knock out of the gene product (20). The cartilage of these mice is atypical in that the collagen fibrils are much thicker than normal (21,22), an observation that is consistent with a regulatory role for type XI collagen in fibril assembly. It has been proposed that the mechanism by which this occurs is similar to that by which collagen V limits the diameter of type I/V fibrils in the cornea (23,24). Here, the NTD of type V collagen at the fibril surface inhibits further accretion of collagen I monomers by steric hindrance or by electrostatic mechanisms (25). In accordance with this model, the NTD of type XI collagen would limit the diameter of heterotypic cartilage fibrils and this may rely in part on the shape and dimensions of the NTD. The goal of this study, therefore, was to examine the structure of the Npp alone or with the alternatively spliced peptides, p8 and p[6aϩ8] to further elucidate the regulatory mechanism. These two isoforms are representative of both the non-cartilage form and up to 30% of the ␣1(XI) in cartilage (26).

Construction of the Mammalian pCEPD47-p8 and pCEPD47p[6aϩ8] Expression
Vectors-DNA constructs that code for the p8 and p[6aϩ8] isoforms were generated by reverse transcription-PCR using RNA from an immortalized rat chondrosarcoma cell line as a template (27). To amplify the sequence, the upstream primer used was 5Ј-ATT AGC TAG CTG CTC CAG TTG ATA TAC TG-3Ј, and the downstream primer was 5Ј-ATT ACT CGA GAC TAG TGA TGG TGA TGG TGA TGG GGT TCA ACT ACA GCA GGT TCC CC-3Ј (Fig. 1). The upstream primer included a unique NheI site, while the downstream primer encoded six histidine codons in tandem, a stop codon, a SpeI site, and a unique XhoI site, at the 3Ј end. These primers were based on published cDNA sequence of the rat ␣1(XI) NTD (18). The sequences were amplified using the Advantage HF PCR kit (CLONTECH) with the following amplification conditions: an initial denaturation at 94°C for 3 min, 30 cycles of denaturation (94°C for 45 s), annealing (65°C for 30 s), elongation (72°C for 1 min), and a final elongation for 10 min at 72°C. The PCR products were analyzed on a 2.5% Nusieve agarose gel (3:1; FMC Bioproducts) and were cloned into the NheI-XhoI-cut pCEP4 mammalian episomal expression vector (Invitrogen), which had been modified to include the signal peptide and 5Ј-untranslated region of the basement membrane protein BM40 (28). Secreted recombinant proteins retain four amino acids from this signal peptide at the N terminus. Positive clones were selected by ampicillin resistance in Escherichia coli and their identity verified by DNA sequencing. The plasmid was harvested and transfected into human embryonic kidney (HEK) 293 cells that express the EBNA-1 protein (Invitrogen) using Pfx-2 lipid (Invitrogen). Cells were grown in DMEM/F-12 containing 10% fetal calf serum and transformed cells were positively selected for by addition of hygromycin (400 g/ml) and G418 (50 g/ml). These cells were expanded for large scale culture in serum-free DMEM/F-12. Expression of recombinant protein was detected for up to 2 months, and medium was collected twice a week. Pooled media was clarified by centrifugation before proceeding with protein purification.
Construction of the Prokaryotic ␣1-Npp Recombinant Vector-The ␣1-Npp sequence was amplified with the upstream primer 5Ј-TAA CTA GCT CCA GTT GAT ATA CTG AAA GCT TTA GAT TTT-3Ј (Fig. 1), which included a unique NheI restriction site. The downstream primer was 5Ј-ATT AGG ATC CAC TAG TGA TGG TGA TGG TGA TGC TGA  GCG GCC TTG GAT GTT AAG TCA CA-3Ј, which encoded six histidine residues, a stop codon, a SpeI site, and a unique BamHI site at the 3Ј end. The ␣1-Npp sequence was amplified using the Advantage HF PCR kit (CLONTECH), cloned into the pET 11a expression vector (Stratagene), and the identity was verified by sequencing. Positive clones were selected and plasmid DNA was used to transform BL21(DE3)pLysS-competent cells, which express T7 polymerase (Invitrogen). These were grown in LB medium, and ␣1-Npp was purified from the inclusion bodies of these cells by solubilizing in 8 M urea, 14 mM ␤-mercaptoethanol. The protein was refolded by rapid dilution in 20 mM sodium phosphate, pH 7. 8, containing 0.25 M NaCl and then dialyzed against chelating column start buffer (see below).
Purification of Recombinant Proteins-Recombinant p8 and p[6aϩ8] proteins were precipitated from serum-free medium by 30% ammonium sulfate and resuspended in start buffer (1 M NaCl, 20 mM sodium phosphate, 2 M urea, pH 7.1) containing protease inhibitors (1 mM N-ethylmaleimide, 1 mM [4-(2-aminoethyl)-benzenesulfonyl fluoride-]hydrochloride and 1 mM benzamidine). Insoluble material was pelleted by centrifugation (18,000 ϫ g for 30 min) and the resultant supernatant filtered through a 0.8-m filter. This was loaded onto a 5-ml HiTrap chelating affinity column (Amersham Pharmacia Biotech) charged with 0.2 M cobalt chloride and equilibrated in start buffer. The absorbance was monitored continuously at 280 nm, and the flow rate was 2 ml/min. Bound proteins were eluted by a step gradient to 0.25 M imidazole. Proteins in each fraction were analyzed by SDS-PAGE (7.5% gel, reducing conditions) and stained with Coomassie Blue. Fractions containing recombinant protein were pooled, concentrated to 1 ml by ultrafiltration (Centriprep, Amicon), and passed over a Superose 12 column (Amersham Pharmacia Biotech) equilibrated in 0.1 M NaCl, 20 mM sodium phosphate, 2 M urea, pH 7.2. The absorbance was monitored continuously at 280 nm, and the flow rate was 0.25 ml/min. Purified recombinant protein was then concentrated on a Centricon filter unit (Amicon). ␣2-Npp was extracted with guanidine HCl from bovine nasal cartilage and purified as described previously (11).
One hundred pmol of ␣1-Npp protein was digested with 2 pmol of sequencing-grade trypsin (Promega, Madison, WI) for 4 h in 50 mM ammonium bicarbonate. 1 pmol of the peptide digest was pressureloaded onto a 75-m capillary HPLC column for on-line analysis by liquid chromatography-tandem mass spectrometry (LC-MS/MS) in an ion-trap mass spectrometer (LCQ, Finnigan, San Jose, CA) as described (29). As peptides eluted from the column, their mass/charge ratios were measured in the mass spectrometer. A second stage of mass spectrometry was then performed automatically, where a selected peptide ion was fragmented to collect sequence information. Disulfide bonds are stable under the conditions used, and no bond breakage between cysteines occurred. A mass/charge ratio 1315.8, [M ϩ 3H] 3ϩ corresponding to a trio of disulfide-bonded peptides was encountered during the run. A tandem mass spectrum was automatically collected, which was used to determine the correct order of disulfide arrangement. This trio was the only peak not present in an identical run where the peptide digest was reduced with 10 mM dithiothreitol for 1 h at 37°C.
Glycosylation Analysis-ScanProsite was used to identify potential N-glycosylation sites, while the NetOGlyc 2.0 prediction server was used to predict potential GalNAc O-glycosylation sites of ␣1(XI) NTD. To assess whether the recombinant p[6aϩ8] protein was glycosylated, a sample was dialyzed against 20 mM sodium phosphate, pH 7.2, and incubated overnight at 37°C with either 0.5 units of N-glycosidase F alone or in combination with 0.25 milliunits of O-glycosidase (Roche Molecular Biochemicals). Digested and undigested control samples were separated on a 7.5% SDS-PAGE gel and stained with Coomassie Blue.
Molecular Sieve Chromatography-For analysis of protein shape, p[6aϩ8] recombinant protein was passed over a Superose 12 molecular sieve column. This was performed in both denaturing (8 M urea, 14 mM ␤-mercaptoethanol, 0.1 M NaCl, 0.1 M Tris-HCl, pH 7.1, 0.1% Triton X-100) and non-denaturing conditions (0.1 M NaCl, 0.1 M Tris-HCl, pH 7.1). The absorbance was monitored continuously at 280 nm, and the flow rate was 0.25 ml/min. SDS-PAGE was used to analyze every second fraction of the denatured buffer run to determine the elution position of p[6aϩ8]. The elution position was compared with a set of known globular proteins (670 to 1.35 kDa (Bio-Rad); 67 to 13.7 kDa (Amersham Pharmacia Biotech)) resuspended in identical buffers and run immediately prior to each of the experimental runs.
Rotary Shadowing-The purified isoforms (10 g/ml) were dialyzed against 0.2 M ammonium bicarbonate, sprayed onto mica, and shadowed with platinum (30). The images were viewed by transmission electron microscopy. Measurements were estimated from photomicrographs using a calibrated eyepiece. Only molecules with defined, extended tails were measured. To make the measurements more accurate, the size of the platinum particles, measured from the background, was subtracted from this value. Statistical analysis was performed and a standard error of the mean calculated.
Circular Dichroism-Purified recombinant isoforms (0.5-1 mg/ml) were dialyzed extensively against 0.1 M sodium phosphate, pH 7.2. Each sample was analyzed in the far-UV range (260 -180 nm) at 24°C by CD spectroscopy on a Jasco J-500 spectropolarimeter interfaced to a computer, using a quartz cell with a path length of 100 m. Spectra were smoothed, and buffer base lines were subtracted. The secondary structure was subsequently estimated using the variable selection method with an initial set of 33 basis spectra (31). This method of analysis is thought to be advantageous in estimating the ␤-sheet content of a protein (32). The concentration of each sample was determined in triplicate by amino acid analysis. Samples were gas-phase hydrolyzed using 300 l of 6 N HCl, 2% phenol at 110°C for 22 h, and the hydrolysates were analyzed on a Beckman 6300 amino acid analyzer using sodium citrate buffers. For determination of the structure of the variable region, the difference between the normalized spectra of the ␣1-Npp and the p[6aϩ8] was obtained using a calculation that accounted for the different number of amino acids present in each domain. The resultant spectrum of the variable domain was then solved.
Secondary Structure Predictions-The secondary structure of the ␣1-Npp and variable region of the ␣1(XI) NTD were predicted from the following programs: (a) information theory with nil decision constants (Gor1; Ref. 33 (40), and (i) nnPredict (41). These prediction programs can all be accessed at the EXPASY web server, where the default options, if any, for e.g. window lengths and similarity thresholds, were chosen. Not all these programs predicted the same secondary structure for each sequence, so the prediction was based on agreement of at least four out of the nine methods implemented.

RESULTS
Purification of Recombinant Protein-The ␣1-Npp domain alone or the complete p[6aϩ8] isoform (Fig. 1b) were expressed as recombinant proteins in prokaryotic and eukaryotic systems, respectively, as sufficient quantities could not be purified from cartilage. Recombinant p[6aϩ8] was initially purified by chelating affinity chromatography utilizing the 6-histidine tag incorporated into the construct (Fig. 2a). SDS-PAGE analysis and Western blotting with monoclonal antibodies directed against the 6-histidine tag indicated that the material in pool B contained the recombinant protein (results not shown). However, Coomassie Blue staining of the same material showed the presence of a high M r contaminating protein (Fig. 2c, lane 2). The fractions containing partially purified recombinant protein were pooled, concentrated and passed over a Superose 12 molecular sieve column (Fig. 2b). SDS-PAGE analysis and Coomassie Blue staining of the peak fractions indicated that the high M r contaminating protein was successfully removed (Fig.  2c, lane 3), leaving substantially pure recombinant p[6aϩ8] protein in the second peak (Fig. 2c, lane 4). These fractions were pooled and the protein concentrated for use in structural analyses. Recombinant ␣1-Npp was purified in an identical manner and the purified product is shown (Fig. 2c, lane 5). The typical yields of purified recombinant protein were 1 mg of p[6aϩ8]/4 liters of DMEM/F-12 medium or 1 mg of ␣1-Npp/liter of LB medium.
Analysis of Folding by Disulfide Mapping-The disulfide bonds formed by the recombinant p[6aϩ8] protein were analyzed by peptide mapping under non-reducing conditions. After digestion with cyanogen bromide and chymotrypsin, Edman sequence analysis of disulfide-linked peptides revealed that Cys-1 was linked to Cys-4 and that Cys-2 was linked to Cys-3 ( Fig. 3a). There remains a possibility that a small amount of protein was not folded correctly but as the protein migrated as a single species on SDS-PAGE and eluted as a single, symmetrical peak on molecular sieve chromatography, it is likely that most, if not all, of the protein was folded correctly.
The proper disulfide bond pairing of the recombinant ␣1-Npp was confirmed by LC-MS/MS. Disulfide analysis by LC-MS/MS of a tryptic digest of ␣1-Npp resulted in the detection of the mass/charge ratio for a trio of predicted disulfide cross-linked peptides with two possible cross-linked conformations (Fig. 3b). A tandem mass spectrum corresponding to this peptide revealed the correct disulfide order because disulfides did not break under the conditions employed and amino acid sequence could be verified across the S-S bonds. Only a single peak disappeared in a comparison between native and reduced tryptic digests, and no other masses corresponding to any other predicted disulfide-linked peptides were identified, suggesting that the conformation detected was the only one present in the recombinant protein. The disulfide bond pattern detected for both the eukaryotic and prokaryotic recombinant proteins was identical to that of the highly homologous ␣2-Npp domain isolated from cartilage (11), indicating that the recombinant proteins are likely to be correctly folded.

Glycosylation Analysis of Recombinant p[6aϩ8]-Scan-
Prosite was used to analyze N-glycosylation sites in ␣1-Npp and the variable region. The sequence -NISE-was identified as a potential site within the p8 peptide. To assess whether the recombinant p[6aϩ8] was glycosylated, a sample was digested with N-glycosidase F, separated by SDS-PAGE, and stained with Coomassie Blue. It was found that the N-glycosidase digested sample migrated slightly further on SDS-PAGE compared with undigested control samples (Fig. 4). This was also true for recombinant p8 protein, and in both cases double digestion with N-and O-glycosidase did not further increase the rate of migration (results not shown). Moreover, during the amino acid sequence analysis performed to map the disulfide bonds, the asparagine residue of the amino acid sequence -NISE-was absent, consistent with an N-glycosylation at this site within the p8 peptide.
Molecular Sieve Analysis of Eukaryotic Recombinant p[6aϩ8]-Following the molecular sieve purification of the p[6aϩ8] isoform, standard globular proteins were passed over the column in identical buffer conditions. It was observed that, in non-denaturing buffer, the p[6aϩ8] eluted at a much greater apparent molecular mass (ϳ280 kDa) than its mass calculated from its amino acid sequence (Fig. 5a). The elution of p[6aϩ8] as a single, sharp, symmetric and included peak in non-denaturing conditions was an indication that the greater apparent molecular mass was not due to multimer formation, but rather an asymmetric conformation. When chromatographed under denaturing conditions (8 M urea, 14 mM ␤-mercaptoethanol, 0.1% Triton X-100; Fig. 5b), the p[6aϩ8] eluted at a position corresponding to approximately 66 kDa, the same apparent molecular mass determined by SDS-PAGE. Moreover, the ␣1-Npp domain eluted at a position corresponding to its molecular mass in non-denaturing conditions (results not shown), suggesting that asymmetry resides in the variable region.
Rotary Shadowing-The indication of an asymmetric conformation of p[6aϩ8] was investigated further by use of rotary shadowing and electron microscopy. Rotary-shadowed images of both the recombinant ␣1-Npp and the ␣2-Npp domain showed a homogeneous population of small round particles, indicating formation of a compact, globular domain (Fig. 6, a  and b). This observation is similar to the rotary shadowing images of the NC11 domain, a homologous domain found in type XVI collagen (42). Conversely, the p[6aϩ8] isoform, which contains variable region in addition to the Npp domain, displayed two distinct folding domains: a globular "head" and an extended "tail" (Fig. 6c). Measurement of these images revealed that the ␣1-Npp and ␣2-Npp domains were similar in size, 8.4 (Ϯ 0.17) nm and 10 (Ϯ 0.16) nm in diameter, respectively (Fig.  6, a and b). These measurements are only approximate since the size of the Npp domain approaches the limits of resolution of this technique. The extended tail of the p[6aϩ8] isoform was approximately 16 (Ϯ 0.4) nm long (Fig. 6c). Rotary shadowing of p8 and p6b recombinant proteins also displayed a globular head and an extended tail (results not shown). It is possible that the rotary shadowing technique may exaggerate the appearance of extension and rigidity. However, molecular sieve chromatography indicated a highly asymmetric conformation in solution so that the variable region, although probably somewhat flexible, is quite extended on average.
Mass/Length Determination-The mass/length ratio of the variable region was determined from rotary-shadowed images and was found to be in the range of 1050 -1530 Da/nm. When compared with other extended protein structures, the mass/ length ratio of the variable region is similar to a two-stranded coiled-coil (1420 Da/nm; Ref. 43) and a collagen triple helix (1000 Da/nm), two of the most extended polypeptide structures. It should be noted, however, that in each of these comparisons more than one peptide chain contributes to the overall structure. Our results suggest that an equally extended structure can be achieved by a single polypeptide without benefit of significant amounts of periodic secondary structure such as ␣-helix and ␤-sheet (see below). We cannot exclude the possibility that, in vivo, additional structure results from an interaction between this region and the variable domain of the ␣2(XI) chain, although this seems unlikely as the ␣2(XI) protein isoform predominantly expressed in cartilage lacks the alternatively spliced variable region (19).
Circular Dichroism-Experimental evidence for the secondary structure of the recombinant ␣1-Npp and p[6aϩ8] proteins was obtained by CD spectroscopy (Fig. 7). The ␣1-Npp spectra showed negative ellipticity at 216 nm and a peak of positive ellipticity at 198 nm, which together are indicative of ␤-sheet structure ( Fig. 7a; Refs. 44 and 45). The lack of negative minima at 208 and 222 nm and positive ellipticity at 192 nm suggested that this protein contained relatively little ␣-helix. The ␣2-Npp spectrum was slightly different in that the curve was broader between 216 and 208 nm, and the positive ellipticity peak had shifted from 198 to 193 nm (Fig. 7b). Both these observations suggested that this protein contained a relatively higher amount of ␣-helix than the ␣1-Npp domain. Analysis of these spectra using the variable selection method (31) indicated that the ␣1-Npp domain contained 10% ␣-helix, 33% ␤-sheet, 23% ␤-turn, and 34% "other" structure while the ␣2-Npp domain contained 20% ␣-helix, 29% ␤-sheet, 21% ␤-turn, and 30% "other" structure (Table I). These structural calculations, including the higher proportion of ␣-helix in ␣2-Npp, were consistent with secondary structure predictions from the amino acid sequence (Table I).
The p[6aϩ8] spectrum is similar to that of the Npp domains, in that there is a lack of minima at the wavelengths indicative of ␣-helical structure (Fig. 7c). However, the spectra differ in that the positive ellipticity peak at 198 nm is absent. Analysis of the p[6aϩ8] spectrum estimated that it contained 9% ␣-helix, 22% ␤-sheet, 26% ␤-turn, and 44% "other" structure, again consistent with secondary structure predictions (Table I). This recombinant isoform contained proportionately less ␤-sheet than the ␣1-Npp domain alone, suggesting that the variable region is devoid of ␤-sheet structure (see below). The glycan chains did not significantly affect the secondary structure of the p[6aϩ8] isoform since their removal had no effect on CD spectra determinations (results not shown).
The secondary structure of the p[6aϩ8] variable region was deduced from the CD spectra of ␣1-Npp and p[6aϩ8]. A difference spectrum was generated from these two spectra ( Fig. 7d; Table I), and the resulting analysis indicated that the p[6aϩ8] variable region was nearly devoid of periodic secondary structure and contained 75% ␤-turn and "other" structure. The results of this analysis were in good agreement with secondary structure predictions of the variable region, although somewhat more ␤-structure was detected than predicted (Table I). A similar difference analysis of the p[6aϩ8] and p8 spectra indicated that the p6a component contained mainly ␤-turn (results not shown).

FIG. 3.
Disulfide mapping of p[6a؉8] and ␣1-Npp. a, results of Edman sequence analysis of p[6aϩ8] revealed that Cys-1 is linked to Cys-4 and that Cys-2 is linked to Cys-3. A dash (-) denotes a blank in the sequence indicating the first half of a cysteine pair, and an asterisk (*) denotes a cysteine residue. A schematic representation of the primary structure of the ␣1-Npp domain shows the enzymatic digestion sites and the disulfide bond arrangement. Solid lines represent disulfide bonds, rectangles depict CnBr cleavage sites, and diamonds represent chymotrypsin cleavage sites. b, tandem mass spectrum of a trio of disulfidebonded peptides with a precursor mass/ charge ratio 1315.8 [Mϩ 3 H] 3ϩ . The two potential disulfide bonding scenarios are shown. This large ion was fragmented on average at a single peptide bond along the chain. The two resulting fragment (b and y) ions were recorded in the tandem mass spectrum and were later reconciled with the sequence. Because disulfide bonds are not broken here, the peptide sequence can be followed across the S-S bond, and the correct bond orientation determined. Arrows represent peptide bond cleavage between the two cysteines of the middle peptide producing b and/or y ions, which unambiguously determined disulfide bond assignment.

FIG. 4. Glycosylation analysis of p[6a؉8]. Recombinant p[6aϩ8]
protein was separated by SDS-PAGE (7.5% gel, reducing conditions), with (ϩ) or without (Ϫ) prior treatment with N-glycosidase F. Coomassie Blue staining showed that the treated sample migrated faster indicating the removal of N-linked glycan chains. Molecular size markers are shown (kDa).

DISCUSSION
The model for the regulation of fibrillogenesis by type XI collagen is related to that proposed for type V collagen, where the NTDs play the critical role in control of fibril diameter. For example, in the heterotypic collagen fibrils of the cornea, the triple helical domains of type V collagen are sequestered in the interior of the fibril, whereas the retained NTDs of the ␣1(V) chains are excluded from the interior of the fibril, accumulate on the surface, and thus eventually sterically hinder the further deposition of type I collagen molecules onto the fibril (23)(24)(25). A similar mechanism has been proposed for type XI collagen and is supported by in vitro fibrillogenesis assays with cartilage collagens that demonstrate uniformly thin collagen type II fibrils will assemble only when collagen XI is present (46). In the following discussion, emphasis has been placed on the effects of ␣1-NTD on fibrillogenesis because it is relatively long-lived in the tissue, compared with the ␣2-Npp, which is rapidly processed after synthesis (47).
The major triple helical domain of type XI collagen is also sequestered within the collagen fibril of fetal cartilage (1, 3). However, it is clear from the molecular dimensions of the ␣1-Npp, the variable region, and the minor triple helix that they would not be accommodated within the confines of the gap region of a fibril (1.5/2.5 ϫ 32 nm; Fig. 8a; Ref. 46). Indeed, the ␣1-Npp has been localized on the surface of collagen fibrils in fetal cartilage (48), and the same is true for the variable region peptides. 3 It is not known, however, where the minor helix resides, whether it remains in the interior of the fibril or functions to extend the NTD to the fibril surface, so both possibilities are shown (Fig. 8b). In terms of length, the minor helix alone (ϳ20 nm) would be sufficient to exclude the ␣1-Npp from the interior of the 20-nm diameter fibrils, although the variable region (16 nm in the most extended form) would also suffice. The NTD may turn back on itself, as has been shown with the N-telopeptides of type I collagen (49), or it may adopt other orientations with respect to the fibril (Fig. 8c). All possibilities still allow the N-terminal region, especially the globular ␣1-Npp domain, to limit the appositional growth of the fibril by blocking the further accretion of collagen monomers.
However, a mechanism of regulation of fibrillogenesis based solely on consideration of steric effect of the Npp may be simplistic. For example, despite their homology and similar molecular dimensions (including the Npp), 10% content of type XI collagen is sufficient to produce the 15-20-nm fibrils of cartilage (1), whereas 20 -30% content of type V collagen is required to obtain the 25-nm fibrils of the corneal stroma (24). In addition, if fibril diameter is strictly governed by the steric effect of N-propeptide domains, then one would expect that collagen fibrils composed of pN-collagen to be thinner than normal. However, in patients with Ehlers-Danlos syndrome type VIIB and in animals with dermatosparaxis, diseases where fibrils contain predominantly pN-collagen I molecules, normal diameter fibrils are found. Instead, their surfaces are highly convoluted as surface area is maximized to accommodate these extra propeptide domains (50,51). Thus it is not likely that a steric effect alone is sufficient to explain the effects of retained Npropeptides on fibrillogenesis in these tissues. Other factors clearly influence fibrillogenesis in all of these systems, factors such as rate and order of propeptide processing of the major collagens, types I and II (52), and the participation of fibrilassociated proteins such as decorin, fibromodulin, and TRAMP (53)(54)(55).
The CD calculations presented here are consistent with previous secondary structure predictions, which indicate that the ␤-sheet of the ␣1-Npp domain could be distributed into nine As the detergent interferes with absorbance at 280 nm, the elution position of p[6aϩ8] was determined by running every second fraction on SDS-PAGE. The molecular sizes of globular standard proteins in both buffers were plotted as a function of their elution volume, and the apparent molecular mass of p[6aϩ8] was calculated from the standard curves. In non-denatured buffer (a), p[6aϩ8] was found to elute at ϳ280 kDa, whereas, under denaturing conditions (b), it eluted at ϳ66 kDa, which is consistent with the position it migrates on SDS-PAGE. The anomalous migration of p[6aϩ8] under non-denatured conditions is indicative of an asymmetric conformation, rather than aggregation or multimerization. It should be noted that, although it looks like the protein elutes at the same position, this is a chance occurrence as urea changes the sieving capacity of the column.
␤-strands (15,45). This region is structurally related to a domain of the glycoprotein thrombospondin that is termed the Tsp1 module (16). This module, which is compatible with an Ig-fold structure, has also been identified in other collagen chains, namely ␣1(V), ␣1(IX), ␣2(IX), ␣2(XI), ␣1(XII), ␣1(XIV), ␣1(XVI), and ␣1(XIX) (15,16,42). A superfamily of proteins , and the p[6aϩ8] isoform (c). All spectra were smoothed, and buffer base lines were subtracted. d, a difference spectrum was generated from the ␣1-Npp and p[6aϩ8] spectra to determine the structure of the variable region alone. that contain Ig-like folds has thus emerged, members of which are related due to their structural similarity rather than their sequence identity. Members of this family have been shown to contain 7-10 ␤-strands and are classified into five different subtypes (56). Although the ␤-structure of the Npp domains could be arranged into nine ␤-strands, it remains unclear as to which subset of Ig-fold that they belong and this will only be resolved by x-ray crystallography. Nevertheless, the results presented have broad implications for the structure and function of this motif in other extracellular matrices.
The function of the variable region is unknown, but the data presented here suggest several possibilities. The location of the variable region on the fibril surface suggests that, in addition to facilitating the exclusion of the globular ␣1-Npp domain from the fibril core, it may also participate in interactions with other matrix components, thus greatly magnifying the steric effect or mitigating fibril fusion. Such an interaction could contribute to the biomechanical integrity of cartilage and is consistent with the loss of cohesion observed in the cartilage of the Cho mouse (21,22). The variable region peptides are either acidic (p6a and p8) or basic (p6b; Ref. 18), which suggests a potential for differential interactions, and it should be noted that the p6b peptide contains the minimal amino acid sequence for heparin binding (b-X 7 -b; Ref. 57). Alternatively, the identity of the variable region could influence the rate or extent of proteolytic removal of the ␣1-Npp. The proteolytic processing site is positioned seven amino acids distal to the variable domain at the sequence KAAQA2QEPHIDE (13,14). Recent studies have implied that BMP-1 is responsible for processing as this enzyme cleaves the Npp from ␣1(V) at a homologous sequence (58). This enzyme is also responsible for cleavage of the C-propeptide from type I collagen, and molecular modeling shows that this region has a highly extended structure (59). This implies that BMP-1 requires an extended substrate structure for activity. Preliminary results suggest that the ␣1-Npp is cleaved more rapidly from an isoform containing an extended domain (p6b) than from p0, which contains no variable region. 4 Interestingly, this does not seem to affect fibril diameter, as there was no difference between fibrils that contained the p6b isoform and those that did not. 3 This suggests that the differential rate of processing may serve a different function. In addition, the extended conformation of the p[6aϩ8] isoform suggests that the alternatively spliced peptides of the variable region are organized linearly. This arrangement would readily accommodate the complex modulation of its primary structure by alternative splicing and would also facilitate potential alterations in the function of each individual isoform.
In conclusion, the structural organization of the NTD presented here, the differential rate and extent of processing of the ␣1-Npp and ␣2-Npp (47), and the developmentally regulated distribution of isoforms based on alternative splicing within the variable region 3 together suggest that the NTD may contribute to functions of type XI collagen more diverse than regulation of fibrillogenesis. The numerical values indicate the percentage amount of each type of structure. Each value is determined from either CD spectra or secondary structure predictions. Both methods agree that ␤-sheet is the most abundant regular structure of the ␣1-Npp and ␣2-Npp. The p[6aϩ8] contained proportionately less ␤-sheet than the ␣1-Npp alone, suggesting that the variable region lacked ␤-sheet structure. The bottom line of the table depicts the structure of the variable region deduced from the difference of the ␣1-Npp and p[6aϩ8] spectra. It was found that the variable region alone contained little periodic secondary structure. FIG. 8. Schematic model of the structure of the NTD of type XI collagen. a, measurements taken from the rotary-shadowed images show that the ␣1-Npp domain is 8 nm in length and the variable region is 16 nm, while the minor helix was calculated to be ϳ20 nm, assuming a mass/length ratio of 1000 Da/nm. The ␣2(XI) PARP domain is rapidly cleaved during fibrillogenesis, while the ␣3 chain does not contain a large globular domain at the N terminus. b, the dimensions of the variable region indicate that the ␣1(XI) NTD is not fully accommodated within the gap region of the heterotypic fibril and thus the ␣1-Npp domain and part of the variable region must be forced out of the interior to the surface of the fibril. It is not known whether the minor helix resides on the interior of the fibril or is flexible and functions to extend the NTD to the fibril surface. c, three-dimensional view of a heterotypic collagen fibril showing the interior location of the triple helical portion of collagen XI and the surface location of the NTD. The conformation that the NTD adopts at the fibril surface is unclear. The NTD may run parallel with the fibril, or it may turn back on itself, or the variable region may lie perpendicular to the fibril. In all conformations, the NTD would sterically hinder the accretion of collagen II molecules to the surface and therefore function to limit the diameter of the growing fibril.