Three Novel Collagen VI Chains, α4(VI), α5(VI), and α6(VI)*

We report the identification of three new collagen VI genes at a single locus on human chromosome 3q22.1. The three new genes are COL6A4, COL6A5, and COL6A6 that encode the α4(VI), α5(VI), and α6(VI) chains. In humans, the COL6A4 gene has been disrupted by a chromosome break. Each of the three new collagen chains contains a 336-amino acid triple helix flanked by seven N-terminal von Willebrand factor A-like domains and two (α4 and α6 chains) or three (α5 chain) C-terminal von Willebrand factor A-like domains. In humans, mRNA expression of COL6A5 is restricted to a few tissues, including lung, testis, and colon. In contrast, the COL6A6 gene is expressed in a wide range of fetal and adult tissues, including lung, kidney, liver, spleen, thymus, heart, and skeletal muscle. Antibodies to the α6(VI) chain stained the extracellular matrix of human skeletal and cardiac muscle, lung, and the territorial matrix of articular cartilage. In cell transfection and immunoprecipitation experiments, mouse α4(VI)N6-C2 chain co-assembled with endogenous α1(VI) and α2(VI) chains to form trimeric collagen VI molecules that were secreted from the cell. In contrast, α5(VI)N5-C1 and α6(VI)N6-C2 chains did not assemble with α1(VI) and α2(VI) chains and accumulated intracellularly. We conclude that the α4(VI)N6-C2 chain contains all the elements necessary for trimerization with α1(VI) and α2(VI). In summary, the discovery of three additional collagen VI chains doubles the collagen VI family and adds a layer of complexity to collagen VI assembly and function in the extracellular matrix.

Collagen VI is an extracellular component that is present in virtually all connective tissues, where it forms abundant and structurally unique microfibrils in close association with basement membranes. Collagen VI interacts with a range of ECM 2 components. However, its precise role is not clearly understood. Several recent studies have suggested that collagen VI functions to anchor the basement membrane to the pericellular matrix in muscle (1)(2)(3). Other data suggest a role for collagen VI in cell signaling and cell migration (4,5).
Three genetically distinct collagen VI chains, ␣1(VI), ␣2(VI), and ␣3(VI), encoded by the COL6A1, COL6A2, and COL6A3 genes were first described more than 20 years ago (6 -8). The COL6A1 and COL6A2 genes are located in tandem on chromosome 21q22.3. The ␣1(VI) and ␣2(VI) chains are similar in size and domain structure. They contain a short 335-or 336-amino acid triple helix with a glycine triplet repeat motif that is characteristic of all collagens. Flanking the triple helix are domains homologous to the A-type domains found in von Willebrand factor (VWA domains). ␣1(VI) and ␣2(VI) contain one VWA domain N-terminal to the triple helix (N1) and two VWA domains on the C-terminal flank of the helix (C1 and C2). In contrast, the ␣3(VI) chain, encoded by the COL6A3 gene on 2q37.3, is much larger with 10 N-terminal (N1-N10), two C-terminal VWA domains (C1 and C2), and several other identifiable types of domains at the C terminus (C3-C5).
Although the assembly of collagen VI is a complex process, some of the major details have been described (6 -16). The first step is monomer formation where the three chains trimerize in a 1:1:1 ratio. The exact mechanism of how this occurs is unknown, although there is evidence that the C1 domains immediately C-terminal to the triple helix are involved in initial chain selection (15,16). These heterotrimeric monomers then associate in an anti-parallel fashion to form dimers, followed by the lateral association of dimers into tetramers. Dimers and tetramers are stabilized by disulfide bonds formed between cysteine residues present in the triple helix of all three chains (17)(18)(19). The tetramers are then secreted from the cell where they associate in an end-to-end manner to form collagen VI microfibrils. These microfibrils have a characteristic beaded appearance of globular domains separated by a rod-like triple helix with 105 nm periodicity.
Mutations in any of the three collagen VI chains lead to two types of congenital myopathies, the relatively mild Bethlem myopathy (BM) and the more severe Ullrich congenital muscular dystrophy (UCMD) (reviewed in Ref. 20). It is increasingly recognized that UCMD and BM are part of the same clinical spectrum, and the effect of a particular mutation on the production of collagen VI determines the severity. Interestingly, in a significant proportion of individuals affected with these myopathies, mutations in COL6A1, COL6A2, or COL6A3 cannot be found (21,22) suggesting that other molecules are involved in the pathology of these diseases.
Here we report the identification and characterization of three additional collagen VI genes, COL6A4, COL6A5, and COL6A6. These genes are clustered at a single genomic locus in mammals and are located in humans on the q arm of chromo-some 3. The new chains encoded by these genes, ␣4(VI), ␣5(VI), and ␣6(VI), are most similar to the ␣3(VI) chain. However, evidence is presented that only the ␣4(VI) encompassing the N6 to C2 domains, but not the ␣5(VI)N5 to C1 and ␣6(VI)N7 to C2 chains, are competent to co-assemble with ␣1(VI) and ␣2(VI) chains. This finding suggests either that alternative collagen VI assemblies are possible or the regions critical for assembly lie outside N5 to C1 for the ␣5(VI) chain or N7 to C2 for the ␣6(VI) chain.

EXPERIMENTAL PROCEDURES
Cloning of Col6a4, COL6A5, and COL6A6 cDNA Sequences-Partial and full-length human COL6A5 and COL6A6 and mouse Col6a4 cDNAs were amplified by PCR from cDNA templates and cloned into the pCCR1 vector using the CopyControl PCR cloning kit (Epicenter Biotechnologies). Primers sequences were based on genomic and expressed sequencetagged sequences in the public data bases. To ensure high fidelity of amplification and to minimize PCR error, the Expand Long Range PCR system (Roche Applied Science) was used for all PCRs. The cDNAs used as templates for the PCR were mouse uterus cDNA (Zyagen Laboratories) for Col6a4, human testis, small intestine, and colon cDNA (Clontech) for COL6A5, and human adult skeletal muscle for COL6A6 (BioChain Institute). The mouse Col6a4 coding region sequence was assembled from overlapping EST sequences obtained from Open Biosystems (clone IDs 3994905, 4501669, 4986515, 355451, and 5504348) and partial PCR sequences amplified from uterus cDNA. The human COL6A5 and COL6A6 coding sequences were deduced from overlapping partially amplified cDNA sequences. cDNA clones spanning N6 to C2 for Col6a4, N5 to C1 for COL6A5, and N7 to C2 for COL6A6 were cloned as single PCR products into pCCR1 using primers described in supplemental Table 1.
For transfection studies, cDNAs for ␣4(VI)N6-C2, ␣5(VI)N5-C1, and ␣6(VI)N7-C2 in pCCR1 were amplified using the primers modified to include NotI restriction sites (see supplemental Table 1). Following digestion with NotI, the PCR products were subcloned into the pCEP4 expression vector modified to include the BM-40 signal sequence (23) for efficient secretion. The modified pCEP4 vector also includes a His 6 tag immediately downstream from the BM-40 signal sequence for affinity purification of recombinant proteins.
mRNA Expression Analysis-Panels of mouse (Zyagen Laboratories) and human adult and fetal (Clontech) cDNAs were subjected to PCR using the Expand High Fidelity PCR system (Roche Applied Science) according to the manufacturer's instructions. Primers were designed to span at least one intronexon boundary to prevent amplification from contaminating genomic DNA. The mouse Col6a3, Col6a4, Col6a5, and Col6a6 cDNAs were amplified using the mcol6a3f1 and mcol6a3r1, mcol6a4f15 and mcol6a4r5, mcol6a5f1 and mcol6a5r1, and mcol6a6f1 and mcol6a6r1 primers, respectively. For amplification of human COL6A3, COL6A5, and COL6A6 cDNAs, the COL6A3f1 and COL6A3r1, COL6A5f10, and COL6A5r4, and COL6A6f2 and COL6A6r6 primer pairs were used, respectively (supplemental Table 1). The glyceraldehyde-3-phosphate  The three genes are also present in the platypus and lizard genomes, although these genomic segments have yet to be assigned chromosomes. B, same locus in the human and chimpanzee genomes has been disrupted by a chromosome break in the COL6A4 gene. In these species, genomic DNA on the telomeric side of the break is retained on 3q, contiguous with COL6A5 and COL6A6. Note that the genes now on the centromeric side of the break, TMCC1 and TRH, are not present in the original locus and are shaded in lighter gray. Numbers above the sequence refer to the start and end of the coding regions in the human genome. Approximate position of a stop codon in the predicted COL6A4 amino acid sequence is indicated by an asterisk. C, block of genomic DNA on the centromeric side of the break has been translocated to the p arm of chromosome 3 in human and chimpanzee. Note that the two genes adjacent to the chromosome break in COL6A4, NR2C2 and ZFYVE20, are not present at the original locus and are in lighter gray. Approximate positions of two stop codons in the predicted COL6A4 amino acid sequence are indicated by asterisks.
dehydrogenase forward and reverse primers were used in control amplifications.
Preparation of Recombinant ␣5(VI) and ␣6(VI) and Generation of Antibodies-Polyclonal antiserum was raised in rabbits against the C1 domain of the ␣6(VI) chain. The C1 antigen was expressed using the His-Patch ThioFusion bacterial expression system (Invitrogen). The exon encoding the C1 domain of COL6A6 was amplified using the COL6A6fnotI and COL6A6rsalI primers, which were modified to include 5Ј NotI and 3Ј SalI restriction sites. Following digestion with NotI and SalI, the PCR products were cloned into the pTHIO-his(A) bacterial expression vector in-frame with the bacterial thioredoxin gene. Following transformation and selection of transformants, individual colonies were expanded and screened for the presence of recombinant protein. The recombinant ␣6(VI)-C1 domain was found to be in the insoluble bacterial fraction. Therefore, to prepare the protein for antibody production, the insoluble bacterial lysate was resolved on SDS-polyacrylamide gels, and the glutathionine-C1 domain fusion protein was excised from the gel. The recombinant protein was electroeluted from the gel slice. This purified eluted antigen was then used to raise antisera in rabbits (Rockland Immunochemicals, Inc.). The resulting antisera were further affinity-purified against the recombinant antigen immobilized on a nitrocellulose membrane as described (24,25). Serum from the same rabbit, taken prior to immunization with the ␣6(VI) antigen, was used as a negative control.
Immunoblot Analysis-A fetal human skeletal muscle extract enriched for structural proteins was obtained commercially (Biochain, catalog number P5244171). A sample was digested in 10 units of bacterial collagenase for 1 h at 37°C in digestion buffer (150 mM NaCl, 50 mM Tris, pH 8.0, 5 mM CaCl 2 ). 2 g of the undigested and collagenasedigested extract were electrophoresed on a 5% (v/v) SDS-polyacrylamide gel and transferred to an Immobilon-P polyvinylidene difluoride membrane (Millipore). The membrane was blocked in 5% milk powder in Tris-buffered saline (TBS, 150 mM NaCl, 50 mM Tris-HCl, pH 7.4) and then incubated with ␣6(VI) chain antisera (1:5000 dilution) overnight in antibody buffer (0.5% milk powder in TBS with 0.1% Tween 20). Following three washes in antibody buffer, anti-rabbit IgG-horseradish peroxidase secondary antibody (Dako) was added at a dilution of 1:10,000 and incubated for 1 h in antibody buffer. Following washing, the signal was developed using the Western Lightning chemiluminescence system (PerkinElmer Life Sciences) and autoradiography using X-Omat film.
Immunohistochemistry-Affinity-purified ␣6(VI)C1 antiserum was used to stain frozen sections of human articular cartilage obtained from surplus allograft material and a normal adult human tissue panel (Biochain Institute). Sections were blocked with 1% bovine serum albumin in PBS and incubated with dilutions of ␣6(VI)-C1 antiserum in PBS for 1 h. To facilitate antibody penetration, the cartilage sections were treated with 2 mg/ml hyaluronidase (bovine, type IV; Sigma) and then treated with 0.3% (v/v) H 2 O 2 prior to incubation with primary antibodies. After blocking with 1% bovine serum albumin in PBS, the sections were immunolabeled with the ␣6(VI)-C1 or preimmune antiserum in PBS for 1 h. Following washing, the sections were incubated with either goat anti-rabbit or antimouse secondary antibodies. Following a further round of washing, the color was developed using the Vectastain Elite ABC kit (Vector Laboratories). Bound antibodies were visualized using the 3,3Ј-diaminobenzidine peroxidase substrate kit (Vector Laboratories), and sections were counterstained with hematoxylin and mounted in aqueous mounting media. As a comparison, a monoclonal antibody directed against the ␣3(VI) chain (3C4, Santa Cruz Biotechnology, catalog number sc-47712) was also used to stain tissues. An additional polyclonal antiserum raised against human collagen VI (Fitzgerald Industries, catalog number 70-XR95) was used to stain human cartilage.
For immuno-gold electron microscopy, native suprastructural fragments were isolated from human articular cartilage and placed on grids. Samples were doubly immunostained with an antiserum against ␣6(VI)C1 domain and a monoclonal anti-body against collagen VI (AF6210, Acris Antibodies), and colloidal gold-labeled secondary antibodies of different sizes were analyzed with transmission electron microscopy as described (26).
Saos-2 Cell Transfections and Immunoprecipitations-cDNAs for the his-Col6a4N6-C2, his-COL6A5N5-C1, and his-COL6A6N7-C2 constructs in the pCEP4 vector were transfected into Saos-2 cells using FuGENE 6 transfection reagent (Roche Applied Science) and grown as described (24,27). Transfected cells were grown to confluence in 6-well multiwell plates and labeled for 18 h with 100 Ci of L-[ 35 S]methionine (GE Healthcare) in Cys/Met-free Dulbecco's modified Eagle's medium. The cell and media fractions were collected in the presence of protease inhibitors, and the His 6 -tagged ␣6(VI) chains were immunoprecipitated from the cell and media fractions with an anti-His antibody (0.5 g/ml; Roche Applied Science) or collagen VI polyclonal antisera (Fitzgerald Industries, catalog number 70-XR95) and protein A-Sepharose overnight. Following washing, the immunoprecipitated material was  (29). The amino acid sequence is derived from the COL6A5 DNA sequence (supplemental Fig. S1). B, domain structure of the ␣5(VI) chain. VWA domains are boxed and numbered N1-N7 and C1-C3. The triple helical domain is indicated by a line between N1 and C1. The signal peptide at the N terminus is a black box, and unique domains between C2 and C3 and at the C terminus are represented by empty boxes.

Three Novel Collagen VI Genes
denatured in the presence of 20 mM dithiothreitol and fractionated on a 10% (v/v) SDS-polyacrylamide gel. For some experiments dithiothreitol was omitted. Following electrophoresis, the gels were fixed, subjected to fluorography, dried, and exposed to x-ray film.

Identification of Three Novel
Collagen VI Genes-Data base searching using a VWA sequence from the ␣3 chain of collagen VI as a probe identified an uncharacterized sequence on human chromosome 3q22.1 that encoded two VWA domains. Closer inspection of the surrounding sequence revealed that the gene was more extensive and included a collagen triple helix-like sequence flanked by multiple, additional VWA domains. Further analysis of the corresponding region in the mouse, rat, dog, chimpanzee, and rhesus monkey genomes indicated that three distinct collagen-like sequences were clustered at this locus (Fig. 1A). The cDNAs for these genes were cloned by RT-PCR using primers based on exonic genomic DNA and sequenced. The amino acid sequences revealed that each protein contained seven N-terminal VWA domains, a 336-amino acid triple helix, and two or three C-terminal VWA domains. The size of the triple helix corresponds to the size of the helices in the collagen VI chains, which are 335 or 336 amino acids in size (Fig. 2).
These similarities in domain organization and triple helix length strongly suggest that these three collagen chains belong to the collagen VI subfamily.
New Collagen VI Gene Nomenclature-The three existing collagen VI genes/proteins are named COL6A1/␣1(VI), COL6A2/␣2(VI), and COL6A3/␣3(VI). COL6A1 and COL6A2 are arranged in tandem on chromosome 21 and oriented in the same direction with COL6A1 being the most 5Ј gene of the pair. Because all three genes at the 3q22.1 locus are in the same orientation, we propose that the first gene in the tandem array be designated COL6A4, followed by COL6A5, and finally COL6A6. These genes encode the ␣4(VI), ␣5(VI), and ␣6(VI) chains, respectively. Recently, several partial sequences for these genes have appeared in the data bases, and the naming of these is consistent with the nomenclature described here.
Genomic Organization of the Human Collagen VI Locus-Identifiable orthologues of COL6A4, COL6A5, and COL6A6 are present in all mammalian species for which genome data are currently available, including macaques (rhesus monkey), canines (dog), rodents (rats and mice), and monotremes (platypus) (Fig. 1A). Interestingly, the wider chromosomal locus, including the flanking genes, is also present in a reptile species (green anole lizard) indicating that the locus is evolutionarily conserved. Therefore, this locus predates the split of reptiles from the main vertebrate lineage which occurred more than 300 million years ago (28). The human and chimpanzee genomes contain orthologues of COL6A5 and COL6A6. However, when compared with the other genomes, these two species contain only a partial COL6A4 gene at the 3q22.1 locus representing the 3Ј half of the gene (131.414 -131.877 Mb) (Fig.  1B). Bioinformatic analysis using the mouse Col6a4 cDNA and the University of California, Santa Cruz, Genome Browser reveal that the 5Ј-half of the COL6A4 is located on 3p24.3 (15.162-15.195 Mb) (Fig. 1C). The chromosome break occurred in the region that encodes the triple helix of COL6A4. Furthermore, both halves of the gene have accumulated at least one stop codon in exons that are predicted to be protein-encoding. Because humans produce no ␣4(VI) chain, the two halves of the genes are designated COL6A4P1 and COL6A4P2 to indicate their status as pseudogenes. The human COL6A5 and COL6A6 and mouse Col6a4 cDNA sequences are presented as supplemental Figs. S1, S2, and S3, respectively. Analysis of the Human COL6A5/␣5(VI) and COL6A6/␣6(VI) Sequences-An alignment of the triple helical domains of all six collagen VI ␣-chains reveal several interesting features (Fig. 2).
First, the ␣3, ␣4, ␣5, and ␣6 chains have a Cys residue at amino acid 50 of the helix in contrast to the ␣1 and ␣2 chains that have a Cys residue at a different relative position, amino acid 89. Second, the ␣4, ␣5, and ␣6 chains have an interruption in the triple helix between amino acids 125 and 128, which is in the same position as in the ␣3 chain. The triple helix domains from the ␣1 and ␣2 chains do not have this interruption. These sequence features indicate that the three new chains are most similar to the ␣3(VI) chain. This is supported by a phylogenetic analysis of the triple helical domains based on the alignment shown in Fig. 2 (supplemental Fig. S4). The analysis clusters the ␣3, ␣4, ␣5, and ␣6 chains in a separate group from the ␣1 and ␣2 chains.
The ␣5(VI) chain has several imperfections in the Gly-X-Y repeat motif, where the Gly residue is substituted by another amino acid at amino acids 13, 153, and 306 in the ␣5(VI) triple helix (Fig. 2). Similarly, the ␣6(VI) helical domain has several imperfections in the Gly-X-Y repeat motif at amino acids 139, 156, 159, 306, and 315 in the ␣6(VI) triple helix. Neither the ␣3(VI) nor ␣4(VI) chains have Gly-X-Y imperfections. Each of the new chains contain a single RGD cell-binding signal. Overall, the human COL6A5 gene is composed of 39 exons with a coding region 7836 bp in size (supplemental Fig. S1). The gene encodes a protein of 2611 amino acids (Fig. 3) with a predicted molecular mass of 287 kDa. The N terminus is predicted to contain a signal peptide cleavage site between amino acids 18 and 19 (TLA-DQ) (SignalP 3.0) (29). The signal peptide is followed by seven VWA domains, a 336-amino acid collagen triple helix, and then three more VWA domains. On either side of the C3 domain are regions that are 134 and 129 amino acids in size that do not show homology with any other protein.
The COL6A6 gene has 36 exons and a coding region of 6789 bp (supplemental Fig. S2), which codes for a 2262-amino acid protein (Fig. 4). The signal peptide is predicted to be cleaved between amino acids 18 and 19 (VNQ-DS) to give a molecular mass for the secreted protein of 245 kDa. Following the signal peptide are seven VWA domains and a 336-amino acid triple helix and then two additional VWA domains. As with the ␣5(VI) protein, there is a region following C2 of 91 amino acids that shows no homology with other proteins.
Because the human COL6A4 gene is disrupted, we cloned and sequenced the mouse Col6a4 cDNA sequence (supplemental Fig. S3). The mouse Col6a4 gene has an open reading frame of 6930 bp encoding a 2309-amino acid protein with a predicted molecular mass of 248 kDa. As with the human ␣5(VI) and ␣6(VI) chains, the ␣4(VI) chain has seven VWA domains followed by a collagen triple helix and two C-terminal VWA domains.
Expression of COL6A5 and COL6A6 mRNA-RT-PCR was conducted to determine the distribution of Col6a4, Col6a5/ COL6A5, and Col6a6/COL6A6 in mouse and human tissues. In mouse, Col6a4 mRNA was present in uterus and ovary as faint bands but not in any other tissues examined (Fig. 5A). In contrast, mRNA for Col6a5 and Col6a4 was present in a much wider range of tissues. Col6a5 was expressed in all tissues examined except prostate, although it was expressed only weakly in muscle and trachea. Col6a6 mRNA was present in all tissues except prostate and expressed only weakly in bone, small intestine, and placenta.
In human tissues, mRNA for COL6A5 was only present at high levels in lung (adult) and testis, and at lower levels in small intestine and colon (Fig. 5B). This is in contrast to the more widespread expression pattern in the mouse. All other tissues were negative for COL6A5 mRNA, including fetal lung. To confirm the low overall expression of COL6A5, the PCRs were repeated with a different primers set with identical results (data not shown). COL6A6 mRNA was found to be expressed more widely with high levels of expression in fetal and adult lung, heart, and spleen. In kidney, thymus, and skeletal muscle, there was more mRNA expression in fetal compared with adult. Conversely, in adult liver mRNA levels were higher than in fetal liver. Several other tissues expressed COL6A6 mRNA at relatively low levels including pancreas, testis, and uterus.
Expression of ␣6(VI) in Human Tissues-To assess the distribution of ␣6(VI) chains in human tissues, polyclonal antiserum was raised against the C1 domain ␣6(VI). The crude antiserum was then affinity-purified using the recombinant antigen. The affinity-purified antiserum was used in an immunoblotting experiment on a human fetal muscle extract (Fig. 6). The antiserum recognized a single band at 240 kDa corresponding to the molecular weight of the ␣6(VI) chain. The band was digested with collagenase treatment confirming that it is a collagen. The antibody was then used to stain a range of human tissues (Fig. 7). The ␣6(VI) chain was present in the extracellular matrix of kidney (not shown), skeletal and cardiac muscle, lung, and blood vessels (Fig. 7, C, G, K, and Q). Lower levels of ␣6(VI) chain were present in pancreas and spleen (Fig. 7, R and  T). Notably, significant ECM staining for ␣6(VI) protein was absent from liver (Fig. 7O). An ␣3(VI) monoclonal antibody detected extracellular collagen VI in several tissues assayed, including skeletal muscle (Fig. 7A), cardiac muscle (E), lung (I), and liver (M), reflecting the wide distribution of collagen VI. The absence of significant ECM staining for the ␣6(VI) chain in liver, a tissue that is rich in collagen VI (compare Fig. 7M with Fig. 7O), indicates that the affinity-purified ␣6(VI) antibody does not cross-react with other collagen VI chains.
Collagen VI is known to be a component of the chondrocyte pericellular matrix (30,31). Because COL6A6 mRNA was shown to be present in mouse articular cartilage (Fig. 5A), the expression of collagen VI in human articular cartilage was assessed (Fig. 8). Pericellular staining for collagen VI is clearly shown in Fig. 8, panels F and H, where human articular cartilage was immunostained with two collagen VI antibodies. In contrast to these collagen VI antibodies is the staining pattern of the ␣6(VI)-specific antibody, which extends significantly farther into the territorial matrix with little or no discrete pericellular staining. This can be seen in the superficial layer but is especially clear in the deeper zone.
Immunogold electron microscopy analysis also confirmed that ␣6(VI) chains are present in human articular cartilage (Fig.  8B). Here, antibodies to the ␣6(VI) chain co-localize with extrafibrillar material that contains collagen VI, demonstrating that the labeled material is part of the same suprastructures.
Assembly of ␣4(VI), ␣5(VI), and ␣6(VI) in Transfected Cells-To assess whether the three new collagen VI chains are competent to assemble with ␣1(VI) and ␣2(VI) chains, transfection studies in SaOs-2 osteosarcoma cells were conducted (Fig. 9). These cells synthesize ␣1 and ␣2 chains but not any other collagen VI chains, including COL6A5 and COL6A5 (data not shown). Cells were transfected with cDNAs for mouse Col6a4 (containing domains N6 -C2), human COL6A5 (domains N5-C1), or human COL6A6 (domains N7-C2). Following selection in hygromycin, the cultures were labeled with [ 35 S]methionine, and the labeled material in the cell layer and medium fractions were immunoprecipitated with an anti-His 6 monoclonal antibody (Fig. 9A) or a commercially available collagen VI antibody (Fig. 9B). Bands corresponding to the molecular weights (ϳ230 -240 kDa) of the ␣4(VI)N6-C2, ␣5(VI)N5-C1, and ␣6(VI)N7-C2 chains were present in the cell fractions of their respective transfections (see Fig. 9A, lanes 3, 5, and 6). No band at this molecular weight is present in the cell layer of untransfected cells (Fig. 9A, lane 1). This confirms that the cells synthesize all three chains from their respective transfected cDNAs. The ␣4(VI)N6-C2 chain was present in the medium fraction (Fig. 9A, lane 4) indicating that it is secreted and competent to co-assemble with ␣1(VI) and ␣2(VI). This was con- firmed in the reverse experiment where the metabolically labeled cultures were immunoprecipitated with antiserum that recognizes the human ␣1(VI) and ␣2(VI) chains (Fig. 9B). Again, only in the cells expressing the ␣4(VI)N6-C2 chain was a band present in the transfected cell medium. The detection of secreted ␣4(VI)N6-C2 chain when immunoprecipitating with ␣1/␣2 antibody is good evidence that the ␣4 chain co-assembles with these two chains. Conversely, the failure to detect ␣5(VI)N6-C1 and ␣6(VI)N7-C2 is surprising and suggests that these chains are not competent to co-assemble with ␣1(VI) and ␣2(VI) chains in this system. Longer exposure of the gels failed to detect any secreted ␣5(VI)N6-C1 and ␣6(VI)N7-C2 chains in the media fractions. To confirm that the secreted ␣4(VI) chain is triple helical, immunoprecipitated sample from the medium of cells transfected with His 6 -Col6a4N6-C2 cDNA was resolved under nonreducing conditions (Fig. 9C). The 240-kDa ␣4(VI) chain migrates at the top of the gel in the absence of reduction. The band corresponding to the co-precipitated ␣1 and ␣2, which is present with reduction chain, also disappears in the nonreduced sample. Together these data suggest that ␣1, ␣2, and ␣4 chains co-assemble into disulfide-bonded triple helical collagen VI heterotrimers that are secreted from the cell.

DISCUSSION
The discovery of three additional chains, ␣4(VI), ␣5(VI), and ␣6(VI), doubles the size of the collagen VI family from three to six members. A schematic, updated to show the domain structures of the three existing (␣1, ␣2, and ␣3) and three new (␣4, ␣5, and ␣6) human collagen VI chains, is presented in Fig.  10. Overall, the domain organization of the collagen VI chains is similar with each having a short triple helix flanked by a variable number of VWA domains. The ␣1 and ␣2 chains are smallest with just one Nand two C-terminal VWA domains. The ␣3 chain is largest with up to 10 N-and 2 C-terminal VWA domains as well as several domains not related to VWA domains at the C terminus. The new ␣4, ␣5, and ␣6 chains are closer in size to the ␣3 chain with seven N-terminal VWA domains and two or three C-terminal VWA domains. Several features within the triple helix domain suggest that the three new chains are more closely related to the ␣3(VI) chain than ␣1(VI) and ␣2(VI). The ␣3, ␣4, ␣5, and ␣6 chains have the same interruption in the Gly-X-Y repeat motif in the triple helix, which is not present in the ␣1 and ␣2 chains. In addition, ␣3(VI) and the three new collagen VI chains contain a single cysteine at amino acid 50 of the triple helix (Fig. 2). In contrast, the ␣1(VI) and ␣2(VI) chains have a helix cysteine at a different relative position at position 89. Analysis of disease-causing mutations in the collagen VI chains has demonstrated that these cysteine residues are crucial for forming interchain disulfide bonds and stabilizing collagen VI assembly intermediates (10, 17-19, 32, 33). These sequence features suggest that the COL6A3, COL6A4, COL6A5, and COL6A6 genes share a common evolutionary history.
The collagen VI locus at 3p22.1 is evolutionarily conserved and predates the split of the reptile lineage, although in humans the COL6A4 gene is disrupted by a chromosome break. Rhesus monkeys appear to possess three intact collagen VI genes at this

Three Novel Collagen VI Genes
locus, but the human and chimpanzee genomes only have two intact genes, COL6A5 and COL6A6. Therefore, the chromosome translocation that disrupted the COL6A4 gene must have occurred following the split of the hominoids (apes and humans) from the old world monkey lineage (macaques, gibbons, and vervets) estimated to be 30 million years ago (34). The break occurred in the domain that encodes the triple helix and distributed the 5Ј end of COL6A4 to the p arm of chromosome 3. This is a major disruption to the gene and is highly likely to render the gene functionally inactive, although a careful analysis will need to be conducted to exclude the possibility that transcripts derived from the split gene have functional significance. This is a possibility because such a recent chromosome rearrangement may have left the regulatory regions of the 5Ј end of the gene intact. If a partial protein for the ␣4(VI) chain is produced, it may exert a dominant negative effect on collagen VI function. There are 11 ESTs for the 5Ј half of the gene suggesting that this portion may be transcriptionally active.
In mouse, Col6a5 and Col6a6 are transcribed in a wide range of tissues. However, expression of Col6a4 is confined to a few tissues, including uterus and ovary. In humans, there is also widespread expression of COL6A6 but restricted expression of COL6A5. The reason for the difference in expression of COL6A5/Col6a5 between human and mouse is not known. One possibility is that the chromosome break altered the genomic context of the locus such that trans-acting regulatory elements required for widespread expression of COL6A5 were removed or deleted. Nonetheless, there are multiple tissues in humans and mouse where at least one of the new chains is expressed.
With the existence of six chains the assembly of collagen VI is more complex than previously thought and implies that a chain selection mechanism must exist to distinguish between chains in cells that express more than three chains. The trimerization of the fibrillar collagens and collagen IV, which also has six chains, has been well studied (reviewed by Khoshnoodi et al. (35)). The common theme is that initial chain selection involves elements C-terminal to the triple helix, and this appears to be the case for the collagen VI chains as well (12,15,16). However, the motifs responsible for chain selection in collagen VI may be different because there is no clear C-terminal domain homology between the collagen IV, collagen VI, and the fibrillar collagen subfamilies. To investigate the ability of the new chains to co-assemble with ␣1(VI) and ␣2(VI), transfection and immunoprecipitation experiments in SaOs-2 osteosarcoma cells were conducted. SaOs-2 cells express mRNA for COL6A1 and COL6A2 but not COL6A3, COL6A5, or COL6A6 (see Refs. 11, 14 and data not shown), making them an ideal cell line to assess whether these chains assemble into collagen VI molecules. ␣1(VI) and ␣2(VI) are not competent to assemble into heterotrimeric collagen VI molecules alone and require a third chain for collagen VI assembly (11,36). This allowed us to test the model that the new chains substitute for ␣3(VI) and co-assemble with ␣1(VI) and ␣2(VI). However, this model may be an oversimplification of the in vivo situation because although we show that the mouse ␣4(VI)N6-C2 chain can co-assemble with ␣1 and ␣2, the ␣5N5-C1 and ␣6N7-C2 chains do not associate with ␣1 and ␣2 and are retained inside the cell. Both chains contain the C1 domain, which is reported to be critical for collagen VI chain association (12,15,16). However, the COL6A5N5-C1 cDNA used in our transfection study lacked the exon for the C2 domain, and we cannot exclude the possibility that this domain is important for ␣5(VI) co-assembly with ␣1(VI) and ␣2(VI). Overall, these data indicate that the ␣5(VI) and ␣6(VI) chains are not competent to assemble with ␣1 and ␣2 chains and suggest that these chains assemble into collagen VI heterotrimers that are different from ␣1-␣2-␣3 and ␣1-␣2-␣4 protomers. Future studies will examine the possibility of alternative collagen VI chain assemblies that do not require ␣1(VI) and/or ␣2(VI).
Interestingly, the distribution of ␣6(VI) in cartilage is different from that of the ␣3(VI) chain. The ␣3(VI) chain is concentrated around the cell in the pericellular matrix, and the ␣6(VI) chain is predominantly in the territorial matrix, although there is overlap with the pericellular matrix (see Fig. 8B). This finding raises the possibility that not all collagen VI microfibrils contain ␣3(VI) chains. This suggests that different collagen VI macromolecular structures may exist.
A recent report mapped the collagen XXIX gene, COL29A1, to human chromosome 3q22.1 and described its genetic asso-   Size of the amino acid sequence represents the maximum possible size and does not take into account splice variants that are known to exist for the COL6A2 and COL6A3 genes. The molecular weight is the predicted molecular weight of the mature protein following signal peptide cleavage. The chromosomal location information was derived from the University of California, Santa Cruz, Genome Browser (March 2006, assembly). DNA sequences for human COL6A5, COL6A6, and mouse Col6a4 are presented in supplemental Figs. S1, S2, and S3, respectively. ciation with atopic dermatitis (37). The COL29A1 gene appears to be the COL6A5 sequence reported here. Based on sequence similarities with the collagen VI genes, we argue that the COL29A1 gene is not a new collagen subtype but belongs to the collagen VI family. Finally, in ϳ40% of individuals with Bethlem myopathy and UCMD, mutations are not present in the COL6A1, COL6A2, or COL6A3 genes (21,22). Our finding that COL6A6 is expressed in skeletal muscle raises the exciting possibility that mutations in this gene are involved in the pathophysiology of these and possibly other, related, congenital muscular dystrophies.