Three Novel Collagen VI Chains with High Homology to the α3 Chain*

Here we describe three novel collagen VI chains, α4, α5, and α6. The corresponding genes are arranged in tandem on mouse chromosome 9. The new chains structurally resemble the collagen VI α3 chain. Each chain consists of seven von Willebrand factor A domains followed by a collagenous domain, two C-terminal von Willebrand factor A domains, and a unique domain. In addition, the collagen VI α4 chain carries a Kunitz domain at the C terminus, whereas the collagen VI α5 chain contains an additional von Willebrand factor A domain and a unique domain. The size of the collagenous domains and the position of the structurally important cysteine residues within these domains are identical between the collagen VI α3, α4, α5, and α6 chains. In mouse, the new chains are found in or close to basement membranes. Collagen VI α1 chain-deficient mice lack expression of the new collagen VI chains implicating that the new chains may substitute for the α3 chain, probably forming α1α2α4, α1α2α5, or α1α2α6 heterotrimers. Due to a large scale pericentric inversion, the human COL6A4 gene on chromosome 3 was broken into two pieces and became a non-processed pseudogene. Recently COL6A5 was linked to atopic dermatitis and designated COL29A1. The identification of novel collagen VI chains carries implications for the etiology of atopic dermatitis as well as Bethlem myopathy and Ullrich congenital muscular dystrophy.

Members of the collagen protein superfamily play important roles in maintaining extracellular matrix structure and func-tion. To date 28 family members are known (1,2), among which the fibril-forming collagens and the FACIT collagens form large subgroups. In addition, several collagens exist that have highly specific functions. Among these, collagen VI forms a distinct network of microfibrils in most connective tissues. Electron microscopy revealed a beaded filament structure of the microfibrils (3). The ␣1, ␣2, and ␣3 chains of collagen VI form heterotrimeric monomers that already intracellularly assemble to dimers and tetramers (4,5). After secretion, filaments are formed by end to end interactions of the preassembled tetramers.
The three previously known collagen VI chains contain a relatively short collagenous domain of about 335 residues together with VWA 3 domains, which are the characteristic non-collagenous domains of collagen VI. A common feature of VWA domains is their involvement in the formation of multiprotein complexes (6). Whereas all three collagen VI chains contain two C-terminal VWA domains, the ␣1 and ␣2 chains carry only one and the ␣3 chain ten VWA domains at the N terminus (7,8). In addition, the ␣3 chain contains a unique domain with similarities to salivary gland proteins, a fibronectin type III repeat, and a bovine pancreatic trypsin inhibitor/ Kunitz family of serine protease inhibitor domain (Kunitz domain) at the C terminus (8). It was suggested that the VWA domains play a role in the assembly of collagen VI (9 -11). However, recently the analysis of lysyl hydroxylase 3-deficient mouse embryos indicated that also the loss of potentially glycosylated hydroxylysine residues prevents the intracellular formation of collagen VI tetramers and leads to impaired secretion of collagen VI (12).
It has been shown that collagen VI interacts with several other extracellular matrix components, including collagen I (13), II (14), and XIV (15), perlecan (16), and the microfibrilassociated glycoprotein MAGP1 (17). The N-terminal globular domains of the collagen VI molecules bind the small leucinerich repeat proteoglycans decorin and biglycan, which in turn interact with matrilins, mediating contacts to further binding partners (18).
Studies on collagen VI have often focused on its function in skeletal muscle because of the patient phenotypes, Beth-lem myopathy, and Ullrich congenital muscular dystrophy, observed when the ␣1, ␣2, or ␣3 chain carries a mutation (for review see Ref. 19). In mice where the gene coding for the collagen VI ␣1 chain has been inactivated also the ␣2 and ␣3 chains are not secreted, showing that a heterotrimeric assembly is required (20). The mice show a muscular weakness and histological signs of muscle fiber necrosis. Recent studies indicate that the myopathy is due to a mitochondrial dysfunction (21,22). A possible explanation could be a decreased integrin-mediated signaling from collagen VI to the cells (23), but details of the downstream events are still not known.
Here we describe three new collagen VI chains that have the potential to replace the collagen VI ␣3 chain in collagen VI assemblies and thereby to increase the structural and functional versatility of collagen VI.

MATERIALS AND METHODS
RT-PCR-RT-PCR was used to clone the mouse and human collagen VI cDNAs. Primers were designed according to EST and genomic sequences that are deposited in the data bases (supplemental Table 1). To prevent mutations in the RT-PCR we used the Expand high fidelity PCR system (Roche Applied Science). The cDNAs for the ␣4 chain were amplified from mRNA isolated from adult mouse uterus and newborn mouse brain, and cDNAs for the ␣5 and ␣6 chains were amplified from mRNA from newborn mouse lung. The human cDNAs for the ␣5 chain were cloned from mRNA prepared from HT1080 or HEK293-EBNA cells, and the cDNAs for the ␣6 chain were cloned from mRNA prepared from fetal brain using the primer pairs indicated in supplemental Table 1.
Northern Blot Analysis-Total RNA was extracted from various tissues of newborn and adult C57BL/6J mice by the guanidinium-thiocyanate method. mRNA was prepared by using the Oligotex mRNA Mini Kit (Qiagen). Aliquots were electrophoresed on a 0.8% denaturing agarose-formaldehyde gel, blotted, and hybridized with digoxigenin-labeled RNA probes. The conditions in the last two wash steps were: 0.1 ϫ SSC, 0.1% SDS at 68°C for 15 min each. The blots were developed using CDP-Star (Roche) according to the manufacturer's instructions.
Bioinformatic Analysis-The non-redundant NCBI genomic data bases for mouse (Build 37.1) and human (Build 36.2) were scanned for new genes using collagen and matrilin sequences as queries. The exon-intron boundaries of each of the new genes were carefully interpreted using the NCBI Evidence Viewer together with the cloned cDNA sequences. The potential signal peptide and domain structure of each protein was predicted by SignalP v3.1 and SMART, respectively. However, the N1 domain of the ␣5 chain was manually assigned based on sequence signature motifs because none of the available domain prediction programs could locate it. Multiple sequence alignments were performed using CLUSTAL X (v1.81) and figures were prepared with the BOXSHADE v3.2 program. The protein sequence identities of the new chains were calculated using BOXSHADE. The phylogenetic analysis was done by protein distance and protein parsimony as described in PHYLIP v3.66.
Expression and Purification of Recombinant N-terminal Collagen VI ␣4, ␣5, and ␣6 Chain VWA Domains-cDNA constructs were generated by RT-PCR on mRNA. For the collagen VI ␣4, ␣5, and ␣6 chains, the domains N3-N6, N3, and N1-N7 were chosen, respectively. Suitable primers introduced 5Ј-terminal NheI and 3Ј-terminal BamHI, BglII, or XhoI restriction sites (supplemental Table 1). The amplified PCR products were inserted into a modified pCEP-Pu vector (16) containing an N-terminal BM-40 signal peptide and a C-terminal His 8 -tag or a C-terminal tandem strepII-tag (17) downstream of the restriction sites. The recombinant plasmids were introduced into HEK293-EBNA cells (Invitrogen) using FuGENE 6 transfection reagents (Roche). The cells were selected with puromycin (1 g/ml), and the His 8 -tagged protein-producing cells were transferred to serum-free medium for harvest of the recombinant protein. The C-terminal tandem strepIItagged protein was directly purified from serum-containing cell culture medium. After filtration and centrifugation (1 h, 10,000 ϫ g), the cell culture supernatants were applied either to a streptactin column (1.5 ml, IBA GmbH) and eluted with 2.5 mM desthiobiotin, 10 mM Tris-HCl, pH 8.0, or to a TALON metal affinity column (Clontech) and eluted following the supplier's protocol.
Preparation of Antibodies against the New Collagen Chains-The purified recombinant collagen VI fragments were used to immunize rabbits and guinea pigs. The antisera obtained were purified by affinity chromatography on a column with antigen coupled to CNBr-activated Sepharose (GE Healthcare). The specific antibodies were eluted with 0.1 M glycine, pH 2.5, and the eluate was neutralized with 1 M Tris-HCl, pH 8.8. The antiserum raised against the domains N1-N7 of the collagen VI ␣6 chain were affinity-purified on a column coupled with the collagen VI ␣6 chain N2-N6 domains to prevent cross-reactivity due to the highly identical N7 domains of collagen VI ␣5 and ␣6. The lack of extensive cross-reactivity between the new chains was demonstrated by ELISA.
Immunohistochemistry-Immunohistochemistry was performed on frozen embedded sections of adult wild type and collagen VI ␣1 chain-deficient mice (20). The frozen sections were preincubated in ice-cold methanol for 2 min, blocked for 1 h with 5% normal goat serum in phosphate-buffered saline containing 0.2% Tween 20, and incubated with the primary antibodies overnight at 4°C followed by AlexaFluor 488-conjugated goat anti-rabbit IgG (Molecular Probes), AlexaFluor 546conjugated goat anti-rabbit IgG (Molecular Probes), or Alex-aFluor 488-conjugated goat anti-guinea pig IgG (Molecular Probes). Collagen VI ␣1, ␣2, and ␣3 chains were detected using a polyclonal antibody (AB7821, Chemicon). A polyclonal antibody against the human native laminin-332 (24) was kindly given by R. E. Burgeson.
Purification of Collagen VI-Native collagen VI was purified from newborn mice. Proteins were extracted by urea treatment, and collagen VI was isolated by molecular sieve column chromatography as described previously (25).

Cloning of cDNAs Coding for Three New Mouse Collagen VI
Chains-In a screen of the genomic data base with collagen and matrilin sequences as queries, three genes were identified in the mouse genome that code for new VWA domain-containing collagens. Because of their homology to the ␣3 chain of collagen VI and their arrangement in the genome, these were designated as the ␣4, ␣5, and ␣6 chains of collagen VI. The corresponding cDNAs were cloned as overlapping partial clones by RT-PCR, using primers deduced from the genomic sequence, and sequenced. The cloned mouse ␣4 cDNA of 7084 bp (accession numbers AM231151-AM231153) contains an open reading frame of 6927 bp, encoding a protein consisting of 2309 amino acid residues preceded by a signal peptide of 22 residues, as predicted by a method using neural networks or hidden Markov models, respectively (13). The mature secreted protein has a calculated M r of 248,389 ( Fig. 1). At least nine EST clones exist that extend 207 bp in the 3Ј direction and contain an ATTAAA polyadenylation signal at their 3Ј-ends. In addition, a partial RIKEN cDNA clone (AK159050) extends 1219 bp and also contains an ATTAAA polyadenylation signal at its 3Ј-end, indicating the presence of different 3Ј-UTRs.
The cloned mouse ␣5 chain cDNA of 8298 bp (accession numbers AM748256 -AM748258) contains an open reading frame of 7920 bp, encoding a protein consisting of 2640 amino acid residues preceded by a signal peptide of 18 residues (13). The mature secreted protein has a calculated M r of 287,502 ( Fig. 1). A partial RIKEN clone (AK134435) extends 751 bp at the 3Ј-end but does not contain a polyadenylation signal.
The cloned mouse collagen VI ␣6 chain cDNA of 7097 bp (accession numbers AM748259 -AM748262) contains an open reading frame of 6795 bp, encoding a protein consisting of 2265 amino acid residues preceded by a signal peptide of 18 residues (13). The mature secreted protein has a calculated M r of 244,260 ( Fig. 1).
Domain Structure-The domain structures of the new chains are very similar to that of the collagen VI ␣3 chain (Fig. 2). For comparison with the already known collagen VI chains we use the nomenclature introduced by Chu et al. (8). The domains at the N terminus of the collagenous domain are designated with N, the domains at the C terminus of the collagenous domain with C. Numbering starts at the collagenous domain. At the N terminus all three mature proteins contain seven VWA domains (N7-N1), followed by a 336-amino acid residue long collagen triple helical domain. Toward the C terminus they have two VWA domains (C1 and C2) that are followed by a unique sequence (C3) that in the new ␣6 chain also represents the C-terminal end. In mouse the ␣4 chain carries a short stretch of 17 amino acid residues at the C-terminal end (C4) that resembles a Kunitz domain. Interestingly, when searching the genomic data bases for exons coding for a complete Kunitz domain, such a domain could be identified at this position in ortholog genes of several species. Only in mouse and rat do the sequences contain a premature stop codon, indicating that, except in rodents, a full Kunitz domain is present at the C terminus of the collagen VI ␣4 chain (Fig. 3A). In the ␣5 chain the C-terminal end contains a third VWA domain (C4) followed by another unique domain (C5). A major difference between the new chains and the collagen VI ␣3 chain is the presence of three additional VWA domains at the N-terminal end of the ␣3 chain. Interestingly, a splice variant of the collagen VI ␣3 chain (AAC23667) lacks the first, second, and fourth VWA domains and thereby, as the new chains, contains seven N-terminal VWA domains. The overall identity at the amino acid level is highest between the ␣5 and ␣6 chains (44.7%) and lowest between the ␣4 and ␣5 chains (28.0%). The overall identity of the three new chains and the ␣3 chain varies between 25.9 and 26.7%.
Alternative Splicing-In mouse, two different splice variants of the collagen VI ␣4 mRNA with premature stop codons can be deduced from EST clones. First, the ESTs AU023415 and BG068629 contain a stop codon in an alternative exon following the exon coding for the N4 domain. If translated, this transcript would yield a protein containing only the first four VWA domains. A second splice variant was detected in the three EST clones BX520360, AI427280, and W48310. Here, an alternative splice donor site in exon 35 coding for the C2 domain and an alternative splice acceptor site in exon 37 coding for the unique domain are used. Due to a shift in codon phase, the new exon codes for a different frame and contains a stop codon 101 bp downstream of the alternative splice site. If translated, this transcript would give a protein that lacks nearly one-half of the C2 domain and the unique domain. Interestingly, the alternative splice site contains a non-canonical GC-AG motif.
A RIKEN cDNA clone coding for the collagen VI ␣6 chain (AK054356) shows alternative splicing in the 5Ј-UTR, indicating the presence of two different promoters. Interestingly, due to additional alternative splicing of exon 6, a much shorter open reading frame occurs that would generate a protein containing only the first six VWA domains and lacking the seventh VWA domain, the collagenous domain, and the C-terminal non-collagenous domains.
Analysis of the Collagenous Domains-The 336-amino acid residue long collagenous domains have exactly the same size as that in the collagen VI ␣3 chain (Fig. 3B). The identity between the collagenous domain of the ␣3 chain and those in the ␣4, ␣5, and ␣6 chains is 53.3, 49.1, and 51.8%, respectively. A cysteine residue that is also present in the collagenous domain of the collagen VI ␣3 chain and appears to be involved in tetramer formation and stability (19) is conserved in all new chains.
The locations of the two imperfections in the Gly-Xaa-Yaa repeat found in the collagen VI ␣3 chain are conserved in all new chains, whereas the ␣5 and ␣6 chains have additional imperfections. In both these chains a glycine residue in a Gly-Xaa-Yaa repeat close to the C terminus of the collagenous domain is replaced by a leucine or a valine residue, respectively, introducing another imperfection. Interestingly, the position coincides with an imperfection found in the ␣1 and ␣2 chains. In addition, an imperfection is present at the center of the collagenous domains of the collagen VI ␣5 and ␣6 chains, where one or two glycine residues of Gly-Xaa-Yaa repeats are lacking, respectively.
In contrast to the collagenous domain of the collagen VI ␣3 chain, which contains five potentially integrin-binding RGD sequences, in each of the new chains only one RGD motif is present. In the collagen VI ␣4 and ␣6 chains the motif is found at exactly the same position where an RGD is present also in the collagen VI ␣3 chain (Fig. 3B). The content of proline or hydroxyproline in the X and Y positions is lower (17.4 -20.5%) than in the fibril-forming collagen I ␣1 or collagen II ␣1 chains (26). N-and C-terminal of the collagenous domains several cysteine residues are present, which might form intermolecular disulfide bridges that enhance the stability of the trimeric col- The sequences for the Kunitz domains of rhesus monkey, dog, and rat were deduced from genomic sequences. The sequences were aligned by CLUSTAL X using the default parameters. The residues forming the trypsin interaction site in the original bovine pancreatic trypsin inhibitor (BPTI) (34) are marked with a number sign, the cysteine residues with asterisks, and the RGD sequences with dots. Imperfections in the collagenous domains are boxed and numbered I1-I4. The conserved metal ion-dependent adhesion site (42) and the conserved hydrophobic moieties (43) are denoted with and , respectively. lagens. In phylogenetic analyses using protein distance and protein parsimony, the collagenous domains of the ␣3, ␣4, ␣5, and ␣6 chains group in one clade (Fig. 4, A and B).
Analysis of the VWA Domains-Of the 28 VWA domains present in the new collagen VI chains, the metal ion-dependent adhesion site (DXSXSXnTXnD, where n represents a variable number of amino acid residues) motif, is fully conserved only in 8 (Fig. 3D). Sequence alignment of the VWA domains of the new chains with their counterparts present in the collagen VI ␣1-␣3 chains highlights the homology ( Fig. 3D and supplemental Fig. 1). The highest sequence identity between two VWA domains of the new chains is 92.1% for ␣5N7 and ␣6N7. High identity values were also obtained for the ␣5N4 and ␣6N4 (64.5%), ␣5N5 and ␣6N5 (51.9%), ␣5C2 and ␣6C3 (52.9%), ␣5C1 and ␣6C1 (50.5%), and ␣5N1 and ␣6N1 (50.3%). Among the various VWA domains found in the collagen VI ␣1-␣3 chains, the N10 domain of the collagen VI ␣3 chain shows the highest identity value to the N7 domain of the ␣4 chain (39.5%). Similar identity values were obtained for ␣3N9 and ␣4N7 (34.7%) and ␣3C1 and ␣4C1 (34.5%). Identity values between the ␣3 chain VWA domains and ␣5 and ␣6 chain VWA domains are not higher than 28.4 and 28.9%, respectively. The identity between the VWA domains of the new chains and those of the collagen VI ␣1 and ␣2 chains is always lower than 24.0%. In phylogenetic analyses using protein distance and protein parsimony, all the VWA domains of the ␣5 and ␣6 chains pair up together (Fig. 4,  C and D). The C-terminal VWA domains of the ␣3, ␣4, ␣5, and ␣6 chains group to a distinct branch in which the C1 domains are in one subbranch and the C2 domains and the C4 domain of the ␣5 chain are in another. Similarly, the N1 domains of the ␣3, ␣4, ␣5, and ␣6 chains all cluster together (Fig. 4, C and D).
Analysis of the Unique Domains-The unique sequences at the C-terminal end follow directly after the second C-terminal VWA domains (C2). In the collagen VI ␣5 chain a second unique domain is present C-terminal of the C4 domain. The unique domains are 99 -111 amino acid residues long. The unique domain of the ␣4 chain and the first unique domain of the ␣5 chain as well as the second unique domain of the ␣5 chain and the unique domain of the ␣6 chain share some pairwise similarity, 31.6 and 26.1%, respectively. However, a stretch of 15 amino acid residues at the beginning of each domain is highly identical in all four unique domains and has a cysteine residue at the end (Fig. 3C). Interestingly, the unique sequence of the collagen VI ␣3 chain, C-terminal to the C2 domain, also shares some homology to the unique domains of the new chains, most clearly in the C-terminal portions, and particularly the cysteine residue is conserved (Fig. 3C). Interestingly, shortly after the highly homologous stretch, an RGD motif is present in both the ␣4 chain and the first unique domain of the ␣5 chain, whereas this motif is missing in the ␣6 chain and in the second unique domain of the ␣5 chain (Fig. 3C). In addition to the single RGD motifs present in each of the collagenous domains, these two RGD motifs are the only ones found in the new collagen VI chains. An RGD motif is lacking in the unique domain of the collagen VI ␣3 chain. BLAST searches with the unique sequences revealed some weak homologies to intracellular proteins like the REST corepressor 1 (␣4 35/83 (42%)), ubiquitin D (␣5C3 22/32 (68%)), protein-tyrosine phosphatase (␣5C5, 34/71 (47%)), and dynein cytoplasmic 2 heavy chain 1 (␣6 26/60 (43%)).
Structure of the Murine Collagen Col6a4-Col6a6 Genes-The new mouse collagen VI genes map to chromosome 9 (9F1) (Fig. 5). The genomic sequences are completely contained in the public data bases (NT_039477 and NW_001030918). The genes lay head to tail in tandem orientation on the minus strand. The Pik3r4 gene and the Mirn135a1 gene are located downstream and upstream of the new collagen genes, respectively. We identified exons by flanking consensus splice signals and by comparison with the respective cDNAs. The exon/intron organization of the three genes is very similar (Fig. 6 and supplemental Tables 2-4) regarding size, exon and intron length, and codon phase. The Col6a4 and Col6a5 genes are 112 kb, and the Col6a6 gene is 104 kb long. They consist of 38, 44, and 37 exons, respectively, that code for the translated part of the mRNA (Fig. 6). The first exon in each gene completely encodes the 5Ј-UTR. All second exons code for the signal peptide sequence followed by six exons coding for the first six VWA domains (N7-N2), whereas the VWA domain N1 is encoded by three exons. The collagenous domains are encoded by exons 12-30. Interestingly, intron 24 of the Col6a4 gene is a GC-AG-type intron. Exons 31 and 32 code for short spacer regions. The VWA domains C1 and C2 are encoded by exons 33/34 and 35, respectively. The structures of the three genes differ at the 3Ј-end. In Col6a4 the unique sequence is encoded by two exons followed by a last exon coding for the truncated Kunitz domain and the 3Ј-UTR. In Col6a6 the last two exons code for the unique domain and 3Ј-UTR. The more complex structure of the C-terminal end of the collagen VI ␣5 chain is also reflected in the Col6a5 gene structure. The unique domain between the VWA domains C2 and C4 is encoded by two exons followed by the exon coding for the additional VWA domain C4. As in Col6a6, the last two exons of Col6a5 code for the unique domain and 3Ј-UTR. Although there is only partial homology between the unique domains of the new collagen chains, each unique domain is encoded by two exons where the first exon is always about 95 bp and the second about 200 bp long, pointing to the likelihood of a common ancestor.
New Collagen VI Genes in Man-The orthologs of the new mouse genes map to human chromosome 3q21 (Fig. 5). The tandem orientation of the genes is conserved, but the gene coding for the ␣4 chain is broken into two pieces, and the 5Ј region of the gene is located at 3p24.3. Only the region downstream of the new collagen VI genes, coding for PIK3R4, is in synteny in man and mouse. The breakpoint resembles the large scale pericentric inversion that occurred in the common ancestor of the African apes and is present in modern human chromosome 3 as well as in the chimpanzee and gorilla orthologs, but not in orangutan or Old World monkeys (27). In contrast to rhesus macaque, the human COL6A4 is interrupted after the first exon coding for the collagenous domain, and EST clones for both parts of the gene can be found in the data bases. However, due to the presence of stop codons that are distributed over the sequence, both parts of the human COL6A4 are likely to be transcribed non-processed pseudogenes. The corresponding cDNAs of the human COL6A5 and COL6A6 were cloned as overlapping partial clones by RT-PCR using primers deduced from the genomic sequence and sequenced (accession numbers AM774225-AM774227 and AM906078 -AM906084). The human collagen VI ␣5 chain has an identity of 73.1% at the amino acid level to the mouse ortholog (supplemental Fig. 2). The non-identities are not evenly distributed over the sequence. A 32-amino acid long proline-rich stretch at the C terminus of C1 is missing in man, and the unique domains are highly divergent. In addition, at two positions in the C-terminal part an amino acid residue is deleted and at three positions an amino acid residue is inserted into the human ␣5 chain. Most of the cysteine residues are conserved, but there is an additional cysteine present in the collagenous domain of the human ␣5 chain. However, the cysteine codon resembles an SNP (rs1497312) leading to a non-synonymous exchange to a serine codon. The positions and sizes of the imperfections in the collagenous domain are identical to those in mouse, whereas the RGD motif in the collagenous domain of the ␣5 chain is lost. Instead there is a new RGD motif at the N terminus of the collagenous domain. The two RGD motifs present in the unique domain of mouse are also missing in man. Another SNP (rs11355796), which resembles the deletion of a thymidine at the C terminus forms a premature stop codon, leading to a full-length protein of 2590 residues (supplemental Fig. 2). No information is available on the population frequency, but both variants are found in the TRACES-WGS data base. Interestingly, in the alternative Celera assembly of the human genome the deletion is present, whereas the thymidine is found in the reference assembly, leading to a longer protein. In contrast to murine Col6a5, human COL6A5 contains an additional intron in the 3Ј-UTR that leads to three different C termini by alternative splicing, thus resulting in full-length ␣5 chain isoforms of 2526, 2614, or 2615 amino acid residues (supplemental Fig. 2). The human collagen VI ␣6 chain has an identity of 83.4% at the amino acid level to the mouse ortholog, and only the last 30 amino acids at the C terminus show some differences (supplemental Fig. 3). The positions of the signal peptide cleavage site, the RGD motif, and all cysteine residues in the mature protein are completely conserved.
Interestingly, variants of the human COL6A5 gene were recently shown to be associated with atopic dermatitis (28). The authors of that study designated the human COL6A5 as COL29A1. Although not present in the original publication, the sequence was recently published in the data base (accession number EU085556). Although in the publication the length of the protein sequence was given as 2614 amino acids, the sequence submitted to the data base is 2615 amino acids long. The reason for this difference is unclear, but the amino acid sequence is nearly completely identical with the third variant presented here (supplemental Fig. 2), the only difference being that residue 2560 is a serine residue instead of the asparagine found by us. Nevertheless, the second splice variant presented here contains 2614 amino acids. Surprisingly, the 5Ј-UTR region of the EU085556 contains a duplication of the 19-bp sequence GTGCGGCGCGGACCAGGGC that is not present in our sequence and is found neither in the alternative Celera assembly of the human genome nor in any of the 21 TRACE-WGS clones that cover this region but is present in the reference assembly of the human genome.
Expression of the New Mouse Collagen VI Genes-To determine the length of the new collagen VI mRNAs we performed Northern hybridization with total RNA or mRNA (Fig. 7A). The mRNA coding for the ␣6 chain was readily detected as a 9.7-kb band in total RNA derived from the lungs of newborn mice. Several messages coding for the ␣4 chain were detected in total RNA derived from the same source, probably indicating alternative splicing. The most prominent mRNA band had a length of 8.4 kb whereas weaker bands appeared at 11.7, 6.7, and 5.0 kb. A 9.5-kb message coding for the ␣5 chain was detected in purified mRNA derived from sterna of 4-week-old mice. RT-PCR was performed to screen the tissue distribution of the new collagen VI chains (Fig. 7, B and C). Products corresponding to the mRNAs for the ␣5 and ␣6 chains could be detected in lung, heart, kidney, muscle, brain, intestine, skin, femur, and sternum of newborn mice. In addition, ␣6 chain mRNA could be detected in calvaria. The ␣4 chain mRNA shows a more restricted tissue distribution and could be detected in lung, kidney, brain, intestine, skin, sternum, and weakly in calvaria (Fig.  7B). In adult mice, expression of the ␣4 chain is lost in most tissues, and RT-PCR showed a signal only in ovary and very weakly in spleen, lung, uterus, and brain. In contrast, the ␣5 chain is widely expressed also in adult mice, and mRNA could be detected in lung, heart, kidney, spleen, muscle, ovary, uterus, brain, skin, liver, and sternum, whereas the ␣6 chain expression is more restricted and could be detected in lung, heart, muscle, ovary, brain, liver, and sternum (Fig. 7C).
The New Collagen VI Chains Copurify with ␣1, ␣2, and ␣3 Chain-containing Collagen VI Prepared from Newborn Mice-If the new collagen VI chains assemble with known collagen VI chains, they should be present in conventional collagen VI preparations. Thus we isolated collagen VI from newborn mouse carcasses (25) and tested this preparation for the presence of the new chains by immunoblot. For this purpose, we generated antibodies specific for the new chains. Tagged versions of different N-terminal VWA domains were recombinantly expressed in EBNA293 cells, and the recombinant proteins were purified by affinity chromatography and used to immunize rabbits. The antisera were affinity-purified before use, and cross-reactivity among the new collagen VI chains was tested by ELISA ( Fig. 8 and not shown). All three new chains were detected after reduction of the collagen VI preparation as major bands running above the 220-kDa marker (Fig. 8), con-sistent with the calculated molecular masses. For the collagen VI ␣4 and ␣5 chains additional lower migrating bands were detected (Fig. 8), indicating alternative splicing or proteolytic processing. The weak smear with lower mobility seen for the ␣4 chain (Fig. 8) could indicate the presence of non-reducible cross-linked molecules.
The Collagen VI ␣5 and ␣6 Chains, but Not the ␣4 Chain, Are Deposited in the Extracellular Matrix of Skeletal Muscle-Mutations in collagen VI lead to muscular dystrophies in humans, FIGURE 7. Northern blot (A) and RT-PCR (B and C) analysis of the new mouse collagen VI chain mRNA species. A, Northern hybridization was performed for the collagen VI ␣4 and ␣6 chains with 5 g of total RNA from lung of newborn mice and for the ␣5 chain with of 1 g of poly(A) ϩ RNA from sternum of 4-week-old mice. Probes were generated using primers ␣4m8 and ␣4m9 for the ␣4 chain, ␣5m2 and ␣5m7 for the ␣5 chain, and ␣6m6 and ␣6m10 for the ␣6 chain. Position of size markers are indicated on the left. B and C, RT-PCR analysis was performed using primer pairs ␣4m6 and ␣4m7 for the ␣4 chain, ␣5m4 and ␣5m5 for the ␣5 chain, and ␣6m2 and ␣6m9 for the ␣6 chain. Template RNA was isolated from newborn (B) and adult mice (C). The 1-kb ladder from Invitrogen was used as a reference. and mice lacking collagen VI display a myopathic phenotype affecting skeletal muscle (20). We therefore tested cryostat sections of adult mouse quadriceps femoris muscle for the expression of the new collagen VI chains (Fig. 9). By immunohistochemistry using the polyclonal antibodies, collagen VI ␣5 and ␣6 were readily detected in skeletal muscle of adult mice. As the specific antibodies against the collagen VI ␣4 chain did not stain skeletal muscle we tested its reactivity on sections of small intestine where this chain could be strongly detected below the mucosal layer (Fig. 9). The strong reactivity of the antibody with intestine indicates that the collagen VI ␣4 chain is truly absent from skeletal muscle.
The targeted interruption of the Col6a1 gene in mouse completely abolishes the secretion of the collagen ␣2 and ␣3 chains (20), showing the need for a heterotrimeric assembly to form a functional collagen VI molecule. To determine the influence of the lack of the collagen VI ␣1 chain on the assembly of the new collagen VI chains we analyzed their occurrence in ␣1 chaindeficient mice (20). The new collagen VI chains could not be detected in quadriceps femoris of collagen VI ␣1 chain-deficient mice by immunohistochemistry (Fig. 9), indicating a participation of the ␣1 chain in the assembly of collagen VI mole-cules containing the new chains. The absence of the ␣5 and ␣6 chains in Col6a1 knock-out mice was also confirmed by immunoblot analysis of diaphragm extracts (Fig. 10). For wild type mice, incubation with antibodies specific for either the collagen VI ␣5 or ␣6 chain resulted in clearly identifiable bands above 220 kDa. When the same method was applied to diaphragm from collagen VI ␣1 chain-deficient mice, no bands were detected, supporting the results from the immunohistochemical analysis.

DISCUSSION
We report on the identification and initial biochemical characterization of three new collagen VI chains, named collagen VI ␣4, ␣5 and ␣6. The mouse Col6a4 -Col6a6 genes are arranged in tandem on chromosome 9 and were numbered according to their appearance from 5Ј to 3Ј on the coding strand. Cloning of the cDNAs by RT-PCR and immunohistochemistry and immunoblot using antibodies raised against recombinant fragments confirmed the expression of the new collagen VI genes in mouse. These genes have previously only been incompletely annotated or incorrectly predicted by conceptual translation or gene prediction programs.
The sequences and the domain structures of the new proteins show that they represent new collagen VI chains, which probably occur as the consequence of a gene duplication of the common ancestor of the collagen VI ␣3 and of the new ␣ chain genes, followed by additional duplications. The identical size of the collagenous domains of the new ␣ chains, compared with that of the ␣3 chain, implies that they could substitute for the ␣3 chain, probably forming ␣1␣2␣4, ␣1␣2␣5, or ␣1␣2␣6 heterotrimers. The close relation between the ␣3 chain and the new chains is also reflected by the almost identical exon/intron organization of the portions of the respective genes encoding the collagenous domains. With the exception of the last exon, the exon sizes are the same. Alternative splicing has been reported for the ␣3 chain mRNA (29) leading to production of shorter protein isoforms with molecular sizes similar to those of the new chains. The collagen VI arrangement known to date is composed of ␣1, ␣2, and ␣3 chains that associate intracellularly in a stoichiometric ratio to form triple helical monomers. Monomers then assemble into dimers and tetramers, which are finally secreted and deposited in the extracellular matrix where they form beaded filaments by interactions of their non-collagenous domains (3). There is good evidence that the ␣3 chain expression is essential for the formation of functional collagen VI molecules, as human SaOS-2 cells that are deficient in ␣3 chain expression do not produce triple helical collagen VI (30). Although the length of the collagenous domain of the collagen VI ␣1 chain is also identical to that of the new chains and the ␣2 chain is only one amino acid residue shorter, there are other criteria that clearly show the closest relationship of the new chains with the ␣3 chain. First, the exact position of the cysteine residue within the collagenous domain is conserved in the ␣3, ␣4, ␣5, and ␣6 chains. In the ␣3 chain this cysteine appears to be involved in tetramer formation and stability (19). The ␣1 and ␣2 chains also contain a cysteine each, but these are at a different position, and they appear to be involved in the stabilization of the supercoil that is formed during antiparallel dimer forma- Carcasses of newborn mice were extracted, and collagen VI was isolated by molecular sieve column chromatography as described previously (25). The proteins were submitted to SDS-PAGE on 4 -12% polyacrylamide gradient gels under reducing conditions. A, Coomassie Brilliant Blue-stained gel. B, immunoblot of the same preparation with antibodies against collagen VI (␣1, ␣2, ␣3; lane 1), the collagen VI ␣1 chain (lane 2), and the new collagen VI chains ␣4, ␣5, and ␣6 (lanes [3][4][5]. C, reactivity of affinity-purified antibodies against the collagen VI ␣6 chain and lack of reactivity of antibodies against the ␣4 and ␣5 chains against the collagen VI ␣6 chain N2-N6 domains. ELISA plates were coated with 0.5 g of recombinant collagen VI ␣6 N2-N6 protein per well, and antibodies were diluted as indicated. tion (31). Second, it has also been suggested that the supercoiled dimer is partially stabilized by ion pairs between different segments along the supercoil (32). In the ␣1␣2␣3 triple helical monomer the supercoiled part of the ␣1 chain carries a high negative net charge, whereas that of the ␣3 chain has a high positive net charge and that of the ␣2 is close to neutral. All three new chains carry a positive net charge that is even higher than that of the ␣3 chain. In addition, the positions of the two Gly-Xaa-Yaa imperfections present in the ␣3 chain, giving the supercoil a clearly segmented character (32), are conserved. Third, phylogenetic analyses based on the collagenous domains, using protein distance and protein parsimony methods, clusters the new chains with the collagen VI ␣3 chain, whereas the ␣1 and the ␣2 chains form a different branch (Fig.  4, A and B). In addition, the VWA domains C1 and C2 of the ␣3, ␣4, ␣5, and ␣6 chains also cluster in common branches (Fig. 4,  C and D).
All three new chains could be detected in collagen VI preparations from newborn mouse carcasses. The Coomassiestained gel clearly showed the distinct ␣1 and ␣2 bands and a very heterogenous distribution of bands above 220 kDa (Fig.  8A). By immunoblot using affinity-purified antibodies, the new chains were identified as a part of this purified collagen VI preparation (Fig. 8B), indicating that trimeric assemblies containing the new chains are present in tissues. Further experimental evidence for an assembly of the new chains into collagen VI came from the study of their expression in Col6a1 knock-out mice. It was shown earlier that the absence of the collagen VI ␣1 chain also leads to the lack of the secretion of the ␣2 and ␣3 chains (20), indicating that the ␣1 chain is essential for the assembly of collagen VI molecules. The complete lack of the new chains in Col6a1 knock-out mice clearly shows that the presence of the collagen VI ␣1 chain is a prerequisite also for their secretion. This observation strongly suggests that the new chains assemble in a similar way as proposed for the ␣1, ␣2, and ␣3 chain-containing collagen VI.
It has been shown that the N5 and the C5 domains of the collagen VI ␣3 chain are critical for the microfibril formation (9,33). Based on the sequence information alone it is not clear which domains in the new chains correspond to the N5 domain of the ␣3 chain. In some species only the ␣4 chain contains a domain distinctly homologous to the ␣3 chain C5 domain. As in the ␣3 chain, this domain resembles a Kunitz domain. Interestingly, the Kunitz domain of the ␣4 chain is truncated in mouse and rat, but the N-terminal part, which contains the trypsin interaction site in the original bovine pancreatic trypsin inhibitor (34), is still present in the truncated molecules and could serve as an interaction module. The ␣5 and ␣6 chains lack a Kunitz domain, which may indicate differences in the assembly of ␣5 and ␣6 chain-containing microfibrils. Indeed, it will be interesting to study how collagen VI of different composition assembles. Do fibrils contain ␣1, ␣2, and only one of the four related ␣3, ␣4, ␣5, and ␣6 chains or are mixed assemblies possible? The latter alternative would lead to a very high number of possible permutations.
The ortholog human genes are present on a, in an evolutionary context, very interesting part of chromosome 3. A large pericentric inversion occurred some time after the split of Homininae and Ponginae (27). The 3Ј breakpoint of the inversion is located within COL6A4 and leads to its inactivation. Although both parts of COL6A4 are still present and can be easily identified by their sequence, both have become transcribed non-processed pseudogenes. Thereby Homininae have become natural COL6A4 knock-outs (Fig. 5). This raises the question of whether one of the remaining genes has taken over the function of the lost one. The major structural difference between the collagen VI ␣4 chain and the ␣5 and ␣6 chains is at the C terminus, where the fibronectin type III domain and the Kunitz domain occur only in the ␣4 chain. However, when comparing human and mouse collagen VI ␣5 and ␣6 chains, higher divergence is found at the C terminus of the ␣5 chain, which in addition shows alternative splicing and could represent an adaptation to a need to replace the ␣4 chain in Homininae.
Recently, the human COL6A5 was associated with atopic dermatitis in a linkage study and designated COL29A1 (28). FIGURE 9. Immunohistochemical analysis of wild type and Col6a1 knock-out mouse. Immunohistochemistry was performed on frozen sections from mouse quadriceps femoris muscle (A-H) and small intestine (I-L) from wild type (wt) (A-D and I-K) or Col6a1 knock-out mice (ko) (E-H and L). Sections were incubated with the affinity-purified antibodies against the collagen VI ␣4 N3-N6 domain (B, F, J, and L), the collagen VI ␣5 N3 domain (C and G), the collagen VI ␣6 N2-N6 domain (D and H), human collagen VI from placenta (detecting the classical collagen VI chains ␣1␣2␣3; A, E, and I) and laminin 332 (K), followed by an AlexaFluor 488-conjugated goat anti-rabbit IgG (green, A-H), AlexaFluor 546-conjugated goat anti-rabbit IgG (red, I and K), or AlexaFluor 488-conjugated goat anti-guinea pig IgG (green, J and L). Antibodies against the classical collagen VI chains (␣1␣2␣3) and the ␣5 and ␣6 chains, but not such against the ␣4 chain, strongly stained the extracellular matrix surrounding the muscle fibers of wild type (A-D) mice. In small intestine, antibodies against classical collagen VI (␣1␣2␣3) (I) and antibodies against the collagen VI ␣4 chain show co-staining with those against the basement membrane marker laminin-332 (K). In collagen VI ␣1 chain-deficient mice, staining for the new collagen VI chains is absent (L). Nuclei were counterstained with 4Ј,6-diamidino-2-phenylindole (blue, A-H and L). The bar is 100 m.
Although neither DNA nor protein sequences were originally published, later the cDNA sequence became accessible in the data base. The protein sequence is, except for a single amino acid exchange, identical to the third potential splice variant presented here. As we have shown for mouse by RT-PCR, the collagen VI ␣5 chain is expressed in skin in man also (28). The expression in other tissues only partially overlaps; however intestine and lung also show expression in both species. A variety of nonsynonymous coding SNPs were described, but none could explain the association of COL6A5 with atopic dermatitis on its own. It was therefore proposed that several variants or combinations associated with the most common haplotype of COL6A5 are involved in the etiology of the disease. In addition, a strongly maternal transmission pattern was found, which could be due either to imprinting or to maternal effects through an interaction of the child's genotype with the maternal environment during prenatal life. Another susceptibility locus for atopic dermatitis is linked to 3p24-22 (35), which is exactly the breakpoint area where the 5Ј part of the COL6A4 pseudogene is located. It could be that the mechanism leading to atopic dermatitis is more complex and that the expression of the nonprocessed ␣4 chain pseudogenes by a yet unknown mechanism influences ␣5 chain expression. The maternal transmission pattern could point to such a mechanism. A number of pseudogenes have been described where gene conversion between a functional copy of a gene and a neighboring pseudogene causes disease (36). However, in the present case the mechanism is likely to be more complex.
The newly identified locus for atopic dermatitis in COL6A5 could correlate to another susceptibility locus found on chromosome 21p21 in a Swedish population that may contain a susceptibility gene modulating the severity of atopic dermatitis especially in combination with asthma (37). Both 21p21 and 3p24 have also been described as asthma susceptibility loci (38). Interestingly, COL6A1 and COL6A2 are located on 21p21, which could point to a more general role of collagen VI in the development of atopic dermatitis or asthma. In contrast, there is clear evidence for the role of collagen VI in the etiology of Bethlem myopathy and Ullrich congenital muscular dystrophy. A variety of mutations in all so far known collagen VI chains have been identified (for review see Ref. 19). Interestingly, patients who have phenotypes typical of Bethlem myopathy or Ullrich congenital muscular dystrophy, but in whom mutations in the collagen VI ␣1, ␣2 and ␣3 chains could not be detected have been described (39 -41). It is therefore tempting to speculate that mutations in the new collagen VI chains may cause muscular disease. Indeed, in mouse skeletal muscle the affinitypurified antibodies detected two of the new chains, ␣5 and ␣6, but not ␣4, associated with the extracellular matrix. In contrast, the ␣4 chain was absent from mouse muscle but detected close to the basement membrane underlying the mucosal cell layer in small intestine. This differential distribution indicates that the new chains may have tissue-specific functions allowing a modulation of collagen VI properties.