Collagen XXVIII, a Novel von Willebrand Factor A Domain-containing Protein with Many Imperfections in the Collagenous Domain*

Here we describe a novel collagen belonging to the class of von Willebrand factor A (VWA) domain-containing proteins. This novel protein was identified by screening the EST data base and was subsequently recombinantly expressed and characterized as an authentic tissue component. The COL28A1 gene on human chromosome 7p21.3 and on mouse chromosome 6A1 encodes a novel protein that structurally resembles the beaded filament-forming collagens. The collagenous domain contains several very short interruptions arranged in a repeat pattern. As shown for other novel minor collagens, the expression of collagen XXVIII protein in mouse is very restricted. In addition to small amounts in skin and calvaria, the major signals were in dorsal root ganglia and peripheral nerves. By immunoelectron microscopy, collagen XXVIII was detected in the sciatic nerve, at the basement membrane of certain Schwann cells surrounding the nerve fibers. Even though the protein is present in the adult sciatic nerve, collagen XXVIII mRNA was only detected in sciatic nerve of newborn mice, indicating that the protein persists for an extended period after synthesis.

The collagen superfamily covers a variety of subclasses. Most of these genes are unique to vertebrates and only some collagens are conserved throughout the animal kingdom (1), indicating that most collagens have evolved in parallel with the appearance of an internal skeleton. Up to date 42 genes of this family have been described and the different polypeptides can be assembled into at least 28 distinct trimeric collagens. The importance of collagens is well established through the study of more than 1300 different mutations found in collagen genes (1). The first collagens were identified through biochemical purification and amino acid sequencing. In a second wave, with the help of cDNA library screening methods, several additional collagens were found. Recently, the information provided by genome projects allowed the identification of yet not described members. One of the reasons for the late discovery of these collagens is the fact that they are minor components and often expressed at very specific sites. VWA 3 domains are found in a variety of proteins, e.g. the prototype von Willebrand factor, collagens, matrilins, and integrins (2). The functions of the VWA domains are to facilitate protein-protein interactions. For example, the VWA domains present in the integrin ␣ 1 ␤ 1 and ␣ 2 ␤ 1 receptors are responsible for the interaction with fibrillar collagens (3). VWA domains are also found in collagen VI (beaded filament-forming collagen) as well as in collagen VII (anchoring fibril-forming collagen). In addition, several members of the fibril-associated collagens with interrupted triple helices (FACIT collagens) also contain VWA domains (1).
Collagens can be found in all the tissues throughout the vertebrate body and are important for tissue integrity. In the peripheral nervous system, they are present in the endoneurium in the form of fibrillar collagens and as components of the basement membranes surrounding the processes of Schwann cells and perineurial cells. One of the fibrillar collagen chains, collagen ␣4(V), is predominantly expressed in the developing nerve (4). The basement membrane is essential for the integrity of myelinated nerves. As demonstrated in several studies, the removal of laminin chains or of the corresponding receptors lead to neuropathy (demyelinated type), characterized by the presence of naked axons (5). In this study we identified and characterized the novel collagen XXVIII in the mouse and human and showed that it is mainly a component of the basement membranes around Schwann cells in the peripheral nervous system.

MATERIALS AND METHODS
RT-PCR and 5Ј Rapid Amplification of cDNA Ends-RT-PCR was used to clone the mouse and human collagen XXVIII cDNAs. Primers were designed according to EST and genomic sequences deposited in the data bases (Table 1). To prevent mutations in the RT-PCR we used the Expand High Fidelity PCR System (Roche Diagnostics). The mouse full-length clone was amplified using primer pairs c28m-1/c28m-5 and c28m-6/c28m-7 and the partial sequence of the alternatively spliced mouse cDNA AJ890450 was amplified using the primer pair c28m-8/ c28m-9 from cDNA of epiphyseal cartilage of newborn mice. The human cDNA was cloned from mRNA prepared from human lung using primer pairs c28h-1/c28h-2, c28h-3/c28h-4, and c28h-5/c28h-6.
Northern Blot Analysis and RT-PCR-Total RNA was extracted from various tissues of newborn and adult C57BL/6J mice by the guanidinium thiocyanate method. mRNA from limbs of newborn mice was prepared by using the QuickPrepMikro mRNA Purification Kit (Amersham Biosciences). Aliquots (3 g) were electrophoresed on a 0.8% denaturing agarose-formaldehyde gel, blotted, and hybridized with a digoxigeninlabeled RNA probe (nucleotides 2320 -3243). The conditions for the last two wash steps were: 0.1ϫ SSC, 0.1% SDS at 68°C for 15 min each. The blots were developed using CDP-Star (Roche) according to the manufacturer's instructions.
Expression and Purification of Recombinant Collagen XXVIII and Its VWA2 Domain-VWA domains expressed in bacteria are often correctly folded after purification (6) and suited for the generation of specific antisera. Therefore the collagen XXVIII VWA2 cDNA was generated by PCR on the full-length cDNA. The primer pair c28m-10/ c28m-11 introduced a 5Ј terminal XbaI and a 3Ј terminal BamHI restriction site. The cDNA was cloned into a modified pGEX vector carrying a glutathione S-transferase His 6 tag with a thrombin cleavage site. Escherichia coli cells (RosettaDE3, Novagen) were transformed with the recombinant plasmid. The bacteria were induced by 0.5 mM isopropyl thiogalactoside and grown for 16 h at 28°C. The cells were treated with lysozyme and sonicated in phosphate-buffered saline, pH 7.5, containing protease inhibitors (Complete, Roche). After centrifugation (30 min, 20,000 ϫ g, 4°C) the supernatant was concentrated on glutathione SuperFlow Resin (Clontech) and the protein was eluted with 10 mM glutathione, 150 mM NaCl, 10 mM Tris-HCl, pH 8.0. Thrombin cleavage was performed overnight at room temperature (1 unit/mg thrombin, 5 mM CaCl 2 ) and the cleaved off glutathione S-transferase His 6 tags were removed by passing the solution over a glutathione SuperFlow column.
For the expression of full-length murine collagen XXVIII, cDNA was generated by PCR on the full-length cDNA. Suitable primers (c28m-3/ c28m-4) introduced a 5Ј SpeI and a 3Ј NotI restriction site. The amplified PCR product was inserted into the expression vector pCEP-Pu (7) downstream of the cytomegalovirus promoter. The vector contained an N-terminal His 6 tag (7) upstream of the SpeI site.
The recombinant plasmid was introduced into human embryonic kidney 293-EBNA cells (Invitrogen) by electroporation. The cells were selected with puromycin (1 g/ml) and transferred to serum-free Dulbecco's modified Eagle's medium for harvest of the recombinant protein. After filtration and centrifugation (1 h, 10,000 ϫ g), the cell culture supernatant was applied to TALON Metal Affinity Resin (BD Biosciences) and the protein was eluted with 100 mM imidazole, 150 mM NaCl, 50 mM Tris-HCl, pH 8.0.
Collagenase Digestion-For assessment of the domain structure of collagen XXII, the recombinant protein was subjected to collagenase digestion. The incubation with 40 units/ml of highly purified bacterial collagenase (CLOSA, Worthington Biochemicals) was carried out in 50 l of elution buffer containing 5 mM CaCl 2 and 1 mM Pefablock (Roche) for 4 h at 37°C. The reaction was stopped by adding EDTA to a final concentration of 20 mM.
Preparation of Antibodies to the Collagen XXVIII VWA2 Domain-The purified recombinant VWA2 protein was used to immunize a rabbit. The antiserum obtained was purified by affinity chromatography on a column with antigen coupled to CNBr-activated Sepharose (Amersham Biosciences). The specific antibody, termed pAb KR43, was eluted with 150 mM NaCl, 0.1 M triethylamine, pH 11.5, and the eluate was neutralized with 1 M Tris-HCl, pH 6.8.
Sequential Extraction of Collagen XXVIII from Mouse Sciatic Nerves-The extraction was performed as described before (8). Sciatic nerves were weighed and frozen at Ϫ80°C. On the day of extraction, the specimens were cut into 1-mm 3 pieces. 10 volumes (ml/g wet tissue) of chilled buffer 1 (150 mM NaCl, 50 mM Tris, pH 7.4) was added, and the tissue was extracted for 7-10 h at 4°C with continuous mixing. The extracts were clarified by centrifugation and the supernatants stored at Ϫ20°C. The pellets were re-extracted in an identical manner with buffer 2 (1 M NaCl, 10 mM EDTA, 50 mM Tris, pH 7.4) and the remaining insoluble material with buffer 3 (4 M guanidine HCl, 10 mM EDTA, 50 mM Tris, pH 7.4). All extraction buffers contained 2 mM phenylmethylsulfonyl fluoride and 2 mM N-ethylmaleimide.
Aliquots (100 l), extracted with buffers I, II, or III were precipitated with 1 ml of 96% ethanol overnight at 4°C. The precipitates were washed with a mixture of 9 volumes of 96% ethanol and 1 volume of Tris-buffered saline for 2 h at 4°C with gentle agitation. After centrifugation the pellets were air dried and suspended in 150 l of water and the same volume of 2ϫ SDS-PAGE sample buffer was added. Aliquots were applied to 4 -12% SDS-polyacrylamide gels.
SDS-PAGE, Immunoblotting, and Determination of the N-terminal Sequence-SDS-polyacrylamide gel electrophoresis was performed as described by Laemmli (9). For immunoblots the proteins were transferred to nitrocellulose and incubated with the appropriate affinitypurified rabbit antibody diluted in Tris-buffered saline containing 5% low fat milk powder. Bound antibodies were detected by luminescence using peroxidase-conjugated swine anti-rabbit IgG (Dako), 3-aminopthalhydrazide (1.25 mM), p-coumaric acid (225 M), and 0.01% H 2 O 2.
Immunoelectron Microscopy-The sciatic nerve was diced into 1 ϫ 1-mm pieces and immersed in primary KR43 antibody diluted 1:5 in serum-free Dulbecco's modified Eagle's medium overnight at 4°C. The tissue was washed in medium for 4 h, and then immersed in 10-nm gold-labeled secondary antibody diluted 1:3 in medium overnight at 4°C. Following extensive wash in medium, the tissue was immersed in gold enhancement solution, rinsed, fixed in 1.5% glutaraldehyde, 1.5% paraformaldehyde with 0.05% tannic acid, osmicated, and prepared for transmission electron microscopy by a standard protocol.

RESULTS
Cloning of Human and Mouse Collagen XXVIII cDNAs-In a screen of the genomic data base with collagen and matrilin sequences as query, a gene was identified in both the human and mouse genome that codes for a new VWA domain-containing collagen that was designated as the ␣1 chain of the next available number in the collagen family, collagen XXVIII. The corresponding cDNAs were cloned by RT-PCR using primers deduced from the genomic sequence and sequenced. The mouse collagen XXVIII cDNA of 4,230 bp (accession number AJ890449) contains an open reading frame of 3,423 bp, encoding a protein consisting of 1,141 amino acid residues preceded by a signal peptide of either 20 or 23 residues, as predicted by a method using neural networks or hidden Markov models, respectively (13). The mature secreted protein has a calculated M r of either 116,414 or 116,095 (Fig. 1A) and consists of a N-terminal VWA domain followed by a 528-amino acid residue collagenous domain, which has altogether 16 very short imperfections in the Gly-X-Y repeat. The C terminus is made up by a second VWA domain followed by module without homology to any other protein domain and a domain related to the bovine pancreatic trypsin inhibitor/Kunitz family of serine protease inhibitors (Kunitz domain; Fig. 1, A and B). The human cDNA of 3,515 bp (accession number AJ890451) has an open reading frame of 3,375 bp and an overall identity of 85.9% to the mouse ortholog in the coding region (Fig. 1A). The human protein has a predicted signal peptide of 23 amino acid residues and the mature secreted protein an M r of 113,917. The domain structure is identical to that of the mouse collagen XXVIII. The overall identity at the amino acid level is 86.5%. The identity is lowest (63.9%) in the unique domain, which in addition is 16 amino acid residues shorter in human, because a deletion of the splice donor site present in mouse. The identity is highest in the collagenous domain (91.5%) (Fig. 1A).
Alternative Splicing of Collagen XXVIII-Splice variants leading to premature stop codons were detected in human and mouse. In a partial mouse cDNA clone (AJ890450) intron 24 ( Fig. 5) is not spliced, which if translated would lead to a protein that contains most of the collagenous domain but lacks the second VWA domain and the Kunitz domain. The human partial cDNA clone AJ890452, which splices to an alternative exon within intron 24 ( Fig. 5), would encode a similar protein. The human partial cDNA clone AJ890453 contains an additional exon in intron 7 (Fig. 5) that includes a stop codon and would lead to a protein that lacks most of the collagenous domain. In addition, other mouse and human EST clones representing alternatively spliced mRNAs are present in the data bases. However, none of the splice variant clones or EST sequences is full-length and therefore their complete protein sequences are not known.
Analysis of the Collagenous Domain of Collagen XXVIII-The 528amino acid residue long collagenous domain contains 12 GXG and four GXXXXG imperfections that are uniformly distributed along the sequence. The sizes and positions of these imperfections are completely conserved between man and mouse ( Fig. 1). In mouse, eight of the amino acid residues at the X position of the 12 GXG motifs are hydrophobic and the sequence GIG occurs five times. In human, the isoleucine is replaced by valine in two positions. The length of the perfect GXY triplet segments between the imperfections varies and lays between 19 and 82 residues. The longer non-interrupted sequences are equally distributed in the middle and at both ends of the collagenous domain (Fig. 1). At the Y position of the GXY triplets the lysine content is relatively high (20%), whereas the content of proline is normal in the X and Y positions (32%). The lysine residues are often part of a KG(D/E) motif that has been shown to contribute to the stability of the triple helix (14) and that occurs in 12.9% of the triplets of collagen XXVIII. In the N-and C-terminal ends of the collagenous domain a CXC motif is present, which could be used for formation of disulfide bonds between the chains. An unpaired cysteine residue is present in the longest noninterrupted N-terminal collagenous segment. An RGD motif is found at the C-terminal end of the collagenous domain in both mouse and man.

Analysis of the Collagen XXVIII VWA Domains and the Bovine Pancreatic Trypsin Inhibitor/Kunitz Family of Serine Protease Inhibitors
Domain-The metal ion-dependent adhesion site (D-X-S-X-S-X-nT-X-nD), is fully conserved only in the VWA2 domain (Fig. 2). As in human and mouse matrilin-3, in the VWA1 domain of collagen XXVIII the threonine is replaced by a serine. Sequence alignment of the mouse collagen XXVIII VWA domains with their counterparts in selected matrilins and collagens highlights the homology (Fig. 2). The sequence identity between the two mouse collagen XXVIII VWA domains is only 22.9%, the similarity is 40.0%. Higher identity values were obtained for the VWA2 domain of mouse matrilin-2 and the mouse collagen XXVIII VWA2 domain (32.2%), and the VWA domain of mouse matrilin-3 and the mouse collagen XXVIII VWA2 domain (27.6%). Of the various VWA domains found in collagen, the VWA1 domain of ␣2(VI) collagen has the highest identity value (20.9%). In phylogenetic analyses using protein distance and protein parsimony the collagen XXVIII VWA domains cluster to the VWA domains of collagen ␣1(VI) and ␣2(VI) (Fig. 3).
The Kunitz domain at the C-terminal end ( Fig. 1) contains the full consensus pattern (F-X(3)-G-C-X(6)-[FY]-X(5)-C). A BLAST search with the collagen XXVIII Kunitz domain revealed that the Kunitz domains of papilin and ␣3(VI) collagen are most related and are 48 and 45% identical, respectively, whereas the Kunitz domain of ␣1(VII) collagen is only 33% identical.
Recombinant Expression of Mouse Collagen XXVIII VWA Domains and Generation of Antisera-Recombinant expression of full-length collagen XXVIII yielded insufficient protein amounts to be used in immunization. Therefore the cDNA encoding the collagen XXVIII VWA2 domain was expressed in bacteria (not shown). It has been shown earlier that VWA domains expressed in bacteria are often correctly folded after purification (6) and therefore suited for the genera-tion of specific antisera. The purified VWA2 domain was used to immunize a rabbit.
Recombinant Expression of Full-length Mouse Collagen XXVIII-cDNA encoding the sequence of full-length mouse collagen XXVIII was cloned into the pCEP-Pu vector utilizing the BM40 secretion signal sequence and an N-terminal His 6 tag (7). The recombinant plasmid was introduced into human embryonic kidney 293-EBNA cells and maintained in an episomal form. The recombinant collagen XXVIII protein was secreted into the cell culture medium and appeared in reducing SDS-PAGE as a band with an apparent molecular mass of ϳ135,000, which is in the range expected for monomeric collagen XXVIII (Fig. 4, A  and B). A band corresponding to the homotrimeric form of collagen XXVIII could be detected in non-reducing SDS-PAGE (Fig. 4A). After reduction this band disappeared (Fig. 4A). Only minor amounts of recombinant collagen XXVIII could be purified by affinity chromatography on a TALON column. However, these were sufficient to perform a collagenase digestion (Fig. 4B). In immunoblots the major band running at ϳ135,000 Da nearly completely disappeared and a band at   6 -tagged collagen XXVIII, was subjected to SDS-PAGE, with (ϩSH) and without (ϪSH) prior reduction on 4 -12 or 4 -8% gradient gels, respectively. Fn, fibronectin. B, collagenase digestion of purified recombinant collagen XXVIII was performed for 4 h at 37°C, the not digested (Ϫ) and digested (ϩ) protein was separated by SDS-PAGE on 4 -12% gradient gels with prior reduction. The immunoblots were developed using affinity-purified antibodies raised against the collagen XXVIII VWA2 domain.
ϳ60,000 Da appeared. As the immunoblot was developed using the antiserum directed against the VWA2 domain, this band represents the C-terminal half of the protein.
Structure of the Collagen COL28A1 Genes-The human and mouse collagen XXVIII genes map to syntenic regions on chromosomes 7 (7p21.3) and 6 (6A1), respectively. Upstream in both cases lie the genes coding for a hypothetical protein containing a WD40 domain, the replication protein A3, glucocorticoid-induced transcript 1, the islet cell autoantigen 1, and neurexophilin 1. Downstream of the collagen XXVIII genes lies the gene coding for core 1 UDP-galactose:N-acetylgalactosamine-␣-(R)␤1,3-galactosyltransferase, which represents the end of the syntenic region. Both collagen XXVIII genomic sequences (human, NT_007819, NT_079592, NT_086703; mouse, NT_039340) are completely contained in the public data bases, except for a short gap in the first intron of the mouse gene. We identified exons by flanking consensus splice signals and comparison with the respective cDNAs. The exon/intron organization of the two genes is very similar (Fig. 5; Table 2) regarding size, exon, and intron length and codon phase. The human gene is 178 kb, the mouse gene 195 kb long, and both collagen XXVIII genes consist of 34 exons that code for the translated part of the mRNA. In addition, in mouse and man an exon 0 exists that encodes the 5Ј untranslated region. Exon 1 codes for the signal peptide sequence, whereas the first VWA domain is encoded by exons 2 and 3, the collagenous domains by exons 4 -29, and the second VWA domain by exons 31 and 32, respectively. Exon 30 codes for a short spacer region. The unique domain is encoded by exon 33, the only exon that differs in size between man and mouse. The Kunitz domain and the 3Ј UTR are encoded by exon 34. In both the human and mouse genes the splice donor site at the 5Ј end of intron 28 contains the non-canonical GC dinucleotide instead of GT. Interestingly, an mouse EST clone (CR519347) exists where exon 28 is spliced out and exon 27 is directly spliced to exon 29. In addition, here exon 29 is spliced to an alternative exon within intron 29, which would lead to a protein that contains most of the collagenous domain but lacks the second VWA and the Kunitz domain, similar to the splice variants described above.
Collagen XXVIII Gene Expression-To determine the length of the collagen XXVIII mRNA we performed Northern hybridization with mRNA derived from the lower limb of newborn mice. The message was very weak and could be detected only by using 3 g of mRNA in the Northern blot analysis (Fig. 6A). The blot clearly showed 5.2-and 4.0-kb bands. Weak Northern hybridization signals from mRNA were also detected in skin, intestine, sternum, brain, and kidney of newborn mice (not shown). RT-PCR was performed to further screen the tissue distribution of collagen XXVIII, and mRNA was detected in skin, intestine, heart, kidney, lung, brain, sciatic nerve, sternum, and calvaria of newborn mice and in intestine and brain of adult mice (Fig. 6, B and C). Among neuronal tissues from adult mice, collagen XXVIII mRNA could be amplified from brain stem, spinal cord, and most strongly from dorsal root ganglion, whereas sciatic nerve was negative (Fig. 6C).
Characterization of Collagen XXVIII Extracted from Mouse Sciatic Nerve-To study the structure of tissue-derived collagen XXVIII and identify similarities or differences to the recombinantly expressed protein, sequential extracts from murine sciatic nerve tissue were analyzed by SDS-PAGE and immunoblotting using the affinity-purified polyclonal antibody KR43 directed against the VWA2 domain (Fig. 7). Collagen XXVIII could be extracted by using buffers containing EDTA or EDTA together with guanidine HCl. A major band with an apparent molecular mass of ϳ150 kDa, representing the monomeric form of collagen XXVIII, was detected under reducing conditions, whereas under non-reducing conditions a band with a higher apparent mass, in the range of fibronectin, was detected, indicating the formation of disulfide-linked homotrimeric collagen XXVIII molecules. Under reducing conditions, an additional band with a lower apparent mass was detected that may represent either a degradation product or an alternatively spliced form of collagen XXVIII.
Collagen XXVIII Is Strongly Expressed in Peripheral Nerves-The tissue distribution of collagen XXVIII was studied by immunohistochemistry using the affinity-purified polyclonal antibody KR43 raised against the VWA2 domain on cryostat sections of embryonic (E18.5), newborn, and adult mice (Fig. 8). Collagen XXVIII was easily detected in the peripheral nerves of newborn and adult mice. The dorsal root ganglia are also positive for collagen XXVIII and the strongest expression was seen in the peripheral nerve fibers originating from there. The neurons are not stained, whereas the intermediate space between the large neuronal cell bodies is partially positive (Fig. 8H). A strong partial staining can also be seen in transversal sections through the sciatic nerve of adult mice (Fig. 8). Co-staining with the Schwann cell marker S100␤ (Fig. 8F) showed partial overlap of expression, indicating that the protein originates from a subset of Schwann cells. In contrast, there was not any co-staining with an anti-neurofilament antibody (Fig. 8C) or with the myelin-specific dye FluoroMyelin Red (Fig. 8G), indicating that collagen XXVIII is neither present in the axons nor in the myelin sheaths. An antibody directed against galactocerebroside (Gal-C), the major galactosphingolipid of peripheral nerve Schwann cell membranes, also showed only partial co-staining (Fig. 8D). Some overlap was also seen with the myelin marker myelin basic protein (Fig. 8E). The co-distribution seems to be restricted to cell bodies of Schwann cells, whereas a partial basement membrane staining was seen for collagen XXVIII also.
Co-staining with the basement membrane marker nidogen-1 shows co-distribution in a limited subset of basement membranes and the collagen XXVIII signal does not always cover the complete basement membrane (Fig. 8, A and B). Furthermore, in electron microscopy using gold-labeled antibodies a strong staining of basement membranes surrounding a subset of Schwann cells that ensheathe the axons was seen, whereas basement membranes around other axons did not carry any gold particles (Fig. 9). In conclusion, it appears that a particular subset of Schwann cells synthesize collagen XXVIII and that the protein is laid down in the surrounding basement membrane. In addition in dorsal root ganglia and peripheral nerves, collagen XXVIII is present in connective tissues like calvaria and skin (Fig. 8I). Other neuronal tissues were negative in immunofluorescence (brain in Fig. 8I and results not shown). An antibody raised against the collagen XXVIII VWA1 domain gave the same staining pattern as the antibody KR43 (results not shown), which recognizes the VWA2 domain.

DISCUSSION
We report on the initial characterization of collagen XXVIII, a new member of the collagen family that belongs to the VWA domain containing branch. The human and mouse genes are clearly orthologous, as their amino acid sequences are more than 85% identical, they are located on syntenic areas of the genome, and have identical exon-intron structures. Cloning of the cDNA by RT-PCR confirmed the existence of the collagen XXVIII gene in man and mouse. These genes have previously only been incompletely predicted by automated computer analysis.
The predicted mouse cDNA (XM_145161) was annotated by the genome project as "similar to procollagen, type VI, ␣2" by using the gene prediction program GNOMON, supported by EST evidence. The predicted sequence lacks the parts of the collagen domain encoded by exons 10, 18, and 29 and, in addition, the exon coding for the Kunitz domain is not complete. On the other hand, the human genome project predicted two partial independent cDNAs, lacking in addition several collagen domain-coding exons. The N-terminal sequences are also annotated as similar to procollagen, type VI, ␣2 (XM_499262) or "similar to ␣2 type VI collagen isoform 2C2 precursor" (XM_295195), whereas the C-terminal sequences are annotated as "similar to matrilin 2 precursor" (XM_209824 and XM_374399). In addition, collagen XXVIII was earlier identified as a new collagen and named collagen E (2), but also here the collagenous domain was only poorly predicted. A partial human cDNA clone (BC063866) derived from dorsal root ganglion is present in the data bases. This clone contains a polyadenylation site as well as a poly(A) tail, which was not included in the human collagen XXVIII sequence presented here, leading to the estimation of approximately 5400 bp for the full-length human cDNA. Similarly, on the basis of the sequences of the 5Ј and 3Ј ends of the RIKEN full-length enriched, adult male testis mouse cDNA clone 4932419L17, which extends the mouse cDNA sequences presented here, the total length of the mouse cDNA could be calculated to 5017 bp, which is in good correlation to the result of the Northern blot that revealed a length of 5.2 kb (Fig. 6A). The shorter 4.0-kb band probably represents an alternatively spliced mRNA, which might be assembled by exon skipping in the 5Ј region.
The sequence analysis shows a relationship of collagen XXVIII to collagen VI and, indeed, a collagenous domain flanked by VWA domains and a C-terminal Kunitz domain only occurs in collagen VI. Phylogenetic analyses based on the VWA domains, using protein distance and protein parsimony methods, clusters the two collagen XXVIII VWA domains to VWA1 and VWA2 domains of the ␣1(VI) and ␣2(VI) chains. ␣1(VII) collagen also contains a Kunitz domain, but the Kunitz domain of collagen XXVIII is more similar to that of the ␣3(VI) chain, which again indicates a relationship of collagen XXVIII to the beaded filament-forming type VI collagen. Nevertheless, there is at present no evidence for a functional relationship of collagen XXVIII and collagen VI and, in addition, collagen VI contains fewer and mainly different imperfections in the collagenous domains. However, these imperfections are thought to be important for the formation of a twisted supercoil of two triple helical collagen VI molecules (15,16). Interruptions and imperfections in the collagen triplet repeats also occur in many other collagens. In collagen XXVIII only two types are present, mainly GXG but also GXXXXG. The restricted occurrence of only these two types is unique for a collagen chain. Also other collagens, like the ␣1(IV), ␣5(IV), or ␣1(VII) chains, contain repeated GXG imperfections. For example, the mouse ␣1(IV) chain has seven GXG and six GXXXXG motifs, but in addition it contains another 13 imperfections with lengths of three to nine amino acid residues, and the mouse ␣1(VII) chain contains eight GXG and seven GXXXXG motifs and in addition 13 imperfections with lengths of three to 17 amino acid residues. These are not as FIGURE 6. Analysis of collagen XXVIII mRNA species in various mouse tissues. A, Northern hybridization of 3 g of mRNA from the lower limb tissue of newborn mice. B and C, RT-PCR analysis was performed using the primer pair c28m-1 and c28m-2. Template RNA was isolated from newborn (B, lower panel) and adult mice (B, upper panel). In C, the developmental stage is indicated: nb, newborn; ad, adult. The 1-kb ladder from Invitrogen was used. uniformly distributed as in the ␣1(XXVIII) collagen. In collagens IV and VII the interruptions are thought to be responsible for the flexibility of the molecules (17,18). FACIT collagens also contain interruptions in the collagenous domain, but those are longer and occur more sparsely compared with collagen XXVIII. Therefore the structure of collagen XXVIII seems to be unique and cannot clearly be assigned to any branch of the collagen family. Presumably, the imperfections in the collagenous domain are an important structural feature.
KGE/D sequences are thought to increase the stability of the triple helix by the formation of electrostatic interactions (14). In collagen XXVIII 12.9% of the triplets are KGE/D triplets and, in addition, this motif is often present among the first triplets of each collagenous segment, probably stabilizing the triple-helical fold of these short segments. Interestingly, this stabilizing effect was also suggested to be the reason for the high content of the KGE/D motif in the basement membrane ␣1(IV) collagen, which contains 10.3% KGE/D compared with 3.6% in the fibillar ␣1(I) and ␣1(I) collagens (19).
Only one single cysteine residue is present in the collagenous domain.  The other cysteines present in the sequence are most probably involved in the formation of disulfide bonds within the VWA and Kunitz domains. Flanking the collagenous domain, two pairs of cysteine residues, CEC and CGC, are present and these might form intermolecular disulfide bridges that enhance the stability of the trimeric molecules. If collagen XXVIII also forms higher polymers, the single cysteine residue within the collagenous domain could form disulfide bridges interconnecting different trimeric molecules. Such intermolecular bonds also exist in collagen VI, where they are involved in dimerization and trimerization of the triple-helical monomers (15). VWA domains are often involved in protein-protein interactions and occur in a variety of collagens. Indeed, 57 of the 134 VWA domains present in the human proteome are found in collagens (for review see Ref. 2). In addition to the FACIT collagens, VWA domains are present in all three chains of collagen VI and in collagen VII. In the FACIT collagens, the C-terminal collagenous domains are associated with the surface of collagen fibrils. The N-terminal, VWA domain-containing part protrudes from the fibril and is likely to be involved in interactions with other extracellular matrix proteins. In collagen VI the VWA domains are thought to be involved in self-interactions (2), but they can also bind to fibrillar collagen I (20). The VWA domain-containing N-terminal part of collagen VII binds to a variety of extracellular matrix and basement membrane proteins, including collagens I and IV and laminins 5 and 6 (2). Only the second VWA domain of collagen XXVIII contains a fully conserved metal ion-dependent adhesion site motif. However, conservation of this motif does not prove the presence of a binding site for divalent cations, as crystallization of the von Willebrand factor A3 domain revealed no bound metal ion, although the motif is present (21). On the other hand, an incomplete metal ion-dependent adhesion site motif must not exclude the binding of divalent cations. As collagen XXVIII is extracted from tissue only when EDTA is present, at least one of the VWA domains is probably involved in cation-dependent protein binding, most likely by participation of the metal ion-dependent adhesion site motif.
Kunitz domains are not contained in mature collagens VI or VII. Proteinases of the bone morphogenetic protein-1 family convert procollagen VII to mature anchoring fibril collagen (22). In normal human skin, the removal of the Kunitz domain-containing NC-2 domain from procollagen VII precedes its deposition at the dermal-epidermal junction (23). At least in human adult articular cartilage, the Kunitz domain of the ␣3(VI) chain is initially incorporated into the newly formed collagen VI fibrils, but immediately after secretion cleaved off and not present in the mature pericellular collagen VI matrix (24). Although the Kunitz domain of collagen VI has homologies to the trypsin protease inhibitor it has no inhibitory function (25). By a mutagenesis approach, critical amino acid residues (TXXDFXXXW) were identified in human collagen VI that prevent binding to trypsin (26). This motif is very similar to that of the Kunitz domain of collagen XXVIII in man and mouse ((E/N)XXDYXXXW). Nevertheless, it could serve as an interaction module that binds to proteins different from trypsin, as was recently shown for the Kunitz domain of collagen VI that binds to an endothelial receptor, TEM8, highly expressed in tumor vessels (27).
The human and the mouse COL28A1 genes are very similar and have the typical structure of collagen-encoding genes, containing a multiplicity of exons. However, at the 5Ј end of intron 28 a noncanonical splice site is present, where the canonical GT is replaced by a GC. This splice site is conserved between man and mouse, indicating a significance of this divergence. GC-AG introns occur in 0.7% of U2-type introns (28) and in Caenorhabditis elegans it has been shown that such introns can be involved in alternative splicing (29). Indeed, this kind of alternative splicing was first described in let-2, a gene that encodes an ␣2(IV) collagen-like protein with developmentally regulated splicing (30). Although we have not found alternative transcripts that differ at this position, an alternatively spliced EST clone (CR519347), where the GC splice donor site is not used, is present in the data base.
Immunohistochemistry revealed the strongest expression of collagen XXVIII in peripheral nerves originating from dorsal root ganglia. RT-PCR performed on mRNA prepared from sciatic nerve of newborn mice was positive, whereas collagen XXVIII mRNA could not be amplified from sciatic nerve of adult mice. Taken together with the clear demonstration by immunohistochemistry and immunoblots of the protein in the sciatic nerve of adult mice, this indicates that collagen XXVIII is laid down during development and persists in the extracellular matrix for an extended period after synthesis. Only some of the nerve bundles are strongly stained. The staining originates from Schwann cells, but could also be detected in parts of the basement membrane that are produced by the Schwann cells. Indeed, in electron microscopy gold-labeled collagen XXVIII, antibodies strongly label basement membranes surrounding Schwann cells that ensheath particular axons, whereas basement membranes around other axons carry fewer or no gold particles. It is at present unclear which subset of Schwann cells produce collagen XXVIII, and how it is integrated into the basement membrane. In addition to the strong expression in nerves, collagen XXVIII could also be detected by immunohistochemistry in calvaria and, weakly, in skin of newborn mice, indicating a role of collagen XXVIII during the development of connective tissues. Although mRNA could be detected by RT-PCR also in other tissues, immunohistochemistry was negative, indicating that protein levels were below the detection limit. This could be because of translational regulation or to a higher turnover than in sciatic nerve. Furthermore, RT-PCR revealed a broader expression of collagen XXVIII in newborn than in adult mice.
Collagen XXVIII is a new collagen that cannot clearly be assigned to any collagen subgroup, despite the fact that it contains VWA domains. Although the structure of collagen XXVIII has similarities to collagen VI, there is no further evidence that it belongs to the collagen VI subfamily. Because the collagenous stretch is longer in collagen XXVIII, it is very unlikely that heterotrimeric molecules can be formed together with collagen VI chains. The tissue distribution is clearly more restricted than that of collagen VI and the predominant expression in neuronal tissues is unusual for a collagen. The localization in basement membranes could indicate a relationship with collagen IV. Ongoing investigations will shed further light on the role of collagen XXVIII in the assembly of nerve fiber basement membranes. Although no human mutation has been associated with collagen XXVIII, it is tempting to speculate that mutations in this gene would lead to neurodegenerative disease.