Complete Exon-Intron Organization of the Human Gene for the α1 Chain of Type XV Collagen (COL15A1) and Comparison with the Homologous Col18a1 Gene*

The human gene for the α1 chain of type XV collagen (COL15A1) is about 145 kilobases in size and contains 42 exons. The promoter is characterized by the lack of a TATAA motif and the presence of several Sp1 binding sites, some of which appeared to be functional in transfected HeLa cells. Comparison withCol18a1, which encodes the α1(XVIII) collagen chain homologous with α1(XV), indicates marked structural homology spread throughout the two genes. The mouse Col18a1 contains one exon more than COL15A1, due to the fact thatCOL15A1 lacks sequences corresponding to exon 3 ofCol18a1, which encodes a cysteine-rich sequence motif. Twenty-five of the exons of the two genes are almost identical in size, six of them contain conserved split codons, and the locations of the respective exon-intron junctions are identical or almost identical in the two genes. The homologous exons include the closely adjacent first pair of exons and the exons encoding a thrombospondin-1 homology found in the N-terminal noncollagenous domain 1, which are followed by the most variable part of the two genes, covering the C-terminal half of their noncollagenous domain 1 and the beginning of the collagenous portion, after which most of the exons are homologous. The lengths of the introns are not similar in these genes, with two exceptions, namely the first intron, which is very short, less than 100 base pairs, and the second intron, which is very large, about 50 kilobases, in both genes. It can be concluded that COL15A1 andCol18a1 are derived from a common ancestor.

The family of collagens is large, and the number of known collagenous proteins is increasing. Nineteen genetically distinct vertebrate collagen types and more than 30 genes that encode their constitutive ␣ chains have been identified to date (1)(2)(3)(4). The criteria for classification as collagen are that such proteins have at least one triple-helical domain consisting of polypeptide chains with a repeated Gly-X-Y sequence and are structural components of the extracellular matrix. The collagen types have been named with Roman numerals in the order of their discovery. The fibril-forming collagens, types I, II, III, V, and XI, have a single, uninterrupted triple-helical domain that is available for fibril formation. The genes encoding these types are highly homologous (1,(5)(6)(7), and those for the three major ones, COL1A1, COL1A2, COL2A1, and COL3A1, are characterized by 51-52 exons. Their triple-helical domain is coded for by 41-42 exons, most of which are 54 bp 1 in size or multiples thereof, and each exon begins with a complete codon for a glycine. The class of nonfibril-forming collagens includes types IV, VI-X, and XII-XIX, which all have one or more interruptions in the collagenous sequence. The genes coding for this heterogeneous group are more divergent in structure, and their numbers and sizes of exons can vary considerably (1,5,6).
The complete primary structure of the human ␣1(XV) chain consists of 1388 residues, with the following domains: a 25residue putative signal peptide, a 530-residue N-terminal noncollagenous domain, a 577-residue collagenous sequence, and a 256-residue C-terminal noncollagenous domain (8). The collagenous sequence consists of nine collagenous domains, which are separated by eight noncollagenous domains. Collagen types XV and XVIII have been found to be homologous (8 -13), and it has been suggested that they should be called multiplexins (multiple triple helix domains and interruptions) (10). The N-terminal noncollagenous domains of both collagen chains contain sequence homology to thrombospondin, and seven of their collagenous domains are homologous, as are the C-terminal noncollagenous domains.
The exon-intron organization of the mouse type XVIII collagen gene has recently been determined (14), and a partial structure corresponding to the seven extreme 3Ј exons has been described for the gene encoding human type XV collagen (8). The genes encoding the homologous collagens are located on separate chromosomes, the human gene for the ␣1(XV) collagen chain having been mapped to chromosome 9 (15) and its mouse counterpart to chromosome 4 (16), whereas the ␣1(XVIII) collagen gene is located on human chromosome 21 and mouse chromosome 10 (11).
We report here on the isolation of genomic clones for the human type XV collagen and characterization of the exonintron organization of the entire gene. Comparison of the type XV collagen gene with that encoding type XVIII collagen reveals marked conservation in exon-intron organization, thus indicating that the two genes derive from a common ancestor. Analyses of the 5Ј-flanking sequence of the COL15A1 gene using a computer search for promoter elements and deletion constructs transfected into HeLa cells suggested a "housekeeping promoter" characterized by the lack of a TATAA motif and the presence of apparently functional Sp1 binding sites.

EXPERIMENTAL PROCEDURES
Isolation and Characterization of Genomic Clones-Radioactively labeled human cDNA clones for type XV collagen were used as probes for screening human genomic libraries: a human lung fibroblast genomic library in the FIX TM vector (944201; Stratagene), a human leukocyte genomic library in the vector EMBL-3 (HL1006d; CLONTECH), a human lymphocyte cosmid library in pWE15 (951203; Stratagene), and a human genomic library in the cosmid vector PJB8 (a gift from Dr. Leena Ala-Kokko, University of Oulu, Finland). The screenings were performed under stringent conditions (17): hybridizations were carried out at 41°C in 50% (v/v) formamide in 5ϫ SSC (1ϫ SSC ϭ 0.15 M NaCl, 0.015 M sodium citrate, pH 6.8), 1% (w/v) bovine serum albumin, 1% Ficoll (w/v), 1% polyvinylpyrrolidone (w/v), 0.25 mg of denatured salmon sperm DNA/ml, and 0.1% (w/v) SDS. The final washes for the filters were carried out in 0.5ϫ SSC, 0.1% SDS at 65°C. The positive clones picked from the libraries were analyzed by restriction enzyme mapping and Southern blotting, and suitable restriction fragments were subcloned into the plasmid pBluescript SK (Stratagene).
Gaps in the genomic sequences not covered by the subclones were filled by the PCR method designed for the isolation of end fragments from yeast artificial chromosome clones (18). 100-ng aliquots of the isolated or cosmid DNA were blunt end-digested separately with AluI, EcoRV, HaeIII, HincII, PvuII, RsaI, SmaI, and StuI in a 20-l reaction containing the appropriate buffer as suggested by the supplier of the enzymes (Amersham Pharmacia Biotech). Next, 2.5 l of 10ϫ ligation buffer (1 ϫ ligation buffer ϭ 50 mM Tris, pH 7.8, 10 mM MgCl 2 , 10 mM dithiothreitol, 1 mM ATP, and 25 g/ml bovine serum albumin), 0.5 l of 5 M linker solution (18), 1 l of water, and 1 l of T4 DNA ligase (New England Biolabs Inc.) were added to the individual reactions, which were then incubated overnight at 12°C. The incubation was stopped by adding 75 l of water and heating for 10 min at 95°C. Two l of the diluted ligation mixture was used as the template in a 10-l PCR containing Taq polymerase buffer (Promega; 50 mM KCl, 1.7 mM MgCl 2 , 0.1% Triton X-100, and 10 mM Tris-Cl, pH 9.0), 0.2 mM of each deoxynucleotide, 10 pmol of gene-specific 23-mer primer, 1 pmol of 25-mer linker primer (18), and 1 unit of Taq polymerase (Promega). The amplification conditions were 1 min at 94°C, 45 s at 60°C, and 1 min 30 s at 72°C for 35 cycles. After the first round of PCR, the reaction mixture was diluted with 250 l of water, and 1 l of the dilution was used for the second round of PCR under the same conditions except that 5 pmol of nested gene-specific primer and 5 pmol of linker primer were used. The synthesized fragments were EcoRI-digested, subcloned, and sequenced as described below.
The sizes of the introns were determined by either sequencing, restriction mapping, or PCR. 2 ng of the relevant lambda or cosmid DNA was used as a template in a 50-l PCR. The PCR mixtures contained the same ingredients as above, and sense and antisense primers corresponding to the flanking exons were used.
Nuclease S1 Protection-Total RNA from cultured human skin fibroblasts was isolated by guanidium isothiocyanate-chloroform-phenol extraction (19), and the S1 nuclease protection experiment was performed as described (17,20). A 574-bp SacI-BanI fragment (nucleotides Ϫ469 to ϩ112 in Fig. 3) was 5Ј-end-labeled with T4 polynucleotide kinase and [␥-32 P]ATP (3000 Ci/mmol, Amersham Pharmacia Biotech). The doublestranded probe (3 ϫ 10 5 cpm) was hybridized to 20 g of total RNA from human skin fibroblasts in the presence of 80% formamide, 40 mM Pipes, pH 6.4, 400 mM NaCl, and 1 mM EDTA at 67°C for 15 h. After hybridization, 300 l of buffer (280 mM NaCl, 50 mM sodium acetate, pH 4.5, 4.5 mM ZnSO 4 ) was added, and the mixture was digested with 800 units of S1 nuclease (Boehringer Mannheim) at room temperature for 20 min. The protected fragments were analyzed on a 6% polyacrylamide sequencing gel. 20 g of yeast tRNA was used as a negative control. The exact sizes of the protected fragments were determined by comparison with adjacent dideoxynucleotide sequencing reactions (21).
Nucleotide Sequencing and Sequence Analysis-The nucleotide sequences were determined by the Sanger dideoxynucleotide chain termination method (21) either manually or using an automated DNA sequencer (Applied Biosystems). Vector-specific or sequence-specific 17mer primers synthesized in an Applied Biosystems DNA synthesizer (Department of Biochemistry, University of Oulu) were used, and the nucleotide sequence data were analyzed by DNASIS (Amersham Pharmacia Biotech). Consensus sites for the binding of transcription factors were searched for in the Transcription Factor Data Base using the Sequence Analysis software package, Version 8.0 (Genetics Computer Group, Inc.).
Northern Blot Analysis-Human adult multitissue Northern blots (7760-1 and 7759-1; CLONTECH) were hybridized under stringent conditions with 32 P-labeled probes covering 33 kb of the intron 2 in the COL15A1 gene in the manner suggested in the manufacturer's protocol.
Deletion Constructs for Promoter Analysis-Five deletion constructs consisting of different lengths of 5Ј-flanking sequences of the human type XV collagen gene were made. All of the fragments were restriction enzyme-digested from a HindIII subclone derived from a cosmid clone HG-23 ( Fig. 1) and subcloned into the pGL2-Basic Vector (Promega) upstream from the luciferase gene. An EspI restriction site at the position ϩ27 was utilized as a common 3Ј-end for all of the constructs. A linker primer containing restriction sites EspI-SalI-HindIII was attached to the 3Ј-end of all the constructs, and a HindIII site from a pGL2-Basic Vector (Promega) was used in subcloning. As the 5Ј-ends of the constructs, different restriction sites were used (HindIII for del 1, HincII for del 2, XhoI for del 3, XbaI for del 4, and SacI for del 5). Accordingly, the 5Ј subcloning position in the vector depended on the construct, so that del 1 was subcloned as a HindIII fragment, del 2 as a SmaI-HindIII fragment, del 3 as a XhoI-HindIII fragment, del 4 as a NheI-HindIII fragment, and del 5 as a SacI-HindIII fragment. Deletion constructs used in promoter analysis consisted of the following fragments: del 1, bp Ϫ3598 to ϩ27; del 2, bp Ϫ2615 to ϩ27; del 3, bp Ϫ1858 to ϩ27; del 4, bp Ϫ1117 to ϩ27; and del 5: bp Ϫ474 to ϩ27.
Cell Culture and Transfection Assays-HeLa cells were routinely maintained at 37°C in Dulbecco's modified Eagle's medium (Imperial) supplemented with 10% fetal calf serum, 50 g of ascorbate per ml, 2 mM glutamine, 100 units/ml of penicillin, and 50 g/ml of streptomycin. HeLa cells were transiently transfected with a liposome-based method (DOTAP liposomal transfection reagent kit, Boehringer Mannheim), according to the manufacturer's protocol. Briefly, the various luciferase deletion constructs (5 g) were transfected with 1 g of pCMV-␤galactosidase plasmid (CLONTECH) to normalize for transfection efficiencies. For cotransfection experiments, 5 g of luciferase plasmids were cotransfected with either 1 g of the human Sp1 expression vector (pEVR2/Sp1 plasmid) or 1 g of the control expression vector (pEVR2/0 plasmid). Cells were harvested 24 h after transfection, and luciferase activity was determined from cell extracts using the luciferase assay system (Promega). The ␤-galactosidase activity was measured using the ␤-galactosidase enzyme assay system (Promega). To normalize transfection efficiency for the cotransfection experiments, total DNA was extracted from each sample, and Dot-blot was performed. The nitrocellulose membrane was hybridized with a probe corresponding to a fragment of the luciferase reporter gene. Densitometry scanning of the autoradiograms was performed with the GelWorks 1D program (UVP Gel Documentation and Analysis System, GDS8000). The pGL2-Basic vector and the pGL2-Control vector (Promega) were used as negative and positive controls, respectively. The human expression vector for Sp1 under control of the CMV promoter, pEVR2/Sp1, was a gift of Dr. Suske (Institut fü r Molekularbiologie und Tumorforschung, Marburg, Germany). The control pEVR2/0 was obtained from the plasmid pEVR2/ Sp1 lacking the Sp1 cDNA fragment. All plasmids used for transfection were purified by the plasmid midi kit (Qiagen).

RESULTS AND DISCUSSION
Isolation and Characterization of Genomic Clones-The isolation and characterization of the seven extreme 3Ј exons of the genomic lambda clone HLF-15 ( Fig. 1) encoding part of the human ␣1(XV) chain gene has been described previously (8). In order to isolate additional clones, the same lambda library that yielded clone HLF-15 was screened four times using different fragments of the human type XV collagen cDNA (8) as probes. These screenings resulted in the isolation of five new clones, HLF-3, HLF-5, HLF-13, HLF-17, and HLF-18 (Fig. 1).
To find genomic sequences covering the gap between clones HLF-13 and HLF-15 ( Fig. 1), two additional human genomic libraries, a human leukocyte library, and a cosmid library in the vector PJB8 (see "Experimental Procedures"), were screened with the cDNA clone PF-19 (8). This resulted in the isolation of one clone from each library, a 12-kb lambda clone HL-1-1 and a 30-kb cosmid clone C-1-8-1. These contained sequences that overlapped with each other and with the clones HLF-13 and HLF-15.
To characterize the extreme 5Ј-end of the gene, a human lymphocyte cosmid library was screened with the cDNA clone F-10 encoding the extreme 5Ј sequences of the human type XV collagen cDNAs (8). This screening gave two positive clones, of which HG-23, with an insert of 38 kb, was characterized further and was found to code for the missing N-terminal sequences and also several kb of 5Ј-flanking sequences.
Identification of the Transcription Initiation Site and Sequences of the 5Ј-Flanking Region of the Gene-The transcription initiation site of the gene was determined by S1 nuclease protection analysis (Fig. 2). A double-stranded SacI-BanI DNA fragment corresponding to the sequence Ϫ469 to ϩ112 in Fig.  3 was isolated and 5Ј-end-labeled with ␥-32 P. This probe was then hybridized to total RNA isolated from cultured human skin fibroblasts. When the hybrids were subjected to nuclease S1 digestion, three major protected fragments of sizes 120, 164, and 165 nucleotides and nine minor bands were detected (Fig.  2). Comparison of the sizes of the protected fragments with adjacent dideoxynucleotide sequencing reactions indicated that a major transcription initiation site is located at an adenosine (A) nucleotide and another at two thymidines (T) 44 -45 nucleotides upstream of this. Because the former showed the stronger band and was accompanied by several other initiation sites, it is marked ϩ1 in Fig. 3. About 3.6 kb of the 5Ј-flanking region of the gene was sequenced and further studied by computer analysis using the Transcription Factor Data Base program. The program predicted a promoter area between Ϫ398 and Ϫ142, which corresponds well to the transcription initiation sites predicted by S1 nuclease protection assay. There is no TATAA box in the vicinity of the transcription start sites, but the 5Ј-flanking region of the gene contains a TATAA sequence located between Ϫ404 and Ϫ400 relative to the predicted major transcription start site (Fig. 3). This motif may not be functional, however, in view of results obtained by S1 nuclease protection assay (and the overall structure of the 5Ј-flanking region). If it were functional, transcription initiation would occur about 30 nucleotides downstream. Furthermore, in the presence of a functional TATAA box, transcription initiation should start from a very FIG. 2. Nuclease S1 mapping analysis to locate the transcription initiation site in the human COL15A1 gene. A 574-bp SacI-BanI fragment of the gene was 5Ј-end-labeled and used for nuclease S1 digestion, and an autoradiography of the nuclease S1 digestion products fractionated by gel electrophoresis is shown. The lengths of the protected fragments ranged between 352 and 96 bp. Lane 1, probe with nuclease S1 in the absence of RNA; lane 2, probe without nuclease S1; lane 3, protected fragments of nuclease S1 digestion using the probe and total RNA from cultured human skin fibroblasts. The major protected fragments are indicated by long arrows and the minor ones by short arrows. The arrows in this figure correspond to the asterisks in Fig. 3. The lower part of the figure shows a schematic diagram of S1 nuclease mapping.
precise area, so that the lack of a functional TATAA motif agrees with the presence of multiple transcription initiation sites. The sequence covering nucleotides from Ϫ474 to the ATG was found to be rich in GϩC (68.4%) and to contain consensus motifs for several transcription factors, some of which are shown in Fig. 3. This region contains four potential Sp1 binding sites, for example, and there is also one Sp1 binding site in the first intron. Recently a new protein binding sequence known as multiple start site element downstream (MED-1) was detected in many TATAA-less promoters with multiple start sites (22). This consensus sequence GCTCC(C/G) is found downstream of the mapped transcription initiation sites in the human COL15A1 gene (Fig. 3).
Deletion Analysis of the COL15A1 Promoter-To investigate the functional properties of the human COL15A1 promoter, we performed reporter gene analysis using various deletions con- structs. A series of 5Ј deletions from bp Ϫ3598 to ϩ27 were constructed from the human promoter and linked to the luciferase reporter gene. Transient transfection experiments were carried out in HeLa cells, which express collagen type XV, in five independent experiments, each run in duplicate (Fig. 4, A  and B). After normalization on ␤-galactosidase activity, all the promoter constructs exhibited similar luciferase activity. Consistent with these data, the shortest promoter fragment, from bp Ϫ474 to ϩ27, is sufficient to give the entire promoter activity for the COL15A1 gene in HeLa cells.
Cotransfections with Sp1 Expression Vector-Because the sequence from bp Ϫ474 to the transcription start site was found to be rich in GϩC and to contain four potential Sp1 binding sites, we investigated whether Sp1 has a potential role in the regulation of the COL15A1 gene. The different deletion constructs were cotransfected in HeLa cells with a human Sp1 expression vector or with the corresponding vector without the Sp1 cDNA (control). Results are expressed for each deletion construct as a ratio of the relative luciferase activity obtained with the Sp1 expression vector to that obtained with the control (Fig. 4C). Although basal luciferase activity obtained with the negative control pGL2-Basic vector was not changed, cotrans-fection with the Sp1 expression vector induced the promoter activity of all constructs from 5.5-fold for the longest construct to 10.3-fold for the shortest one. These results suggest that Sp1 is involved in the regulation of the human type XV collagen gene.
Exon-Intron Organization of the Human Gene for the ␣1(XV) Collagen Chain-DNA sequencing of the genomic clones indicated that the human type XV collagen gene consists of 42 exons and 41 introns (Fig. 1). Sequences were determined for all the exons, their intron junctions, most of the intronic sequences of reasonable size and about 3.6 kb of the 5Ј-flanking region of the gene. Exons 1-41 vary in size from 36 to 548 bp, whereas the extreme 3Ј exon is 1119 bp in length, containing 908 bp of 3Ј-untranslated sequences ( Table I). The introns vary in length between 89 bp and about 55 kb (Table I). The various overlapping genomic clones covered the entire gene with the exception of introns 2 and 9, the sizes of which were obtained by Southern blotting of genomic DNA. The exon-intron boundaries (Table II)   NC10 -1119 f a UTR, untranslated region. b Size of the exon calculated from the major transcription start site as determined by S1 nuclease assay (see "Results"). c Size of an intron determined by sequencing. d Size of an intron determined by restriction mapping. e Size of an intron determined by PCR. f 211-nucleotide coding sequences followed by 3Ј-UTR sequences.
The data indicate that the human type XV collagen gene is quite large, its transcribed region being about 145 kb. The coding information is unevenly distributed in the gene, because about 90 kb of 5Ј genomic sequences contains only the first 11 exons encoding the N-terminal noncollagenous domain, whereas the rest of the gene, exons 12-42, lies within a 55-kb genomic area. The longest intron, intron 2, is about 55 kb in size, and in order to find out whether this contains coding sequences, we hybridized Northern blots containing poly(A) ϩ RNA isolated from human heart, brain, placenta, lung, liver, skeletal muscle, kidney, pancreas, spleen, thymus, prostate, testis, ovary, small intestine, colon, and peripheral blood leukocytes with probes covering those of its regions included in clones HG-23 and HLF-17, but no signal was detected.
Exons 12-36 cover collagenous sequences with multiple interruptions, exons 12 and 36 themselves being junction exons encoding both noncollagenous terminal domains and collagenous sequences ( Fig. 1 and Table I). Because type XV collagen contains several interruptions in the collagenous sequence, many of the exons encoding this region contain both collagenous and noncollagenous sequences. There are eight exons encoding solely collagenous sequences (exons 16, 17, 20, 23, 24, 26, 27, and 32) (Table I), of which four are 36 bp in length, one is 54 bp, two are 63 bp, and one is 81 bp. With the exception of TABLE II Exon-intron boundaries of the gene coding for the human ␣1(XV) collagen chain The 5Ј-end of exon 1 is not shown here because it represents the 5Ј-end of the corresponding mRNA, which was detected by S1 nuclease assay (Figs. 2 and 3). exon 32, all of them begin with a complete codon for glycine, which is characteristic of collagen genes. The 54-bp exon is typical of the fibril-forming collagen genes, whereas 36-and 63-bp exons are found in several genes encoding nonfibrilforming collagens (1,5,6).
Altogether, seven exons in the type XV collagen gene begin with a split codon (Table II); three of them are located in the genomic region encoding the N-terminal noncollagenous domain. The large 548-bp exon homologous to thrombospondin-1 begins with a split codon for a glutamic acid. The genomic sequences coding for the central collagenous domain contain three split codons, two of which are located in consecutive exons in the collagenous sequence and one of which is located in the junction exon between the last collagenous domain and the C terminus. Exons 37-42 encode the C-terminal noncollagenous domain, and exon 42 begins with a split codon for a tryptophan. The exons encoding the central collagenous sequences are, on average, shorter than those encoding flanking noncollagenous N and C-terminal sequences.
Comparison of the Human ␣1(XV) and Mouse ␣1(XVIII) Collagen Genes-The human type XV and mouse type XVIII collagen genes are of somewhat different sizes, the former being about 145 kb in size and the latter about 102 kb, and they have 42 and 43 exons, respectively. They are highly similar in their exon-intron organization, but the introns in the type XV gene are in most cases longer than those in the type XVIII gene.
The first intron is less than 100 bp in both genes, whereas the second is conspicuously longer, 55 and 50 kb in the type XV and XVIII genes, respectively. The 548-bp exon 3 in the type XV gene and the 551-bp exon 4 in the type XVIII gene (14) are both homologous to thrombospondin-1, and both begin with a split codon for a glutamic acid. This feature is also conserved in the human and mouse thrombospondin genes (28,29), demonstrating marked genomic conservation of this sequence motif. Five additional split codons are conserved in the type XV and type XVIII collagen genes (Table III). The lengths of the exons coding for the collagenous sequences showing homology between the ␣1(XV) and ␣1(XVIII) chains (Table III) and the locations of the respective exon-intron junctions are identical or almost identical in the two genes. As described previously, the marked homology between the C-terminal noncollagenous domains of the ␣1(XV) and ␣1(XVIII) chains extends to the exonintron organization of the corresponding regions of their genes (13). The last three exons encoding the most conserved portion of the polypeptides share a 55-61% homology at the nucleotide level (Table III).
Conclusions-The genes encoding the fibril-forming collagens range in size from 18 to 53 kb and consist of over 50 exons (1,5,6), whereas those encoding the nonfibril-forming collagens show more extensive heterogeneity in their genomic organization: they vary in size from 5 kb for COL10A1 (30) to 750 kb for COL5A1 (31) and in number of exons from 3 for COL10A1 (30) to 118 for COL7A1 (26). The present characterization of the complete exon-intron structure of the COL15A1 gene, showing it to be about 145 kb in size and to contain 42 exons, makes it one of the largest collagen genes, with a typically high number of exons.
The exon pattern of the COL15A1 gene differs markedly from that of the fibril-forming collagen genes, in which the triple-helix is encoded predominantly by exons of 54 bp or multiples of this (1, 5, 6). Only one of the exons in the COL15A1 gene that code for purely collagenous sequences is 54 bp in size.

TABLE III
Comparison of exons homologous between the human ␣1(XV) and mouse ␣1(XVIII) collagen genes Exons that have identically located 5Ј-end exon-intron junctions in the type XV and type XVIII collagen genes are indicated in boldface type. Exons that begin with split codons are marked with asterisks. Type XVIII collagen has one collagenous domain and one noncollagenous domain more than type XV collagen, and consequently, the homologous domains differ by one in their numbering.
In fact, none of the nonfibril-forming collagen genes characterized so far displays the 54-bp exon pattern observed in the fibril-forming collagen genes, whereas many of them, including COL15A1, typically contain 36-and 63-bp exons, in addition to exons of more variable length, encoding the interrupted collagenous sequences (1,5,6).
The 5Ј-flanking region of COL15A1 is characterized by the lack of a TATAA motif and the presence of several GC motifs. This renders the 5Ј-flanking region of COL15A1 similar to promoters of the "housekeeping genes," which are transcribed widely but at low RNA levels in many tissues. Several other collagen genes also contain multiple GC boxes as their main promoter elements, including the COL5A1, COL7A1, COL11A1, and COL11A2 genes (26,(32)(33)(34). In addition, the downstream promoter of COL6A2 (35) and the upstream promoter of Col18a1 (14), two collagen genes with alternate promoters, are also of this kind. Transient transfection experiments, which were performed on HeLa cells with 5Ј deletion constructs ranging from bp Ϫ3598 to ϩ27, indicated that the shortest promoter fragment, from bp Ϫ474 to ϩ27 had the same promoter strength as all the longer constructs. Furthermore, cotransfections with the Sp1 expression vector directly demonstrated that this transcription factor could regulate the expression of the human type XV collagen gene in HeLa cells through binding to one or more of the four Sp1 sites within this fragment.
The COL15A1 and Col18a1 genes show marked structural similarity. The mouse Col18a1 gene contains one exon more than the human COL15A1 gene, but it is about 40 kb smaller in size, thus presenting a nonconserved picture of the sizes of the introns. This may also be due to species differences. Their homology covers 25 exons that are nearly identical in size, 6 of which contain conserved split codons (Table III). The homologous exons are spread throughout the entire gene, including the closely adjacent first pair of exons and the exons manifesting the thrombospondin-1 homology, which are followed by the most variable region of the two genes, covering part of the noncollagenous domain 1 and the beginning of the collagenous portion, after which most of the exons are homologous. The homology is most pronounced in the region encoded by the last three large exons (Table III). The second intron is large in both genes, over 50 kb. It is typical of collagen genes that they possess regulatory elements in intron 1, e.g. in the COL1A1 and COL2A1 genes (5). Intron 1 of the COL15A1 gene is only 89 bp in length, and that of Col18a1 is only 71 bp, but it is highly possible that the large intron 2 in both genes may contain such elements. One notable difference likely to occur between the two genes is that the COL15A1 gene lacks sequences corresponding to exon 3 of the Col18a1 gene (14), which encodes a cysteine-rich region of the mouse ␣1(XVIII) collagen chain noncollagenous domain 1 homologous to rat and Drosophila frizzled proteins (36). This cysteine-rich sequence has not been found in any of the human type XV collagen cDNA clones characterized so far (8,9), or in any of the mouse type XV collagen cDNAs (16), suggesting that this exon is indeed absent from COL15A1. Furthermore, our Northern blotting experiments with probes covering 33 kb of intron 2 did not reveal any mRNA signals. All in all, the comparison of the two genes clearly points to a common ancestor.
Collagen types XV and XVIII have both similarities and differences in tissue distribution. Both are prominently synthesized by endothelial cells in practically all tissues studied (37,38). The most obvious differences are the strong type XV collagen expression in muscle tissue, where type XVIII is present in much lower amounts, whereas the opposite situation prevails in liver tissue (12,38). There are no data available on the mechanisms that regulate the expression of these genes, but the tissue distribution data suggest that these mechanisms are not identical despite their otherwise extensive homology.