Identification of the key regions within the mouse pro-alpha 2(I) collagen gene far-upstream enhancer.

Studies using transgenic mice have shown that the mouse pro-alpha2(I) collagen gene contains a far-upstream enhancer, which directs expression in the majority of collagen I-producing cells during development and in response to tissue injury. In this study, we have investigated the minimal functional region required for the enhancer effect and studied the role of the three hypersensitive sites (HS3-HS5) that overlap this region. The results of deletion experiments indicate that the minimal functional unit of this enhancer is a 1.5-kb region between -17.0 and -15.45 kb from the transcription start site. This region includes the core sequences of HS3 and HS4 but not HS5. The HS4 sequences are essential for the functional integrity of the enhancer, whereas HS3 represents tissue-specific elements that direct expression in mesenchymal cells of internal tissues and body wall muscles. The HS3 region appears to bind a complex of transcription factors illustrated by large regions of protected sequences. A 400-bp sequence located between -17.0 and -16.6 is also essential for the enhancer because its deletion results in increased susceptibility to the chromatin environment.

the ␣1(I) and ␣2(I) collagen genes may involve a number of distinct elements, each directing expression in a separate tissue, cell type, or stage of development or disease. This hypothesis has been supported by studies of transgenic mice harboring various regions of the mouse, rat, and human pro-␣1(I) collagen promoters that demonstrate a modular arrangement of distinct and separate tissue-specific elements within the first 3.2 kb of the proximal promoter (4). These elements were shown to be responsible for reporter gene expression in different subsets of type I collagen-producing cells, namely osteoblasts, odontoblasts, fascia, and tendon fibroblasts (4). More recently, an additional osteoblast element was characterized (5) as well as a uterine fibroblast element farther upstream between Ϫ7.0 and Ϫ8.0 kb from the transcription start site (6).
Although the mouse pro-␣1(I) and pro-␣2(I) collagen genes are coordinately expressed, the arrangement of the control elements appears to differ between the two genes. Unlike the mouse pro-␣1(I) collagen gene, the pro-␣2(I) collagen gene does not appear to contain discernable tissue-specific elements within the Ϫ3.2-kb proximal promoter, but the minimal promoter (Ϫ350 to ϩ54 bp) alone is capable of promoting very limited levels of reporter gene expression in transgenic models (7), in contrast to the pro-␣1(I) minimal promoter (8). A survey of the upstream region of the mouse pro-␣2(I) gene promoter reported the presence of three DNase I hypersensitive sites (HS), 1 HS5, HS4, and HS3, located approximately Ϫ17.1, Ϫ16.2, and Ϫ15.5 kb upstream of the transcription start site. These sites were present only in collagen type I-producing cells (9). Transgenic mice harboring a 6.0-kb region encompassing the three hypersensitive sites between Ϫ19.5 and Ϫ13.5 kb, linked to the minimal promoter (Ϫ350 to ϩ54 bp), resulted in a high level of reporter gene expression in the majority of collagen-producing cells. However, unlike the mouse pro-␣1(I) collagen gene upstream region, no reporter gene expression was detected in the osteoblasts of the endochondral bones or the odontoblasts, although some reporter gene expression could be detected in the osteoblasts of bones of intramembranous origin. We termed this region the far-upstream enhancer because of its ability to drive reporter gene expression independent of position and orientation and in the presence of nonspecific minimal promoters (9).
These data suggested that this far-upstream enhancer contains a number of distinct cis-regulatory elements involved in the control of expression in different type I collagen-producing cells. In this study, we sought to identify the functional unit(s) of the far-upstream enhancer and to investigate the role of hypersensitive sites within the upstream enhancer. The results show that the minimal functional enhancer is 1.5 kb in length, and the sequences overlapping HS4 are probably the most critical for the functional activity of this enhancer.

EXPERIMENTAL PROCEDURES
DNA Constructs-The 5Ј-flanking sequence of the mouse pro-␣2(I) collagen gene was derived from a FIX II library (Stratagene). The clone 3-3 contained 5Ј-flanking sequences of the pro-␣2(I) collagen gene from Ϫ23.3 to Ϫ11.0 kb relative to the transcriptional start site (9). Two reporter gene vectors were used in generating transgenic mice, pLacRM350 and pLacES350, which was adapted from pLacRM350, with the 3Ј multiple cloning site in reverse orientation. These vectors were modified from pLacF (10), a pUC18-based plasmid containing a lacZ gene with a eukaryotic translation initiation codon and mammalian polyadenylation sequences. Both vectors also contain the mouse pro-␣2(I) collagen gene minimal promoter (ϩ54 to Ϫ350 bp) 5Ј of the lacZ gene.
Generation of Transgenic Mice-Transgenic mice were generated using the mouse pro-␣2(I) constructs shown in Fig. 1. Vector sequences were removed with NruI and SacII. The digested DNA constructs were gel-purified, suspended in 10 mM Tris/HCl (pH 7.4), 0.1 mM EDTA (pH 8) at a concentration of 2-3 ng/l, and injected into the pronuclei of fertilized C57Bl10 ϫ CBA/C eggs as previously described (11). Undamaged eggs were transferred into pseudopregnant CD1 foster mothers. The foster mothers were sacrificed at 15.5 days of gestation, and the embryos were stained for ␤-galactosidase activity as previously described (11). In addition, DNA was extracted from the placentas of the founder animals and used to genotype the animals by PCR analysis.
Cell Culture-Transient transfection assays were performed using the FuGENE 6 TM (Roche Diagnostics) transfection reagent according to the manufacturer's instructions. Briefly, NIH3T3 and L929 cells were grown in 6-well plates in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum. At log phase growth, the cells were transfected with 2 g of total plasmid DNA and 6 l of FuGENE 6 TM reagent (2 ϫ 10 5 cells per transfection). pRSVLuc was co-transfected (0.2 g) to check the efficiency of transfection. The cells were harvested 72 h post-transfection using lysis buffer, and activities of the reporter genes ␤-galactosidase and luciferase were measured using the Dual-Light ® chemiluminescent reporter gene assay system (Applied Biosystems) according to the manufacturer's instructions. Protein content was also measured using BCA reagents (Pierce). The specific activity was calculated by dividing the data from ␤-galactosidase assay by the luciferase data and then by the total protein content.
DNase I Footprinting and Hypersensitive Site Analysis-The preparation of nuclei and DNase I digestion was carried out as previously described (9). Briefly, NIH3T3 and L929 cells were washed with ice-cold phosphate-buffered saline, scraped, pelleted, and resuspended in buffer A (15 mM Tris/HCl (pH 7.6), 15 mM NaCl, 60 mM KCl, 1 mM EDTA, 0.5 mM EGTA, 0.3 mM sucrose, 0.1% Triton X-100, 0.15 mM spermine, 0.5 mM spermidine, 1 mM phenylmethylsulfonyl fluoride, 1 mM dithiothreitol). Cells were mechanically disrupted, the resulting homogenate was diluted with an equal volume of buffer B (buffer A without Triton X-100), and the nuclei were centrifuged. Nuclear pellets were resuspended in 5 volumes of buffer C (buffer A without Triton X-100, EDTA, or EGTA), and DNA concentrations were estimated by UV absorption at 260 nm. 15 absorbance units were used for each DNase I digestion. These reactions were performed in 40 mM Tris/HCl (pH 7.6) and 6 mM MgCl 2 using 10 l of DNase I (Amersham Biosciences) at concentrations between 0 and 10 units/l. After a 15-min incubation at room temperature, the reaction was halted by adding 2 volumes of 50 mM Tris/HCl (pH 8.0), 100 mM NaCl, 100 mM EDTA, 1% SDS, and 40 l of proteinase K (20 mg/ml). Purified DNA was then digested with appropriate restriction enzymes, and Southern analysis was performed according to the standard protocol (12). To locate the specific restriction enzyme sites that lay 5Ј and 3Ј of the hypersensitive site cores, mouse genomic DNA was also digested with the same enzyme, in addition to one of nine specific enzymes located within the area of interest. These samples were run alongside the DNase I-treated extract. For DNase I footprinting, plasmid DNA was digested with the appropriate restriction enzyme and end-labeled by filling in 3Ј-recessed ends with Klenow enzyme (12). Gel purification of labeled DNA fragments, preparation of nuclear extracts from NIH3T3 cells, nuclear protein binding, and DNase I footprinting reactions were performed as previously described (13).

Identification of the 5Ј Boundary of the Far-upstream
Enhancer-Transgenic mice were generated containing the constructs pGB(Ϫ19.5/Ϫ13.5), pGB(Ϫ17.0/Ϫ13.5), and pGB(Ϫ16.6/Ϫ13.5). These constructs are depicted in Fig. 1A and consist of the farupstream enhancer with progressive 5Ј deletions fused to the Ϫ350 to ϩ54 minimal promoter and lacZ gene. The number of embryos positive for the transgene are given in Table I.
DNase I hypersensitive site analysis using a genomic ladder digested with a series of known restriction enzymes refined the location of HS3, HS 4, and HS5 reported by Bou-Gharios et al. (9) and showed the restriction enzyme sites between which the hypersensitive core sites were located. Construct pGB(Ϫ19.5/ Ϫ13.5) contained DNase I hypersensitive sites HS3, HS4, and HS5, whereas constructs pGB(Ϫ17.0/Ϫ13.5) and pGB(Ϫ16.6/ Ϫ13.5) contained only HS3 and HS4 because HS5 is located between the StyI site at Ϫ17.05 and the EcoRI site at Ϫ17.2 and, hence, beyond the 5Ј boundary of the latter two constructs ( Fig. 1).
Mice transgenic for constructs pGB(Ϫ19.5/Ϫ13.5) and pGB(Ϫ17.0/Ϫ13.5) showed a very similar staining pattern to that previously reported (9). No differences in expression pattern were noted with the smaller pGB(Ϫ17.0/Ϫ13.5) construct, and the percentage of transgenic mice expressing the construct was also similar to that of the larger construct (Table I). Because we have established that HS5 is beyond the 5Ј boundary of pGB(Ϫ17.0/Ϫ13.5), these results show that the DNA sequence represented by HS5 is not required for the correct expression of the pro-␣2(I) collagen gene during mouse development.
In contrast, the mice transgenic for and expressing the construct pGB(Ϫ16.6/Ϫ13.5) with a 400-bp deletion from the 5Јend showed a staining pattern different from that seen in mice expressing the pGB(Ϫ17.0/Ϫ13.5) construct. X-gal staining was only detected in 29% of the transgenic embryos and varied quite considerably in the pattern of expression between embryos ( Fig. 2). Histological examination showed that the expression of the reporter gene could not be detected in the intramembranous bones of the calvarium, but the clavicle showed some limited staining. Expression of the reporter gene was also severely limited in the mesenchymal cells associated with skeletal muscle with only two embryos showing detectable expression in the body wall muscles along the back of the embryos. Reporter gene expression in the connective tissue of the fascia was also limited to certain regions of the back. X-gal staining in the internal organs was absent except for the connective tissue of the pancreatic primordia, and low level staining was seen in the fibrous and smooth muscle layers of the stomach.
These results show that the 5Ј boundary of the functional far-upstream enhancer lies between the BamHI site at Ϫ17.0 and the AvaI site at Ϫ16.6 kb upstream of the transcriptional start site. This 400-bp region does not contain any detectable DNase I hypersensitive sites but has been shown here to have a significant impact on the functional expression of the farupstream enhancer. Transgenic mice were also generated with the construct pGB(Ϫ17.0/Ϫ16.6), which contains the essential 400-bp region. Only 17% of the transgenic embryos expressed the transgene (Table I). Furthermore, in these mice the X-gal staining pattern was aberrant, with staining seen only in noncollagen-producing tissues (data not shown).
Identification of the 3Ј Boundary of the Far-upstream En-hancer-Transgenic mice harboring the constructs pGB(Ϫ17.0/ Ϫ14.7) and pGB(Ϫ17.0/Ϫ15.45) with 3Ј deletions of 2.2 and 1.95 kb, respectively (Fig. 1B) were very similar to those expressing pGB(Ϫ17.0/Ϫ13.5), with high levels of reporter gene expression detected in the intramembranous osteoblasts, mesenchymal cells of the muscle, tendon, and fascia, smooth muscle cells of the blood vessels and gut, and connective tissue of the majority of internal organs (Fig. 3). This indicated that the region located between Ϫ15.45 and Ϫ13.5 kb may not be essential to the expression of the pro-␣2(I) collagen gene during development.
In contrast, transgenic mice expressing the construct pGB(Ϫ17.0/Ϫ15.7), in which a further 280 bp had been deleted from the 3Ј-end of the enhancer region, showed a significant reduction in the level and pattern of transgene expression. However, that pattern was consistent in all founder animals, which were expressing the transgene (Fig. 4). This deletion effectively removed the HS3 core site, located between the HindIII site at Ϫ15.45 and the StyI site at Ϫ15.7 (Fig. 1B). Reporter gene expression was detected in the intramembranous osteoblasts and in fibroblasts of the fascia, tendon, and muscle of certain defined regions, primarily around the nose and the limbs (Fig. 4). Apart from these regions, no expression was detected in skeletal or smooth muscle (with the exception of some staining in the stomach) or in the connective tissue of the internal organs. These results suggest that the sequence between the StyI site at Ϫ15.7 and HindIII at Ϫ15.45 is required for reporter gene expression in the mesenchymal cells of connective tissue of the internal organs and in most regions of the muscle, tendon, and fascia. These results also indicate that HS3 plays a role in directing tissue-or region-specific control of the mouse pro-␣2(I) collagen gene.
Nuclear Protein Binding Sites around HS3-In an attempt to identify possible binding sites contributing to the tissue-and spatial-specific regulation of the mouse pro-␣2(I) collagen gene, a DNase I footprinting assay was employed to map sites of nuclear protein interaction in the region of the mouse pro-␣2(I) collagen gene upstream enhancer represented by HS3. The analysis identified two distinct areas of nuclear protein protection within the region between the StyI site at Ϫ15.7 and the HindIII site at Ϫ15.45 kb upstream of the transcriptional start site (Fig. 5). These footprints (FP1 and FP2) were detected with extracts from both L929 and NIH3T3 cells. Both footprints lie at the 5Ј of the fragment. This analysis strongly suggests that the region represented by HS3 is responsible for binding several nuclear proteins that may have a role in tissue-or stagespecific expression of the mouse pro-␣2(I) collagen gene in mesenchymal cells. Indeed, computer-based sequence analysis searches through MatInspector V2 (14) revealed a number of transcription factors with 100% core recognition. These are shown in Fig. 5.
No mice transgenic for the construct pGB(Ϫ17.0/Ϫ14.7 del 0.5) showed any detectable reporter gene expression. In con-trast, 44% in embryos transgenic for the construct pGB(Ϫ17.0/ Ϫ15.45 del 0.25) showed detectable expression (Table I). In all but one embryo, reporter gene expression was restricted to the tail region, in tendon and the fascia (Fig. 6). This result demonstrates that the sequence representing the HS4 region is essential in maintaining the functional integrity of the enhancer.
To further examine the influence of the HS4 core region on gene transcription, transient transfections were carried out to establish the effect of HS4 deletion on the ability of the farupstream enhancer to promote reporter gene expression in vitro (Fig. 7). Although both pGB(Ϫ17.0/Ϫ14.7) and pGB(Ϫ17.0/ Ϫ15.45) promoted an increase in ␤-galactosidase-specific activity relative to the Ϫ350 minimal promoter alone, the two HS4deletion constructs, pGB(Ϫ17.0/Ϫ14.7 del 0.5) and pGB(Ϫ17.0/ Ϫ15.45 del 0.25), also showed similar transgene activity when compared with the intact HS4-containing enhancer constructs. DISCUSSION Several studies of transcriptional control of the genes encoding collagen type I have suggested the existence of a number of distinct tissue-specific elements directing expression of the pro-␣1(I) and pro-␣2(I) collagen genes. The work presented in this study using transgenic mice demonstrates that the upstream enhancer has a core functional unit of 1.5 kb between Ϫ17 and Ϫ15.45 kb from the transcriptional start site. The sequences within this region contain several cis-acting elements that work in concert to produce an enhancer that drives high levels of expression in a tissue-specific manner. One of these elements is represented by the sequences overlapping HS4, which we show to be essential for the functional integrity of this enhancer.
The 5Ј boundary of the functional regions of the mouse farupstream enhancer has been located between Ϫ17.0 and Ϫ16.6 kb upstream of the transcriptional start site. Deleting the 2.5-kb region between Ϫ19.5 and Ϫ17.0 kb of the 6-kb farupstream enhancer had very little effect on the expression pattern of the reporter gene, even though this deletion effectively removed the DNase I hypersensitive site, HS5. This is demonstrated here by transgenic mice expressing the construct pGB(Ϫ17.0/Ϫ13.5), which showed X-gal staining in the same pattern and intensity as seen with the larger pGB(Ϫ19.5/ G, X-gal staining is seen in the smooth muscle cells of the stomach (s), in the connective tissue of the pancreatic primordia (pp), and the degenerating mesonephric tissue (ms). Staining is also detected in both the fibroblastic cells of the capsule and the connective tissue of the kidney (k). H, staining in the meninges around the brain (arrow). Black bar, 100 m.
Ϫ13.5) construct. It is worth noting that the human COL1A2 shares high homology with the mouse enhancer and also contains three DNase I hypersensitive sites. The human enhancer requires all three HS sites for correct functioning, and deletion of the most 5Ј HS site, which shares high sequence similarity with mouse HS5, results in the loss of most of the expression (15). This suggests that subtle differences in sequence homology are important, and/or there is a rearrangement within the conserved human sequences that promote regulation via different combinatorial elements. FIG. 4. Transgenic mice generated with HS3 sequence deletion. A, two representative whole-mount embryos at E15.5 expressing pGB(Ϫ17.0/Ϫ15.7). Sagittal sections of both embryos show limited X-gal staining around the ossification of the frontal bone (B) and the clavicle (C, arrows). Staining is detected in the connective tissue in tendon (D), fascia (f), and muscle (m) of the limbs (E). X-gal staining is also seen in the fascia around the nose (F). Black bar, 100 m. 5. Footprint of sequences overlapping HS3. The 280-bp DNA sequence between the StyI site at Ϫ15.7 and the HindIII site at Ϫ15.45 kb from the transcription start site in the pro-␣2(I) promoter was incubated with nuclear extract from L929 and NIH3T3 fibroblasts. In each test, DNA was incubated with increasing amounts of Dnase I in the presence (ϩ) and absence (Ϫ) of cell nuclear extracts. G/A indicates the Maxam and Gilbert reaction as a marker. Two footprints were detected, (FP1) and (FP2), with sequence indicated vertically. The sequence of the positive strand is outlined with putative transcription factors generated with MatInspector computer analysis. Arrows denote the core sequence used by the program.

FIG.
Further deletions of the Ϫ17.0-kb BamHI site resulted in the loss of most reporter gene expression, as seen in embryos expressing the construct pGB(Ϫ16.6/Ϫ13.5). It is therefore apparent that the deleted 400-bp region contained sequences essential for the full function of the far-upstream enhancer. Furthermore, the variegated expression seen in the founder animals, where no two embryos expressed the same pattern, unlike any other deletions, may indicate that this 5Ј-sequence protects the enhancer from position effect variegation. This protection has been shown in other genes, such as the hCD2 locus in which sequences in its HHS3 region protect against position effect (16).
The 3Ј boundary of the functional regions of the far-upstream enhancer has been shown to be located near the HindIII restriction site Ϫ15.45 kb upstream of the transcriptional start site. Transgenic mice expressing both constructs pGB(Ϫ17.0/ Ϫ14.7) and pGB(Ϫ17.0/Ϫ15.45) demonstrated high levels of reporter gene expression in a pattern very similar to that of the larger Ϫ19.5/Ϫ13.5 enhancer. However, removal of the 250-bp region between Ϫ15.7 and Ϫ15.45, which included the DNase I hypersensitive site HS3, had a deleterious but consistent effect on the function of the far-upstream enhancer. The generation of transgenic mice harboring an additional construct containing the missing 250 bp (HS3) did not result in any expression (data not shown). These deletion experiments have demonstrated that the minimal sequence necessary to confer the full effect of the far-upstream enhancer is located between Ϫ17.0 and Ϫ15.45 kb. This region alone is capable of promoting reporter gene expression in the fibroblastic cell lineages of most organs in which in situ hybridization has demonstrated expression of collagen type I (17).
The most significant result of this study is the demonstration that deletion of the sequences overlapping HS4 abolishes the function of the enhancer. Although there is some evidence that small deletions of the core of HS sites may result in artificially severe phenotypes (18,19), it is unlikely to be the explanation here because differently sized HS4 deletions resulted in similar phenotypes, showing that in the absence of HS4 the far-upstream enhancer is unable to promote expression in vivo. Furthermore, the results of the transient transfection assay demonstrate that when the enhancer is not integrated into the genome the deletion of HS4 has no effect on the level of expression. Taken together these results imply that the region represented by HS4 is essential for the function of the far-upstream enhancer, probably by modifying the chromatin environment.
Although these deletion experiments have demonstrated that the minimal sequence necessary to confer the full effect of the far-upstream enhancer is located between Ϫ17.0 and Ϫ15.45 kb, this enhancer does not constitute a locus control region (LCR), nor do these regions represent all the regulatory elements that control mouse pro-␣2(I) collagen gene expression in all tissues. No reporter gene expression was ever detected in the osteoblasts derived from endochondral ossification or in the cells of the liver. It is therefore likely that as yet unidentified DNA sequences elsewhere in the pro-␣2(I) collagen gene domain are necessary to achieve complete control of gene expression and to make up a functioning LCR.
The notion of separate DNA elements or modules directing tissue-specific expression has already been documented in a number of collagen-encoding genes, including the mouse pro-␣1(I)s collagen gene (4). It was thought that the mouse pro-␣2(I) collagen gene far-upstream enhancer may actually consist of a number of distinct tissue-specific elements, similar to those found in other collagen genes. We investigated the presence of such tissue-specific cis-acting elements around sequences represented by DNase I hypersensitive sites because these regions represent areas of DNA in a distinct non-nucleosome conformation and are composed of clusters of individual protein binding sites (20,21). The deletion of sequences overlapping HS3 resulted in a significant loss of expression in several tissues, most notably in the skeletal muscle of the body wall and attenuated expression of fascia staining within the skin layers of the embryos. This pattern was repeatable in all expressors and did not suffer from position effect variegation, as seen with the 5Ј-deletion. This suggested that HS3 sequences contain tissue-specific cis-acting elements responsible for certain mesenchymal lineage. The footprinting results indicate that this region does indeed bind transcription factors. Moreover, the size of the footprinted region, at least in fibro- blasts, suggests that a complex of factors may be involved. The computational search for transcription factor binding sites revealed a number of transcription factors, many of which were muscle-specific transcription factor binding sites including myogenic basic helix-loop-helix MyoD protein, the MEF2 protein, Mt-binding protein, and the muscle initiator sequence TATA box, as well as the transcriptional activators and regulators (AP1, CAAT boxes, and GATA3) known to be required for their function in vivo (22). Further experiments are underway to delineate the critical transciption factors in this region of the enhancer.
One barrier to identifying specific regulatory elements in fibroblasts is the nature of these cells. Unlike osteoblasts, fibroblasts do not form a homogeneous population nor are they easily divided into subpopulations (23). Considerable heterogeneity has also been reported within the fibroblast population of single tissues, although attempts to define the subpopulations have so far been less than successful (24,25). Interestingly, one of the phenotypes used to identify different subpopulations within a tissue has been the level of collagen type I synthesis. Thus, transcription of the pro-␣2(I) collagen gene may not be a simple matter of distinct elements. If collagen expression in a single tissue varies between different subpopulations of fibroblasts, identification of specific control elements regulating collagen expression may be a more complicated matter than previously envisaged.
In conclusion, this study demonstrated that there are three key regions to the upstream enhancer, each of which appears to affect gene expression via different mechanisms. Together they generate the enhancer of the mouse pro-␣2(I) collagen promoter.