Cooperativity between Far Upstream Enhancer and Proximal Promoter Elements of the Human α2(I) Collagen (COL1A2) Gene Instructs Tissue Specificity in Transgenic Mice*

Interaction between the proximal (-378) promoter and the far upstream (-20 kb) enhancer is essential for tissue-specific expression of the human α2(I) collagen gene (COL1A2) in transgenic mice. Previous in vitro studies have shown that three Sp1 binding sites (around -300) are part of a cytokine-responsive element and that two TC-rich boxes (around -160 and -125) and a CBF/NFY consensus sequence (around -80) confer optimal promoter activity by interacting among themselves and with the upstream Sp1 sites. Here we report that mutations of the Sp1 binding sites, TC-rich boxes or CBF/NFY consensus sequence lead to reduced transgene activity, thus underscoring the functional interdependence of the proximal promoter elements. Loss of the Sp1 binding sites was associated with loss of transgene expression in osteoblasts, whereas elimination of the CBF/NFY binding site (alone or in combination with the TC-rich boxes) was correlated with a lack of activity in the ventral fascia and head dermis and musculature. Additionally, transgene expression in skin fascia fibroblasts depended on the integrity of the Sp1 binding sites and TC-rich boxes, and on their physical configuration. Evidence is also presented suggesting cooperativity between cis-acting elements of the far upstream enhancer and proximal promoter in assembling tissue-specific protein complexes. This study thus reiterates the complex and highly combinatorial nature of the regulatory network governing COL1A2 transcription in vivo.

Interaction between the proximal (؊378) promoter and the far upstream (؊20 kb) enhancer is essential for tissue-specific expression of the human ␣2(I) collagen gene (COL1A2) in transgenic mice. Previous in vitro studies have shown that three Sp1 binding sites (around ؊300) are part of a cytokine-responsive element and that two TC-rich boxes (around ؊160 and ؊125) and a CBF/ NFY consensus sequence (around ؊80) confer optimal promoter activity by interacting among themselves and with the upstream Sp1 sites. Here we report that mutations of the Sp1 binding sites, TC-rich boxes or CBF/NFY consensus sequence lead to reduced transgene activity, thus underscoring the functional interdependence of the proximal promoter elements. Loss of the Sp1 binding sites was associated with loss of transgene expression in osteoblasts, whereas elimination of the CBF/NFY binding site (alone or in combination with the TC-rich boxes) was correlated with a lack of activity in the ventral fascia and head dermis and musculature. Additionally, transgene expression in skin fascia fibroblasts depended on the integrity of the Sp1 binding sites and TC-rich boxes, and on their physical configuration. Evidence is also presented suggesting cooperativity between cis-acting elements of the far upstream enhancer and proximal promoter in assembling tissue-specific protein complexes. This study thus reiterates the complex and highly combinatorial nature of the regulatory network governing COL1A2 transcription in vivo.
Type I collagen, the major structural component of the connective tissue, is produced predominantly by cells of mesenchymal origin in a tightly controlled spatiotemporal manner (1). Coordinated biosynthesis of the type I collagen subunits (␣1(I) and ␣2(I) chains) is critically important for morphogenesis and growth, as well as for tissue homeostasis and repair (1). Moreover, abnormal activation of the human type I collagen genes (COL1A1 1 and COL1A2) in response to cytokines and other inflammatory agents is associated with excessive matrix deposition in a variety of fibrotic diseases, including liver cirrhosis, glomerulosclerosis, and scleroderma (2)(3)(4)(5)(6). During the past 10 years, the COL1A2 gene has emerged as an informative model system in which to study the molecular mechanisms and cellular factors that control extracellular matrix assembly and function in normal and diseased conditions (4,6). These studies were broadly based on two experimental "read-outs" of transcriptional regulation, namely cell transfections and transgenic mice.
In vitro analyses have documented that a complex array of combinatorial interactions among transcription factors binding to the Ϫ378 proximal promoter sequence directs constitutive and cytokine-modulated expression of the COL1A2 gene (4,6). Specifically, the sequence extending from nucleotides Ϫ313 to Ϫ183 (also known as the cytokine responsive element or CyRE) has been shown to mediate the antagonistic signals of transforming growth factor-␤, tumor necrosis factor-␣, and interferon-␥ through the activation of the Smad, JNK, and JAK/STAT pathways, respectively (7)(8)(9)(10)(11). Additional CyRE binding sites involved in constitutive and/or cytokine-modulated COL1A2 expression include those of C/EBPs, Sp1, AP1, Fli-1/Ets-1, and NFkB (12)(13)(14)(15)(16)(17). Furthermore, cooperativity between TC-rich boxes at Ϫ173/Ϫ155 and Ϫ133/Ϫ108 and a CBF/NFY recognition sequence at Ϫ84/Ϫ80, as well as between these elements and the Sp1 sites of the CyRE, has been shown to optimize COL1A2 promoter activity (18,19). Finally, interferon-␥-responsive elements have been mapped to the 5Ј TC-rich box and to the core promoter region (nucleotide ϩ7) in association with binding of YB-1 and the RFX5/CIITA complex, respectively (20,21). Transgenic mouse work has demonstrated that high and tissue-specific transcription in vivo from the Ϫ378 promoter requires the participation of a relatively long (2.3 kb) enhancer element located some 20-kb upstream of the start site of transcription (22). This work has also shown that the proximal promoter is capable by itself of directing collagen I-specific transcription, albeit inefficiently and in a mosaic pattern. Together, these findings were interpreted to indicate that the predominant function of the COL1A2 far upstream enhancer is to broaden and augment tissue-specific transcription from the proximal promoter (22).
The present study employed the transgenic approach to assess whether the far upstream enhancer could confer tissue specificity on its own, and whether individual elements of the proximal promoter contribute differently to COL1A2 transcription in vivo. The results suggest cooperativity between the far upstream enhancer and the proximal promoter in assembling tissue-specific protein complexes. They confirm in vitro observations indicating that interactions among proximal promoter elements are required for optimal transcription. They also indicate that transcription factors binding to individual promoter elements are responsible for distinct properties of COL1A2 expression in vivo. More generally, the study reiterates the highly combinatorial nature of COL1A2 transcription.

EXPERIMENTAL PROCEDURES
DNA Constructs-The control LacZ reporter constructs harboring the far upstream enhancer (from Ϫ21.1 to Ϫ18.8 kb) and/or the proximal promoter of COL1A2 (from Ϫ378 to ϩ52 bp) had been already described (22). Mutant constructs were engineered using PCR amplification to insert single nucleotide substitutions into the various nuclear protein binding sites of the proximal promoter (18,23,24). Deletion of the 3Ј half of the proximal promoter was generated by using BstX1 and BstN1 restriction sites (12). The hsp68LacZpA plasmid was a generous gift of Dr. J. Rossant (Mt. Sinai Hospital, Toronto, Canada) (25). Preparation of linearized plasmid DNA for microinjection was according to the standard protocol (26).
Generation and Analysis of Transgenic Embryos-Transgenic embryos were produced by the standard pronuclear injection of DNA into fertilized C57Bl/10 ϫ CBA/Ca or B6C3 F1 eggs (26). Plasmid DNA was digested with appropriate enzymes, purified from agarose gel, and microinjected at a concentration of 2-4 ng/ml in 10 mM Tris, pH 7.4, and 0.1 mM EDTA. Embryos were collected from the recipient females mainly at embryonic day 15.5 (E15.5) for whole-mount fixation and staining. This stage was chosen because it is characterized by high Col1a2 activity and to avoid decreased skin permeability because of increased keratinization (27). Southern blot hybridization and/or PCR amplification of placental DNA were used to assess transgene integration as described previously (22). Positive embryos had comparable numbers of integrated transgenes. After cutting open the thorax and abdomen, embryos were placed in cold phosphate-buffered saline and fixed for 45-60 min in 0.2% glutaraldehyde, 2% formalin in 0.1 M phosphate buffer, pH 7.3, containing 2 mM MgCl 2 and EGTA. After three washes of 1 h each in the same buffer supplemented with 0.1% sodium deoxycholate and 0.2% Nonidet P-40, embryos were stained overnight at room temperature in 1 mg/ml of 5-bromo-4-chloro-3-indolyl-␤-D-galactosidase solution (X-gal) containing 5 mM potassium ferrocyanide and 5 mM ferricyanide. For the histology, X-gal-positive embryos were dehydrated and wax-embedded, and 6-m tissue sections were prepared, de-waxed, and counterstained with eosin.

RESULTS
Although we originally argued that COL1A2 promoter specificity is enhanced by the far upstream sequence, the data did not exclude the presence of tissue-specific elements in the far upstream enhancer as well (22). To address this important point, we co-injected the linearized 2.3-kb long far upstream enhancer into fertilized mouse eggs together with a construct in which the LacZ reporter was under the control of the minimal heat shock promoter (hsp68lacZpA) (28). Unlike transgenic embryos harboring only hsp68lacZpA (Fig. 1A1), those in which the heterologous promoter and COL1A2 far upstream enhancer had co-integrated into the genome (n ϭ 7) consistently showed ␤-galactosidase activity that varied among different transgenes (Fig. 1, A2 and A3). Irrespective of the degree of transgene expression, LacZ was only transcribed in collagen I-producing tissues. For example, skin fascia, osteoblasts, and muscle were X-gal-positive in the highest transgene expressor but not to the same degree as in the transgene with the homologous promoter (Fig. 1, B-F). Together with our earlier results (22), these data suggested that tissue-specific determinants may reside within both the proximal promoter and far upstream enhancer of COL1A2.
Next, we examined the transcriptional contribution of individual cis-acting elements in the proximal promoter within the context of the interaction with the far upstream enhancer. Accordingly, we engineered transgenic LacZ constructs containing the 2.3-kb far upstream enhancer sequence linked to mutant versions of the proximal promoter ( Fig. 2A). The mu- Histological sections of A2 show staining in the skin fascia and body muscle (m) but not in the epithelial layer (arrow). Staining is also detected in the diaphragm (C, arrow) and intercostal muscles (ic), but not in the liver (L), and periosteum (D, arrows). In E and F are comparable sections from a transgenic mouse with the COL1A2 promoter and enhancer (22) showing high and uniform X-gal staining in skin fascia and muscle (E), as well as in clavicle osteoblasts (F, arrows).

FIG. 2. Transgene constructs.
A, sequence of the human proximal promoter from Ϫ378 to ϩ52 bp with indicated Sp1 binding sites of the CyRE (green), the downstream TC-rich boxes (orange), and the CBF/NFY recognition sequence (blue). Nucleotide substitutions in these DNA elements are shown along with the location of the BstX1 and BstN1 restriction enzyme sites used to generate the TC;CB⌬ transgene. B, diagrammatic representation of the wild-type (WT) and mutant transgenic constructs with reference to the sequence shown in A. HS, DNase I hypersensitive sites numbered as per Antoniv et al. (22). tations included nucleotide substitutions in the three Sp1 binding sites of the CyRE (SPm transgene), in the binding sites of the two TC-rich boxes (TCm transgene), and in the CBF/NFY recognition sequence (CBm transgene) (Fig. 2B) (18, 23, 24). Fifteen-and-a-half-day-old founder embryos (E15.5) were ex-amined for transcriptional activity (as indicated by the overall intensity of whole-mount ␤-galactosidase staining) and for tissue distribution of LacZ activity (as monitored by histological analyses of ␤-galactosidase-positive embryos). All transgenic embryos were processed under the same experimental conditions to ensure reliable comparisons within and among transgenic constructs. Moreover, the activity of each mutant transgene was compared with that of the lines containing the wild-type enhancer and/or proximal promoter (22).
SPm transgenics (n ϭ 8) were found to express LacZ significantly less than the wild-type construct (Fig. 3, A1 and  A2). Histological examination of ␤-galactosidase-positive embryos revealed great variability among different transgenic embryos, with distinct combinations of collagen I-producing cells displaying a mosaic pattern of transgene expression. Specifically, we found that ␤-galactosidase activity was relatively strong and uniform in the meninges, intercostal muscles, and kidney (Fig. 3, B-E). By contrast, no transgene expression was detected in osteoblasts or skin fascia fibroblasts (Fig. 3, B and D). These results suggested that the CyRE mutation impairs transgene expression in osteoblasts and skin fascia fibroblasts in addition to reducing the overall strength of the enhancer/promoter interaction.
On average, the expression of positive embryos harboring the TCm construct (n ϭ 3) was similar to that of the SPm embryos (Fig. 4, A1 and A2). Like them, there was no detectable LacZ activity in fascia mesenchymal cells (Fig. 4C). Unlike SPm embryos, mutations in the TC-rich boxes had little or no effect on transgene expression in osteoblasts (Fig. 4, B and D). Finally, TCm transgene activity was relatively higher in the Sagittal sections show staining in the body muscle of the back but not in the fascia below the arrowheads pointing to the epidermis (B). In C, the meninges but not the brain (b) are stained. In D, the intercostal muscle (ic) but not the rib osteoblasts (arrow) are positive. In E, the kidney is strongly and uniformly stained. lower back of the embryos (Fig. 4, A1 and A2). We concluded from these results that the TC-rich boxes of the proximal promoter contribute to COL1A2 expression in skin fascia fibroblasts but not in osteoblasts.
The transcriptional strength of CBm transgenes (n ϭ 4) was slightly higher than the SPm and TCm embryos (Fig. 5, A1 and A2). Unlike either of them, the CBm transgene was transcribed in both osteoblasts and skin fascia fibroblasts (Fig. 5, B-E). Interestingly, half of the CBm expressors exhibited a rather unique X-gal staining pattern characterized by intense labeling of the posterior region of the body and tail, with an almost defined boundary in the ventral side of the animals (Fig. 5A2). Histological sections confirmed this conclusion by showing an abrupt transition in skin fascia from cells expressing the transgene on the dorsal side to those negative on the ventral side (Fig. 5D). The transition was not correlated with changes in cell morphology or tissue organization. The CBm expression pattern thus supported the previous conclusions from the SPm and TCm transgenics that the CyRE complex contributes to osteoblast specificity and that factors binding to the CyRE and TC-rich boxes may cooperate in conferring skin fascia specificity. Additionally, the results raised the intriguing possibility that CBF/NFY may be involved in determining the pattern of COL1A2 expression in mesenchymal tissues.
The last set of transgenic experiments was designed to provide independent validation for the last conclusion. Accord- ingly, two enhancer-containing transgenic constructs were engineered that harbored two distinct mutations in the region downstream of the CyRE. The first transgene (TC;CBm) contained nucleotide substitutions in both TC-rich boxes and CBF/ NFY consensus sequence, whereas the CyRE region of the second transgene (TC;CB⌬) was artificially juxtaposed to the TATA box by deleting the sequence encompassing the TC-boxes and the CBF/NFY consensus sequence (Fig. 2). Both mutations led to higher ␤-galactosidase activity than mutations in individual promoter elements (Figs. 6 and 7, A). Additionally, TC; CBm and TC;CB⌬ were expressed with the same rostral-ventral exclusion pattern as seen in half of the CBm embryos ( Figs.  6 and 7, A). Histological analysis confirmed this similarity by showing X-gal staining in some internal organs, muscles, and bones (Figs. 6, C and D and 7, B-D) but not in the ventral portion of skin fascia (Figs. 6B and 7, G and H). This marked boundary was more evident in older (E16.5) TC;CB⌬ embryos in which the dermis is more mature (Fig. 7, A1 and E). Another example of differential expression of the TC;CB⌬ transgene at comparable anatomical locations was the strong X-gal staining of skeletal muscles in the hind limbs and the lack of such staining in the forelimbs (Fig. 7D). There was no detectable transgenic activity in the head region rostrally to shoulder level (Figs. 6 and 7, A). These observations thus inferred that CBF/ NFY binding to a specific element of the proximal promoter contributes to patterned COL1A2 expression. DISCUSSION During the past 10 years, a large body of in vitro and in vivo studies have delineated the regulatory network of COL1A2 as consisting of multiple and overlapping nuclear protein binding sites, which are located in a distal enhancer, the proximal promoter, and the core promoter region (4,6). The present study advances our understanding of COL1A2 transcriptional regulation by providing new insights into the contribution of the enhancer and promoter interaction to tissue specificity. The conclusions discussed below relied on the analysis of COL1A2 transgenes according to the relative strength and cell-type specificity of LacZ expression.
Previously we reported that the Ϫ378 COL1A2 promoter drives low transgene expression in a number of collagen I-producing cells, including osteoblasts (22). Here we showed that the far upstream enhancer confers some collagen I specificity to the otherwise inactive hsp68 promoter. However, the higher variability in transgene expression seen with the far upstream enhancer in combination with the heterologous promoter, as opposed to the homologous promoter, strongly suggests that hsp68 lacks the requirements to stabilize this interaction. We therefore propose that complementary elements in the far upstream enhancer and proximal promoter cooperate in promoting optimal and tissue-specific expression of COL1A2. This hypothesis is in line with the emerging "enhanceosome" model whereby synergism between proteins distantly bound to DNA leads to a stereospecific multiprotein assembly on the promoter, which induces and/or stabilizes DNA looping and regulates transcription (29). Our data also raise the possibility that the equivalent regulatory network of the mouse Col1a2 gene may not be organized exactly in the same way. First, we have found that the Col1a2 enhancer instructs hsp68 transcription in all collagen I-producing cells except osteoblasts (data not shown). Second, we did observe that osteoblasts very seldom express LacZ in transgenes containing only the Ϫ350 Col1a2 promoter sequence (27).
Another aim of the present study was to delineate the in vivo role of proximal promoter elements previously identified using the cell transfection approach (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19). This is a particularly important goal if one were to translate the knowledge of COL1A2 regulation into new therapeutic modalities against fibrotic diseases (30,31). The loss of constitutive expression and cytokine response of transfected plasmids harboring mutations in the Sp1 binding sites of the CyRE have underscored the critical role of this transcription factor in COL1A2 regulation (23,32). Similar analyses have also inferred cooperative interaction(s) between Sp1 bound to the CyRE and Sp1/Sp3 and CBF/NFY bound to the downstream DNA elements (18,19). The finding that mutations introduced in each of these binding sites reduce transgene activity is strong evidence that these protein interactions operate in vivo as well, and in concert with the enhancer-bound complex. The results are also consistent with implicating the CyRE-bound Sp1 in conferring bone specificity to the COL1A2 transgene. By contrast, the consequences of the mutations on skin fascia fibroblast expression are more difficult to interpret without invoking a contribution by the physical configuration of the promoter elements, and by the relative affinity of the binding proteins (33). On the one hand, loss of skin fascia expression in SPm and TCm transgenes would suggest that both DNA elements are required for fibroblast activity. On the other hand, skin fascia expression by the TC;CBm and TC;CB⌬ transgenes would indicate a predominant role of the CyRE complex in driving fibroblast activity. We believe that this apparent discrepancy may reflect the fact that both the physical configuration of the promoter elements and the relative affinity of the binding proteins contribute to transgene activity (33). We rest our belief on two correlative lines of evidence. First, the higher activity of the TC;CBm or TC;CB⌬ constructs compared with those harboring mutations in only one of the two elements is consistent with earlier in vitro data suggesting multiple levels of both negative and positive interactions among the proximal promoter elements (18,19). We therefore concur with these earlier studies in proposing that the nucleotide substitutions in the downstream elements may relieve some CyRE repression, thus restoring fibroblast activity. Second, the relatively higher Xgal-staining intensity of TC;CB⌬ compared with TCm;CBm points to the involvement of promoter configuration (as evidenced by the closer proximity of the CyRE-bound factors to the TATA box) in determining the strength of the promoter/enhancer interaction and, implicitly, cell type-specific levels of transgene expression.
Another intriguing finding of this study was the unique expression pattern of transgenic mice harboring mutations in the binding site of transcription factor CBF/NFY. Originally identified as a common regulator of the type I collagen genes, this heterotrimeric protein has been more recently shown to be absolutely required for embryonic development and also to be involved in chromatin remodeling (24,34,35). We have shown that nucleotide substitutions that abrogate CBF/NFY binding, alone or in combination with mutations in the TC-rich boxes, lead to loss of LacZ expression in the ventral and head regions of the dermis, as well as in the muscles of the forelimbs. The patterning of specific cell lineages implies that CBF/NFY may be essential for COL1A2 activity in those cells that do not express the transgene. It is worth noting that the dermal cell lineage in the ventral and fascial regions, which are LacZ negative, have a distinct embryonic origin from those of the dorsal region, which are LacZ positive. Although cells of the lateral and ventral dermis of the trunk originate from the somatopleura mesoderm and express dermo1, those of the dorsolateral dermis derive from the dermatome and express Msx1 (36). Our transgenic data therefore raise the intriguing possibility that CBF/NFY may be implicated in determining the pattern of COL1A2 expression in some mesenchyme cell lineages through a yet to be defined mechanism.
In conclusion, this study is the first to elucidate the transcriptional roles of the cis-acting elements of the COL1A2 proximal promoter within a chromatin context and in relationship to the far upstream enhancer interaction. It is also the first to provide strong evidence indicating that the interaction is likely to be mediated by complementary sets of tissue-specific determinants leading to the assembly of a functional enhanceosome. Other DNA elements have been identified that need to be re-evaluated within the same in vivo context before a full account of the COL1A2 regulatory network is attained. They include the methylation dependent core promoter site, which modulates polII access to the transcription start site, the c-Myb binding sites at around Ϫ1.2 kb, which may be involved in scleroderma pathogenesis, and the sequences around evolutionarily conserved and unique DNase I hypersensitive sites (21,22,37). Work in progress is evaluating the transcriptional contribution of the last two COL1A2 elements in the transgenic mouse model.