IDENTIFICATION OF A REPRESSOR IN THE FIRST INTRON OF THE HUMAN α2(Ι) COLLAGEN GENE (COL1A2)*

The human and mouse genes that code for the alpha2 chain of collagen I (COL1A2 and Col1a2, respectively) share a common chromatin structure and nearly identical proximal promoter and far upstream enhancer sequences. Despite these homologies, species-specific differences have been reported regarding the function of individual cis-acting elements, such as the first intron sequence. In the present study, we have investigated the transcriptional contribution of the unique open chromatin site in the first intron of COL1A2 using a transgenic mouse model. DNase I footprinting identified a cluster of three distinct areas of nuclease protection (FI1-3) that span from nucleotides +647 to +760, relative to the transcription start site, and which contain consensus sequences for GATA and interferon regulatory factor (IRF) transcription factors. Gel mobility shift and chromatin immunoprecipitation assays corroborated this last finding by documenting binding of GATA-4 and IRF-1 and IRF-2 to the first intron sequence. Moreover, a short sequence encompassing the three footprints was found to inhibit expression of transgenic constructs containing the COL1A2 proximal promoter and far upstream enhancer in a position-independent manner. Mutations inserted into each of the footprints restored transgenic expression to different extents. These results therefore indicated that the unique open chromatin site of COL1A2 corresponds to a repressor, the activity of which seems to be mediated by the concerted action of GATA and IRF proteins. More generally, the study reiterated the existence of species-specific difference in the regulatory networks of the mammalian alpha2(I) collagen coding genes.

2 mesenchymal cell types. That collagen I transcripts are found in the same 2:1 ratio as the corresponding chains has been interpreted to suggest that common regulatory programs coordinate expression of the two genes (7). However, multiple studies have failed to reveal a common organization of cis-acting elements and cognate trans-acting factors that would be consistent with the notion of shared regulatory programs between the collagen I genes (6). In point of fact, transgenic studies have revealed that the regulatory networks of the mammalian collagen I genes are organized very differently. On the one hand, tissue-specific production of α1(I) collagen chains is under the control of distinct and separate DNA elements scattered throughout the 3.2 kb immediately upstream of the start site of transcription (8)(9)(10)(11). On the other hand, proper α2(I) collagen synthesis is the result of combinatorial interactions amongst nuclear factors that bind to overlapping DNA motifs clustered within the proximal promoter, as well as between them and those binding to a far-upstream enhancer (12)(13)(14)(15).
Species-specific differences have also emerged with respect to the organization of the regulatory network of the human COL1A2 and mouse Col1a2 gene (12)(13)(14)(15). Chromatin analyses have shown that COL1A2 and Col1a2 share five DNase I hypersensitive sites (HS) within nearly identical sequences of the proximal promoter (HS1) and 2.3 kb (HS2) and 20 kb upstream of from the transcription start site (HS3-5) (12,13,16). Furthermore, studies in transgenic mice have demonstrated that high and tissue-specific expression of both COL1A2 and Col1a2 proximal promoters requires interaction with the upstream sequence containing HSs 3-5, also know as the far-upstream enhancer (12,13).
Deletion experiments have however shown that the region around HS5 is dispensable in the mouse, but absolutely required in the human transgene (13,14). Additional analyses have documented that the proximal promoter or the far-upstream enhancer of the human but not of the mouse gene, can by themselves drive transgenic expression in osteoblasts (15).
The transcriptional contribution of the first intron sequence is another potential difference between the two species. First, we have identified an open chromatin site, HS(In), that is unique to the first intron of the human gene (13). Second, earlier cell transfection experiments had assigned an enhancing activity to the first intron of Col1a2 and an inhibitory role to the COL1A2 counterpart (17,18). As part of our effort to delineate the full anatomy of the COL1A2 regulatory network, we have revisited these early studies using the transgenic mouse model in conjunction with DNA-binding assays and guided by the knowledge that the intronic sequence contains an open chromatin site (13). The results indicate that the sequence harboring HS(In) acts as a strong inhibitor of COL1A2 transcription, thus supporting the earlier contention of Sherwood et al. (17). Our investigations also mapped relevant cis-acting elements within the HS(In) sequence, identified the cognate trans-acting factors and demonstrated that full HS(In) repressing activity requires the concerted action of GATA and IRF transcription factors. This study therefore extends the characterization of the major functional elements of the COL1A2 regulatory network, in addition to identifying another speciesspecific difference between the human and mouse genes.

EXPERIMENTAL PROCEDURES
DNA Binding Assays-Nuclear extracts were purified from cultured WI-38 human lung fibroblasts, NIH-3T3 mouse fibroblasts or Jurkat T cells according to the previously published protocol (19). For DNase I footprinting assay, a plasmid DNA containing the intron 1 sequence that spans from nucleotides +524 to +895 was cleaved internally with Hind III, end-labeled by filling-in 3'recessed ends with the Klenow enzyme, excised from the plasmid backbone with EcoRI , and incubated with nuclear extracts with or without addition of DNase I as previously described (19,20). Likewise, the EMSA conditions for oligonucleotide end-labeling and incubation with nuclear extracts were essentially the same as previously described (19,20). In some experiments, nuclear extracts were pre-incubated with commercial antibodies against GATA or IRF proteins (Santa Cruz Biotechnology, Santa Cruz, CA) or molar excesses of unlabelled mutant or wild-type oligonucleotides were added to the nuclear extract incubation. Footprinted sequences or DNA-protein complexes were resolved by polyacrylamide gel electrophoresis and visualized by autoradiography.
The chromatin immunoprecipitation (ChIP) assay was performed on WI-38 cells using a commercial kit (Upstate, Lake Placid, NY) and according to the published protocol (21). Oligonucleotide primers corresponding to +680/+705 and +1074/+1047 (intron-specific) or corresponding to -2472/-2450 and -2358/-2339 (negative control) were employed to PCR amplify sequence potentially bound by various nuclear proteins. The PCR reaction was performed for a total of 38 cycles after initial denaturation at 93 o C for 3 min.; amplification conditions included denaturation at 93 o C for 45 sec., annealing at 55 o C (intronspecific) or 47 o C negative control)for 1 min. and elongation at 72 o C for 2 min. One seventh of the immunoprecipitated genomic fragment was used as a template for amplification, except for the input sample in which 0.01% of the total DNA was used. Results of the ChIP analysis were visualized by Southern blot hybridization to the intron or upstream sequences of the amplification products separated on a 2% agarose gel. Intron sequences from different vertebrate organisms were derived from the Ensemble database (www.ensemble.org) and aligned using the program GeneDoc (www.psc.edu/biomed/genedoc). Transgenic Constructs-The control LacZ reporter constructs harboring the far-upstream enhancer and proximal promoter of COL1A2 had been already described (13). Mutant constructs were engineered using PCR amplification to insert single nucleotide substitutions into the various nuclear protein-binding sites of the intronic sequence. Preparation of linearized plasmid DNA for microinjection was according to the standard protocol (22). Generation and Analysis of Transgenic Embryos-Transgenic embryos were produced by the standard pronuclear injection of DNA into fertilized C57Bl/10 x CBA/Ca F1 eggs (22). Plasmid DNA was digested with appropriate enzymes, purified from agarose gel and microinjected at a concentration of 2-4 ng/ml in 10 mM Tris (pH 7.4) and 0.1 mM EDTA. Embryos were collected from the recipient females mainly at 15.5 days post coitum (E15.5) for whole-mount fixation and staining. This stage was chosen because it is characterized by high Col1a2 activity and to avoid decreased skin permeability due to increased keratinization (12). Southern blot hybridization and/or PCR amplification of placental DNA were used to assess transgene integration as previously described (12). After cutting open the thorax and abdomen, embryos were placed in cold phosphatebuffered saline and fixed for 45-60 min in 0.2% gluteraldehyde, 2% formalin in 0.1 M phosphate buffer ph 7.3 containing 2 mM MgCl 2 and EGTA. After three washes of 1 hr each in the same buffer supplemented with 0.1% sodium deoxycholate and 0.2% Nonidet P-40, embryos were stained overnight at room temperature in 1 mg/ml of 5-bromo-4chloro-3-indolyo-β-D-galactosidase solution (X-gal) containing 5 mM potassium ferrocyanide and 5mM ferricyanide. For histology, X-gal positive embryos were dehydrated and wax-embedded, and 6-µm tissue sections were prepared, de-waxed and counterstained with eosin. The data presented are from embryos with comparable numbers of transgenic inserts, 2 to 5.

RESULTS AND DISCUSSION
Early cell transfection experiments have shown that the first intron of Col1a2 and COL1A2 stimulates and inhibits transcription, respectively (17,18). These studies were however performed using long intronic sequences (1.2-1.7 kb) and without prior knowledge of the precise location of relevant cis-acting element(s). Subsequent analyses of chromatin structure located a unique DNase sensitive site, termed HS(In), at about +730 in the first intron of COL1A2 and within an evolutionarily divergent sequence (13,17). The present study was designed to characterize this putative regulatory element of COL1A2 using transgenic mice in combination with DNA-binding assays. Accordingly, the DNase I footprinting assays were first performed on a genomic fragment spanning from nucleotides +524 to +801 in order to map sites of nuclear protein interaction around HS(In). The analysis located a cluster of three clearly distinct footprinted areas within the HS(In) encompassing region, which were designated FI1 (+647/+676), FI2 (+696/+734) and FI3 (+746/+760) (Fig. 1A). Radiolabeled oligonucleotides containing each of the footprinted areas were then used in the electrophoretic mobility shift assay (EMSA) to confirm binding of nuclear proteins. The EMSA revealed formation of specific retarded bands with each of the three probes (Fig. 2). A computer-aided inspection of the footprinted areas identified potential binding sites for transcription factors GATA (FI1 and FI2) and IRF (FI3) (Fig. 1B). Wild-type and mutant forms of these recognition sites were therefore tested in a competition assays against the respective labeled probes. The resulting EMSAs documented the ability of the wild-type oligonucleotides and the inability of the mutant sequences to affect complex formation (Fig. 2).
Cross-species sequence alignment of the relevant intron elements revealed only a modest level of sequence homology and loss of most of the GATA and IRF binding sites identified in the human gene (Fig. 1C).
Additional EMSAs using specific antibodies confirmed that GATA and IRF proteins indeed bind to the FI1 and FI2 sequences and to the FI3 sequence, respectively. Specifically, the assays showed that the FI1 and FI2 probes bind GATA-4 and not GATA-1, whereas probe FI3 recognized both IRF-2 and IRF-1 (Fig. 2). That the IRF-1 antibodies reduced FI3 complex formation without yielding a supershift could be accounted for by unspecific antibody interference. However, lack of IRF-1 antibody interference with Jurkat nuclear extracts excluded this possibility (Fig. 3A). Consistent with the differential distribution of the two GATA proteins (23), the specificity of the FI2 complex was indirectly corroborated by the finding that GATAs 2 and 3 or GATA-4 bind to FI2 in Jurkat T-cells and fibroblasts, respectively (Fig. 3A). The same results were obtained with the IF1 probe (data not shown). These in vitro binding assays were confirmed in vivo by ChIP analysis of human lung fibroblasts. Sequence-specific PCR amplification of genomic DNA immunoprecipitated with antibodies against GATA-4, IRF-1 or IRF-2, but not with unspecific antibodies, yielded reproducibly positive signals when hybridized to the HS(In) probe (Fig.  3B). Furthermore, specificity of in vivo binding was independently confirmed by lack of positive signals in a parallel control sample in which the ChIP assay was performed with an upstream sequence of COL1A2 (Fig. 3B). Preliminary evidence also suggests that a potential CREB/AP1 recognition sequence in FI2 binds c-Jun (data not shown).
Having determined the precise location of the HS(In) element and the identity of the interacting nuclear factors, the next experiments examined its functional contribution to COL1A2 transcription. Transient transfections using a 372 bp intronic sequence (+524 to +895) inclusive of the HS(In) element showed a slight downregulation of the -378 promoter activity, consistent with the earlier notion of a repressing activity of the COL1A2 intron (data not shown) (17). Based on this preliminary evidence, the transgenic model was then employed to examine the HS(In) element within the in vivo context and in relationship to the interaction between the proximal promoter and far-upstream enhancer. Transgenic constructs included the original 21.1/18.8pLAC plasmid, which contains the core sequence of the farupstream enhancer and the -378 proximal promoter (13), and a modification of 21.1/18.8pLAC in which the wild-type 372 bp segment containing the HS(In) element had been inserted downstream of the reporter gene (21.1/18.8pLAC-In), or between the far-upstream enhancer and proximal promoter (21.1/18.8(In)pLAC) (Fig.  4). Several 21.1/18.8pLAC-In founders were generated and all showed lower β-galactosidase staining in most tissues compared with embryos harboring the intronless 21.1/18.8pLAC transgene (Fig. 5A and B). Histological sections demonstrated that βgalactosidase staining in different transgenics, albeit variable in intensity, was always confined to collagen I-producing cells (Fig. 5D-I). Overall, the intensity and distribution of X-gal staining in the 21.1/18.8pLAC-In transgene was reminiscent of the pattern previously observed with the proximal promoter transgene without the far-upstream enhancer (13). Limited staining was seen in some ossification centers of intramembraneous bones ( Fig.  5H and I), and in patches of skin fascia and tendon (Fig. 5G) as well as in a few internal organs, such as the forming kidney and spleen ( Fig. 5D and F). By contrast, no staining was detected in the lung, heart, gut and blood vessels (Fig. 5E). Similar results were obtained with 21.1/18.8(In)pLAC, the construct in which the intronic sequence had been inserted between the enhancer and the promoter (Fig. 5C). Together, these findings demonstrated the inhibitory effect of the HS(In) sequence on COL1A2 transcription.
Based on the above findings, we assessed the contribution of individual nuclear protein-binding sites to HS(in) inhibition of 21.1/18.8pLAC expression by examining β-galactosidase activity in transgenic mouse embryos harboring mutated versions of FI1, FI2 or FI3. The mutations included nucleotide substitutions in the binding sites of each footprint (21.1/18.8pLAC-In m1 , 21.1/18.8pLAC-In m2 and 21.1/18.8pLAC-In m3 ) and in the GATA-binding sites of both FI1 and FI2 (21.1/18.8pLAC-In m1,2 ) (Fig. 4). X-gal staining revealed comparable βgalactosidase levels that were similar to that of the intronless 21.1/18.8pLAC transgene (Fig. 6A-D). They also identified a few interesting differences amongst the four mutant transgenes. First, X-gal staining in the skin and other tissues of the 21.1/18.8pLAC-In m1 and 21.1/18.8pLAC-In m2 transgene was lower than in 21.1/18.8pLAC-In m3 embryos (Fig. 6A, B and D). Sole exception was the unusually strong β-galactosidase activity in 21.1/18.8pLAC-In m1 and 21.1/18.8pLAC-In m2 bones ( Fig. 6K and L). In point of fact, this was the strongest X-gal staining ever recorded in bone with any of the COL1A2 constructs examined in this and previous studies (13,15). Second, LacZ gene expression in internal organs of the 21.1/18.8pLAC-In m2 transgene was consistently higher than the 21.1/18.8pLAC-In m1 construct (data not shown). Third, the combination of both GATA mutations (21.1/18.8pLAC-In m1,2 ) yielded the same level of Xgal staining as the mutation of only the IRF-binding site (21.1/18.8pLAC-In m3 ) and was the closest to that of the intronless control transgene (Fig. 6C and D). Several founders harboring transgenes with m1, m2, m3 mutations were sectioned and examined histologically. This revealed intense β-galactosidase activity in all type I collagen-producing cells, such as skin fascia fibroblasts (Fig. 6E), skeletal muscle cells, tendon and blood vessels (Fig. 6H). High level staining was also noted in gut, lung, pancreas and splenic premordium (Fig. 6F-I). Collectively these results indicated that full repressor activity requires the participation of all nuclear protein-binding sites, and that each of them may play a more prominent role within individual cellular contexts.
In summary, the present study has demonstrated for the first time that the HS(In) element is a strong repressor of COL1A2 transcription in vivo. This conclusion was based on the ability of a short HS(In) containing sequence to inhibit the activity of an experimental model that closely replicates the expression pattern of COL1A2 in transgenic mice. Within the limitations of this in vivo model, our result supports Sherwood et al. (17) cell transfection data and reiterates the existence of functional differences in the organization of the regulatory network of the Col1a2 and COL1A2 genes. As such, it underscores the peril of extrapolating functional conclusions from one mammalian species to another.
The EMSA and the ChIP assay have correlated the inhibitory activity of the HS(In) sequence with the specific binding of GATA and IRF proteins to three nearly juxtaposed elements. Moreover, transgenic experiments have demonstrated that inhibition requires the full complement of nuclear protein binding sites. They have also raised the possibility that each of the cisacting HS(In) elements imparts slightly different properties to the inhibitory protein complex. GATA proteins represent a small family of zinc finger transcriptional regulators that are expressed in hematopoietic stem cells (GATAs 1, 2 and 3) and in a variety of mesoderm and endoderm-derived tissues (GATAs 4, 5 and 6) (23). Consistent with the tissue distribution of GATA family members, we observed binding of GATA-4 and GATAs 2 and 3 with nuclear extracts from NIH-3T3 and Jurkat cells, respectively. GATA proteins have been reported to modulate tissue-specific gene expression across various cell types by interacting with a large array of transcriptional activators and repressors (23). Along these lines, we recently found that the HS2 element of COL1A2 is another GATA-binding site that represses transcription from the -378 promoter (24). It is therefore conceivable to argue that combinatorial interactions among GATA proteins and co-factors at different COL1A2 sites may orchestrate expression of this mesenchyme-specific gene within different cellular contexts.
Originally identified as transcriptional repressors or activators of interferon-β (IFN-β) and of INFγ-inducible genes, IRFs have later emerged as broader regulators of other biological processes, such as cell growth, in conjunction with other nuclear proteins (25). A case in point is IRF-2 which has been shown to repress IFN-β activation by coactivator repulsion (26). In this novel regulatory mechanism, incorporation of IRF-2 into the enhanceosome prevents recruitment of the CBP-polII holoenzyme complex through a specific protein domain. Similar to the relatively higher activity of 21.1/18.8pLAC-In m3 compared with the other mutant transgenes in most collagen I-producing tissues, inactivation of IRF-2 in mice has been shown to expand the number of cells that respond to viral infection by inducing INF-β gene transcription (26). Although our study did not address whether a similar mechanism may operate in COL1A2, it nonetheless suggested that other factors which normally counteract HS(In) repression in collagen I-producing cells are not present in the transgenic model used as our experimental read-out. Amongst others, probable modulating factors include additional sequences and cognate trans-acting factors and/or appropriate spacing of the interacting domains in the regulatory network (27). Work in progress is characterizing the precise mechanism of the transcriptional repression by the intronic elements and the identity of the transacting factors involved in this process. It is also exploring the possibility that HS(In) may contain another interferon-responsive element of the COL1A2 gene.