Structure and Expression of the Mouse AhR Nuclear Translocator (mArnt) Gene*

Aryl hydrocarbon receptor (AhR) nuclear translocator (Arnt) gene has been isolated and characterized from a mouse genomic DNA library. The gene is about 60 kilobases long and split into 22 exons. An unusual exon/intron junctional sequence was found in the 11th intron of the gene that begins with GC at its 5′-end. The exon/intron arrangement of mArnt gene differs greatly from those of the other members of the same basic-helix-loop-helix/PAS family. The gene is TATA-less and has several transcription start sites. The promoter region of the mArnt gene is GC-rich and contains a number of putative regulatory DNA sequences such as two GC-boxes, a cAMP-responsive element, E-box, AP-1 site, and CAAT-box. Deletion experiments revealed that all these DNA elements made substantial contributions to a high level of expression of the gene, except for the cAMP-responsive element. Of all, two GC-boxes displayed the most dominant enhancing effects. It was demonstrated that there exist specific factors binding to these DNA elements in the nuclear extracts of HeLa cells. Among them, Sp1 and Sp3, and CAAT-box binding factor-A were identified to bind the GC-boxes and CAAT-box, respectively. Expression of MyoD in HeLa cells stimulated the Arnt promoter activity by binding to the E-box.

Aryl hydrocarbon receptor (AhR) nuclear translocator (Arnt) gene has been isolated and characterized from a mouse genomic DNA library. The gene is about 60 kilobases long and split into 22 exons. An unusual exon/ intron junctional sequence was found in the 11th intron of the gene that begins with GC at its 5-end. The exon/ intron arrangement of mArnt gene differs greatly from those of the other members of the same basic-helix-loophelix/PAS family. The gene is TATA-less and has several transcription start sites. The promoter region of the mArnt gene is GC-rich and contains a number of putative regulatory DNA sequences such as two GC-boxes, a cAMP-responsive element, E-box, AP-1 site, and CAATbox. Deletion experiments revealed that all these DNA elements made substantial contributions to a high level of expression of the gene, except for the cAMP-responsive element. Of all, two GC-boxes displayed the most dominant enhancing effects. It was demonstrated that there exist specific factors binding to these DNA elements in the nuclear extracts of HeLa cells. Among them, Sp1 and Sp3, and CAAT-box binding factor-A were identified to bind the GC-boxes and CAAT-box, respectively. Expression of MyoD in HeLa cells stimulated the Arnt promoter activity by binding to the E-box.
Aryl hydrocarbon receptor (AhR) 1 nuclear translocator (Arnt) is a member of the basic-helix-loop-helix/PAS family of heterodimeric transcription factors, which include AhR, hypoxia-inducible factor 1␣ (HIF-1␣), Drosophila Trachealess (Trh), and single-minded protein (Sim) (1)(2)(3)(4)(5). Recent molecular cloning and biochemical studies have demonstrated that in the nuclei, Arnt forms a heterodimer with AhR that is activated by binding with a ligand such as 2,3,7,8-tetrachlorodibenzo-p-dioxin to activate expression of the genes for a group of drugmetabolizing enzymes through binding to the xenobiotic response element sequence upstream of the genes (6). In addition, the AhR-Arnt heterodimer is also considered to me-diate the various biological effects such as teratogenesis, tumor promotion, epithelial dysplasia, and immunosuppression by aromatic environmental pollutants usually represented by 2,3,7,8-tetrachlorodibenzo-p-dioxin, the most toxic chemical, although the target genes for the heterodimer in these effects remain to be elucidated (6). Arnt also forms a homodimer that binds the E-box and potentially activates the genes driven by the promoter containing the E-boxes (7). It has recently been reported that Arnt forms heterodimeric complexes with HIF-1␣ and HIF-1␣-like factor to regulate genes involved in the response to hypoxic conditions (2), and disruption of the Arnt gene by homologous recombination resulted in embryonic death of mice because of abortive angiogenesis and defective responses to glucose and oxygen deprivation (8). Taken together, these results suggest that Arnt plays a central role in transcriptional regulation by the bHLH-PAS transcription factors as a key partner molecule in the formation of transcription/ regulation-competent dimers. It would be interesting to elucidate the structure of the Arnt gene with regard to an evolutionary aspect of a gene family of the bHLH-PAS transcription factors and the regulatory mechanism of its transcription.
In this study, we have cloned and characterized the mouse Arnt gene, whose structure consists of 22 exons and is more complex than that of AhR, and elucidated multiple regulatory DNA elements in the promoter region and their trans-acting regulatory factors.

EXPERIMENTAL PROCEDURES
Library Screening and Sequencing-A 129/SV mouse genomic library (Stratagene) was screened by using a full-length of mouse Arnt cDNA and its BamHI/EagI fragment of mouse Arnt cDNA (about 0.8 kb) as hybridization probes as described (9). Positive clones were isolated and their sequence analyses were performed as described (9). Introns 4 and 8 that were not covered with the cloned DNAs were amplified with PCR by using the LA Taq kit (Takara) and two pairs of 35-mer primers that anneal to the 3Ј-ends of exons 4 and 8 and the 5Ј-ends of exons 5 and 10. Genomic DNA from the liver of C57BL/6 mice was used for PCR as templates, and the PCR products were subcloned into pBluescriptII SK(ϩ) for sequencing.
Determination of the 5Ј-End of the mArnt Gene-Total RNA was prepared from C57BL/6 mouse skeletal muscle by the guanidinium method (10), and poly(A) ϩ RNA was purified with oligo(dT) latex (Takara). For RNase protection assay (11), a 296-base pair NcoI/AvaI fragment of the genomic clone A18, which was subcloned into the SmaI site of pBluescript II SK(ϩ), was used for synthesis of a riboprobe by using [␣-32 P]CTP (800 Ci/mmol) and T7 RNA polymerase. The RNase protection experiments were carried out as described (12). The 5Ј-RACE was performed according to the protocol of Life Technologies, Inc. Poly(A) ϩ RNA was reverse-transcribed by using a gene-specific primer (Gsp1) complementary to the sequence of ϩ243 to ϩ267 of mArnt mRNA and Super ScriptII reverse transcriptase at 42°C. The first strand mArnt cDNA was amplified by PCR using a poly-GI adaptor primer (AP) and a nested mArnt-specific primer (Gsp2) complementary to the sequence of ϩ218 to ϩ242 of mArnt mRNA. The amplified products were subjected to gel electrophoresis, and the elongated band was subcloned into the SmaI site of pBluescript II SK(ϩ) for sequencing with the dye terminator method (Applied Biosystems). The oligonucleotide primers used are given as follows in the 5Ј to 3Ј direction: AP, GGCCACGCGTCGACTAGTACGGGIIGGGIIGGGIIG; Gsp1, TGACC-GTCGCTTAATAGCCCTCTG; Gsp2, CAACAGCTCCTCCACCTTG-AATCC.
Deletion Analysis of the mArnt Promoter-The 5Ј-flanking region (ϩ12 to Ϫ1666) of the mArnt gene was inserted into the SmaI site of the luciferase reporter plasmid (pGL3 Basic Vector) (Promega). Introduction of various deletions in the 5Ј-flanking region and transfection experiments using HeLa cells were performed as described (13,14). The LacZ reporter gene pBos-LacZ (1 g) was cotransfected with the luciferase reporter genes as an internal control, and the luciferase activities were normalized against the ␤-galactosidase activity.
DNase I Footprinting Analysis-A 329-base pair SacII/NdeI fragment (ϩ12 to Ϫ317 of the mArnt gene) of the genomic clone mA18 was subcloned into the SmaI site of pGL3-basic vector. The fragment was 32 P-labeled at the XhoI and Asp-718 site for the downstream and upstream end-labeling, respectively. Nuclear extracts from HeLa cells were prepared by the method of Schreiber et al. (15). A series of treatments for the DNase I footprinting analysis were performed as described (14).
Site-directed Mutagenesis-The E-box in the mArnt promoter was mutated in a site-directed manner by the method of splicing by overlap extention (20). The mAEboxm1 and mAEboxm2 were used as primers for introducing the mutagenesis.
Cotransfection Assay-The mArnt promoter (ϩ12 to Ϫ1666)-luciferase construct PGL3mAPr (Ϫ1666) or its mutant PGL3mAPr (Ϫ1666)m with a mutated E-box was cotransfected into HeLa cells with an expression plasmid for mouse MyoD, PcDNA3mMyoD (a kind gift from Dr. Nabeshima, Osaka University), or with PcDNA3 plasmid used as a control. A series of treatments for this experiment were performed as described above.

Characterization of Mouse Arnt Genomic Clones-Sequence
analysis of all the clones, mA3, mA8, mA9, mA18, and mA16 isolated from a SV/129 mouse genomic library showed that the Arnt gene is split into 22 exons. Because the mA3, mA8, and mA9 do not overlap with each other, we tried to fill in these gaps by the PCR method. Unique PCR products of 7-and 4-kb were generated for the respective gaps, and sequence analysis of the bands revealed that the 7-and 4-kb DNA fragments overlapped with the parts of mA3 and mA8 and with those of mA8 and mA9, respectively (data not shown). Consequently, total length of the mArnt gene is about 60-kb long (Fig. 1). Exon 5 is an alternative exon and is omitted in a short form of the mArnt mRNA (3). The two forms of mArnt mRNA encode two Arnt proteins, which have not been known to differ with regard to function and mode of expression. To determine the 5Ј-end of the Arnt gene, the RNase protection assay was performed to yield three major protected fragments as shown in Fig. 2. Because RNA has a lower mobility in the denaturing polyacrylamide/urea gel than DNA of the same size, we confirmed the result by the 5Ј-RACE. Among 9 clones analyzed, 5, 3, and 1 started at positions ϩ1, ϩ7, and ϩ9, which correspond to the 139, 133, and 131 nucleotide fragments of the RNase protection experiments, respectively, indicating that ϩ1 is the major transcriptional start site of the Arnt gene. No TATA-like sequence was found in the 5Ј-flanking region of the 5Ј-end of the mArnt gene. Instead, the sequence of the promoter region is GC-rich as is often found in the TATA-less genes (Fig. 4).
All splice junctions have the GT/AG consensus except for intron 11 (data not shown), which begins with GC at the 5Ј-end and ends in AG. A few of the genes were reported to have this unusual GC sequence at the 5Ј-end of the intron boundary (21)(22)(23)(24)(25). The physiological role of this unusual junction sequence has yet to be established.
Exon/Intron Arrangement of bHLH/PAS Gene Family-When the sites of introns of the mArnt gene relative to the amino acid sequence is compared with those of the other members of the bHLH/PAS protein family such as mAhR, dSim, mHIF-1␣ and mArnt2, 2 the Arnt gene shares few sites of splicing junctions with mAhR ( Fig. 3), dSim, and mHIF-1␣, but has much in common with mArnt2 (data not shown). Even in the gene structure coding for the most conserved bHLH/PAS region, the splicing junction sites of the mArnt gene barely resemble those of mAhR gene and others except for mArnt2 gene. From the similarity of the amino acid sequence, it has been proposed that the bHLH/PAS proteins are mainly divided into the two groups represented by AhR and Arnt, respectively. Because the genes for AhR, HIF-1␣, and dSim share many the splicing junction sites with one another (26), different gene organization between AhR and Arnt gene supports the two divisions of the bHLH/PAS transcription factors (27). Transcription Activity of mArnt Promoter-Although the mArnt promoter contains no TATA-box, sequence analysis of the promoter region has revealed several potential cis-acting DNA elements as shown in Fig. 4. To determine how these DNA elements contribute to the mArnt promoter activity, we constructed a fusion gene, pGL3mAPr (Ϫ1666), by ligating the promoter sequence (Ϫ1666 to ϩ12) of the Arnt gene with the 5Ј-end of the luciferase gene in the pGL3-Basic vector and transfected it into several cultured cell lines, HeLa, Hepa-1, and 293T cells. Because HeLa cells gave the highest luciferase activity (280-fold over the control level) and are known to express endogenous Arnt mRNA, we chose to use HeLa cells for the transient transfection experiment. Then we introduced various external deletions from the 5Ј-end of the promoter sequence into the fusion gene and determined the expressed luciferase activity. As shown in Fig. 5, the deletions progressing through the Ϫ207 position caused gradual reduction in the promoter activity and finally showed no enhancement in the luciferase expression at the Ϫ29 position. The most pronounced reductions were noted when the sequence from Ϫ67 to Ϫ56 and from Ϫ57 to Ϫ29, which contain the GC-box1 and GC-box2, respectively (Fig. 4), were deleted. In the deletion process, the promoter activity was reduced concomitantly with the deletion of several DNA fragments, which contain the other putative enhancer elements: E-box (Ϫ207 to Ϫ139), AP-1 (-139 to Ϫ112), and CAAT-box (Ϫ112 to Ϫ86) sequences.
DNase I Footprinting Analysis of mArnt Promoter-To investigate whether there exist specific DNA binding factors on these regulatory sequences in the nuclear extracts of HeLa cells, DNase I footprinting analysis was performed. When increasing amounts of the nuclear extract were added, five regions from Ϫ31 to Ϫ62, Ϫ88 to Ϫ113, Ϫ116 to Ϫ140, Ϫ189 to Ϫ210, and Ϫ250 to Ϫ270 were distinctly protected from DNase I digestion (Fig. 6, A, lanes 3-6 and B, lanes 3-6). As described in Fig. 4, these regions contain the putative GC-boxes, CAAT- Gel Mobility Shift Assay for the Region Protected from DNase I Digestion-To investigate in detail the factors that interact with the protected sequences, GMSA were performed using the synthesized oligonucleotides corresponding to the sequences defined by the DNase I footprinting analysis as probes. As shown in Fig. 7, all the five probes formed complexes with the factors in the nuclear extract from HeLa cells and gave specific retarded bands. Although the GC-box1 and GC-box2 have a similar sequence to the consensus sequence of the GC-box recognized by Sp1 and its related factors, their patterns of bound complexes in GMSA were subtly different from each other. One major and two minor bands were observed with the GC-box1 and GC-box2, whereas two extra faint bands were associated only with the GC-box2. When the nuclear extracts were treated with anti-Sp1 antibody, the major retarded band of the two DNA probes was supershifted with the other minor bands being apparently unaffected. On the other hand, the anti-Sp3 antibody eliminated the two minor bands, whereas the major band and the two extra faint bands formed by GC-box2 remained unaffected. It is clear from the results that GC-box1 and 2 were mainly recognized and bound by Sp1 and their small parts were by Sp3.
Concerning the proteins binding to the CAAT-box, the retarded band formed by CAAT-box was supershifted by anti-CBF-A antibody but not by anti-CBF-B or by anti-C/EBP antibodies, indicating that the CAAT-box was bound by CBF-A, but not by C/EBP. It has been reported that CBF-A forms a heterotrimer with CBF-B and CBF-C for binding with the CAAT-box (28). The reason why anti-CBF-B antibody did not give a supershift could be considered by one of the possibilities that the heterotrimer formed by CBF-A, CBF-B, and CBF-C could structurally prevent CBF-B from reacting with its antibody and that CBF-A forms a heterocomplex with some other partner molecules rather than CBF-B.
We tried to identify specific trans-acting factors on the putative AP-1 site, E-box and CRE by using anti-c-Jun, anti-MafK (29), anti-E-47, anti-c-Myc, anti-E-12, anti-human bHLH factor and anti-CREB-1 antibodies. However, none of these antibodies affected the mobility of these retarded bands. In addition, an oligonucleotide of the AP-1 site in the promoter of polyoma virus gene (18) failed to compete with the AP-1 probe (Fig. 7C). Similarly, no competition was observed between the putative E-box and the E-box sequence of acetylcholine receptor ⑀-subunit gene (17). On the other hand, retarded bands of the putative CRE of the Arnt disappeared by competition with the oligonucleotide of CRE of the human collagenase gene (16), indicating that the CRE of the Arnt is able to work as functional CRE, but was not bound with CREB in HeLa cells.
The regions between Ϫ62 and Ϫ88 and between Ϫ210 and Ϫ250, which were also weakly protected in DNase I footprinting, were subjected to GMSAs by using nuclear extracts from HeLa cells but did not give any specific retarded bands.
Enhancement of mArnt Promoter Activity by MyoD-To examine whether MyoD could affect the mArnt promoter activity by binding to the E-box sequence, we cotransfected an expression plasmid for mouse MyoD with either PGL3mAPr (Ϫ1666) or its mutant PGL3mAPr (Ϫ1666)m into HeLa cells as described. As shown in Fig. 8, the mouse MyoD could stimulate the mArnt promoter substantially, whereas the PGL3mAPr (Ϫ1666)m with a mutation in the E-box displayed essentially no enhancement in the luciferase expression by the MyoD expression plasmid. These results indicate that expression of MyoD can enhance the mArnt promoter activity through the E-box in HeLa cells and suggest that expression of Arnt is activated in the tissues such as muscle where MyoD is expressed.

DISCUSSION
Using the 5Ј-RACE and RNase protection assay, we mapped transcription start sites of the mArnt gene to three major sites (Fig. 2). In consistence with multiple transcription start sites, we found no TATA sequence in their 5Ј-upstream proximity as also the case with the AhR gene. Multiple transcription start sites and GC-rich sequences in the promoter region are often reported with the "housekeeping" genes that have a TATA-less promoter (30,31). Sequence analysis has shown that the Arnt gene is about 60-kb long and is split into 22 exons. The structure of the Arnt gene is very different from and more complex than that of the AhR gene. Even in the gene structure encoding the most conserved amino acid sequence of the bHLH/PAS domain, the exon/intron arrangement is different between the two genes. This conversion of the exon/intron arrangement has been already reported with several gene families and is suggested to be generated by insertion or deletion of introns during the evolutionary process (32). Thus far, several gene structures of the bHLH/PAS superfamily such as AhR, HIF-1␣, HIF-1␣like factor, Drosophila Sim and Trachealess have been reported and their exon/intron arrangements are conserved well among them (4,12,26). From sequence similarity, the bHLH/PAS transcription factors are mainly divided into two groups represented by AhR and Arnt, respectively (27). The members described above resemble AhR in the amino acid sequence rather than Arnt and they are classified into the AhR group. Different gene structure of Arnt from those of the bHLH/PAS factors of the AhR group indicates that the gene structure has been differentiated since the division of the two groups. Recently the gene structure of Arnt2, which belongs to the Arnt group, has been found to be highly similar to that of Arnt.
In the upstream region of this TATA-less gene, several interesting potential regulatory sequences such as GC-box, AP-1 site, CAAT-box, E-box, and CRE were found as shown in Fig. 4. Transient DNA transfection experiments using a fusion gene consisting of the upstream sequence of the Arnt gene and the luciferase structural gene demonstrated that the upstream sequence has transcriptional enhancer activity in HeLa cells. Various deletions in the upstream sequence could locate the regulatory DNA sequences whose deletions resulted in a marked reduction in the expressed luciferase activity. These DNA fragments contain the putative regulatory DNA elements described above, indicating that they contribute substantially to the expression of luciferase, except for the CRE whose dele-tion was apparently not very influential in this expression system. The deletion of one of the two GC-boxes displayed most profound effects on the expression driven by the Arnt promoter, suggesting that the two GC-boxes cooperatively enhance the expression of the gene. DNase I footprinting analysis demonstrated that the nuclear extracts specifically protected those presumed regulatory DNA elements from digesting by DNase I, therefore, indicating the presence of the factors binding to those DNA elements. Although it showed little promoter activity in the transient DNA transfection assay, the CRE sequence was also protected. The putative CRE sequence is a complete 8-base palindromic sequence and could function as a cis-acting regulatory DNA element in other types of cells. Gel mobility shift assay demonstrated the presence of specific factors binding to those regulatory DNA elements in HeLa cells. Experiments using anti-Sp1, anti-Sp3, and anti-CBF-A antibodies showed that the two GC-boxes were bound for major part by Sp1 and for minor by Sp3 and that the CAAT-box was bound by CBF-A. It remains to be specified what regulatory factors bind to the putative AP-1 site, CRE, and E-box sequences, because GMSAs using anti-c-Jun, MafK, CREB-1, E12, E47, c-Myc, and human bHLH factor antibodies did not give any supershifted bands. Furthermore, typical oligonucleotides of the AP-1 site and E-box sequences did not compete with probes of the corresponding regulatory sequences of Arnt in GMSA. It is likely that the sequences flanking with the core recognition sequences of Arnt is critical for binding to a nuclear factor of HeLa cells. Further studies are necessary to determine which factors binds these DNA elements including CRE for elucidating the mechanism of the Arnt gene expression.
The members of MyoD family are cell-type-specific transacting factors and play an important role in the expression of muscle-specific genes through binding to the E-box in their promoter regions (33,34). It has been reported that the expression of mArnt mRNA is the highest in muscle among the tissues examined, although mArnt mRNA was expressed ubiquitously (35). Overexpression of MyoD by transfecting the MyoD expression plasmid into HeLa cells enhanced the expression of the luciferase activity driven by the mArnt promoter, suggesting that the Arnt expression is stimulated in the tissues such as muscle, which express MyoD.
Ubiquitous expression of Arnt in various tissues of embryos and adult animals can be generated by cooperative functions of multiple transcription factors bound on the respective regulatory DNA elements. These factors could be changed in different tissues. It would be interesting to identify these regulatory factors in various cultured cells and tissues and clarify how they interact with one another to enhance the expression of the gene.
FIG. 8. Activation of the mArnt promoter activity by transfection with mouse MyoD expression plasmid. A, schematic representations of the expression plasmid for mouse MyoD and the reporter plasmids. B, relative luciferase activities of the mArnt promoters. Values are the means of three independent experiments normalized to ␤-galactosidase activity used as an internal control. Total DNA used for transfection was adjusted to 10 g by adding pGL3-basic vector plasmid DNA.