DNA binding specificities and pairing rules of the Ah receptor, ARNT, and SIM proteins.

The Ah receptor (AHR), the Ah receptor nuclear translocator protein (ARNT), and single-minded protein (SIM) are members of the basic helix-loop-helix-PAS (bHLH-PAS) family of regulatory proteins. In this study, we examine the DNA half-site recognition and pairing rules for these proteins using oligonucleotide selection-amplification and coprecipitation protocols. Oligonucleotide selection-amplification revealed that a variety of bHLH-PAS protein combinations could interact, with each generating a unique DNA binding specificity. To validate the selection-amplification protocol, we demonstrated the preference of the AHR•ARNT complex for the sequence commonly found in dioxin-responsive enhancers in vivo (TNGCGTG). We then demonstrated that the ARNT protein is capable of forming a homodimer with a binding preference for the palindromic E-box sequence, CACGTG. Further examination indicated that ARNT may have a relaxed partner specificity, since it was also capable of forming a heterodimer with SIM and recognizing the sequence GT(G/A)CGTG. Coprecipitation experiments using various PAS proteins and ARNT were consistent with the idea that the ARNT protein has a broad range of interactions among the bHLH-PAS proteins, while the other members appear more restricted in their interactions. Comparison of this in vitro data with sites known to be bound in vivo suggests that the high affinity half-site recognition sequences for the AHR, SIM, and ARNT are T(C/T)GC, GT(G/A)C (5′-half-sites), and GTG (3′-half-sites), respectively.

The AHR 1 is a bHLH protein that mediates the metabolic, carcinogenic, and teratogenic effects of compounds such as TCDD (1). In response to agonists, the AHR interacts with a related protein known as ARNT to form a dimeric 2 complex that is capable of binding genomic enhancer elements, known as DREs, and activating transcription at adjacent promoters (2)(3)(4)(5). The AHR and ARNT have sequence similarities to two regulatory proteins found in Drosophila, SIM, and PER (6 -10). SIM is a developmentally regulated bHLH protein involved in controlling central nervous system midline gene expression (11). PER lacks a bHLH domain and thus may be an inhibitor of a related signaling pathway involved in the maintenance of circadian rhythms (12). The hallmark of this family of proteins is that they all possess homology in a sequence of 200 -300 amino acids termed a PAS domain (13). In the AHR, the PAS domain has been shown to be involved in ligand binding, interaction with Hsp90, and may serve as a secondary surface to support ARNT dimerization (2, 14 -16).
Basic/helix-loop-helix proteins are involved in a variety of tightly regulated biological processes, such as the regulation of myogenesis (MyoD/E47) (17), neurogenesis (Achaete-scute/ Daughterless) (18), regulation of immunoglobulin genes (TFEC/TFE3) (19), cellular proliferation (Myc/Max) (20,21), and xenobiotic metabolism (AHR/ARNT) (10). Biochemical and crystallographic data suggest that the HLH domains often act in concert with secondary dimerization surfaces (e.g. "leucine zippers" and possibly PAS domains) to position the two ␣ helical basic regions within opposing major grooves of B-DNA, generating a "scissor grip" structure with high affinity for the core DNA sequence, CANNTG (22)(23)(24). This DNA enhancer sequence is commonly referred to as an E-box and contains either CG or GC dinucleotides at the degenerate positions (i.e. CACGTG or CAGCTG) (25)(26)(27)(28). Current models suggest that E-boxes can be viewed as containing two half-sites, with each partner's basic region determining half-site specificity (e.g. the 5Ј-CAN or the NTG-3Ј half-sites within 5Ј-CANNTG-3Ј). The multiplicity of half-sites and potential dimerization partners may allow production of a large number of homo-or heterodimeric pairs, each with unique sequence binding specificities and consequences for cellular signaling. In contrast to the recognition sites for most bHLH dimers, the cognate response element of the AHR⅐ARNT complex, the DRE, usually contains TNGCGTG (5, 29 -32). Unlike the E-box, the DRE is not palindromic, and thus the DNA half-site specificities of each protein are not readily apparent and are probably different.
In this study, we employed a DNA selection and amplification protocol to identify those bHLH-PAS protein combinations that could form productive DNA binding species and to characterize their individual DNA recognition sites. To validate the approach, we first demonstrated that the AHR⅐ARNT heterodimer would select the known DRE sequence from a pool of GTCGA. Protein Expression-In vitro expression of the AHR, AHRC⌬516, AHRGN⌬315, ARNT, and SIM proteins was carried out in rabbit reticulocyte lysates (Promega) as previously reported (2). For verification of protein expression, the translation was performed in the presence of [ 35 S]methionine, and the product was analyzed by SDS-polyacrylamide gel electrophoresis. Quantitation of the expressed proteins was determined by excising the radiolabeled proteins from the gel and scintillation counting. Baculovirus expression and purification of histidinetagged AHR and ARNT were carried out as reported previously (36).
Gel Shift Analysis-The DNA probes were radiolabeled with either [␥-32 P]ATP, by end labeling with T4 polynucleotide kinase (37), or by PCR of the appropriate template in the presence of [␣-32 P]dCTP, using OL186 and either OL185 or OL225 as primers (27). Unincorporated nucleotides were removed using a 1-ml G-25 Sephadex spin column. The protein combinations were incubated for 30 min at 30°C to facilitate protein dimerization. The clone AHRC⌬516, a constitutively active form of the AHR that interacts with ARNT and binds DNA in a ligandindependent manner, was used to circumvent the use of agonist in some experiments (2). When full-length AHR was used, the incubation period was extended to 2 h in the presence of 10 M of the AHR agonist ␤-naphthoflavone. To minimize nonspecific interactions, 200 ng of poly(dI-dC) was added to the protein mixture along with KCl (final concentration, 100 mM). After 10 min of incubation at room temperature, the DNA probe was added (100,000 cpm), and the sample was allowed to incubate for an additional 10 min. Samples were then subjected to 4% acrylamide nondenaturing gel electrophoresis using 0.5 ϫ TBE (45 mM Tris base, 45 mM boric acid, 1 mM EDTA, pH 8.0) as the running buffer (38).
DNA Selection and Amplification-The DNA binding site selection and amplification was performed essentially as described (27). For example, 10 ng of OL187 containing 13 sequential 4-fold degenerate nucleotides (4 13 Х 7 ϫ 10 7 possible sequences) was annealed to a 5-fold molar excess of primer OL186. The complementary strand was synthesized by incubation with the Klenow fragment of DNA polymerase (5 units) at 37°C for 1 h. The resultant double-stranded DNA was purified by agarose gel electrophoresis (NuSieve, FMC Bioproducts, Rockland, ME), electroelution, and precipitation. For the first round of selection, 10 ng of the double-stranded oligonucleotide pool and either 1 fmol of in vitro expressed protein or 20 fmol of baculovirus-expressed protein were subjected to gel shift analysis. The electrophoresis was terminated when the bromphenol blue dye marker had migrated 1.5 cm. In this manner, the protein-complexed oligomer could be efficiently recovered and the majority of unbound oligonucleotide eliminated. The proteinbound oligonucleotide was then isolated from the upper 1 cm of the gel and was eluted for 3 h at 37°C in buffer containing 10 mM Tris-HCl (pH 8.0), 1 mM EDTA, 50 mM NaCl, and 0.2% SDS. The eluant was extracted with phenol:chloroform:isoamyl alcohol (25:24:1), 10 g of glycogen was added, and the DNA was precipitated. One-fifth of the recovered oligonucleotide pool was amplified by PCR. PCR conditions were 95°C (1 min), 55°C (1 min), 72°C (30 s) for 25 cycles. Reactions contained 10 mM Tris-HCl, 50 mM KCl, 1.5 mM MgCl 2 , 0.001% (w/v) gelatin, 200 M of each deoxyribonucleotide triphosphate, 2 units of Taq polymerase and primers OL185 and OL186 (when OL187 was the template) or OL186 and OL225 (when OL224 was the template) in a total volume of 100 l. After ethidium bromide visualization, approximately 5 ng of amplified template was radiolabeled by PCR and subjected to gel shift analysis for the subsequent round of selection. In these later rounds of selection, the bromphenol blue dye was allowed to migrate approximately 8 cm from the top of the gel to achieve higher resolution. The gels were then dried, the specific complexes were visualized following autoradiography, and the appropriate areas were excised. In most analyses, a double-stranded oligonucleotide corresponding to a commonly used synthetic DRE (annealed OL73/74) served as a migration marker. In initial rounds of selection and amplification, specific complex formation was determined by its migration similar to the OL73/74 complex and its dependence on the expressed protein (e.g. absent in lanes containing only one member of the heteromeric pair or unprogrammed reticulocyte lysate when analyzing for homomeric interactions). The presence of the AHR or ARNT in a complex was verified by the ability to "supershift" the complex upon polyacrylamide gel electrophoresis using either anti-AHR or anti-ARNT immunoglobulins (1 ng). Once a discrete protein-oligomer complex could be detected (typically after three or four rounds of selection and amplification), the amplified oligonucleotide was either digested with BamHI and XhoI and subcloned into pBluescript SK (Stratagene) or extracted with phenol:chloroform: isoamyl alcohol (25:24:1) and directly subcloned into pGEM-T (Promega). Individual clones were sequenced using the dideoxy chain termination method (39).
Dissociation Rate Analysis-The dissociation rates of each DNA binding complex (i.e. full-length AHR⅐ARNT, ARNT⅐ARNT, and SIM⅐ARNT) were determined by gel shift analysis using the indicated DNA sequences as probes. For each off-rate analysis, a master binding reaction equivalent to at least six reactions as described above was used with 1 ng of end-labeled probe. Following binding, 100 -200-fold molar excess of unlabeled, doubled-stranded oligonucleotide that was identical to the probe DNA was added. Aliquots (25 l) were removed and analyzed at the indicated times. To determine the end point value, a 100-fold excess of unlabeled competitor was added prior to the introduction of labeled probe. Complex formation at each time point was determined using a Fuji PhosphorImager. The amount of protein-DNA complex from the end point value was subtracted from each intermediate time point value. To evaluate possible degradation of the protein-DNA complexes, a mixture containing the protein(s) and end-labeled probe was incubated for 20 min in the absence of competitor oligonucleotide (data not shown). Quantitation of this control indicated a degradation of less than 5% of the complex over the time period analyzed. Half-life (t 1/2 ) was calculated from the slope of the linear regression curve where t 1/2 ϭ 0.693/k and k ϭ Ϫ(2.303)(slope).
Coprecipitation-Sf9 soluble extract containing approximately 120 g of baculovirus-expressed ARNT and 35 S-labeled reticulocyte lysateexpressed protein (ARNT, full-length AHR, AHRC⌬516, AHRGN⌬315, or SIM) was combined with the nickel-nitriloacetic acid resin in wash buffer (50 mM Tris, pH 7.4, 100 mM KCl, 10% glycerol, 10 mM ␤-mercaptoethanol, 0.4% Tween 20, and 5 mM imidazole) and mixed gently for 2 h at 4°C. In parallel reactions, uninfected Sf9 soluble extract containing similar amounts of total protein was substituted for ARNT soluble extract as a negative control. Samples containing oligonucleotides were incubated at room temperature for 10 min in the presence of poly(dI-dC) (10 g) followed by the addition of the indicated oligonucleotides prior to the 2-h incubation. The resin was pelleted following centrifugation at 16,000 ϫ g for 10 s, and the samples were washed five times using 1 ml of wash buffer. The pellets were resuspended and analyzed by SDS-polyacrylamide gel electrophoresis and autoradiography.
Statistical Analysis--Square goodness of fit test was used to determine whether frequencies of nucleotides at each position of the oligonucleotide were different than expected random frequencies (40). In the case where DNA selection and amplification-derived AHR⅐ARNT sequences were compared with those present in bona fide 3 DREs, two by two contingency tables were used to compare frequencies of nucleotides. Significance for all tests was set at p Ͻ 0.01.

Validation of the DNA Selection and Amplification
Strategy-To validate the DNA selection and amplification technique, we first examined the nucleotide specificity of the AHR⅐ARNT heterodimer. We amplified a pool of oligonucleotides, derived from OL187, to generate double-stranded oligomers that contained 13 consecutive random nucleotides, theoretically encoding greater than 7 ϫ 10 7 unique sequences. The oligonucleotides that specifically bound to the AHR⅐ARNT complex were subjected to three rounds of selection and amplification. 24 selected oligonucleotides were cloned and sequenced (Fig. 1A). Statistical analysis by -square was performed to identify those nucleotides preferentially selected for by the AHR⅐ARNT complex (Fig. 1B). Nucleotides that occurred at greater than expected frequencies (p Ͻ 0.01) were used to derive a consensus recognition sequence, TNGCGTGC (Fig.  1C). Of these oligonucleotides, 22 contained the GCGTG core sequence that is commonly found in bona fide DREs. Two oligonucleotides, AHA23 and AHA24, contained similar core motifs, TCGTG and GTGTG, respectively. Subsequent gel shift analysis indicated that these two sequences were capable of binding AHR⅐ARNT complexes, albeit at lower affinities than those sites containing the complete core motif, GCGTG (results not shown). Analysis of the OL187 oligonucleotide pool that was cloned, amplified, and sequenced directly, without selection by the AHR⅐ARNT heterodimer, served as a control.
-Square analysis of these sequences indicated that the AHR⅐ARNT selected sequence was not the result of biased oligonucleotide synthesis (Fig. 1D).
Analysis of Sequences Flanking the GCGTG Motif-As shown in Fig. 1, the position of the GCGTG was biased toward the 3Ј-end of the oligonucleotide. To determine if this bias was the result of flanking nucleotides, we constructed an additional nucleotide pool (OL224) that fixed the core motif, GCGTG, between seven random nucleotides on the 3Ј-and 5Ј-ends. This oligonucleotide pool, containing approximately 3 ϫ 10 8 possible sequences, was subjected to three rounds of the selection and amplification protocol, and the selected oligonucleotides were sequenced ( Fig. 2A). -Square analysis was performed (Fig. 2B) and indicated that nucleotide preference occurred at 11 of the 14 flanking positions, resulting in a consensus sequence of GGGNATYGCGTGACANNCC (underlined sequences are fixed, Fig. 2C). Again, analysis of the control oligonucleotide pool indicated that nucleotide preference was not the result of biased oligonucleotide synthesis (Fig. 2D). To confirm that our consensus sequence was highly specific for the AHR⅐ARNT complex, we synthesized the corresponding oligonucleotide (OL318/319) and performed gel shift analysis. As demonstrated in Fig. 3, complex formation required both proteins. Neither ARNT nor AHRC⌬516 recognized this motif alone (Fig. 3, lanes  1-3), recognition of the consensus sequence by full-length AHR and ARNT was ligand responsive (Fig. 3, lanes 4 and 5), and the complex was recognized by anti-ARNT and anti-AHR immunoglobulins (Fig. 3, lanes 6 and 7). Addition of purified immunoglobulin did not affect the migration of the AHR⅐ARNT complex (Fig. 3, lane 8).
Comparison of the AHR⅐ARNT Selected Sequence With Bona Fide Enhancer Elements-To support the idea that our strategy would select for biologically relevant DNA binding motifs, we compared the consensus sequence selected by the FIG. 1. AHR⅐ARNT recognition sites selected from random sequences after three rounds of selection. A, the double-stranded oligonucleotide pool generated from OL187 was incubated with 1 fmol of reticulocyte lysate-expressed AHRC⌬516 and ARNT. The mixture was subjected to the selection and amplification protocol, and the individual clones were sequenced. The most highly conserved sequence, GCGTG, is boxed. B, tabulation of nucleotide frequencies at each position (n ϭ 24). All frequencies were multiplied by 100. The frequencies of individual nucleotides were analyzed by -square at the p Ͻ 0.01 level. Frequencies above the expected random level are underlined. C, an AHR⅐ARNT consensus sequence derived from statistically significant nucleotides. D, a sample from the double-stranded oligonucleotide pool (OL187) was cloned and sequenced to verify equal representation of each nucleotide, and the frequencies were calculated and analyzed by -square (n ϭ 19).
AHR⅐ARNT complex in vitro to sequences known to correspond to functional enhancers in vivo. For this comparison, we first analyzed 10 bona fide DREs to determine the frequency of nucleotides at each position (Fig. 4). These frequencies were then compared to the corresponding frequencies observed in the selected and amplified oligonucleotides (Fig. 4D). The in vitro derived consensus was similar to the bona fide DREs at 14 out of 19 of the nucleotide positions. Statistically significant differences were detected at the outer most positions (Ϫ8, Ϫ9, 9, 10) and at the Ϫ5 position.
Selection and Amplification of ARNT-Homodimer Recognition Sequences-Purified ARNT obtained from baculovirus-infected Sf9 cells (36) with the addition of unprogrammed reticulocyte lysate was subjected to the same DNA selection and amplification protocol described above, using double-stranded oligonucleotides generated from OL187. After four rounds of selection and amplification, 20 ARNT-specific sequences were aligned and analyzed by -square to yield a consensus sequence, CACGTG (Fig. 5). Unlike the oligonucleotides selected from OL187 by the AHR⅐ARNT complex, no bias was observed due to flanking nucleotides, and no statistically significant specificities were observed for nucleotides that flanked this core (Fig. 5B). Four sequences that contained the AACGTG (AA17, AA18, AA19, AA20) motif were also amplified. Gel shift analysis demonstrated that these sequences were recognized by the ARNT complex but at a lower affinity than sequences containing the CACGTG sequence (data not shown). To confirm that the derived consensus sequence, CACGTG, was specific for ARNT homodimers, we synthesized the corresponding consensus oligonucleotide, and demonstrated that a specific ARNT⅐DNA complex was formed in gel shift analysis (Fig. 6A,  lane 2). The presence of ARNT in the complex was confirmed by supershifting the complex in the presence of anti-ARNT immunoglobulin (Fig. 6A, lane 4) but not by purified immunoglobulin (Fig. 6A, lane 5). In agreement with our previous results (36), purified bHLH-PAS proteins require heat denaturable factor(s) found in reticulocyte lysate for function (Fig. 6A, lanes 1-3). The addition of bovine serum albumin also stabilizes the ARNT dimer formation to a lesser degree, demonstrating that the only bHLH-PAS protein in the complex is ARNT (Fig. 6A, lane 3). Finally, we confirmed that the ARNT⅐DNA binding complex could be formed at the lower concentrations of ARNT that are typically generated in the reticulocyte lysate expression system and that may also be found in cells (i.e. ϳ1 fmol/5 l) (Fig. 6B, lane 1).
ARNT and SIM Interact Resulting in Unique DNA Binding Specificity-The selection and amplification protocol was performed to determine if ARNT could interact with SIM and recognize a specific DNA sequence. Using the oligonucleotide pool derived from OL187, we were unable to select and amplify a discrete SIM⅐ARNT⅐DNA complex that was dependent on the presence of both proteins. We repeated the procedure using OL224 as the oligonucleotide source (see "Discussion"). Following four rounds of selection and amplification, a pool of specific SIM/ARNT selected DNA was cloned and sequenced. Given the apparently weak interaction of the complex and the comigration of nonspecific protein-oligonucleotide species, 80 of the selected sequences were radiolabeled, and each was individually reanalyzed by gel shift analysis to confirm its interaction with both SIM and ARNT. Of the 80 amplified oligonucleotides, FIG. 2. AHR⅐ARNT selection analysis of sequences flanking the GCGTG core. A, The double-stranded OL224 oligonucleotide pool containing the fixed sequence, GCGTG, flanked by seven random nucleotides on each side was incubated with 1 fmol of reticulocyte lysateexpressed AHRC⌬516 and ARNT. The mixture was subjected to three rounds of DNA selection and amplification, and the individual clones were sequenced. B, tabulation of the nucleotide frequency at each position (n ϭ 25). All frequencies were multiplied by 100. The frequencies of individual nucleotides were analyzed by -square at the p Ͻ 0.01 level. Frequencies above the expected level are underlined. C, nucleotides with above expected frequencies were used to derive an AHR⅐ARNT consensus sequence. D, the double-stranded oligonucleotide pool (OL224) was cloned and sequenced to verify equal representation of each nucleotide, and the frequencies were calculated and analyzed by -square (n ϭ 23). 19 were specific for the SIM⅐ARNT complex as judged by the formation of specific gel shift bands that were detected only in the presence of both proteins and that were recognized by the ARNT-specific antibodies (Fig. 7A). Nucleotides that were associated with the SIM⅐ARNT⅐DNA complex formation were identified by -square analysis and were used to derive a consensus sequence, GNNNNGTGCGTGANNNTCC (Fig. 7, B and  C). Gel shift analysis using an oligonucleotide corresponding to the derived consensus sequence (OL331/332) confirmed that it was specific for the SIM⅐ARNT complex (Fig. 8A). Again, complex formation required both proteins, since neither ARNT nor SIM could recognize the sequence alone (Fig. 8A, lanes 1-3), and the complex was recognized by ARNT-specific antibodies but not purified IgG (Fig. 8A, lanes 4 and 5).
While this work was in review, a report described a consensus sequence found upstream of SIM-regulated genes in Drososphila, GTACGTG (41). This core sequence differed by a single nucleotide from the sequence deduced by our in vitro approach (i.e. GTACGTG versus GTGCGTG). Since our selected SIM⅐ARNT sequence was biased for a G at this position due to the use of oligonucleotides with a fixed GCGTG core, we chose to examine the impact of this single nucleotide difference on binding by the SIM⅐ARNT complex. To control for effects of adjacent sequences, we engineered oligonucleotides that contained these two core sequences into flanking sequences derived from either the SIM⅐ARNT consensus that was deduced in Fig. 7C (i.e. GGGATGT(A/G)CGTGACATTC; OL464/465 and OL331/332; respectively) or the SIM-dependent enhancer found upstream of the Drosophila Tl gene (i.e. AATTTGT(A/ G)CGTGCCACAGA; OL501/502 and OL503/504, respectively). Gel shift analysis indicated that all four sequences were bound by the SIM/ARNT with a similar binding affinity. Thus, either an A or a G is well tolerated at this position, with no difference in binding observed when the core sequence is within the context of the flanking sequences derived from the Tl enhancer (Fig. 8, A and B).
Half-site Recognition of ARNT, AHR, and SIM-The experiments described above suggest that ARNT is capable of forming a homodimer that recognizes the previously described Ebox sequence, CACGTG, forming a heterodimer with the AHR recognizing TNGCGTG and forming a heterodimer with SIM recognizing GT(G/A)CGTG. Since all ARNT-containing complexes bind sequences with a GTG 3Ј-half-site and the ARNT alone complex binds a palindrome of this site (CACGTG), we conclude that this half-site corresponds to an ARNT binding half-site. The observation that unique heteromeric partners each yield different 5Ј-half-sites is consistent with T(C/T)GC being the 5Ј-half-site of the AHR and GT(G/A)C being the 5Ј-half-site of SIM.
Examination of Other Possible PAS-Protein DNA Complexes-In an effort to determine if additional PAS proteins could interact and generate DNA binding specificity, we attempted our selection and amplification protocol with either OL187 or OL224 and SIM, AHR, or a combination of the AHR and SIM. After several rounds of selection, neither the AHR, SIM, or a combination of the two proteins developed specific DNA binding complexes. To increase the sensitivity of these attempts, experiments were also performed using baculovirusexpressed AHR. All combinations were repeated three times without detection of a specific DNA binding complex. In addition, we synthesized oligonucleotides containing a palindrome of the predicted recognition half-sites of the AHR and SIM (core sequences of T(C/T)GCGC(A/G)A and GTGCGCAC, respectively). Gel shift analysis of either the AHR or SIM with these radiolabeled oligonucleotides failed to yield specific DNA binding complex formation (data not shown).

DNA Binding Specificity of bHLH-PAS Dimers for Their Selected Consensus Sequences and Various E-boxes-
As an additional demonstration of DNA binding specificity, we used competitive binding analysis to compare the affinities of bHLH-PAS dimers for oligonucleotides corresponding to their consensus DNA sequences and a variety of E-boxes. Competitive binding analysis with each productive bHLH-PAS pair (i.e. AHR⅐ARNT, ARNT⅐ARNT, or SIM⅐ARNT) demonstrated that each DNA binding complex had the greatest affinity for its derived consensus sequence over all of the E-box sequences tested (see Fig. 9, A-C). Presence of the ARNT homodimer consensus sequence (OL329/330) diminished the complex formation in all reactions that contained the ARNT protein (Fig.  9, A and C, lane 3). The ARNT homomeric species demonstrated the greatest affinity for the E-box CACGTG (Fig. 9B,  lanes 3 and 5), with much lower affinity for the TNGCGTG

FIG. 7. Determination of SIM⅐ARNT DNA recognition sites. A,
Double-stranded OL224 containing the fixed sequence, GCGTG, and flanked by seven random nucleotides was incubated with 1 fmol each of reticulocyte lysate-expressed SIM and ARNT, the mixture was subjected to four rounds of DNA selection and amplification, and the individual clones were sequenced. The most highly conserved sequence, GTGCGTGA, is boxed. B, tabulation of the nucleotide frequencies at each position (n ϭ 19). All frequencies were multiplied by 100. The frequencies of individual nucleotides were analyzed by -square at the p Ͻ 0.01 level. Frequencies above the expected level are underlined. C, A SIM⅐ARNT consensus sequence derived from statistically significant nucleotides. sequence (Fig. 9B, lane 2) or the other E-boxes, CAGCTG or CATGTG (Fig. 9B, lanes 6 and 7).
Relative DNA Binding Affinities of AHR-, ARNT-, and SIMcontaining Complexes-To obtain estimates of the relative DNA binding affinities of the full-length AHR⅐ARNT, ARNT⅐ARNT, and SIM⅐ARNT complexes, we performed dissociation rate analysis using the gel shift assay as an end point. As shown in Fig. 10, the calculated half-life values of the full-length AHR⅐ARNT and ARNT⅐ARNT complexes are similar (3.2 versus 5.06 min) while that of the SIM⅐ARNT complex was considerably more rapid (less than 0.2 min).
Demonstration of PAS Protein Interactions by Coprecipitation-To further establish the interaction of bHLH-PAS proteins, we utilized a coprecipitation assay (Fig. 11). Proteinprotein interactions of ARNT-AHR, ARNT-AHRC⌬516, and ARNT-ARNT, but not ARNT-SIM, were observed. Specificity of the ARNT-containing interactions was demonstrated by the lack of coprecipitation using the 35 S-labeled GN⌬315 AHR construct in which most of the dimerization domain has been replaced by the DNA binding and dimerization domain of Gal4 (2). Interestingly, ARNT-ARNT interactions were observed only when the incubations contained the CACGTG-containing oligonucleotides.

DISCUSSION
Strategy-Our hypothesis was that bHLH-PAS proteins could form a variety of heteromeric and homomeric combinations and that each complex would display unique oligonucleotide binding specificities. We predicted that the analysis of these different recognition sites would allow us to deduce the half-site specificity of each protein. To test these ideas, we utilized a DNA selection and amplification strategy to identify the preferred recognition sequences of various AHR, ARNT, and SIM combinations (27). The oligonucleotides bound by these protein complexes were isolated from pools of millions of independent, unbound sequences. Once selected by the protein complex, the oligonucleotides were isolated from nondenaturing polyacrylamide gels and amplified by PCR. To increase the specificity of the method, the oligonucleotide pools were typically subjected to multiple rounds of selection and amplification prior to cloning and sequence analysis. The power of this method arises from the fact that it is independent of any prior knowledge or preconceptions regarding DNA binding specificity and has the potential to yield information about protein-DNA interactions not readily attainable by more conventional methods such as DNA footprinting or site-directed mutagenesis of a single oligonucleotide sequence.
Specific versus Nonspecific Interactions-A number of approaches were used to ensure that amplified sequences were specific for the protein complex and not simply sequences that were nonspecifically comigrating in the gel. First, bands of amplified oligonucleotides were analyzed (considered specific) only if the band was dependent upon the presence of all of the bHLH-PAS proteins used in the assay. Second, specificity was confirmed by the capacity of ARNT-or AHR-specific antibodies to supershift the radiolabeled complex. Third, a consensus was deduced from each set of selected oligonucleotides, and this information was used to design consensus oligonucleotides that were used in gel shift assays to confirm specificity of interaction. Only in the case of SIM⅐ARNT sequences was the presence of a comigrating nonspecific oligonucleotide observed. In this case, we reanalyzed each of the 80 amplified oligonucleotides independently by gel shift analysis to eliminate any nonspecific sequences (see above).
Validation of the DNA Selection and Amplification Strategy-To validate our strategy, we first employed this technique using the AHR⅐ARNT complex that recognizes the DRE sequence, TNGCGTG (5, 29 -32). We anticipated one of two outcomes. Either the AHR⅐ARNT complex would recognize sequences containing this known core and validate our experimental approach or the complex would recognize a unique DNA sequence, such as the E-box motif, that is commonly recognized by most other bHLH proteins. In our initial experiment, we performed AHR⅐ARNT selection on a pool of oligonucleotides that had mixed bases incorporated at 13 sequential positions (OL187). -Square analysis of the nucleotide  2, 5, 8, and 11), ARNT alone (lanes 3, 6, 9, and 12), or both SIM and ARNT (lanes 1, 4, 7, 10) using 32 P-labeled OL331/332 (GGGATGT-GCGTGACATTC, lanes 1-3), OL464/465 (GGGATGTACGTGACATTC, lanes 4 -6), OL501/502 (AATTTGTACGTGCCACAGA, lanes 7-9), or OL503/504 (AATTTGTGCGTGCCACAGA, lanes 10 -12). Unprogrammed reticulocyte lysate was added, if necessary to normalize the amount of lysate in each reaction. frequencies at various positions revealed a consensus sequence of TNGCGTGC. This sequence was essentially identical to the previously described DRE, TNGCGTG (5, 29 -32). No sequences conforming to E-boxes were found in any of the 24 clones that were sequenced.
The analysis presented in Fig. 1 indicated that the positioning of the TNGCGTG core sequence within the random 13-mer was biased by the flanking sequences required for annealing PCR primers (i.e. most core sequences were found closer to the 3Ј-end of the oligonucleotide, Fig. 1A). This observation led us to examine the impact of flanking sequences on AHR⅐ARNT DNA binding specificity. The analysis using OL224 as the oligonucleotide pool revealed a consensus binding sequence of GGGNAT(C/T)GCGTGACANNCC (Fig. 2). 4 Nucleotides that were present at frequencies above expected random values were identified at 11 of the 14 flanking positions, including those in positions Ϫ4, Ϫ3, 4, 5, and 6. These results are consistent with those obtained using substitution mutagenesis of a DRE-containing oligonucleotide (31,32). The selection of flanking nucleotides suggests that both the AHR and ARNT (or other proteins within this complex) are capable of DNA contacts at sites adjacent to the commonly recognized core sequence. In addition, our results suggest that positions not identified previously, the Ϫ9, Ϫ8, Ϫ7, Ϫ5, 9, and 10 positions, are selected for and thus could also play a role in the AHR⅐ARNT-DNA recognition.
If binding affinity is the only determinant of a functional DRE in vivo, then our consensus sequence for the AHR⅐ARNT complex should be identical to bona fide DREs. In an attempt to address this question, we compared our selected sequences to 10 DREs known to function upstream of TCDD-regulated genes. Since similarity is a difficult assertion to prove statistically, we identified those nucleotides that were statistically different. The most interesting discrepancy between the in vitro and in vivo consensus is the preference for an A at position Ϫ5 for the in vitro derived sequence and the lack of an A at Ϫ5 in any reported DRE. The absence of A at Ϫ5 may be an indication that inappropriate contacts are occurring in vitro, that additional proteins are required for in vivo interactions, or that some attenuation of binding affinity is required for optimal control of gene expression in vivo.
DNA Recognition by ARNT Homodimers-In an effort to determine half-site recognition of ARNT and to determine if ARNT could recognize a specific DNA sequence as a homodimer or as a heterodimer with other bHLH-PAS partners, we performed a series of selection and amplification experiments with various combinations of the AHR, ARNT, and SIM. The observation that ARNT is not found in association with Hsp90 (42) and is present at high concentrations in the nuclear compartment of hepatoma cells (34) led us to first attempt to characterize oligonucleotide sequences that were specifically bound by ARNT alone (presumably as an ARNT homodimer). We attempted to increase the sensitivity of the selection by using ARNT that had been purified from a baculovirus expression system (36). Given our previous results suggesting that the purified AHR from this expression system required uncharacterized protein factors for DNA binding, we routinely added 10 g of unprogrammed reticulocyte lysate to the ARNT/oligonucleotide incubation mixture (36). Using these conditions, we found that ARNT recognized the sequence CACGTG. This complex migrated to a position similar to that of the AHR⅐ARNT and SIM⅐ARNT heterodimers, suggesting that ARNT recognized this sequence as an oligomer of a size similar to the other complexes, presumed to be dimeric. Further, the ability of bovine serum albumin to stabilize the ARNT complex (albeit to a lesser degree) indicates that the ARNT DNA binding complex is not the result of an interaction with an unknown protein present in the reticulocyte lysate. As shown in both Figs. 6B and 11, we could also detect this interaction using the concentrations of ARNT generated in our reticulocyte lysate system (ϳ1 fmol). This indicates that the interaction can occur at the lower ARNT concentrations that may be found in cell nuclei (34). While this manuscript was in review, work by Sogawa et al. (43) also reported that ARNT homodimers recognize the CACGTG motif and used chimeric reporter constructs to suggest that this interaction may be capable of up-regulating endogenous promoters downstream of the corresponding E-box element in vivo. Our dissociation rate experiments indicate that the relative stabilities of the AHR⅐ARNT and ARNT⅐ARNT complexes for their respective recognition sites are similar (Fig.  10). However, analysis of these complexes by coprecipitation yielded lower amounts of complexed ARNT⅐ARNT than that of AHR⅐ARNT (especially in the absence of oligonucleotides). The reason for this discrepancy is unclear, but it is an indication that ARNT⅐ARNT interactions are weaker than AHR⅐ARNT interactions in the absence of DNA. Taken together, these studies suggest that the ARNT⅐ARNT homodimer may act as an important transcriptional regulator through its interaction with E-box elements.
DNA Recognition by SIM⅐ARNT Heterodimers-The observation that ARNT homodimeric complexes could specifically interact with DNA suggested that other bHLH-PAS combinations might recognize unique DNA sequences and shed light on the half-site recognition and pairing rules of this family of transcription factors. Although our initial attempts to demonstrate SIM-ARNT-DNA interactions using OL187 were unsuccessful, we also initiated the selection analysis with OL224. This strategy was initiated for two reasons. First, the AHR and SIM share the highest degree of sequence similarity in their bHLH, PAS, and C-terminal domains (Fig. 12) (6), thus we predicted these would have the most similar DNA recognition sequences. Second, our preliminary experiments led us to suspect that amplification of ARNT homodimer-specific sequences (CACGTG) was preferentially occurring in our attempts to select and amplify sequences specific for the SIM⅐ARNT complex. Our results presented in Fig. 5 indicated that use of OL224 would minimize DNA interactions resulting from ARNT homodimers, thus minimizing contamination by ARNT-specific sequences (i.e. CACGTG). Using this strategy, we were able to amplify oligonucleotides that bound SIM⅐ARNT complexes specifically, with the consensus sequence GNNNNGTGCGT-GANNNTCC. Our failure to detect SIM-ARNT interactions using the coprecipitation assay (Fig. 11) combined with the 4 Interestingly, this consensus sequence did not contain a C at position 4 as was observed in the analysis presented in Fig. 1C. Given the proximity of this C to the primer site in OL187, the observed bias reported using that oligonucleotide, and our inability to reproduce the conservation of C at position 4 using OL224, we consider the assignments derived in Fig. 2C as our final AHR⅐ARNT selected consensus sequence. rapid dissociation rate of the SIM⅐ARNT⅐DNA complex (Fig. 10) indicate that the SIM-ARNT interaction is relatively weak. The weak interaction of the SIM-ARNT complex found in this study is in contrast to that reported by Sogawa et al. (43).
Recently, a number of SIM-responsive elements have been cloned from Drosophila using an enhancer trapping technique (41). Sequence alignment of these regulatory elements revealed a consensus motif, (G/A)(T/A)ACGTG. This sequence differs by a single nucleotide when compared to the SIM⅐ARNT consensus core sequence we describe in Fig. 7, GTGCGTG. The difference exists at the Ϫ2 position (underlined) within the putative SIM binding 5Ј-half-site (A versus G). To examine the importance of this nucleotide position, we performed a series of gel shift experiments to determine the impact that this nucleotide had on SIM⅐ARNT recognition. We found that both A and G at the Ϫ2 position are specifically bound by the SIM⅐ARNT complex (Fig. 8B). Our inability to predict an A nucleotide at this position arose from our use of OL224 that has a fixed GCGTG core (see above). Thus, we conclude that the in vitro SIM⅐ARNT consensus core sequence is more appropriately GT(A/G)CGTG, with GTACGTG possibly having greater relevance to SIM-responsive gene regulation in vivo.
Half-site Recognition of ARNT, AHR, and SIM-The identification of half-site recognition of ARNT, AHR, and SIM in combination with analysis of the amino acid sequences of their basic regions should provide insights into the relationships between the bHLH-PAS proteins and members of other bHLH families. Interestingly, the ARNT-specific sequence half-site is also recognized by other bHLH proteins such as Max (44), Myc (45), and USF (22,46,47). The bHLH proteins that bind the 3Ј-half-site GTG sequence (binding CACGTG as homodimers) have been denoted as class B proteins and are distinguished by the presence of an arginine (R) residue in their basic region immediately following the sequence ERRR (i.e. ERRRR) (48) (Fig. 12). The bHLH proteins that lack this C-terminal Arg residue commonly recognize the 3Ј-half-site CTG sequence (binding CAGCTG) and are denoted class A. Our results suggest that ARNT is a class B protein since its homomeric form recognizes the palindromic CACGTG sequence with greatest affinity, and its basic region has an Arg residue at the characteristic position. In addition, many bHLH proteins possess a critical glutamic acid residue (ERRR), which has been shown to contact the CA of the E-box sequence CANNTG (49). Although this residue is present in the basic region of ARNT, it does not occur at corresponding positions in either the AHR or SIM proteins. Thus, by predictions derived from these rules and from their primary amino acid sequences, neither the AHR nor SIM proteins would be expected to bind any known E-box half-sites. Our results support this prediction and suggest that when complexed with ARNT, the AHR has the greatest affinity for the 5Ј-half-site T(C/T)GC, and SIM has the greatest affinity for the half-site GT(A/G)C. We suggest that these proteins represent a unique class of bHLH proteins and designate this group as class C. While this paper was in review, another group determined the position of ARNT as the 3Ј-GTG half-site of the DRE (50).
Pairing Rules of bHLH-PAS DNA Binding Complexes-Our results indicate that certain rules dictate pairing and subsequent DNA binding of bHLH-PAS proteins. In contrast to the identification of DNA binding complexes formed with ARNT alone, AHR and ARNT, or SIM and ARNT, no oligonucleotide sequences could be selectively amplified when the AHR and SIM (each alone or mixed) were used as the binding species. These experiments were repeated multiple times, using either OL187 or OL224 and the higher concentrations of protein that were attainable with baculovirus-expressed AHR. The fact that heterodimeric binding of the bHLH-PAS proteins was detected only with ARNT suggests that ARNT may be a general dimerization partner for PAS proteins that respond to cellular signals. In addition, the multiplicity of productive bHLH-PAS protein combinations may have a significant impact on the spectrum of DNA binding sites, enhancer elements, and responsive genes affected by these proteins in the presence and absence of compounds such as TCDD. A second explanation for the limited number of bHLH-PAS protein pairs that were detected by this method cannot be ruled out. Our inability to detect AHR or SIM homodimeric or AHR-SIM heterodimeric interactions with DNA may be due to a failure of the method to detect weaker protein-protein or protein-DNA interactions in vitro.
Summary-These data support several important conclusions. First, ARNT is capable of forming distinct DNA binding complexes with another molecule of ARNT, the AHR, or SIM. This suggests that bHLH-PAS proteins may be involved in a combinatorial mechanism of gene regulation that involves the formation of multiple homo-or heterodimeric pairs, each with a role in controlling expression of distinct batteries of genes (51). For example, the observation that ARNT may interact with E-box elements suggests that in the absence of AHR agonists, ARNT homodimers play a role in the regulation of a second battery of genes, possibly through interactions at Eboxes that may be down-regulated in the presence of TCDD. Second, since ARNT is capable of recognizing DNA as a component of several distinct complexes, we were able to elucidate the DNA recognition half-sites of these PAS proteins. As predicted by amino acid sequence homology to other class B bHLH proteins, ARNT recognizes the 3Ј-half-site GTG. In contrast, the basic region amino acid sequences of both the AHR and SIM are unique and specify distinct 5Ј-half-sites, T(C/T)GC and GT(A/G)C, respectively. Finally, the AHR⅐ARNT complex displays a preference for nucleotides that flank the core T(C/ T)GCGTG motif, suggesting that the protein-DNA interactions of this complex extend beyond the core motif. Other PAS protein complexes (i.e. ARNT⅐ARNT or SIM⅐ARNT) display fewer preferences for flanking nucleotides, suggesting that the sequence specificity of various PAS protein complexes may differ substantially or may be less restricted than that of the AHR⅐ARNT complex.