Binding site selection for the plant MADS domain protein AGL15: an in vitro and in vivo study.

AGL15 (for AGAMOUS-like 15) is currently the only reported member of the plant MADS domain family of transcriptional regulators that preferentially accumulates during embryo development. Additionally, AGL15 is one of the more divergent members of the MADS domain family, including within the DNA-binding domain. Previous studies have shown that MADS domain proteins bind to DNA sequences with an overall consensus of CC(A/T)6GG (called a CArG motif). Nonetheless, different MADS domain proteins exhibit similar yet distinct binding site preferences that may be critical for differential gene regulation. To determine the consensus sequence preferentially bound by AGL15 in vitro, PCR-assisted binding site selection assays were performed. AGL15 was observed to prefer a CArG motif with a longer A/T-rich core and is to date the only plant MADS domain protein having such a preference. Next, the Arabidopsis genome data base was searched for genes containing AGL15 binding sites as candidates for direct regulation by AGL15. One gene, DTA4 (for Downstream Target of AGL15-4), was identified by this method, and then confirmed as a direct target of AGL15 in vivo.

MADS domain proteins are a family of regulatory factors found in yeast, animals, and plants, containing a conserved ϳ60-amino acid domain that mediates DNA binding and dimerization (for a review of the MADS domain, see Ref. 1). MADS domain proteins generally bind to a consensus DNA sequence called a CArG motif 1 (CC(A/T) 6 GG; canonical sequence); however, more detailed studies revealed that different MADS factors have overlapping but distinct binding site preferences (reviewed in Ref. 2). In plants, MADS box genes are expressed throughout the life cycle, and many members have been shown to be involved in homeotic regulation of programs in flower development (reviewed in Ref. 1). MADS box genes are more numerous in plants (e.g. over 80 in Arabidopsis thaliana) (3) than in organisms from other kingdoms. The reason for this greater diversity in plants is not clear, but it raises an important question concerning how these factors specifically regulate their downstream target genes. One possible mecha-nism to achieve such specificity is through differential binding to target genes.
AGL15 (for AGAMOUS-like 15) is especially intriguing because it is currently the only reported MADS domain protein that preferentially accumulates in developing plant embryos (4,5). AGL15 has been found in nuclei of tissue developing in an embryonic mode in a wide variety of situations (5,6). Although AGL15 appears to be expressed at its highest level during plant embryo development, it is also expressed at lower levels after germination (7). AGL15 is also one of the more divergent members of the MADS domain family in terms of primary structure, including within the MADS domain (8,9). Therefore, it would be particularly interesting to determine whether this relatively divergent member of the family preferentially binds DNA sequences distinct from the general CArG motif. In this study, the consensus binding sequence of AGL15 was determined using PCR-assisted in vitro binding site selection assay. AGL15 was found to prefer a type of CArG motif with a longer central A/T-rich sequence, similar to the N-10 site of the MEF2A binding consensus (10). This preference is thus far unique for plant MADS domain proteins and may contribute to functional specificity of AGL15.
To understand the regulatory roles AGL15 plays, it is important to have a comprehensive knowledge of the downstream target genes it regulates in vivo. The in vitro information obtained from the binding site experiments was applied to identify potential in vivo targets of AGL15. The Arabidopsis genome data base was searched for genes containing AGL15 binding sites. One such identified gene, DTA4 (for Downstream Target of AGL15-4), was further tested to verify that it represents a direct in vivo target of AGL15. Such an approach may be useful for identifying downstream target genes of other DNA-binding regulatory proteins.

EXPERIMENTAL PROCEDURES
PCR-assisted in Vitro Binding Site Selection-The approach for binding site selection was modified from Huang et al. (11). DNA sequences encoding full-length AGL15, AGL15 lacking the DNA-binding MADS domain (AGL15⌬M), and AG were modified by PCR to add a T7 tag sequence to the C-terminal ends of the recombinant proteins, cloned into the bacterial expression vector pET15b (Novagen, Madison, WI), and transformed into Escherichia coli BL21 (DE3) cells. Expression and isolation of inclusion body proteins were performed as described previously (4). The protein samples were further purified using anti-T7 antibody-conjugated agarose beads (Novagen) according to the manufacturer's instructions. The purity and integrity of the proteins were assessed by SDS-PAGE and by Western analysis using anti-T7 antibodies (Novagen). Purified proteins were dialyzed against 1ϫ binding buffer (10 mM Tris-HCl, pH 7.5, 50 mM NaCl, 1 mM EDTA, 1 mM dithiothreitol).
To produce a pool of double-stranded degenerate oligonucleotides, a 76-mer template was synthesized containing a core of 26 randomized positions, hand-mixed during synthesis to ensure equal representation of each base (IDT Inc., Coralville, IA): ACTCGAGGAATTCGGTACCC-CGGGTN 26 TGGATCCGGAGAGCTCCCAACGCGT. Two additional primers corresponding to the nondegenerate arms of the 76-mer were used to convert and amplify the 76-mer to double strand, using KlenTaq1 DNA polymerase (AB Peptide, St. Louis, MO). The double-stranded 76-mer population was labeled with [␣-32 P]dCTP (Amersham Biosciences) using PCR and purified on an 8% polyacrylamide gel (0.5ϫ TBE), as described by Ausubel et al. (12).
For electrophoretic mobility shift assays (EMSAs), ϳ100 ng of fulllength AGL15, AGL15⌬M, or AG was added to 1ϫ binding buffer supplemented with 5% glycerol, 50 ng/l poly(dI-dC), 100 ng/l bovine serum albumin. In the AG binding reaction, 1.2 M urea was included (11). After preincubation at room temperature for 15 min, 10 4 to 10 5 cpm of 32 P-labeled 76-mer probe was added. After a further 15 min at room temperature, the entire 20-l binding reaction was resolved on a 5% polyacrylamide gel (0.5ϫ TBE, 5% glycerol) to separate free probe and protein-DNA complexes. The gel was dried and exposed to a Phos-phorImager screen (Amersham Biosciences). The regions corresponding to shifted bands were excised, electroeluted in 0.1ϫ TBE buffer, and precipitated. An aliquot was amplified and radiolabeled with PCR, gel-purified, and used as probe for the next round of binding and EMSA. This procedure was repeated for 3-5 rounds to enrich the oligonucleotides bound by AGL15 and AG.
Analysis of Selected Sequences-The selected oligonucleotides were cloned into pGEM-T vector (Promega, Madison, WI) and sequenced on an ABI Prism 310 Genetic Analyzer (Applied Biosystems, Foster City, CA). To reveal the consensus binding sequences, both strands of the random core and part of the flanking linker region were aligned using ClustalW and Jalview programs (available on the World Wide Web at www2.ebi.ac.uk/clustalw), followed by manual inspection and adjustment.
Competitive EMSA-Probes were labeled by PCR with [␣-32 P]dCTP as above. To ensure equal specific activity of the probes, the labeled to unlabeled dCTP ratio was adjusted for each PCR according to the base composition of the templates. Equal cpm (10 4 to 10 5 ) of probes were used in EMSA. Unlabeled competitor concentrations were verified on an 8% polyacrylamide gel. EMSAs were performed under the same conditions as for binding site selection, except various amounts of unlabeled competitors were incubated with proteins for 15 min at room temperature prior to the addition of probes.
Transient Reporter Expression Assay-Eight tandem repeats including a high affinity AGL15 binding site (TTACTATATATAGTAA) or a mutated binding site (TTAGTATATATACTAA) were placed before the minimal 35S CaMV promoter (TATA-box region) (13) and an introncontaining GUS (␤-glucuronidase) reporter gene (14). The reporter constructs were cloned into the binary vector pBI101 (Clontech). The effector encoded an activated form of AGL15 containing a viral VP16 transcription activation domain (15) at the C-terminal end (constructed by Dr. H. Wang, University of Kentucky, Lexington, KY). The AGL15-VP16 sequence was inserted downstream of the 35S CaMV promoter of the binary vector pBIMC (a gift from Dr. D. Falcone, University of Kentucky) for constitutive expression. All constructs were transformed into Agrobacterium tumefaciens strain GV3850.
The transient expression assay was performed using a petunia infiltration system (a formal detailed procedure will be reported by Drs. M. Schoenbeck and J. Chappell, University of Kentucky). Briefly outlined here, overnight agrobacteria cultures were washed once and resuspended to A 600 of 0.5 in 10% sucrose. Cells harboring reporter and effector constructs were mixed at various ratios. The mixture was infiltrated into the abaxial side of leaves of 4 -6 week old Petunia hybrida plants grown under standard greenhouse conditions. The leaves were then left in a moist chamber at room temperature for 3-5 days. GUS activity was measured by 4-MUG assay (16) using a Hoefer DynaQuant 200 fluorometer (Hoefer Scientific Instruments, San Francisco, CA), following the manufacturer's instructions.
Enrichment Tests-To determine whether AGL15 binds in vivo to DNA sequences identified in vitro, enrichment assays were performed on populations of DNA fragments isolated by chromatin immunoprecipitation (ChIP) from an embryonic culture tissue using AGL15-specific antiserum. The AGL15 antiserum has been previously described and evaluated for specificity (4 -7). The detailed ChIP protocol is described elsewhere (17). Potential in vivo targets of AGL15 were identified by searching the data base (available on the World Wide Web at www.arabidopsis.org) with the preferred in vitro binding sites. Oligonucleotide primers were synthesized to amplify the target DNA fragments containing these sites, as well as the coding regions of ␤-2tubulin (18), and elongation factor 1␣ (19) for controls. Both target and control primer pairs were used in multiplex PCR. A series of dilutions of total input chromatin DNA (before ChIP), AGL15-antiserum ChIPselected DNA, and preimmune serum ChIP-selected DNA were used as templates. PCRs were resolved on 2% agarose gels.
Expression Pattern of Target Gene-A. thaliana silique RNA was isolated using the hot borate method (20). cDNAs were synthesized using oligo(dT) and Moloney murine leukemia virus reverse transcriptase (Promega), according to the manufacturer's instructions. Gene expression level was determined semiquantitatively using PCR. The coding region of elongation factor 1␣ was amplified as a normalization control.

RESULTS
Selection of AGL15 Binding Sites in Vitro-PCR-assisted in vitro binding site selection was used to isolate DNA sequences preferentially bound by AGL15 and AG from a random population of oligonucleotides. Binding site selection has previously been performed for AG (11,21) and served as a positive control in our experiments. AGL15 and AG proteins were isolated as inclusion bodies from E. coli and further purified via a Cterminal T7 tag. The integrity and purity of recombinant proteins were confirmed by Coomassie staining and Western analysis (not shown). Oligonucleotides bound by AGL15 or AG were separated from free DNA by EMSA and amplified by PCR. After 3-5 rounds of selection, shifted bands were clearly visible, indicating enrichment of AGL15 and AG binding sites (not shown). Binding site selection was also performed using a form of AGL15 that lacked the DNA-binding MADS domain (AGL15⌬M). This form of AGL15 does not bind to a CArG motif from the yeast STE6 promoter, whereas full-length AGL15 is able to bind to this site in EMSA (5). Therefore, AGL15⌬M served to monitor the background binding site selection. No DNA was recovered when AGL15⌬M was used (not shown).
Selected binding site sequences were aligned and consensus sequences were calculated. Summaries of base preferences of AGL15 and AG at each position of the binding site are shown in Fig. 1, A and B, respectively. Selected sequences were categorized into three groups, where the number refers to the length of the A/T-rich core: CC-6-GG represents a canonical CArG motif with the sequence 5Ј-CC-(A/T-rich) 6 -GG-3Ј; C-8-G contains a longer A/T-rich core (i.e. 5Ј-C-(A/T-rich) 8 -G-3Ј); and C-7-GG/CC-7-G is an intermediate form. The calculated consensus sequence from alignment indicated that the preferred binding site of AGL15 contained a longer A/T-rich core. In the AGL15-selected population, the canonical CC-6-GG motifs constituted less than 25% of the selected sites (Fig. 1C). The majority of the binding sites were of the C-8-G and C-7-GG/CC-7-G forms, with the former predominating. The opposite trend was observed in the AG-selected population. The preferred AG binding sites had the canonical form, as previously reported (11,21). The preference of AGL15 for a C-8-G type of CArG motif is unique among plant MADS domain protein binding site sequences reported thus far (11,(21)(22)(23). Even among MADS domain proteins from different kingdoms, only MEF2A has similar binding site preference (10).
AGL15 Has Higher Affinity for CArG Motifs with a Longer A/T-rich Core-To test whether AGL15 binds to the C-8-G CArG motif with higher affinity, several individual CArG sequences were selected from the populations for EMSAs. CArG1 and CArG2 were AG-selected CC-6-GG type motifs. CArG3 was a C-8-G type CArG repeatedly selected by AGL15. A representative EMSA is shown in Fig. 2A. For AG, all three individual CArGs had similar binding affinity. This was not an artifact of probe or protein saturation, because the same pattern was observed over a wide range of probe/protein ratios (not shown). AGL15, however, showed clear differential affinity among the three CArGs. CArG3, containing the longer A/T core, showed a strong interaction with AGL15 in EMSA, whereas binding to CArG1 and CArG2 (CC-6-GG type) was considerably weaker.
Reciprocal competition assays were used to confirm the preference of AGL15 for the C-8-G CArG motif. Unlabeled CArG sequences were used as competitors against themselves and each other. For AGL15, CArG3 was a much stronger competitor than CArG1 or CArG2 (Fig. 2B). When CArG3 was used as the radiolabeled probe, unlabeled CArG3 could effectively compete with the probe (Fig. 2B, lanes 2-4). In the presence of a 750-fold excess of CArG3 competitor, shifted bands were virtually undetectable (not shown). In contrast, unlabeled CArG1 or CArG2 competitors had little effect in the competition with radiolabeled CArG3 probe, even at a much higher competitor excess (Fig. 2B, compare lanes 5-7 and lanes 8 -10 with lanes 2-4; note the difference in -fold level of unlabeled competitor). When CArG1 or CArG2 was used as radiolabeled probe, unlabeled CArG3 also competed better (Fig. 2B, compare lanes 15-17 with lanes 12-14, and compare lanes 22-24 with lanes 19 -21). Taken together, these data suggest that for AGL15, a C-8-G type CArG3 with a longer A/T core was a much stronger competitor than CC-6-GG type CArGs, and, therefore, AGL15 had higher binding affinity to this form of CArG sequence. Competitive EMSAs revealed a lack of preference of AG for any particular form of CArG motif, although CC-6-GG type CArG motifs competed slightly better than the C-8-G motif (Fig. 2C).
Identifying in Vivo Targets of AGL15-PCR-assisted binding site selection assay identified an array of DNA sequences that AGL15 could bind under in vitro conditions and revealed a preference for binding to an unusual C-8-G type CArG motif. Although the situation in vivo is likely to be very different, correlations have been reported between in vitro binding affinity and functional in vivo binding (24). In the context of the chromatin environment, with the presence of interacting proteins, it is very likely that AGL15 binds DNA sequences that it is unable to bind in vitro (17), just as the animal MADS domain protein SRF binds sites in vivo that it is unable to bind in vitro (reviewed in Ref. 2). Conversely, due to chromatin structure, it is also likely that DNA sequences that correspond to preferred sites in vitro are not bound in vivo (17,25). Therefore, it is important to test whether the information obtained through a strictly in vitro study could be used to identify in vivo targets of AGL15.
Among all of the selected AGL15 binding sites, one particular sequence (TTACTATATATAGTAA) was especially intriguing. AGL15 showed high affinity for oligonucleotides containing this perfect 16-bp palindromic sequence (e.g. CArG3 in Fig.  2B). More interestingly, this CArG motif was highly represented in the AGL15-selected populations. About 30% (12 of 44) of the oligonucleotides after three rounds of selection contained this site. The percentage rose to 50% after the fourth round of selection and almost 100% after the fifth round (not shown). In addition, the consensus derived from alignment of nonredundant binding sites selected by AGL15 matched this particular motif (Fig. 1A). Does AGL15 bind to this type of CArG motif in vivo?
To address this question, this site was first assessed for function in vivo using a transient reporter assay. Eight tandem repeats containing this site were placed in front of a minimal 35S CaMV promoter (TATA-box), which in turn drives the expression of a GUS reporter gene. As a control, a similar construct containing tandem repeats of a mutated version of the binding site (TTAGTATATATACTAA) was also prepared. AGL15 cannot bind to this mutated sequence in vitro, as tested

FIG. 2. AGL15 preferentially binds a CArG motif with a longer A/T-rich core.
Relative binding affinity of AGL15 and AG to different CArG motifs are shown in (A). CArG1 and CArG2, AG-selected CC-6-GG type canonical CArG; CArG3, AGL15-selected C-8-G type CArG. Shifted probes are indicated with an asterisk. B, C, reciprocal competitive EMSA for AGL15 and AG, respectively, using the indicated labeled probes and unlabeled competitors. Competitors were 750ϫ, 1500ϫ, and 3000ϫ in excess, except in B (lanes 2-4), where 150ϫ, 300ϫ, and 600ϫ were used. All of the shifted bands were equally affected by the competitor, but, for simplicity, only the top protein-DNA complexes are shown.
by EMSA (not shown). The effector, an activated form of AGL15 that contains a viral VP16 transactivation domain at the Cterminal end, was placed under the control of a strong CaMV 35S promoter for constitutive expression (Fig. 3A). Cultures of Agrobacterium tumefaciens harboring the reporter constructs and the effector construct were mixed and co-infiltrated into the abaxial side of P. hybrida leaves. GUS activity was measured 3-5 days later. As shown in Fig. 3B, when both the 8XCArG::GUS and AGL15-VP16 were present, the GUS reporter was highly expressed. GUS activity measured 1000-fold higher than controls transformed with only the reporter or effector constructs. However, when the mutated CArG was used, virtually no GUS activity was detected, similar to the background levels detected with the reporter or effector alone. Reporter and effector cultures were mixed at several different ratios (1:3, 1:1, and 3:1), and the number of tandem repeats in the reporter constructs was varied (4ϫ and 8ϫ). GUS activity was also confirmed qualitatively by histochemical staining with X-Gluc. In all cases, the same trend was observed ( Fig. 3B and data not shown). Therefore, this type of CArG motif sequence can function as an in vivo binding site for AGL15.
The A. thaliana genome sequence data base (available on the World Wide Web at www.arabidopsis.org) was searched to identify genes containing the sequence TTACTATATATAG-TAA. Only one exact 16-bp match was found, and it lies within the single intron of a gene that encodes an unknown protein (GenBank TM accession number NM_106625). This gene is referred to as DTA4 (for Downstream Target of AGL15-4). Although transient reporter expression assay showed that AGL15 could bind to this DNA sequence in vivo, the question remains whether this naturally occurring site in the chromatin context was also associated with AGL15. To test this, enrichment tests in conjunction with chromatin immunoprecipitation (ChIP) assays were preformed. ChIP isolates in vivo AGL15-DNA complexes and has been shown to be an effective approach to select direct downstream targets of AGL15 (17). For the enrichment tests, primers were designed to amplify the genomic DNA region containing the potential AGL15 binding site within DTA4 as well as primers to amplify coding regions of various genes that are not expected to be bound by AGL15 as controls. Multiplex PCR was performed using both target and control primer pairs on ChIP-selected DNA populations (Fig. 4, ChIP) from embryonic culture tissue as well as DNA populations selected with preimmune serum (PI-ChIP) and nonselected input DNA (Input). Enrichment is defined as a greater abundance of target amplification product with ChIP DNA compared with that of input DNA (before ChIP) or PI-ChIP DNA, using the control amplification product as a reference. An in vivo AGL15-bound DNA fragment should be represented at higher abundance in the population after ChIP selection than in total DNA (i.e. it is enriched by ChIP). A representative gel image of an enrichment test is shown in Fig. 4. Input chromatin DNA was diluted so that the ␤-2-tubulin control was amplified to a comparable amount as in the ChIP and PI-ChIP populations. The intensity of the DTA4 band in ChIP was significantly stronger than observed in input or PI-ChIP, indicating strong enrichment of the DTA4 fragment by ChIP. Reproducible results were obtained from enrichment assays performed on several independently isolated ChIP populations. The DTA4 fragment was also not enriched in control ChIP experiments using immune serum depleted for anti-AGL15 IgGs, or in ChIP populations from wild type flower buds that accumulate many members of the MADS domain family but lack detectable amounts of AGL15 (not shown). Therefore, the binding site identified within DTA4 is associated with AGL15 in vivo.
Does in vivo binding by AGL15 result in a change in expression of DTA4? Semiquantitative reverse transcriptase-PCR was performed to determine whether DTA4 expression level was responsive to the level of AGL15 accumulation. Transgenic A. thaliana with a p35S::AGL15 transgene has a somewhat higher AGL15 level than wild type in developing siliques (7,17). RNA was isolated from wild type and transgenic plant siliques at the same developmental stages. As shown in Fig. 5, expression of DTA4 was increased in the p35S::AGL15 siliques, where there is an elevated level of AGL15 at all developmental stages examined (5-6, 9 -10, and 11-12 days after flowering).  4. In vivo binding of AGL15 to a C-8-G CArG motif within DTA4. Multiplex PCR was performed using two pairs of primers amplifying the CArG motif region (DTA4) and control (␤-2-tubulin (TUB2)). For comparison, the template amount in each PCR was adjusted empirically so that ␤-2-tubulin was amplified to a similar extent in all samples. Input, total input chromatin DNA; ChIP, DNA selected using AGL15-specific antiserum; PI-ChIP, DNA selected using preimmune serum.
involved in extending the developmental roles of a transcription factor, discrimination is likely to be an important aspect to execute specific developmental programs. How do individual members achieve different developmental consequences? It is possible that there is discrimination in binding sites, leading to differential gene regulation. Or perhaps binding site recognition overlaps extensively, and different developmental programs are realized through other mechanisms such as interactions with other proteins or chromatin architecture. Most likely, all of these mechanisms contribute to functional specificity.
The problem of functional specificity is especially obvious for plant MADS domain proteins, which number over 80 in A. thaliana, and play diverse, critical roles in development (3). Several members of the family have been found to bind related DNA binding sites (CArG motifs), although with distinct preferences in some cases. Despite the fact that AG, AP1, AP3, and PI, members of the Arabidopsis MADS domain family involved in flower development, can bind the same or very similar CArG motifs (26), they have different developmental roles. In some cases, domain swapping experiments have localized biological specificity within the plant MADS and adjoining linker domains (27), but other experiments suggest that DNA-binding site recognition is not the determining factor for execution of specific developmental programs (27,28).
In animals, the MADS domain proteins SRF and MEF2A have more stringent in vitro binding site preferences. SRF binds to a serum response element of the form CC(A/T) 6 GG in vitro, but not to the N-10 site, which has a consensus sequence of CTA(A/T) 4 TAG. Conversely, MEF2A has been reported to bind to the N-10 site but not a serum response element in vitro (29). Also important is the degree of DNA bending induced by binding of these proteins (ϳ73°for SRF; ϳ19°for MEF2A). Bending has been shown to be important for gene regulation for DNA-binding proteins in general as well as for members of the MADS family (30,31). The key determinant in DNA site recognition and bending within the MADS domain of SRF and MEF2A is a single amino acid residue that is not strongly conserved among MADS box genes (29). In contrast, the degree of bending by the plant MADS domain protein SQUAMOSA was found to be intrinsic to the DNA-binding site rather than determined by the protein (32). Also, for the yeast MADS domain protein MCM1, the DNA sequence greatly influences the extent of bending (31). A single base change within the A/T-rich sequences flanking the CArG motif does not affect binding significantly but has a dramatic effect on degree of bending and on activation and repression of downstream targets of MCM1 (31).
Recent work has indicated that the DNA-binding site is not a passive player in the DNA-protein interaction. Several studies have found that a given transcription factor can exhibit different activities depending on DNA-binding site (reviewed in Ref. 33). Binding to different DNA sequences may result in changes in protein conformation and/or interactions with other proteins, with consequences ranging from different degrees of transcriptional activation to activation at some sites but repression at others. Differences in binding affinity to different sites may also have developmental consequences. In a microarray study to identify targets of the C. elegans FOXA protein, PHA-4, it was found that the relative affinity, as assessed in vitro by EMSA, correlated with the onset of gene expression during development. Genes with higher affinity in vitro binding sites are expressed earlier in development, whereas genes with lower affinity sites are expressed later (34). Therefore, an understanding of preferred DNA sequences bound in vitro can be very relevant to understanding how a DNA-binding protein functions in vivo.
AGL15 Has a Unique Binding Site Preference-AGL15 offers a distinct opportunity to examine binding site preference because AGL15 is one of the most diverse members of the MADS domain family in higher plants, including within the DNAbinding MADS domain (8,9). PCR-assisted binding site selection revealed that AGL15 binds with highest affinity to a variant CArG of form C-8-G with strong preference for A/Tflanking sequences. Although other plant MADS domain proteins will bind to this type of sequence, to our knowledge, preference for this type of CArG is unique to AGL15 among plant MADS domain binding sites studied to date. MEF2A, a MADS domain protein involved in mammalian muscle-specific gene regulation, specifically recognizes DNA sites of form C-8-G (10). In particular, AGL15 recognized a palindromic sequence of TTACTATATATAGTAA that was preferentially selected in two entirely independent experiments. EMSAs with individual selected CArG sequences ( Fig. 2A) and competition experiments (Fig. 2B) confirmed that AGL15 preferentially binds to CArG motifs of the C-8-G form in vitro. AGL15 was also able to bend DNA (ϳ90°, not shown), as determined by circular permutation assay (35,36) using a truncated form of AGL15. Interestingly, AGL15 also bent a CArG motif of the form CC-6-GG to a similar extent (not shown), indicating that bending may be a function of the protein for AGL15, as found for the animal MADS domain proteins, rather than of the DNA binding site, as found for SQUAMOSA.
The Value of in Vitro Study: Identifying in Vivo Targets-We demonstrate in this report that in vitro binding site information for AGL15 is relevant for in vivo function and can be used directly to identify previously unknown or unsuspected target genes. Binding sites identified for AGL15 in vitro are able to function in vivo. A transient reporter system demonstrated that a high affinity in vitro site (TTACTATATATAGTAA) could lead to a high level of GUS reporter expression upon AGL15 binding in vivo. The mutated site was not active, indicating sequence-specific interaction. The Arabidopsis genome was searched with the sequence corresponding to the highest affinity binding site. Only one exact match was found, corresponding to sequence in the single intron of DTA4, a gene encoding a hypothetical protein. This cis-element was evaluated for association with AGL15 in the chromatin context by performing ChIP to isolate DNA fragments that bind AGL15 in vivo and testing whether a DNA fragment containing the DTA4 putative CArG motif was preferentially selected compared with DNA fragments not expected to be bound by AGL15. As shown in Fig. 4, the DNA fragment containing the palindromic CArG motif was enriched in the ChIP population relative to total input DNA or preimmune serum-selected DNA, indicating selection in ChIP via in vivo bound AGL15. In addition, expression of DTA4 responds to the level of AGL15. In the presence of increased levels of AGL15 as found in p35S::AGL15 transgenic plants, expression of DTA4 is increased. Therefore, AGL15 binds and regulates gene expression in vivo via a cis-element FIG. 5. Response of DTA4 expression to AGL15 levels. Semiquantitative reverse transcriptase-PCR was performed on RNA isolated from staged silique tissues. DTA4 expression levels in wild type Arabidopsis (WT) were compared with those in p35S::AGL15 transgenic plants (O/E) accumulating increased levels of AGL15. The coding region of elongation factor 1␣ was amplified as a control for normalization. The color of the image is inverted for clarity. corresponding to a DNA sequence identified in vitro.
It has been suggested that for a gene to qualify as a direct downstream target of a transcription regulator, three criteria must be satisfied (37,38): 1) the regulatory protein should bind to the cis-element in vivo; 2) expression of the target gene should respond to changes in the level of the regulator; and 3) the binding site for the regulator in the target gene should confer regulation by the regulatory protein in vivo. DTA4, a gene identified based on in vitro binding site information, passes all three of these tests for demonstrating direct regulation by AGL15. Further characterization of this gene and its role in development is currently underway.
Although PCR-assisted binding site selection can lead to identification of in vivo targets, it is important to recognize that the information obtained is strictly in vitro. In the chromatin environment, consensus sites cannot be assumed to be occupied by a DNA-binding factor. Several other CArG motifs that contained CTATATATAG with A/T-rich flanking sequences were identified in the 5Ј regulatory regions of genes by data base searches. Of three tested, two appeared to bind AGL15 in vivo (not shown). Selective binding to a subset of potential DNAbinding sites is a theme emerging from genome-wide studies mapping the comprehensive in vivo binding sites of transcription factors (25). Additionally, binding may occur to nonconsensus sites in vivo. Nevertheless, in vitro binding site selection provides valuable information. It is the most convenient method to select preferentially bound sequences from a pool containing virtually all possibilities, and binding affinity, as determined in vitro, has been correlated with in vivo activity (24,34).
Several DTA sequences identified by direct sequencing of a ChIP population were reported in a previous study (17). Whereas DTA2 contains a canonical CArG motif of form CC(A/ T) 6 GG in the 5Ј regulatory region (17), other DNA fragments isolated by ChIP do not contain canonical binding sites for MADS domain proteins. Information obtained from the binding site selection experiments reported here has assisted in identification of potential binding sites in DNA fragments isolated by ChIP. The regulatory regions of DTA1 and DTA3, for example, do not contain any canonical CArG motifs. The putative binding site for AGL15 in the regulatory region of DTA1 contains a C-8-G motif. Recent work indicates that this site is involved in the expression of DTA1 in response to AGL15 in vivo. 2 The DNA fragment isolated by ChIP from DTA3 is strongly enriched in the ChIP populations, but the most likely binding site for AGL15 has a form of CC(A/T) 6 CG, which is not of a form isolated by binding site selection assays. 3 It seems likely that co-factors are involved in binding of AGL15 to DTA3 regulatory regions in vivo. Other DNA fragments isolated by ChIP have a variety of potential binding sites, including CC-6-GG, C-7-GG/CC-7-G, and C-8-G forms. In addition, the information obtained from the experiments described in this report allowed identification of in vivo binding sites for AGL15 within its own regulatory regions. 3 These sites of form C-8-G were previously ignored because they did not fit the canonical type CArG motif. Therefore, information obtained by binding site selection experiments has been useful for discovery of novel direct targets (DTA4), determination of binding sites for AGL15 in DNA fragments selected by ChIP (DTA1 and DTA2), and identification of binding motifs in suspected targets (AGL15).