A comparative analysis of Smad-responsive motifs identifies multiple regulatory inputs for TGF-β transcriptional activation

Smad proteins are transcriptional regulators activated by TGF-β. They are known to bind to two distinct Smad-responsive motifs, namely the Smad-binding element (SBE) (5′-GTCTAGAC-3′) and CAGA motifs (5′-AGCCAGACA-3′ or 5′-TGTCTGGCT-3′). However, the mechanisms by which these motifs promote Smad activity are not fully elucidated. In this study, we performed DNA CASTing, binding assays, ChIP sequencing, and quantitative RT-PCR to dissect the details of Smad binding and function of the SBE and CAGA motifs. We observed a preference for Smad3 to bind CAGA motifs and Smad4 to bind SBE, and that either one SBE or a triple-CAGA motif forms a cis-acting functional half-unit for Smad-dependent transcription activation; combining two half-units allows efficient activation. Unexpectedly, the extent of Smad binding did not directly correlate with the abilities of Smad-binding sequences to induce gene expression. We found that Smad proteins are more tolerant of single bp mutations in the context of the CAGA motifs, with any mutation in the SBE disrupting function. CAGA and CAGA-like motifs but not SBE are widely distributed among stimulus-dependent Smad2/3-binding sites in normal murine mammary gland epithelial cells, and the number of CAGA and CAGA-like motifs correlates with fold-induction of target gene expression by TGF-β. These data, demonstrating Smad responsiveness can be tuned by both sequence and number of repeats, provide a compelling explanation for why CAGA motifs are predominantly used for Smad-dependent transcription activation in vivo.

Transforming growth factor-␤ (TGF-␤) 2 is a pleiotropic cytokine that plays crucial roles in various biological processes through regulation of growth, motility, and differentiation of target cells. TGF-␤ signals are mediated through two types of transmembrane receptors, type I and type II. Upon ligand binding, they form a heterotetrameric complex, in which constitutively active type II receptor phosphorylates type I receptor and activates it. The activated type I receptor transmits signals through the Smad signaling pathway and non-Smad signaling pathways (1,2). In the Smad signaling pathway, the activated type I receptor phosphorylates cytoplasmic effector molecules called Smad2 and Smad3. Smad2 and Smad3 subsequently form a heteromeric complex with Smad4 that is translocated into the nucleus, where they positively or negatively regulate target gene expression through association with transcription regulatory genomic regions, together with other transcription factors as well as coactivators and corepressors (3).
The activated Smad complex usually requires DNA-binding transcription factors, also known as Smad cofactors, to successfully regulate target genes because the affinity of Smad proteins to the Smad-binding sequences is thought to not be sufficiently high. Alternatively, Smad proteins alone can activate transcription when multiple Smad-binding sequences are present in tandem (reviewed in Ref. 17). Jonk et al. (11) reported that four copies of the 5Ј-CAGACA-3Ј sequence are sufficient to induce transactivation but that two copies are not. Johnson et al. (18) examined the ability of abutting SBE half-sites to enhance transcription. They found that a reporter construct with three copies of 5Ј-GTCT-3Ј has a higher enhancer activity than one with two copies. Some of these studies, however, only utilized assay systems in which Smad3 and Smad4 are overexpressed. Therefore, they may not reflect the situation of endogenous Smad proteins. At present, the minimal requirements for endogenous Smad-dependent transcription activation remain to be determined.
Here, we compared the functional properties of the SBE and CAGA motifs and examined requirements for Smad-dependent transcription activation. We found that either one SBE or a triple CAGA motif acts as a cis-acting functional half-unit for transcription activation; they are, by themselves, inactive but the combination of two half-units enables gene expression. Importantly, the abilities of Smad-binding sequences to induce transcription do not reflect their abilities to interact with Smad proteins. In an electrophoretic motility shift assay, active and inactive DNA sequences yielded band shifts of differing mobilities. We also found that several CAGA-like motifs with single-base replacement are partially active in inducing gene expression, whereas only the complete SBE is active. The number of CAGA and CAGA-like motifs in Smad2/3-binding sites surrounding TGF-␤ target genes, detected using ChIP-seq analysis of normal murine mammary gland epithelial (NMuMG) cells, correlated well with the fold-induction of target genes after TGF-␤ stimulation. The functional properties of SBE and CAGA motifs revealed in this study help explain why CAGA motifs are predominantly used to fine-tune Smad-dependent induction of endogenous target genes.

DNA-binding properties of Smad proteins overexpressed in mammalian cells
The DNA-binding specificity of Smad3 and Smad4 was previously analyzed by a cyclic amplification of selected target (CASTing) method using bacterially expressed Smad proteins (5). The consensus Smad-binding sequence was found to be an 8-bp palindromic sequence, 5Ј-GTCTAGAC-3Ј (SBE). We applied this strategy to Smad proteins expressed in mammalian cells with an input of TGF-␤ signaling. In addition, we used a pool with 32-bp degenerated sequences instead of a pool with 20-bp degenerated sequences that were used to select Smadbinding sequences in the previous report (5).
HEK293T cells were transfected with FLAG-tagged Smad3 and constitutively active TGF-␤ type I receptor (CA-T␤RI) or FLAG-tagged Smad4 alone. Thirty hours later, cells were harvested. A nuclear extract was prepared from the harvested cells for CASTing analysis using anti-FLAG antibody. After five cycles of selection, selected DNA fragments were sequenced with a next-generation sequencer. The top 25 concentrated sequences are shown in Table S1. In addition to one or two copies of SBE, a part of the CAGA motifs (5Ј-CCAGACA-3Ј and its reverse complement 5Ј-TGTCTGG-3Ј) was principally concentrated. Thus, we obtained results distinct from those of CASTing analysis using bacterially expressed Smad proteins (5). Notably, recently reported 5-bp GC motifs (16) were less frequently observed. Table 1 summarizes the occurrence of Smad-binding motifs, SBE, CAGA, and 5-bp GC in Smad3, as well as Smad4-binding sequences. Although both Smad3 and Smad4 interact with SBE and CAGA motifs, Smad3 prefers to bind CAGA motifs, whereas Smad4 prefers SBE. Occurrence ratios of SBE to CAGA motifs were around 1:1.5 for Smad3 and 5:1 for Smad4. We also noticed that Smad3 and Smad4 have different preferences for bases surrounding SBE (Table 2). Both Smad3 and Smad4 prefer 5Ј-T-GTCTAGAC-A-3Ј, but Smad3 exhibits a more pronounced preference than Smad4 (Table 2, blue shadowed). In addition, Smad4 prefers 5Ј-C-GTCTAGAC-A-3Ј or its complementary 5Ј-T-GTCTAGAC-G-3Ј (Table 2, orange shadowed). As for bases surrounding the CAGA complementary motif, Smad3 prefers 5Ј-G-TGTCTGG-T-3Ј, but Smad4 does not exhibit a clear preference (data not shown).

Designing a new series of reporters containing SBE or CAGA motifs
We went on to compare the properties of each element in TGF-␤-induced transcription activation, using a new series of Table 1  Distinct preferences of Smad3 and Smad4 for SBE and CAGA motifs   Binding sequences to Smad3 or Smad4 overexpressed in 293T cells were concentrated from a double-stranded DNA oligonucleotide library containing 32-nucleotide random sequences. Occurrence of SBE (5Ј-GTCTAGAC-3Ј), CAGA motifs  (5Ј-CCAGACA-3Ј and its complementary 5Ј-TGTCTGG-3Ј), and 5-bp GC motifs  (5Ј-GGCGC-3Ј and its complementary 5Ј-GCGCC-3Ј, 5Ј-GGCCG-3Ј and its complementary 5Ј-CGGCC-3Ј) among total reads are shown. In parentheses, occurrence of each motif per 1,000 reads is shown.
A DNA probe with three tandem repeats of the CAGA motif ("3ϫCAGA" probe) has been shown to interact efficiently with the Smad complex and has been widely used for DNA affinity precipitation (DNAP) assays (19). Because the spacing between the CAGA motifs in the 3ϫCAGA probe is different from that in CCC-Luc, we also constructed a reporter containing the 3ϫCAGA sequence, namely 3ϫCAGA-Luc (Fig. 1A). However, this construct failed to exhibit activity in response to TGF-␤ stimulation (Fig. 1B). Thus, 3ϫCAGA can bind to Smad proteins, but it is inactive as a cis-regulatory element.

The abilities of DNA sequences to cis-activate transcription do not reflect their binding affinities to Smad proteins
We synthesized a biotinylated SSS sequence and investigated its binding to Smad proteins. NMuMG cells were stimulated with TGF-␤ for 1 h before being harvested. Cell lysates were subjected to DNAP assay. SSS precipitated Smad3 as well as Smad4 from cells stimulated with TGF-␤ ( Fig. 2A), which was outcompeted by an excess of nonlabeled SSS but not by its mutant (S m S m S m ) (S m ; 5Ј-CTGATCAG-3Ј) (Fig. 2B). The DNA probe precipitated only a small fraction of phosphorylated Smad2, but a significant portion of phosphorylated Smad3 (Fig.  S1, "unbound and input"), suggesting that it interacts efficiently with the Smad complex composed of phospho-Smad3 and Smad4. Other probes, 3ϫCAGA and CCC, also precipitated Smad proteins ( Fig. 2A), although they lack transcription induction activity.
Intriguingly, Smad proteins bound to biotinylated SSS were outcompeted by 3ϫCAGA, and those bound to biotinylated  3ϫCAGA were outcompeted by SSS (Fig. 2C). We also examined serial precipitation using SSS and 3ϫCAGA probes. After depleting Smad proteins that interact with either SSS or the 3ϫCAGA probe, the other probes precipitated barely anymore Smad proteins (Fig. 2D)

Properties of Smad-binding motifs
We found that both the active cis-regulatory element SSS and the inactive 3ϫCAGA or CCC interact in a similar fashion with the Smad complex in DNAP assay. When the Smad complex interacts with the SSS sequence, it activates gene expression, but when it interacts with 3ϫCAGA, CCC, or CcCcCc sequence, it fails to activate gene expression. Smad-bound DNA sequences thus affect the ability of the Smad complex to transactivate gene expression.

Distinct properties of SBE and CAGA in relationship to transcription activation
Although three tandem repeats of the CAGA motif (3ϫCAGA, CCC, or CcCcCc, referred to henceforth as the triple-CAGA motif) were insufficient for TGF-␤-induced transcription activation, a 9 or 12 repeat of the CAGA motif is widely used as a highly sensitive reporter to monitor TGF-␤ signaling (9). Next, we asked how many repeats of CAGA motif or SBE are required for efficient transcription activation. We found that at least five to six CAGA repeats are required for greater than 10-fold activation, whereas two SBEs are enough (Fig. 3A). One SBE is inactive probably because it only has weak affinity for the Smad complex as shown by DNAP assay (Fig.  3B), whereas the triple-CAGA motif is inactive although it interacts efficiently with Smad proteins. Furthermore, more than four tandem repeats of SBE appeared to suppress transcription activation ( Fig. 3C and Fig. S2). Although the underlying mechanism remains to be elucidated, similar effects of binding site number on transcription activity have been reported for some yeast transcription factors (20). Similar results were obtained using HaCaT and a human lung adenocarcinoma cell line, A549 (data not shown). Therefore, the CAGA motif is additive in nature but SBE has an optimum number of repeats when acting as a cis-regulatory element.
Next, we designed a series of composite reporter constructs composed of two SBEs and single-CAGA motifs (C or Cc). Substituting SBE with C or Cc affected reporter activities depending on the relative positions and orientations (Fig. 3D). Substituting 3Ј-SBE with Cc (SSCc-Luc) enhanced the reporter activity, whereas doing the same with 5Ј-SBE (CcSS-Luc) severely compromised it. Substitution with C resulted in weak to moderate inhibition, whereas C is preferred in the 5Ј-position. Intriguingly, sequences surrounding the core motifs, 5Ј-GTCT-3Ј or 5Ј-AGAC-3Ј, play a substantial role in transcription activation because mutation of either Cc (5Ј-TGTCTGGCT-3Ј) to Cc 3m (5Ј-TGTCTTCAG-3Ј) or C (5Ј-AGCCAGACA-3Ј) to C 5m (5Ј-CTGAAGACA-3Ј) halved the activity level (Fig. 3E). Thus, the presence of the core motifs, 5Ј-GTCT-3Ј or 5Ј-AGAC-3Ј, alone is not enough. Instead, the core motifs are enhanced by adjacent sequences in C and Cc. Furthermore, we constructed reporters containing either one SBE and two C or Cc motifs. We found that most of these reporters exhibited weak activities (Fig. S3). We also observed a preference for Cc in the 3Ј-position and for C in the 5Ј-position in these constructs.
These findings indicate that SBEs and CAGA motifs have distinct properties with respect to transcription activation. We noticed that two SBEs are comparable in their potency to two tandem triple-CAGA motifs in this type of reporter constructs, whereas both one SBE and a triple-CAGA motif fail to induce transcription (Fig. 3A). This observation suggests that both a single SBE and a triple-CAGA motif can act as half-units for transcription activation.

A single SBE is interchangeable with a triple-CAGA motif in Smad-dependent transcription activation
Next, we determined whether a single SBE in SS-Luc can be replaced by a triple-CAGA motif. We either substituted 3Ј-SBE with one to three Cc motifs, or 5Ј-SBE with one to three C motifs, whereas taking positional effects into account. As we anticipated, one SBE can be successfully replaced by three tandem Cc or C elements (Fig. 4A). Reporter activity was maximal when the orientation of all three individual CAGA motifs within a triple-CAGA motif is the same (Fig. 4B). At least two consecutive motifs must be oriented in the same direction for efficient transcription activation.
The artificial reporters described above lack binding sites for other transcription factors and are possibly driven by Smad transcription factors only. Endogenous Smad proteins, however, often have to cooperate with DNA-binding transcription factors known as Smad cofactors, when they transactivate endogenous target genes (reviewed in Ref. 21). Nakano et al. (22) previously reported the construction of an artificial reporter composed of three tandem CAGA motifs and a TCF7L2-binding element (TTE). Because three tandem repeats of either the CAGA motif or TTE alone failed to induce transcription, the reporter's activity appears to depend on the cooperation between Smad proteins and TCF7L2. We modified the reporter construct for further analysis, which is henceforth denoted as CCCT-Luc. We truncated three nucleotides from the 5Ј end of the original version, and verified that both elements are required for transcription activation (Fig. 4C). We reduced the number of CAGA motifs in the CCCT-Luc construct. CCT-Luc exhibited moderate activity, whereas CT-Luc was almost inactive. Next, we replaced the CCC sequence with one SBE (ST-Luc) and determined its activity. As we predicted, ST-Luc exhibited strong transcription activation, similarly to the CCCT-Luc (Fig. 4C). Thus, a single SBE appears to be as Figure 2. Smad-binding properties of transcriptionally active and inactive cis-regulatory elements in DNAP assay. A, DNAP assay using active (SSS) as well as inactive (CCC and 3ϫCAGA) cis-regulatory elements. DNAP assay was performed with either of three biotinylated probes: SSS, CCC, and 3ϫCAGA. Total cell lysates (input), proteins in cell lysates bound or unbound to the probes were analyzed by immunoblotting using indicated antibodies. ␣-Tubulin was used as a loading control. The membranes were reprobed for Smad4 and ␣-tubulin. Smad2 bands are indicated by an asterisk. B, competitive binding assay. DNAP assay was performed in the presence of the indicated amounts of nonbiotinylated probes, SSS or S m S m S m . Phosho-Smad3 (pSmad3), Smad4, Smad2/3, and ␣-tubulin were detected by immunoblotting. C, cross-competitive binding assay. DNAP assay was performed in the presence of a nonbiotinylated probe, either SSS or 3ϫCAGA, at the indicated concentration. Phospho-Smad3 (pSmad3), Smad4, and ␣-tubulin were detected by immunoblotting. D, consecutive DNAP assay. Lysates prepared from NMuMG cells stimulated with TGF-␤ for 1 h were first subjected to precipitation with either biotinylated 3ϫCAGA or biotinylated SSS probe. Each resultant supernatant was further subjected to re-precipitation with the respective probe. A whole procedure is summarized (top). Phospho-Smad3 and Smad4 in bound fractions was detected by immunoblotting.

Properties of Smad-binding motifs
potent as a triple-CAGA motif in this reporter system. Similar results were obtained using HaCaT cells (data not shown).
Our hypothesis that a single SBE and a triple-CAGA motif are equally adept at inducing transcription was verified by two independent reporter systems, with or without a Smad cofactor-binding site. These findings indicate that both a single SBE and a triple-CAGA motif form a cis-acting half-unit required for transcription activation. By themselves, they are inactive,

Properties of Smad-binding motifs
but they can cooperate with either Smad-binding or Smad cofactor-binding sequences. Interestingly, a double-CAGA motif appears to be a partially active half-unit. However, both the single SBE and triple-CAGA motifs have distinct Smadbinding properties; the former is unable to bind to Smad proteins by itself, whereas the latter can interact efficiently with Smads on its own.

Transcriptionally active and inactive Smad-binding sequences are bound by distinct Smad complexes
The next question we posed concerned the crucial difference between active and inactive Smad-binding sequences. To determine the nature of this difference, we measured the binding of Smad proteins to both active and inactive sequences using a electrophoretic mobility shift assay (EMSA) with fluorescent probes, inactive S, 1ϫCAGA, 3ϫCAGA, and active SSS.
We prepared and used nuclear extracts from 293T cells overexpressing full-length Smad3 and Smad4 with CA-T␤RI, and extracts from mock-transfected cells as a control (Fig. 5A). Inactive 1ϫCAGA and 3ϫCAGA as well as active SSS yielded shifted bands. By contrast, S did not yield shifted bands although it did shift weakly when either bacterially expressed Smad3 or Smad4 MH1 domain was used (Fig. S4). Importantly, compared with inactive 1ϫCAGA and 3ϫCAGA, active SSS yielded a slowly migrating shifted band, which appears to represent a DNA probe bound to transcriptionally active Smad complex. Furthermore, we verified the compositions of Smad proteins in the shifted bands by incubating the gel-shift mixture with either anti-Smad2/3 or anti-Smad4 antibodies (Fig. 5B). Incubation with normal IgG did not affect the SSS-shifted band. Incubation with either anti-Smad2/3 or anti-Smad4 resulted in a supershift of the SSS-shifted band, indicating that the SSSshifted band is composed of Smad3 and Smad4. Similar results were obtained when we used 3ϫCAGA as a probe. Notably, the SSS-shifted band was outcompeted with nonlabeled 3ϫCAGA probe and vice versa (Fig. 5C). These results are consistent with those from DNAP assays and again suggest that both probes interact with mostly overlapping populations of Smad proteins. Thus, Smad complexes are likely to be converted to the transcriptionally active form upon interaction with SSS but not with 3ϫCAGA.

Usage of CAGA and CAGA-like motifs in TGF-␤-induced transcription activation of endogenous target genes
Our final aim was to determine how the properties of SBE and CAGA are reflected in transcriptional regulation of endogenous TGF-␤ target genes. Therefore, we examined the occurrence of SBE as well as CAGA motifs in Smad2/3-binding sites surrounding TGF-␤ target genes. NMuMG cells were stimulated with TGF-␤ for 1.5 h, harvested, and subjected to determination of Smad2/3-binding sites by ChIP-seq (GSE121254). We detected 5,847 peaks throughout the genome.
The CAGA motif was reported as a 9-bp element, 5Ј-AG(C/ A)CAGACA-3Ј (or its complementary 5Ј-TGTCTG(G/T)CT-3Ј). However, the effects of base replacement have not been investigated thus far. We utilized SCcCcCc-Luc as a platform and replaced one of the bases with others and measured corresponding reporter activities ( Table 3). The last two positions affected the activity less, whereas the other positions significantly affected the activity. In addition to 5Ј-TGTCTGG-3Ј (Cc) and 5Ј-TGTCTGT-3Ј (Cc-G7T) that are originally reported (9), we selected 5Ј-TGTCTAG-3Ј(Cc-G6A), 5Ј-TGA-CTGG-3Ј (Cc-T3A), and 5Ј-CGTCTGG-3Ј(Cc-T1C), as active CAGA-like motifs (with activities more than one-third of that of the original motif). Similarly, the base requirement in SBE was determined using SS-Luc as a platform. However, any base substitutions resulted in a drastic drop in activities down to less than 10% (Table S2), indicating that the base requirement in Figure 4. A single SBE and a triple-CAGA motif are interchangeable in reporter constructs. A-C, luciferase reporter activities in response to TGF-␤ stimulation. Luciferase activity was measured and data are shown as foldinduction. Error bars represent S.D. from three experimental replicates. One representative result from three independent experiments is shown. A, one SBE in SS-Luc was replaced by anywhere from one to three tandem CAGA (C) or CAGA complementary (Cc) motifs. B, the CcCcCc sequence in SCcCcCc-Luc was in part replaced by a CAGA motif (C). Two or three consecutive motifs oriented in the same direction are underlined. C, the CCC sequence in a cofactor (TCF7L2)-dependent reporter construct, CCCT-Luc, was replaced by SBE, C, or CC.

Properties of Smad-binding motifs
SBE is strict. Therefore, we only examined the occurrence of the complete SBE sequence.
Among 5,847 Smad2/3-binding sites, 3,818 had at least one SBE, CAGA, or CAGA-like motifs in 1,000-bp length genomic sequences flanking the middle position of the Smad2/3-binding sites. The incidence of each element is summarized in Table S3. SBE was not frequently observed in Smad2/3-binding sites, whereas the CAGA and CAGA-like motifs, especially Cc-G7T, as well as 5-bp GC motifs were commonly found.
We next examined the roles of these elements in regulating target gene expression. We previously performed DNA microarray analysis of TGF-␤ target genes in NMuMG cells (23) (GSE46405). Seventy-nine genes were significantly expressed (log 2 intensity Ͼ7) and induced by TGF-␤ stimulation (1 h) more than 1.6-fold (Table S4). Among these, 67 genes had a total of 239 Smad2/3-binding sites located within 50 kb upstream of the annotated transcription start site or downstream of the transcription termination site. We selected 15 representative target genes with various numbers of Smad-binding motifs, and determined their expression after TGF-␤ stimulation for 1.5, 3, and 8 h (Fig. 6A and Fig.  S5). We observed a positive correlation (R around 0.7) between the number of CAGA and CAGA-like motifs and fold-induction after TGF-␤ stimulation (3 or 8 h), which correlated less well with the number of CAGA motifs only, and not with that of 5-bp GC motifs (Fig. 6, B and C). Thus, CAGA and CAGA-like motifs appear to play a central role in Smad-dependent transcription activation in vivo, and SBE is only rarely utilized although Smad proteins have a high intrinsic affinity for this sequence motif.

Discussion
TGF-␤ signaling is principally transmitted by Smad2 and Smad3, which are phosphorylated by TGF-␤ type I receptor.

Properties of Smad-binding motifs
Smad2 and Smad3 phosphorylation leads to formation of a trimeric complex with Smad4 (co-Smad) to cooperatively regulate target gene expression in the nucleus. The DNA-binding affinity of the activated Smad complex is not high enough to support gene expression by itself, and additional DNA-binding transcription factors (Smad cofactors) are usually required for stable association of the Smad complex with the regulatory regions of target genes. Alternatively, when multiple Smadbinding sequences are present, Smad proteins do not require Smad cofactors (17). Thus far, several DNA motifs have been known as Smadbinding sequences. Two prominent examples of such motifs are 5Ј-GTCTAGAC-3Ј (SBE) and 5Ј-AG(C/A)CAGACA-3Ј or its complementary 5Ј-TGTCTG(G/T)CT-3Ј (CAGA motifs). These sequences share 5Ј-GTCT-3Ј or its complementary 5Ј-AGAC-3Ј motifs. However, functional properties of these two motifs have not been examined in depth. In addition, the reason why CAGA motifs are preferentially utilized in regulation of endogenous gene expression (Ref. 13 and this study) has remained to be clarified.

Smad-binding to a triple-CAGA motif is not sufficient to induce gene expression
Because Smad2 and Smad3 have the ability to trans-activate gene expression when fused to the Gal4 DNA-binding domain (24), it appeared likely that binding to the target gene promoters would be sufficient to activate gene expression. Johnson et al. (18) also reported that binding of Smad proteins to cis-regulatory elements is well correlated with transactivation. By contrast, Yingling et al. (25) described a discrepancy between Smad-binding and Smad-dependent transcription activation and discussed that Smad-binding sites may function only in the appropriate promoter context. In addition, Labbé et al. (26) found that both Smad2 and Smad3 form a stable complex with Smad4 and FoxH1 on the goosecoid promoter; however, the same group also found that only the complex containing Smad2 is active.
In this study, we observed that a reporter construct containing 3ϫCAGA failed to respond to TGF-␤ stimulation although 3ϫCAGA interacts efficiently with Smad proteins (Figs. 1B and  2A). By contrast, a single SBE failed to interact with Smad and to induce gene expression. Only two tandem SBE repeats are sufficient for both Smad-binding and transcription activation (Fig.  3, A and B). Zawel et al. (5) previously reported that "2ϫSBE" fails to induce transcription in response to TGF-␤ stimulation. Consistently, we found that the Smad-binding affinity of 2ϫSBE is much weaker than that of SS (Fig. S6). A 2ϫSBE has four bases (5Ј-GGCA-3Ј) between two SBEs, whereas SS has three bases (5Ј-AAT-3Ј) (Table S6). Both the number and sequence of nucleotides between two SBEs appear to affect their activities. Therefore, there is a discrepancy between DNAbinding affinity and transcription activation in the case of CAGA-containing sequences.
We also investigated the minimum requirement for the CAGA-motif-dependent transcription activation and found that at least five or six repeats are required for efficient activation. Song et al. (10) examined the potency of tandem repeats of sequence 5Ј-AGACAAGGTTGT-3Ј, which is derived from the human PAI-1 promoter region (Ϫ720/Ϫ708). Four repeats of the sequence induced reporter activity 4-fold, whereas six repeats induced reporter activity 40-fold following TGF-␤ stimulation. Although the sequence is different from the typical CAGA motif, the core 5Ј-AGAC-3Ј motif is shared. The sequence therefore appears to behave similarly to the CAGA motif.

Properties of Smad-binding motifs Molecular architecture of transcriptionally active Smad complex
From the results described above, we noticed that both a single SBE and the triple-CAGA motif behave similarly in transcription activation, although the latter can interact with Smad proteins, whereas the former fails to do so (Fig. 7A). Because neither is sufficient to mediate Smad-dependent transcription by itself, we propose that a single SBE and the triple-CAGA motif are transcriptionally inactive half-units that can induce Smad-dependent transcription activation when they are combined with another half-unit or possibly, a Smad cofactor site(s).
The next important question concerns the critical differences between Smad complex formed on inactive and active cis-regulatory elements. Active SSS and inactive 3ϫCAGA sequences exhibited similar behavior in DNAP assays (Fig. 2, C and D), but they behaved differently in EMSA (Fig. 5A). Notably, both bands were outcompeted by either SSS or 3ϫCAGA, indicating that DNA sequences apparently do not distinguish between active and inactive complexes. There are two possibilities to explain the difference between active and inactive complexes.
One possibility is that Smad complexes with the same composition bind to both 3ϫCAGA and SSS, but only those that bind to the latter can adopt the specific conformation that allows them to recruit other molecular component(s) required for efficient transcription activation (Fig. 7B). It was previously reported that p300, a histone acetyltransferase, and SMARCA4/Brg1, a component of the SWI/SNF chromatin remodeling complex, must be recruited to the Smad complex to modify chromatin templates, a requirement for the Smad complex to assemble basic transcription Figure 7. Models for Smad-dependent transcription activation. A, properties of DNA sequences containing SBE and/or CAGA motifs. S or CCC/CcCcCc does not independently induce transcription activation, but their composite sequence exhibits activity comparable with those of SS and C6/Cc6. Binding of C6/Cc6 sequence to Smads was confirmed by EMSA (data not shown). B, a model in which SSS is able to recruit additional component(s) required for efficient transcription activation. C, another model in which active sequences (SSS and C6/Cc6) interact with 2 units of activated Smad complexes for efficient transcription activation. Note that p300 is activated by transcription factor dimerization (29). D, a model for Smad-cofactor-dependent activation of TGF-␤ target genes, modified from Ref. 22. machinery (27). Thus far, p300 is known to interact with the Smad complex bound to the inactive 3ϫCAGA sequence (19). We also confirmed it for ourselves (data not shown). Other indispensable components may be excluded from the Smad complex formed on inactive sequences.

Properties of Smad-binding motifs
Another possible explanation is that binding of two Smad complexes is required for transcription activation, whereas binding of one Smad complex is not enough; the activated Smad complex may be a trans-acting functional half-unit (Fig.  7C). This model is consistent with our finding that both a single SBE and the triple-CAGA motif are cis-acting half-units and that at least two of them are required for transcription activation. The model is also reminiscent of Smad-cofactor-dependent transcription activation (Fig. 7D). In addition, transcription factors generally function in a combinatorial manner to regulate gene expression (28) and p300 is activated by transcription factor dimerization (29). All these arguments support the second model. Currently, there is no direct experimental evidence to support the second model. Further investigation is required to solve this important question.

Distinct functional properties of CAGA and SBE
Both Smad3 and Smad4 interact well with SBE (5Ј-GTCTAGAC-3Ј) although Smad3 prefers CAGA motifs. Nevertheless, CAGA motifs, but not SBE, are widely used for Smaddependent transcription activation. The reason underlying the preference for the CAGA motif remains to be elucidated. However, the distinct functional properties of SBE and CAGA that we uncovered in this study may help explain this preference for the CAGA motif.
First, DNA-binding appears to be sufficient for SBE-dependent transcription activation, whereas additional steps after DNA-binding is required for CAGA-dependent transcription activation, suggesting that the latter system has more points for functional regulation. Second, the effect of SBE on transcription activation is additive for up to three SBE units but suppressive beyond that number. By contrast, CAGA motifs have an additive effect probably up to 9 -12 units. Third, sequence requirement for SBE to behave as a cis-regulatory element is very stringent. SBE thus has an "all-or-none" property. By contrast, CAGA motifs are more flexible and some base replacements are permitted without drastic loss of the activity. This may be because SBE is composed only of the core sequences, 5Ј-GTCT-3Ј and its complementary 5Ј-AGAC-3Ј. It is noteworthy that requirement of the core 5Ј-GTCT-3Ј was strict even in CAGA-like motifs (Table 3), except for the second position that was shown to be less involved in the contact with Smad proteins (6 -8). Because multiple CAGA motifs exhibit additive effects and because partially active CAGA-like motifs also exist, combining CAGA and CAGAlike motifs would enable the regulation of target gene expression to be fine-tuned.

CAGA and CAGA-like motifs in regulating endogenous gene expression
We noticed that some of the Smad2/3-binding sites lack known Smad-binding elements in 1,000-bp length genomic sequences flanking the middle position. It is possible that Smad2 and Smad3 associate with these sites either indirectly through other DNA-binding transcription factors, as previously reported for Mixer, GATA-3, and Sox9 (30 -32), or directly through yet unidentified Smad-binding sequences.
The motif that we observed most frequently in Smad2/3binding sites surrounding these early target genes (239 sites) was Cc-G7T (134), followed by the CAGA motif (119) ( Table  S4). Although the Cc-G7T motif exhibits lower potency to activate transcription, it appears to be widely used as a regulatory element. These CAGA/CAGA-like motifs are clustered in some cases, but dispersed or scattered in others. In artificial reporter constructs, the location and orientation of the CAGA motif considerably affect transcriptional activities (Figs. 3D and 4B), probably because of local constraints in the context of the luciferase constructs. It remains unclear how the arrangement of Smad-binding elements in natural promoters affects gene expression. However, within the genomic context, the location and direction of CAGA-like motifs may affect transcriptional activities to a lesser degree because they are placed with sufficient spaces, which enables favorable configuration of each element through looping of DNA.
Notably, the numbers of CAGA/CAGA-like motifs are significantly correlated with fold-induction of target genes at 3 or 8 h, but not 1.5 h, after stimulation. Some of the Smad2/3binding sites do not appear to regulate transcription during the early phase after stimulation. This can be explained by our finding that Smad binding to DNA via CAGA/CAGA-like motifs is not sufficient for transcription activation. At the later phase, Smad complexes regulate gene expression possibly in cooperation with Smad cofactor(s) that are induced by TGF-␤, which is known as the "self-enabling" gene response (33). Alternatively, Smad proteins bound to genomic DNA induce chromatin remodeling in cooperation with SMARCA4/ Brg1, which enables expression of some target genes (34). Requirement for such a sequence of events is consistent with our previous observation that Smad-binding early after TGF-␤ stimulation is significantly correlated with later changes in the induction of target genes (13).

Conclusion
Recently there are a number of studies on cellular responses induced by TGF-␤ using ChIP-seq analysis. However, molecular events leading to transcription activation including chromatin modifications still remain elusive. In this study, utilizing luciferase reporter model systems, we found distinct functional properties of two representative Smad-binding sequences, SBE and CAGA motifs, which explain why CAGA/CAGA-like motifs are widely used for Smad-dependent transcriptional regulation in vivo. In addition, we presented basic data for the minimal requirements that two half-units together enable successful Smad-dependent transcription activation. Our study has thus provided valuable information to facilitate a better understanding of how Smad proteins are activated to induce the expression of genes that play crucial roles in development and maintenance of the architecture of multicellular organisms.

Experimental procedures
Cell culture HEK293T and NMuMG cells were obtained from the American Type Culture Collection and maintained in Dulbecco's modified Eagle's medium containing 10% fetal bovine serum (FBS), 50 units/ml of penicillin, and 50 g/ml of streptomycin. For culturing NMuMG cells, the culture medium was supplemented with insulin (10 g/ml).

Cyclic amplification of selected target (CASTing) method
Determination of Smad-binding DNA sequences was performed as described previously (37) with modifications. An oligonucleotide library containing 32 random nucleotides flanked by sequences for PCR amplification (5Ј-ACTGTACA-CTGGCGTCGTTC-(N 32 )-CTGCTCGAGCGTTCGGATCC-3Ј) was synthesized and used for screening. HEK293T cells were transiently transfected with FLAG-tagged Smad3 and CA-T␤RI or Smad4 alone using PEI Max transfection reagent (Polysciences). Thirty hours later, cells were harvested and a nuclear extract was prepared by NucBuster Protein Extraction Kit (Millipore), followed by CASTing analysis using anti-FLAG M2 magnetic beads. A hundred micrograms of protein was used in each round of selection. Taq polymerase (Takara Bio) was used for DNA amplification. After five rounds of selection, amplified DNA fragments were subjected to sequencing using Ion Torrent and Ion Plus Fragment Library Kit (Life Technologies). Collected sequences were clustered using Aptamer Clustering_2.6.1 (Life Technologies). Raw data are deposited to the DDBJ BioProject database (DRA007794).

Luciferase assay
Luciferase assay was performed as described previously (36) using TGF-␤-responsive reporters, (CAGA) 12 -MLP-Luc (9), 4ϫSBE-Luc (5), and other constructs. In brief, NMuMG cells were transfected with the indicated luciferase reporter constructs and pRL-TK vector using PEI Max transfection reagent (Polysciences), stimulated with TGF-␤ (1 ng/ml) for 18 h, and harvested. Luciferase activities in the lysates were measured using the dual-luciferase reporter system (Promega) and a luminometer (Spectra Max L, Molecular Devices). Values were normalized to those measured for Renilla luciferase under the control of the thymidine kinase promoter. Activities are shown as fold-induction by TGF-␤. All the reporter constructs used in this study have the pGL4-MLP backbone (pGL4 vector containing the TATA box and the initiator sequence of the adenovirus major late promoter). Oligonucleotides for artificial reporter constructs were synthesized with flanking KpnI and XhoI sites, and inserted into pGL4-MLP. Sequences for artificial reporter constructs are listed in Table S5.

DNAP
NMuMG cells were stimulated with TGF-␤ (1 ng/ml) for 1 h and harvested. Cell lysates were prepared with a buffer containing 1% Nonidet P-40, 20 mM Tris-HCl (pH 7.4), 150 mM NaCl, 5 mM EDTA, protease inhibitor mixture (Nacalai Tesque), and phosphatase inhibitor mixture (Nacalai Tesque). DNAP assay was performed as described previously (19) with modifications. Dynabeads M280 streptavidin (10 l, Life Technologies) were incubated with biotinylated DNAs (30 pmol) for 15 min, followed by washing. Lysates (350 g protein) were added to DNA-bound Dynabeads in the presence of salmon sperm DNA (80 g/ml) and incubated overnight at 4°C. For competition experiments, lysates were preincubated with competitor oligonucleotides for 15 min. Dynabeads were later washed and bound proteins were separated by SDS-PAGE and detected by immunoblotting (36). Sequences for biotinylated probes are listed in Table S6.

EMSA
Alexa 633-labeled probes were purchased from Life Technologies. Probe sequences are listed in Table S6. Nuclear extracts were prepared from 293T cells transfected with expression plasmids for FLAG-Smad3, FLAG-Smad4, and CA-T␤RI as described previously (38). Nuclear extracts (40 g protein) were incubated with probes (4 pmol) for 30 min on ice. Complexes were then resolved on a 4.5% polyacrylamide gel and analyzed by ImageQuant LAS 4000 (GE Healthcare). For supershift assay, antibodies (0.8 g) and nuclear extracts were preincubated for 1 h on ice.

ChIP-sequencing
Chromatin isolation, sonication, and immunoprecipitation using antibody against Smad2/3 were performed as described previously (13). NMuMG cells were stimulated with TGF-␤ (1 ng/ml) for 1.5 h in the presence of 10% FBS and harvested. High-throughput sequencing of the ChIP fragments was performed using Illumina Genome Analyzer IIx (Illumina). Unfiltered 36-bp sequence reads were aligned against the mouse reference genome (mm9) using Bowtie2. Peaks were called using CisGenome version 2 (two-sample analysis with an FDR cutoff of 0.05) (39). For SBE and CAGA/CAGA-like motif mapping, 1,000-bp length genomic sequences flanking the middle position of the Smad2/3-binding sites were obtained using Cis-Genome. Raw ChIP-seq and peak-called data are available at the Gene Expression Omnibus (GSE121254).

Quantitative RT-PCR
Total RNA was extracted using ISOGEN (NIPPON GENE). First-strand cDNA synthesis was performed using PrimeScript 1st Strand cDNA Synthesis Kit and random hexamers (Takara Bio). Quantitative RT-PCR was performed using the StepOne Plus Real-Time PCR System and Power SYBR Green Master Mix (Thermo Fisher Scientific). Mouse Gapdh was used for normalization. Primer sequences are shown in Table S7.