Structural Analysis of the Cancer-specific Promoter in Mesothelin and in Other Genes Overexpressed in Cancers*

Mesothelin (MSLN) may be the most “dramatic” of the tumor markers, being strongly overexpressed in nearly one-third of human malignancies. The biochemical cause is unclear. We previously ascribed this cancer-specific overexpression to an element, Canscript, residing around 50 bp 5′ of the transcription start site in cancer (Hucl, T., Brody, J. R., Gallmeier, E., Iacobuzio-Donahue, C. A., Farrance, I. K., and Kern, S. E. (2007) Cancer Res. 67, 9055–9065). Herein, we found a Canscript promoter activity elevated over 100-fold in cancer cells. In addition to a highly conserved TEAD1 (TEA domain family member 1)-binding MCAT motif, nucleotide substitution revealed the consensus core sequence (WCYCCACCC) of an SP1-like motif in Canscript. The unknown transcription factor binding to the SP1-like motif may hold the key for the cancer specificity of Canscript. SP1, GLI1, and RUNX1, -2, and -3 appeared unlikely to be the direct transcription factors acting at the SP1-like motif, but KLF6 had some features of such a candidate. YAP1, a TEAD1-binding protein, appeared necessary, but not sufficient, for Canscript activity; knockdown of YAP1 by small interfering RNAs greatly reduced MSLN levels in MSLN-overexpressing cells, but overexpressing YAP1 in MSLN-negative cells did not induce MSLN expression. Cansript-like sequences were found in other genes up-regulated in pancreatic cancer; reporters driven by the sequences from FXYD3, MUC1, and TIMP1 had activities more than 2 times that of the control. This suggested that the cause of MSLN overexpression might also contribute mechanistically to the overexpression of other tumor markers.

Tumor-overexpressed and tumor-specific biomarkers could serve as sensitive diagnosis tools and specific targets for therapeutic purposes. Deciphering the differential transcription regulation of tumor markers in cancers may further reveal the pathways that drive the origin and malignancy of cancers. There were a few advanced attempts to discover cancer-related transcription factors by studying promoters of tumor markers. For example, RREB-1 (Ras-responsive transcriptional elementbinding protein) was proposed as an up-regulator of calcitonin in thyroid cancer cells (2). Transcriptional studies of some widely applied tumor markers, such as AFP (3) and CEA (4), were stalled at the promoter-identification stages, whereas for most other markers, such as CA125 and CA19-9, exact promoters were not revealed.
As a cancer cell membrane marker having very limited expression in normal tissues, MSLN is an ideal target for therapeutic potential. For example, an anti-MSLN monoclonal antibody was covalently linked to Pseudomonas exotoxin A to form the toxic antibody SS1P (13), which was recently explored in two phase I clinical trials (14,15).
Initial studies of the MSLN promoter were inconclusive (16). An exon/intron map of the MSLN gene was created by aligning a single clone of MSLN cDNA (HS335H7) with fragments of genomic sequence. An arbitrarily chosen 1850-bp genomic DNA fragment (MP1850) 5Ј of the predicted exon 1 (Fig. 1A) was studied in the JU77 mesothelioma cell line. To further define MSLN-specific promoter and enhancer elements, we reported the differing control sequences active in (epithelial) carcinomas (1).
In carcinomas, exon1 (1) was much longer than predicted (16), providing evidence that MSLN utilized different promoters or TSSs 2 in cancers. We reported a 20-bp sequence (Canscript, TCTCCACCCACACATTCCTG) that appeared responsible for MSLN expression in certain cancer cells (1) (according to Fig. 4A of Hucl T. et al. (1), the text description of the Canscript sequence on page 9059 should be Ϫ65 to Ϫ46). The reporter with the full promoter region containing the Canscript sequence (pGL3-B67; Fig. 1B) was 20 times as active as the matched reporter lacking Canscript (pGL3-B-48; Fig. 1B) in the MSLN-overexpressing AsPc1 cell line (supplemental Figs. S1 and 2A) (1), but both reporters had similar activities in the MSLN-negative RKO cancer cell line and the 211H mesothelioma cell line ( Fig. 2A). Canscript contains two motifs: an SP1like element (TCTCCACCC) and an MCAT element (ACATTCCT) separated by a linker (AC) (Fig. 1A). The MCAT element of Canscript matches the conventional MCAT sequence (CATTCCT) in humans to chickens (17). We prefer to specify an extra 5Ј A in the MCAT element because this A is part of MCAT in promoters regulating many other genes (18,19). We reported that the Canscript MCAT element (Fig. 1A) specifically recruited the TEAD1 transcription factor but not TEAD2 and TEAD3 in cancer cells (1). Because TEAD1 is ubiquitously expressed in almost all of the tissues and cell lines (supplemental Fig. S1), the MCAT element is not suspected to provide the cancer specificity of Canscript.
Possibly, a TEAD1-binding cofactor could provide this specificity. Among known TEAD1-binding proteins, YAP1 (supplemental Fig. S1), an oncogenic transcription cofactor (20), was a candidate. YAP1 is phosphorylated and inactivated by the tumor-suppressive Hippo pathway. Overexpression of YAP1 or its Drosophila homologous Yki has been shown to result in tumor-like overgrowth of tissues in the mouse or fruit fly (21).
Alternately, a transcription factor binding to the SP1-like element may provide the cancer specificity. SP1 itself was not a strong candidate; for the central "G" of the SP1 consensus binding sequence (CCCCGCCC) (22) does not match the A in the Canscript sequence (differences are underlined). SP1 belongs to the Krüppel-like family, which contains 21 DNA-binding transcription factors (23). All factors from this family bind to GC-rich DNA elements by three C 2 H 2 zinc finger domains. Among KLF family members, the KLF6/CPBP (core promoterbinding protein) binding sequence (CCCACCCA) (24), and that of hedgehog pathway effector GLI1, an oncoprotein (GAC-CACCCA) (25), were most similar to the SP1-like element of Canscript. The confined SP1-like element and linker region (CACCCACA) also shared similarities with the RUNX protein family binding consensus sequence (AACCACA) (26). RUNX2, also, is a YAP1-binding protein (27).
There exists only one perfect copy of the Canscript sequence in the human genome, but similar sequences were potentially functional by which to drive the expression of other tumor markers. We explored the possibilities with structure-function studies of such Canscript-like sequences.

EXPERIMENTAL PROCEDURES
Cell Lines and Cell Culture-AsPc1, HeLa, RKO, HEK293, and MSTO-211H cell lines were obtained from the American Type Culture Collection. Cells were cultured in RPMI (AsPc1 only) or DMEM medium, supplemented with 10% FBS and 1% penicillin/streptomycin. 5Ј-Rapid Amplification of cDNA Ends (5Ј-RACE)-Total RNA was extracted from AsPc1 and MSTO-211H cells using TRIzol (Invitrogen, 15596-018), followed by 5Ј-RACE according to the SMARTer RACE protocol (Clontech, 634923). Briefly, first-strand cDNAs were synthesized by a templateswitching extension that attached an adapter sequence at the 5Ј-end of the RNA template. The cDNA was amplified by PCR with 3Ј gene-specific and 5Ј adapter-specific primers. The MSLN-specific primer (CCCCAACAGGGGTCGAGCCGT-TGGCAAG) was located within exon 2. The PCR product was gel-purified and cloned into the TOPO TA vector (Invitrogen, K4500-01), followed by sequencing. The PCR primers for validating the transcript variants were as follows: reverse primer, CGAGGCTGAAGAGCAGGAA; forward primer 1, TGTTC-CCTTTGACGGCCCG; forward primer 2, AGGGCTCA-GTGGCTGGAGG.
Transfection-Transient transfections were done by using Lipofectamine (Invitrogen, 18324-012) using the manufacturer's instructions. We seeded 2-4 ϫ 10 5 cells in each well of a 6-well plate 1 day before transfection. pGL3-Basic luciferase vector (0.2 g) and pRL-SV40 Renilla vector (20 ng) were transfected in each well. The transfection medium was replaced by growth medium after 6 h.
Luciferase Reporter Assay-We used the dual luciferase reporter assay system (Promega, E1910). Cells were harvested 24 h after transfection. The cells were lysed by passive lysis buffer (300 l/well). Each sample was measured in duplicate wells using 20 l of cell lysate mixed with luciferase reagent II (LARII; 100 l/well). After recording the luciferase luminescent signals, Stop and Glo reagent was added (100 l/well), followed by the Renilla luminescence reading. Luminescence signals were detected by a PerkinElmer Microbeta Trilux plate reader. All luciferase readings were normalized using the Renilla readings of the same well. The average relative luciferase activity (RLA) was obtained from the duplicate wells. The difference between duplicate wells usually was less than 5%. These averages were used to construct comprehensive surveys of serial structural modifications to the tested sequence. Each experiment was performed twice, on different days. Each luciferase survey presented was representative of the two independent experiments. Results from two independent experiments were similar, but transfection efficiency and the scale of results differed between replicates such that averaging the replicates was precluded. Due to the multiple comparisons, differences in scale, and limitation of only two replicates, the presentation of error bars would not be a valid test of confidence and is omitted.
siRNA Sequences and Transfection-The siRNAs were synthesized by Ambion. An irrelevant siRNA (FANCD2, Irr) was systematically used as a negative control throughout this paper. Except for the previously studied TEAD1 and TEAD2 genes (1), three non-overlapping siRNAs were designed for each target gene. An immunoblot was used to verify siRNA efficiencies.
Apoptosis Analysis-The PE Annexin V apoptosis detection kit was used (BD Pharmingen, 559763). Briefly, cells were washed in cold PBS twice, followed by resuspension in 1ϫ binding buffer at a density of 1 ϫ 10 6 /ml. Cells (in 100 l) were mixed with Annexin V-PE (5 l) and aminoactinomycin D (5 l) and incubated in the dark for 15 min at room temperature before dilution (1ϫ binding buffer, 400 l). Cells were analyzed by a FACScan TM (BD Biosciences) flow cytometer.
Cell Cycle Analysis-Cells (5 ϫ 10 6 ) were washed in PBS once and resuspended in 0.5 ml of PBS. Cells were mixed with 4.5 ml of ice-cold 70% ethanol for 2 h at Ϫ20°C. Cells were washed in 5 ml of PBS again and resuspended in 1 ml of stain (0.1% Triton X-100, 0.2 mg/ml RNase A, and 20 g/ml propidium iodide in PBS) for 30 min at room temperature. Cells were analyzed by a FACScan TM flow cytometer.
EMSAs-Single-stranded 5Ј-biotinylated Canscript oligonucleotides, CGGGGTCTCCACCCACACATTCCTGGGGCG, and the complementary sequence (from Integrated DNA Technology) were annealed. AsPc1 cells were lysed in hypotonic buffer (10 mM Tris, pH 8.0, 10 mM KCl, 1.5 mM MgCl 2 , 0.5 mM DTT, 0.75% Nonidet P-40 with protease inhibitors), followed by brief vortex and centrifugation at 1500 ϫ g for 30 s. The pellet was resuspended in hypertonic buffer (20 mM Tris, pH 8.0, 400 mM NaCl, 1.5 mM MgCl 2 , 0.2 mM EDTA, 10% glycerol with protease inhibitors) followed by 15 min of vigorous vortexing. The lysate was clarified at 13,000 ϫ g for 10 min, and the supernatant was collected as the nuclear extract. Annealed biotinylated DNA oligonucleotides (final concentration of 20 fM) were incubated with 10 g of nuclear extract in 1ϫ complete binding buffer (Pierce, 20148) for 20 min. The DNA-protein complex was resolved on a 5% polyacrylamide nondenaturing gel in 0.5ϫ TBE buffer. The complex was transferred to nitrocellulose membrane (Pierce, 88018) in 0.5ϫ TBE buffer at 380 V for 1 h. The membrane was incubated with HRP-streptavidin in blocking buffer followed by four washes according to the manufacturer's instructions (LightShift Chemiluminescent EMSA kit (Pierce), 20148). The membrane was soaked in ECL substrate for 3 min before film exposure.
ChIP Assay-Cells were pretreated with 1% formaldehyde for 10 min before harvesting for nucleic extraction as described for the EMSA. Properly sonicated aliquots (500 l) were incubated with 1 g of mouse normal IgG, anti-TEAD1, anti-YAP1, and anti-KLF6 antibodies (the antibodies used in immunoblots), respectively, for 2 h, followed by the addition of 50% salmon sperm DNA/protein A-agarose slurry (40 l). The agarose beads were sedimented after a 1-h incubation followed by serial washes using low salt, high salt, LiCl, and TE buffers (28). Beads were stripped by elution buffer (1% SDS and 0.1 M NaHCO 3 ; 250 l). DNA-protein cross-links were reversed by 200 mM NaCl at 65°C for 4 h, followed by protease K treatment. Digestion products were purified (QIAquick PCR kit, Qiagen) and amplified by PCR. Primers (GCAGCTTTGCCTTCCTGG and TCCTCTGCCTCGGTTTCC) flanking the native Canscript sequence resulted in a 216-bp band as a major amplification product.

Detection of TSSs in Contrasting Cell
Lines by 5Ј-RACE-Because the MSLN gene might utilize different promoters under tissue-specific and cancer-specific conditions, we adopted 5Ј-RACE to explore the TSSs of the MSLN gene in tissue-informative cells (211H mesothelial/mesothelioma cells) and cancer-informative cells (AsPc1 pancreatic cancer cells). Using reverse primers located in exon 2, we confirmed the transcription start region in AsPc1 cells described in our report (1) and identified an alternate minor TSS at ϩ1770 (Fig. 1A) having an alternate exon 1. In 211H cells, only the TSS at ϩ1770 could be detected, supporting exclusive use of the more 5Ј promoter in mesothelium. Both two-variant splicing forms were confirmed by RT-PCR (supplemental Fig. S8). Sequencing results showed adherence to classical splicing rules.
presumptive minimal promoter activities, were stripped away. The activity of this pGL3-Canscript was reduced (data not shown) but was restored by using triplicate Canscript tandem copies (pGL3-Can3; Fig. 1B). We then surveyed the relative strength and position dependence of the element. pGL3-Can3 was over 100 times as active as the pGL3-Basic vector in AsPc1 cells (Fig. 2, A and B). The activity of pGL3-Can3 was much higher than pGL3-SV40 vector and comparable with the MSLN full promoter. Moreover, reversing the orientation of the Can3 sequence left its activity unchanged (Figs. 1B and 2C). Can3 was thus sufficient to drive luciferase expression in the promoterless pGL3-Basic vector (Figs. 1B and 2B). On the contrary, relocation of Can3 to 3Ј of the luciferase gene in pGL3-Basic vector eliminated almost all of the relative luciferase activity (Figs. 1B and 2C), suggesting that nearly all of its activity was as a promoter rather than as an enhancer.
To rigorously define the required sequence in Canscript, a nucleotide transitional substitution survey was conducted of the MCAT motif in pGL3-Can3. The results showed that all eight nucleotides (nucleotides [12][13][14][15][16][17][18][19] were essential (Fig. 3A). Any transition mutation in this area reduced the Can3 activity by more than 80%. As a control, the transitional A substitution of the G after the MCAT motif did not affect Can3 activity.
Similarly, all three nucleotide substitutions were surveyed at each position in the SP1-like region. Some substitutions eliminated almost all of the Can3 activity (Fig. 3, B and C). The consensus sequence of the SP1-like element generated by the nucleotide substitution survey revealed an 8-nt minimal core WCYCCACCC (Fig. 3C). This sequence contains a binding core (CACCC) for some members in the KLF protein family (23).
We next studied the two-nucleotide (AC) linker between the SP1-like and MCAT motifs (Fig. 1A). Upon surveying a series of nucleotide substitutions and expansions of this linker, Can3 activity was retained in five of six substitutions (Fig. 4A), whereas the AC to GC substitution removed most Can3 activity. On the other hand, the expansion of this linker (AC) to six nucleotides (AGTGTC) or deletion of these two nucleotides did not affect the Can3-luc activity (Fig. 4B). When we further expanded the linker by inserting a transcriptionally inactive sequence (from the hygromycin coding sequence) between the A and the C (Fig. 1B), Can3 retained some activity even using Other than the last G (position 20), all of the transition mutants reduced RLA by Ͼ80%. B, multisubstitution analysis in the SP1-like element (from the 1st to 9th nucleotide). Only the mutation from T to C at position 3 failed to reduce RLA. C, consensus sequence of the SP1-like element. The size of each nucleotide represents the RLA from its use as a substitution at that position. All of the mutants having an RLA less than 10% of the wild type were neglected. the 20-nt insertion (Fig. 4C). In contrast, when we interchanged the positions of the SP1-like and MCAT motifs around the linker, the reporter activity was reduced significantly (Fig. 4D).
Canscript-like Sequences in Other Genes Up-regulated in Pancreatic Cancer-"Stretching" of the linker length between SP1-like and MCAT elements retained certain Canscript activity, and stretched "Canscript-like" sequences might exist in the promoter regions of other genes. We searched for such Canscript-like sequences by screening upstream sequences (5 kb upstream of each ATG start codon) of the top 52 overexpressed pancreatic neoplasia-associated genes from an unbiased metaanalysis (29). The MCAT sequence "CATTCCT" was required in the search. Only a one-nucleotide mismatch or insertion/ deletion was permitted in the SP1-like region in accordance with the Canscript consensus sequence (only the nucleotide substitutions showed in Fig. 3C were allowed). We discovered seven additional sequences (in six genes; Table 1) having linker lengths of less than 40 bp. We then replaced the Canscript sequence with these fragments in the context of the MSLN promoter region (pGL3-Canscript-like vector; Fig. 1B) and tested their reporter activities in AsPc1, HeLa, RKO, and HEK293 cells. The pGL3-Canscript-like vectors incorporating sequences from FXYD3, MUC1, and TIMP1 had activities at least twice that of the matched empty vector in all four cell lines (Fig. 5A). To further validate this observation, we selected the four strongest sequences by performing seven independent transfections in AsPc1 cells. Each test sequence was compared with the Canscript-like control plasmid. The significant level with a conservative Bonferroni correction for four comparisons was 0.0125 (0.05/4). The levels of four sequences were greater than the control with a p value of 0.0078 (from sign and binomial test, statistically significant at the 0.0125 level). MUC1 was the most active, being more than 4-fold higher in AsPc1 cells (Fig. 5B). As a control, we also screened the upstream sequence (5 kb upstream of the ATG start codon) of 55 random genes and discovered eight stretched Canscript-like sequences (in five genes; supplemental Table S1) having linker lengths less than 40 bp. Thus, Canscript-like sequences did not appear with higher prevalence in genes up-regulated in pancreatic cancer, but some of these Canscript-like sequences were active.
YAP1 Is Required but Not Sufficient for Canscript Activity-To test YAP1 involvement in regulating Canscript activity, we first confirmed that YAP1 protein was associated with this TEAD1-MCAT complex in the MSLN promoter region by a ChIP assay (supplemental Fig. S7). Next we knocked down YAP1 by siRNAs in HeLa cells. As with the TEAD1 knockdown control, YAP1 knockdown dramatically reduced the endogenous MSLN expression level (Fig. 6A). Knockdown of YAP1 also reduced Can3 reporter activity (Fig. 6B), whereas the activity of control pGL3-SV40 was not affected (supplemental Fig.  S2A). On the other hand, overexpression of ectopic YAP1 in RKO and 293 cells neither increased Can3-Luc activity (Fig. 6, C and D) nor boosted MSLN expression in MSLN-negative cell lines (data not shown). It was reported that unphosphorylated  (5,10,15,20) of nucleotides taken from the hygromycin coding sequence were inserted within the AC linker in pGL3-67-94 (Fig. 1B). The 20-nt spacer itself (pGL3-20; Fig. 1B) did not have independent transcriptional activity. "active" YAP1 in the nucleus was more important than overall YAP1 protein levels (30). To address this, we overexpressed the constitutively active YAP1 S127A mutant (30) in the same cell lines and again did not observe up-regulation of MSLN expression (data not shown). We also noticed similar ratios of phosphorylated to total YAP1 protein in the lysate of untransfected MSLN ϩ and MSLN Ϫ cells (data not shown), suggesting that YAP1 may be similarly activated in these cells.
We observed that the cell numbers were reduced when HeLa cells were subjected to YAP1 knockdown (as compared with control siRNAs; data not shown). Cell cycle analysis failed to reveal a significant shift of the cell cycle profile in YAP1 siRNA knockdown cells (supplemental Fig. S2B). Instead, an apoptosis assay confirmed the antiapoptotic function of YAP1 (Fig. 6E) (21) because the apoptotic cell population greatly increased when YAP1 was knocked down. In contrast, cell number changes did not occur upon YAP1 overexpression in RKO or 293 cells.
Taz, a YAP1-homologous protein in mammals (31), is a candidate oncoprotein (32). We detected weak expression of Taz only in AsPc1 cells, but not in HeLa, RKO, and HEK293 cells (supplemental Fig. S1). Taz thus did not appear to be required for Canscript activity in HeLa cells. Its role in AsPc1 cells remains unexplored.
The SP1-like Motif Is Not an SP1 Response Element-The SP1-like element was thought to hold the key for the cancer specificity of Canscript (1). Attracted by its name, we tested and excluded SP1 as a candidate. There is one nucleotide difference between conventional SP1 binding site (CCCCGCCC) (22) and our SP1-like consensus sequence (CYCCACCC), but the nucleotide substitution pattern did not support the SP1 as a target; the A 3 G switch almost eliminated the Can3-luc activity (Fig.  3B). Sp1 was ubiquitously expressed in almost all of the cell lines (supplemental Fig. S1). MSLN protein expression was minimally affected by SP1 knockdown in HeLa cells (supplemental Fig. S3).
Interestingly, an SP1-binding sequence (TCTCCGCCC) in the SV40 promoter region of the pGL3-SV40 (33) was known as pivotal for SV40 promoter activity. This made it impossible to study the SP1 effects on Canscript activity in the context of the  (Fig. 1B). The RLA of pGL3-Canscript-like empty control vector (Ϫ45 to ϩ11 in the MSLN promoter region) was similar to pGL3-Basic vector. B, four up-regulated sequences from A were further tested in AsPc1 cells by seven independent transfections in AsPc1 cells. The RLA of pGL3-Canscript-like empty vector was defined as 1. The four sequences showed higher activities than control vectors in all of seven experiments. The p value from the sign test is 0.0078 for each gene (uncorrected) and 0.031 overall (Bonferroni-corrected for multiple comparisons).

TABLE 1 The seven stretched "Canscript-like" sequences from six genes
The conserved nucleotides in Canscript-like sequences are underlined.
pGL3-SV40 backbone (1). For the same reason, the SV40 promoter-driven pRL-SV40 Renilla vector could not be used as an internal control for normalization of transfection during the SP1 knockdown assay. KLF6 May Be Required but Not Sufficient to Augment Canscript Activity-Among the available DNA binding reports concerning the Krüppel-like family, the KLF6 consensus binding sequence (CCCACCCA) optimally matched the SP1-like element. We found that the expression level of endogenous KLF6 protein was consistent with the MSLN expression pattern in different cell lines (supplemental Fig. S1). Knockdown of KLF6 (KLF6 si-2 and si-3) reduced the endogenous MSLN expression in HeLa cells (Fig. 7A). pGL3-Can3 reporter activity could also be reduced by knockdown of KLF6, and the combination of TEAD1 and KLF6 knockdown could not achieve further suppression (Fig. 7B). Overexpression of HA-tagged wildtype KLF6 in HEK293 cells (Fig. 7C) and in RKO cells, however, did not turn on endogenous MSLN expression (data not shown). Overexpression of HA-tagged KLF6 increased pGL3-Can3 activity at most slightly, as did the coexpression of both YAP1 and KLF6 (Fig. 7D). We tested whether KLF6 could directly bind to Canscript in vitro by EMSA. The biotin-labeled Canscript-containing oligonucleotide identified a DNA-protein complex (supplemental Fig. S4). A portion of this complex was supershifted by the positive control anti-TEAD1 antibody but not by the anti-KLF6 antibody (supplemental Fig. S4). ChIP assay also failed to detect a KLF6-Canscript complex in the MSLN promoter region when using the same anti-KLF6 antibody (supplemental Fig. S7).
GLI1 Is Not a Strong Candidate-We also investigated several oncogenic transcription factors. Among them, the GLI1binding sequence (GACCACCCA) was similar to the SP1-like element, although the first G in the GLI1-binding sequence was not tolerated in our SP1-like consensus sequence (Fig. 3, B and C). Our immunoblot failed to detect the full-length GLI1 protein in AsPc1, HeLa, RKO, and HEK 293 cells (data not shown).
RUNX Family Genes Were Not Sufficient for Canscript Activity-Chimeric oncoprotein RUNX1 belongs to the RUNX protein family, whose binding consensus sequence (AAC-CACA) overlaps with the SP1-like element and the AC linker, although the unmatched A was not tolerated in the SP1-like consensus sequence (Fig. 3, B and C). Moreover, the RUNX1 expression level was consistent with the MSLN expression pattern in different cell lines (supplemental Figs. S1 and S5A), but overexpression of each gene (RUNX1, -2, and -3) in MSLN- negative cell lines (Fig. S5B) did not turn on endogenous MSLN expression (data not shown). Overexpression of RUNX proteins only had a marginal enhancement on Can3 activity (Fig.  S5, B and C). These marginal effects were possibly contributed by the backbone of the pGL3-Basic vector (Fig. S5D) because three consensus RUNX-binding sequences (AACCACA) exist in the pGL3-Basic vector.

DISCUSSION
We found that the TSS in 211H mesothelioma cells was located 1380 bp downstream of the TSS reported in AsPc1 cells (1). The shorter transcript may be driven by a tissuespecific promoter. The upstream TSS in AsPc1 cells may be cancer-specific.
Canscript resembled more a promoter than an enhancer. The Canscript sequence, being located Ϫ65 to Ϫ46 bp 5Ј of the upstream TSS in AsPc1 cells, is consistent with a pattern of promoter regulation. Reversing the orientation did not change the transcription activity, whereas relocation of Canscript resulted in loss of its activity (Fig. 2C). Human promoters, especially highly active promoters, are often bidirectional (34,35), as is Canscript. This is consistent with a report in which a triplicated shorter 18-bp Canscript (from Ϫ62 to Ϫ45, CCACCCA-CACATTCCTGG) had a promoter-like activity 7 times higher than the pGL4-Basic vector (36). The 18-bp Canscript had much less activity than the 20-bp Canscript because the first three nucleotides, TCT, were missing (supplemental Fig. S6). The cancer specificity (Fig. 2, A and B) and the promoter-like activity of Canscript imply that this 20-nt sequence may regulate MSLN transcription in many cancer cell lines.
Our nucleotide transition screen within the MCAT element confirmed that all eight nucleotides were functional, as in the consensus MCAT sequence (Fig. 3A). Yet, an unidentified transcription factor binding to the SP1-like element may be responsible for the cancer specificity of Canscript. For example, the flanking sequence of the MCAT motif played a critical role for selective expression of cardiac troponin T in striated muscles (19), and modification of its 5Ј-flanking MyoD-like binding site eliminated its tissue specificity.
To identify this unknown transcription factor, we examined KLF6, whose consensus DNA-binding sequence was very similar to the SP1-like element. The expression pattern of endogenous KLF6 was consistent with the MSLN expression pattern in various cell lines. Knocking down the expression of KLF6 in HeLa cells reduced the MSLN expression level (Fig. 7A). However, overexpression of KLF6 protein did not turn on MSLN expression in 293 and RKO cells, and we failed to detect the direct binding of KLF6 to the Canscript sequence by EMSA (supplemental Fig. S4). Based on the mixed evidence, KLF6 remains a candidate, but its possible role in MSLN regulation is not clearly established. GLI1 and RUNX family members, however, are not likely to be involved in MSLN transcription regulation.
Because the MCAT-binding TEAD1 cofactor could in theory contribute to cancer specificity if it were activated by a cancerspecific signal, such as from the Hippo-YAP1 pathway, we investigated the role of YAP1 in regulating Canscript activity. Knocking down YAP1 expression in HeLa cells dramatically reduced endogenous MSLN expression and suppressed most of the Canscript reporter activity. Overexpression of wild-type YAP1 or its constitutively active mutant in RKO and HEK293 cells did not turn on MSLN expression, indicating that YAP1 may be necessary but not sufficient for MSLN overexpression in certain cancers. We did not find that the cancer specificity of Canscript was explained by YAP1.
The AC linker between SP1-like and MCAT elements is very flexible. Most nucleotide substitutions were tolerated (Fig. 4A). The linker could be deleted without affecting the pGL3-Can3 activity (Fig. 4B) and could also be elongated to 22 nucleotides while retaining significant (Ͼ20%) Can3 activity. Switching the order of the MCAT and the SP1-like elements around the AC linker, however, suppressed most of the Can3 activity. This indicates a directional alignment that may be required for SP1like and MCAT elements to recruit and assemble respective transcription factors.
Reporters with stretched Canscript-like sequences from the 5Ј region of FXYD3, MUC1, and TIMP1 genes displayed increased transcription activity. Canscript-like motifs may contribute to the overexpression of FXYD3, MUC1, and TIMP1 genes in pancreatic cancer and perhaps to the overexpressed "marker" genes in other cancer types as well.