A Distinct Sequence (ATAAA) n Separates Methylated and Unmethylated Domains at the 5′-End of theGSTP1 CpG Island*210

What defines the boundaries between methylated and unmethylated domains in the genome is unclear. In this study we used bisulfite genomic sequencing to map the boundaries of methylation that flank the 5′- and 3′-ends of the CpG island spanning the promoter region of the glutathione S-transferase (GSTP1) gene. We show that GSTP1 is expressed in a wide range of tissues including brain, lung, skeletal muscle, spleen, pancreas, bone marrow, prostate, heart, and blood and that this expression is associated with the CpG island being unmethylated. In these normal tissues a marked boundary was found to separate the methylated and unmethylated regions of the gene at the 5′-flank of the CpG island, and this boundary correlated with an (ATAAA)19–24 repeated sequence. In contrast, the 3′-end of the CpG island was not marked by a sharp transition in methylation but by a gradual change in methylation density over about 500 base pairs. In normal tissue the sequences on either side of the 5′-boundary appear to lie in separate domains in which CpG methylation is independently controlled. These separate methylation domains are lost in all prostate cancer whereGSTP1 expression is silenced and methylation extends throughout the island and spans across both the 5′- and 3′-boundary regions.

Almost all organisms with a genome of greater than 10 9 base pairs methylate their DNA, generally at the 5Ј-position of cytosine in CpG dinucleotides. In vertebrate genomes CpG sites occur about one-fifth of that expected on the basis of base composition. This has been attributed to the inherent mutability of 5-methyl cytosine to thymine (1). However, CpG-rich clusters, termed CpG islands, are found interspersed in large regions of the CpG-depleted sequence (2). The maintenance of these CpG islands is indicative of a functional role, and it is now well established that they commonly span gene regulatory or gene promoter regions (3). It has been suggested that CpG islands facilitate the expression of housekeeping genes by influencing nucleosomal positioning and that conditions that alter the formation of this array, such as methylation, may indirectly affect CpG island-dependent gene expression (4). Unlike CpG sites in the remainder of the genome, CpG islands are nearly always maintained in an unmethylated state (5). Methylation of CpG islands can occur, however, for example on the inactive X chromosome, in promoters of imprinted genes and in aberrant methylation associated with oncogenesis (6). In all of these cases methylation of CpG islands spanning the gene promoter regions is strongly associated with transcriptional silencing (7).
As yet there is no clear understanding of what permits CpG islands to be preferentially unmethylated, what defines the boundary of the unmethylated region, and what leads to the aberrant methylation of CpG islands commonly seen in cancer. It has been suggested that maintenance of the unmethylated CpG islands is dependent on continued active transcription and/or on the binding of specific proteins that may protect them from methylation (8). Analysis of the CpG island of the adenine phosphoribosyl transferase gene in transgenic mice, using mutated promoters, has shown that Sp1 sites are important for maintaining this CpG island in its unmethylated state (9,10). Thus loss of specific proteins or interference with their binding or loss of active transcription may all contribute to CpG island promoter methylation in cancer.
The GSTP1 gene that encodes glutathione S-transferase-(GST-) 1 contains a typical CpG island promoter region and is widely expressed in most tissues, with the exception of liver (11). It has recently been demonstrated that GSTP1 becomes methylated in a high proportion of prostate cancers and that this methylation is accompanied by gene silencing (12,13,14). Detailed bisulfite sequencing analysis of the CpG island spanning the core promoter region of the GSTP1 gene showed that methylation is extensive at essentially all CpG sites in prostate cancer DNA (14). In contrast, the CpG island is completely unmethylated in normal prostate tissue. However methylation was found to occur outside the CpG island in the CpG-depleted 3Ј-region of the GSTP1 gene in both normal and cancer tissue (14).
In this paper we have extended the bisulfite sequence analysis of the methylation profile of the GSTP1 CpG island/promoter to include the flanking regions where the normal and prostate cancer methylation profiles diverge. This has allowed identification of a defined "boundary" region of methylation at the 5Ј-flank, which is marked by a repeating ATAAA sequence.

MATERIALS AND METHODS
DNA Samples-DNA from three primary prostate cancer samples and corresponding normal DNA from the diseased prostate were isolated as follows. Tissue samples were isolated from patients undergoing radical prostatectomy for prostate cancer. Tissue slices, about 4-mm thick, were snap frozen in liquid nitrogen, and histology was performed on adjacent slices to identify regions of tumor and normal tissue. Tissue samples were isolated from the frozen slice using a punch and ground into a powder under liquid nitrogen using a mortar and pestle. DNA was isolated using Trizol TM reagent (Life Technologies, Inc.) according to the manufacturer's protocols. DNA was further treated with RNase A and then proteinase K before phenol extraction and ethanol precipitation. DNA from the prostate tumor cell lines LNCaP and PC3 was prepared as described previously (14). DNA from the following normal tissues were obtained at autopsy from a 74-year-old male: brain (cerebellum), lung, skeletal muscle, spleen, liver, pancreas, prostate, heart, bone marrow, and blood. All tissues were examined by the pathologist and deemed disease-free.
Bisulfite Conversion-Sodium bisulfite converts cytosine residues to uracil residues in single-stranded DNA under conditions whereby 5-methylcytosine remains nonreactive. All cytosine residues remaining in the target sequence after PCR amplification represent previously methylated cytosines. The bisulfite reaction was carried out on 1-2 g of HindIII-digested patient DNA for 16 h at 55°C under conditions described by Clark et al. (15) and Clark and Frommer (16). The samples were purified using Wizard DNA Clean-Up System desalting columns (Promega), eluted in 50 l of H 2 O, and incubated with 5 l of 3 M NaOH for 15 min at 37°C. The solutions were neutralized by the addition of NH 4 OAc, pH 7, to 3 M, and the DNA was ethanol precipitated, dried, resuspended in 10 l of 10 mM Tris-HCl (pH 8), 0.1 mM EDTA in the case of tissue samples and 50 l for cell line DNA and stored at Ϫ20°C. PCR amplification reactions were performed within 24 h of bisulfite conversion.
PCR Amplification and Primers-PCR amplifications were performed in 50-l reaction mixtures containing 2 l of bisulfite-treated genomic DNA, 200 M of each of the four dNTPs, 6 ng/l of each of the primers, 1-2 mM MgCl 2 , 2 units of AmpliTaq DNA polymerase (Perkin-Elmer), and reaction buffer consisting of 67 mM Tris, 16.6 mM ammonium sulfate, 1.7 mg/ml bovine serum albumin, and 10 mM ␤-mercaptoethanol in buffer (10 mM Tris-HCl (pH 8.0), 0.1 mM EDTA). The strand-specific nested primers used for amplification of bisulfitetreated DNA are indicated in Table I.
Sequence Analysis-The PCR fragments (I-IV) were cloned using the pCR-Script TM Amp SK(ϩ) cloning kit (Stratagene) according to the manufacturers instructions. Individual clones were either sequenced manually using Sequenase version 2.0 DNA sequencing kit (USB) or using PRISM TM DyeDeoxy Terminator Cycle Sequencing Kit (PE/ABI) with AmpliTaq DNA polymerase and the automated 373A DNA Sequencer (ABI).
For automated direct PCR sequencing and quantitative Genescan analysis, PCR products were reamplified using a Biotinylated/M13tailed primer mixture containing GST11-M13 and GST12 primers (see Table I). The direct PCR sequencing reactions were performed using a PRISM Sequenase Dye Primer Sequencing Kit (PE/ABI) on an auto-mated 373A DNA Sequencer (ABI). Details of the Genescan analysis are described by Millar et al. (14). The percent methylation is calculated using peak height of C versus peak height of C plus peak height of T for each position.
Northern Blot Analysis-Multiple tissue Northern blots I and II were purchased from CLONTECH and probed with a PCR fragment generated (see Table I) from exon 7 of the GSTP1 transcript according to the manufacturer's instructions. The blots were then stripped and reprobed with the ␤-actin internal probe supplied to determine the loading levels of each mRNA species. Blots were exposed and quantified using a Molecular Dynamics PhosphorImager and ImageQuant software.

RESULTS
Like many "housekeeping" genes the GSTP1 gene contains a typical CpG island that extends from ϳ400 base pairs upstream to 800 base pairs downstream of the transcription initiation site (Fig. 1A). We have previously shown that the core promoter region is unmethylated in normal prostate cells but becomes methylated in prostate cancer. To understand the mechanism responsible for abnormal methylation of this CpG island in prostate cancer tissues we have now: 1) extended the methylation analysis of the GSTP1 gene across the entire CpG island in a range of normal tissues, particularly in relation to the extent of the unmethylated domain and the nature of its boundaries and 2) compared the methylation profile of the GSTP1 gene in these domains to the methylation profile in prostate cancer cells.
Expression and Methylation Profile of the GSTP1 Promoter in Normal Tissues-The GST-protein is known to be widely expressed in most tissues (11), but it is not known if the methylation pattern is equivalent in these tissues. Therefore, we examined the expression and methylation profile of the GSTP1 gene in a number of normal tissues including heart, brain placenta, lung, liver skeletal muscle, kidney, pancreas, spleen, thymus, prostate, testis, ovary, small intestine, colon, and peripheral blood. Fig. 2 shows the tissue distribution of expression of GSTP1 by hybridization to multiple tissue mRNA Northern blots. Extensive expression was seen in almost all tissues, in agreement with data obtained from immunohistochemical analysis and studies of GST enzyme expression (11). However, lowered GSTP1 expression was observed in the liver; this is also consistent with previous data on protein levels (17,18).
We first examined the methylation profile of the CpG-rich promoter and upstream flanking region. This region spanned 66 CpG sites (Ϫ56 CpG to ϩ10 CpG relative to the transcription start site) and was analyzed by PCR amplification of two separate regions (PCRI and -II, Fig. 1B). The methylation profiles were analyzed by either direct PCR sequencing or cloning and sequencing. The core promoter region PCR fragment (PCRII; spanning CpGs Ϫ28 to ϩ10) was analyzed by direct sequencing of the PCR product from each tissue sample. Direct sequencing gives an average of the methylation level at any one CpG site in the mixture of molecules amplified, as described by Millar et al. (14). The 5Ј-upstream PCR fragment (PCRI; spanning CpGs Ϫ56 to Ϫ30) contained a number of polymorphisms and therefore could not be used reliably for direct sequence analysis; consequently levels of methylation for the 5Ј-upstream region were determined by cloning the PCR product and sequencing individual clones. The polymorphisms found in the 5Ј-upstream region include variation in the number and fidelity of (ATAAA) 19 -24 repeats as well as sequence polymorphisms that include CpG site Ϫ48 and Ϫ33 (14). An example of the methylation profile obtained for the clones generated from the PCR fragment amplified from normal prostate DNA is shown in Fig. 3. As can be seen the methylation profile is heterogenous; in fact a heterogenous methylation profile was common to all the normal tissue samples tested. The methylation data generated from the clonal analysis was averaged for each CpG site and as such corresponds to the direct PCR sequencing analysis. Analysis of the extent of methylation at individual CpG sites in the normal tissues, prostate, blood, brain, spleen, smooth muscle, lung, bone marrow, pancreas, and heart showed no methylation at all in the core promoter region (CpGs Ϫ28 to ϩ10) (Fig. 4B). The exception was in normal liver DNA where significant methylation was seen in a cluster of sites, from CpG site Ϫ7 to CpG site ϩ7 encompassing the transcription start site. A similar methylation profile was observed in two other samples of normal liver DNA that were sequenced (data not shown). Interestingly, liver was the only normal tissue examined that showed reduced GSTP1 expression, indicating a correlation between the methylation of these sites and expression.
In the 5Ј-upstream region (CpGs Ϫ56 to Ϫ30), a marked changed was observed in the methylation pattern of all the normal tissues examined (Fig. 4A). There was essentially no methylation in all the normal tissues up to and including CpG site Ϫ43; in contrast further upstream, from CpG site Ϫ44 and beyond, there was extensive methylation at most CpG sites. However, considerable variability was noted between different tissues in the level and pattern of methylation. Some sites (CpG sites Ϫ56, Ϫ55, and Ϫ53) were heavily methylated (75-100%) in nearly all tissues, whereas CpG site Ϫ52 and Ϫ48 was notable in that it was commonly unmethylated or undermethylated in some tissues. Also the six CpG sites, Ϫ44 to Ϫ49, were not methylated in smooth muscle cell DNA. Interestingly the abrupt transition from the DNA domain that is extensively methylated in all normal tissues to that which is unmethyl-  ated, between CpG sites Ϫ44 and Ϫ43, corresponds to the location of an (ATAAA) 19 -24 repeat sequence (Fig. 1C). Moreover, this pattern of methylation was found to occur in both GSTP1 alleles (data not shown). Immediately upstream of the (ATAAA) 19 -24 repeat are two members of the Alu interspersed repeat sequence family (Fig. 1). The CpG sites (spanning CpGs Ϫ56 to Ϫ44) in the Alu sequence immediately adjacent to the (ATAAA) 19 -24 repeat were hypermethylated in all normal tissues examined (Fig. 4A). The (ATAAA) 19 -24 repeat thus corresponds to a distinct boundary that separates the methylated Alu repeat DNA at the 5Ј-flank of the CpG island from the unmethylated DNA within the CpG island.
In contrast there is no clear sequence boundary at the 3Ј-end of the GSTP1 CpG island. Moreover the 3Ј-flank of the CpG island does not contain any Alu repeats or any other obvious repeat sequences. We previously noted that in normal prostate tissue the CpG island, up to CpG site 33 as well as sites 52 and 53, was unmethylated whereas the 3Ј-end of the GSTP1 gene from CpG site 69 to 103 was extensively methylated (14). To locate the exact methylation boundary at the 3Ј-end of CpG island, we have now sequenced a further 1 kilobase, from CpG site 13 in intron 1 to CpG site 67 in intron 4 (PCRIII and -IV, Fig. 1B), from four normal prostate tissue samples. As shown in Fig. 5, there is no methylation in the CpG sites within the CpG island; however, there is considerable heterogeneity in the methylation profile at the 3Ј-flank of the CpG island and more CpG sites become methylated further downstream from the CpG island region. Interesting, the first site that is methylated is CpG 31, which is located at the junction of intron 1 and exon 2. The profile of each of the four normal prostate samples is  Fig. 3. The average methylation levels are indicated by: Ϫ, 0%; ϩ, 1-25%; ϩϩ, 26 -50%; ϩϩϩ, 51-75%; and ϩϩϩϩ, 76 -100%. B, represents the direct sequence analysis on PCR products obtained from the core promoter region (PC-RII) of the same tissue samples analyzed in A. DNA methylation levels were determined by quantitative genescan analysis as described by Millar et al. (14). The average methylation levels are indicated by: Ϫ, 0%; ϩ, 1-25%; ϩϩ, 26 -50%; ϩϩϩ, 51-75%; and ϩϩϩϩ, 76 -100%. o indicates a CpG site where the methylation was not determined. different but all lack a distinct methylation boundary gradually increasing in methylation density over a 500-base pair region (see Fig. 7).
Methylation Profile in Prostate Cancer DNA-We have previously shown that the core promoter region (CpGs Ϫ28 to ϩ10) and all sites examined in the body and 3Ј-end of the gene are extensively methylated in prostate cancer DNA (14). We have now extended this analysis to examine the extent of methyla-tion in the region upstream of the core promoter in the region (CpGs Ϫ56 to Ϫ30) where we have found a marked boundary of methylation in normal tissue. We examined the methylation state in two prostate cancer cell lines, PC-3, which expresses GSTP1 and LNCaP, a GSTP1 nonexpressing cell line. Using reverse transcriptase-PCR we have shown that the treatment of LNCaP cells with 5-azacytidine, a demethylating agent, reactivates GSTP1 mRNA expression (data not shown). In addition, we analyzed prostate tumor DNA from three patients (BC, CC, and DC) and DNA isolated from a histologically normal region of the prostate of one of these patients (BN).
For LNCaP cells that do not express GSTP1 the extensive high level of methylation seen in the core promoter region was found to continue through the upstream region to and beyond the ATAAA boundary (Fig. 6A). The methylation profile of DNA from PC3 cells that express GSTP1 was distinctly different from that of LNCaP cells. There was very little methylation in the upstream region except beyond the boundary region from (CpG Ϫ46 to Ϫ56). The low methylation observed is consistent with the generally lower methylation level seen in the core promoter (Fig. 6B). Immunohistochemical analysis of both cell lines (data not shown) demonstrated that in the fully methylated LNCaP cells the GST-protein was not expressed. PC-3 cells on the other hand harbored two distinct classes of cells. The majority of the cells expressed abundant GST-protein, whereas in a subset of cells the GST-protein could not be detected. This is consistent with the methylation patterns observed, that is the patterns represent an average of the methylation profile of the pooled DNA that contain a mixture of methylated and unmethylated molecules.
In all three primary prostate cancer specimens (BC, CC, and DC), extensive methylation was seen downstream of the ATAAA boundary (Ϫ43 CpG to ϩ10 CpG) (Fig. 6, A and B). The methylation pattern was heterogenous at any one CpG site and varied between patient samples. The levels of methylation across the promoter region varied from 25 to 75% methylation for each CpG site, as determined by direct PCR sequencing (Fig. 6B). However, when we examined clones derived from the core promoter PCRII fragment we found that two distinct classes of molecules existed, those that were completely unmethylated and clones that showed extensive methylation through this region (data not shown). Because the cancer samples are not homogeneous and contain a mixture of normal and cancerous prostate epithelial cells, as well as stromal elements, it is likely that the unmethylated clones derive from contaminating normal cells. Therefore in the analysis of the clonal data from PCRI, presented in Fig. 6A, we have only presented the frequency of methylation of individual CpG sites among the population of methylated molecules. This explains why the methylation pattern in the cancer samples appears to be more intense from CpG Ϫ43 to Ϫ30 than from CpG Ϫ28 to CpG ϩ10. As we have reported for GSTP1 and other genes methylated in cancer (14,19,20), individual differences in methylation of specific CpG sites, e.g. CpG site Ϫ36, were evident between the different patients. Beyond the boundary region (CpG Ϫ43), the extent and pattern of methylation was essentially similar in the three cancer samples and in the two normal prostate DNA samples studied, normal prostate and BN (Fig. 4A and 6A).

DISCUSSION
It is well established that CpG islands remain unmethylated in normal cells and are generally associated with transcriptionally active genes (5). However, in cancer hypermethylation of CpG islands is a common aberration and this is often associated with gene silencing (6). What has not yet been established is what defines the unmethylated CpG island domain, that is are there distinct sequence boundaries between methylated and unmethylated sequences or do the methylation profiles merge, and moreover how are these boundaries disrupted or bypassed in a cancer cell. To define the sequence boundaries between the methylated and unmethylated domains of a CpG island, we have finely mapped the methylation profile of the upstream and downstream ends of the GSTP1 CpG island in a number of normal tissues including prostate tissue. The GSTP1 gene was chosen because it has a typical CpG island spanning the promoter and exons 1-3 of the gene, it is known to be expressed in a range of normal tissues, and it is frequently hypermethylated in prostate cancer.
A schematic representation of the methylation profiles across the GSTP1 CpG island found in normal prostate tissues and corresponding cancer samples is shown in Fig. 7. In the normal prostate samples there is a marked transition from extensively methylated DNA upstream of the GSTP1 promoter region to the unmethylated domain of the CpG island and this coincides with the position of an (ATAAA) 19 -24 repeat. The boundary between the methylated and unmethylated domains in most other tissues examined also correlates well with the position of the ATAAA repeat sequence, even though in brain and smooth muscle the 5Ј-boundary position is less marked. The boundary separating the methylated and unmethylated DNA domains at the 3Ј-end of the GSTP1 CpG island is less clearly defined. There is considerable heterogeneity of methylation through a 500-base pair region spanning the end of the CpG island in intron 2 and extending into exons 3 and 4. In prostate cancer the distinct exclusion of methylation from the CpG island is lost, and the CpG island becomes methylated in both GSTP1 alleles. The nature of the sequences in the boundary region may give a clue as to the mechanism that normally protects the GSTP1 CpG island from methylation. The ATAAA repeat sequence that flanks the 5Ј-CpG island is present in about 20 copies and may itself act as a barrier to the methylation of the GSTP1 island in normal cells or could just be fortuitously located 5Ј or 3Ј to the "real" barrier sequence. The sequence immediately adjacent and extending 5Ј from the ATAAA repeat is a member of the Alu family of interspersed repeated DNA sequences, most closely matching subfamily Sx. The 3Ј-end of the Alu sequence corresponds exactly with the start of the ATAAA repeat. Indeed the ATAAA repeat may be an expansion of the residual poly(A) tail of the Alu element. It has been noted previously that unmethylated CpG islands are often flanked by methylated Alu sequences (21). In particular Graff et al. showed that Alu sequences upstream of the E-cadherin and VHL genes were methylated in normal tissues, whereas adjacent CpG island sequences were not; however, the precise junction of the methylated and unmethylated domains was not determined in this study.
CpG islands are not always flanked by Alu sequences, however, as noted for the 3Ј-end of the GSTP1 island. Indeed the 3Ј-boundary of the GSTP1 CpG island, which is diffuse in nature, does not correspond to any identifiable sequence motif. However the start of methylation does correlate with transition from high CpG density of the island to more sparsely spaced CpG sites in the body of the gene. Moreover some of the first CpG site that was found to be methylated (CpG site 31, 1546 base) is located at the junction between intron 1 and exon 2. This location is similar to another boundary of methylation we have previously identified at the 3Ј-end of the HIC1 CpG island, which also occurs at an exon/intron boundary (22). As for the GSTP1 CpG island the HIC1 intron sequences are unmethylated, whereas adjacent exon sequences show substantial, but not complete methylation.
As discussed above, the boundaries of CpG islands do not appear to harbor common sequence elements at their flanking ends. Indeed the ATAAA repeat sequence boundary found directly 5Ј to the GSTP1 CpG island appears to be unique as we have not detected such a structure in other CpG island genes analyzed. However a similar pentanucleotide repeat element has been identified downstream of an Alu sequence in the 3Ј-untranslated region of a zinc finger cDNA sequence (23). The ATAAA repeat could be fortuitously located 5Ј or 3Ј to a real barrier sequence, which protects CpG islands from methylation. Alternatively the ATAAA repeat may itself act as a distinct barrier to CpG island methylation of the GSTP1 gene in normal cells but is inert in prostate cancer cells. The repeat sequence may provide binding sites for specific protein factors, in particular members of the high mobility group-I(Y) family of mammalian nonhistone proteins. These have been demonstrated to bind specifically to the minor groove of A-T-rich sequences and to function as gene transcriptional regulatory proteins (24). Moreover elevated high mobility group-I(Y) gene expression has been associated with progressive neoplastic transformation (25). Tamimi et al. (26) have shown that in prostate cancer high expression of high mobility group-I(Y) was observed in prostate tumors with higher Gleason grades. It will therefore be of interest to determine if the increased level of high mobility group-I(Y) expression plays a role in the initiation of hypermethylation of the GSTP1 CpG island in prostate cancer.
Whether hypermethylation in the tumor cell is initiated from the ends of the CpG island or at "hypermethylation centers" within the CpG island is unclear. From studies of transfected DNA, Graff et al. (21) have suggested that methylation may progressively encroach from methylated Alu sequence regions flanking CpG islands. Others have suggested that methylation is initiated at "centers" within the islands and progressively spreads (27). The GSTP1 gene from all the separate prostate cancer tissues was found to be extensively methylated throughout the entire CpG island. However, the DNA samples studied were isolated from cells many generations after the initiation of the methylation process. It is therefore difficult to infer how the process may have begun. In the case of the PC3 (Fig. 5) and DU145 (data not shown) prostate cancer cell lines, methylation is found in the core CpG-rich promoter region but does not extend through the 5Ј-flanking region. This pattern of methylation within the island is also seen in DNA from liver, where GST-is expressed at low levels. This would suggest that in these cells at least methylation could have been initiated at sites within the CpG island or spread from the 3Ј-methylated flanking sequences.
The susceptibility to initial methylation events in cancer may be determined by transcriptional activity (or lack of) or by the loss of or interference to binding of specific transcription factors such as Sp1 (9,10). Interestingly it has been noted that strong expression of GST-protein in normal prostate epithelium is limited to the basal cells and that many differentiated secretory epithelial cells do not stain with anti-GST-antibodies (12). Thus methylation of the GSTP1 gene in prostate cancer cells may be preceded by a loss of expression of the gene in normal epithelium. These observations indicate that early GSTP1 gene inactivation in prostate cancer cells may be a mechanism that predisposes the CpG island promoter region to the de novo methylation pathway resulting in the spread of methylation throughout the island in both alleles.