New 5′-(CGG) n -3′ Repeats in the Human Genome*

We identified new, potentially unstable loci in the human genome containing 5′-(CGG) n -3′ trinucleotide repeats by screening a human subgenomic library as well as a chromosome 16 library with a 5′-(CGG)17-3′ oligodeoxyribonucleotide probe. Five different clones were isolated, two from the chromosome 16 library and three from the subgenomic library. Determinations of the nucleotide sequences have revealed that the E7 clone displayed, in addition to the 5′-(CGG) n -3′ trinucleotide repeat, a 5′-(CAG) n -3′ and a 5′-(CCT) n -3′ trinucleotide repeat. Two clones, CL16-1 and P5-5, had homologies to known genes, the human casein kinase II α′ subunit (chromosome 16) and the human calcium-activated potassium channel (chromosome 10), respectively. Clones E7 and P4 were assigned to chromosome 6, whereas CL16-8 mapped to chromosome 16. Their potential coding capacities were assessed by RNA transfer (Northern blotting) experiments. Four different transcripts were identified by using the E7 clones as hybridization probes, three of them being brain-specific. The P4 clone was expressed in placenta and skeletal muscle. Minor polymorphisms within the repeats were observed in normal and in fragile X individuals. Lung and colon carcinoma cell lines in which some microsatellites were shown to be unstable were also investigated. Expansions of the 5′-(CGG) n -3′ repeats were not found.

Fragile sites can be classified into two major groups, rare and common ones. Among the rare fragile sites, at least 19 are folate-sensitive. So far, five folate-sensitive fragile sites have been characterized at the molecular level: FRAXA, FRAXE, FRAXF, FRA16A, and FRA11B. They are associated with an expansion of 5Ј-(CGG) n -3Ј trinucleotide repeats, with three of them being linked to clinical phenotypes. Whereas expansion at the FRAXA locus results in the fragile X syndrome, expansion at the FRAXE locus leads to mild mental retardation, and the FRA11B locus can be involved in the Jacobsen syndrome. The amplified FRA16A and FRAXF loci are not linked to known phenotypes, and genes have not yet been found near these fragile sites.
Several severe human diseases have been associated with the expansion of naturally occurring triplet repeats in the human genome (for recent reviews, see Refs. [1][2][3][4][5]. The cause of these instabilities in human DNA has remained unknown. In eight different loci, a 5Ј-(CAG) n -3Ј repeat as part of the coding region of genes can be amplified and cause a lengthening of a normal polyglutamine sequence. Severe neurological disorders, spinal and bulbar muscular atrophy, spinocerebellar ataxia types 1, 2, 6, and 7, Huntington disease, dentatorubral-pallidoluysian atrophy (Haw River syndrome), and Machado-Joseph disease (spinocerebellar ataxia type 3) are associated with the expansion of polyglutamine tracts. In Friedreich's ataxia, the only disease in this group with an autosomal recessive inheritance pattern, a 5Ј-(GAA) n -3Ј repeat is expanded in the first intron of the X25 gene (6). Surprisingly, in myotonic dystrophy, the expanded 5Ј-(CTG) n -3Ј repeat lies in the 3Ј-untranslated region, allegedly affecting the structure of the 3Ј-region of the messenger RNA (7,8). But the same region also constitutes the 5Ј segment of the myotonic dystrophy-associated homeobox gene (9,10).
The fragile X syndrome has been the first disease to be linked to a fragile site in the human genome. The FRAXA fragile site on the X chromosome in location A can be elicited by culturing the cells in a medium devoid of folic acid and thymidine (for reviews, see Refs. 11 and 12). By now, in excess of 100 fragile sites have been identified in the human genome, their recognition being dependent on the addition of specific chemicals to the culture medium, such as 5-bromo-deoxyuridine, thymidine, distamycin A, aphidicolin, 5-azacytidine, and others. Most of these fragile sites exhibit no genetic phenotype, e.g. those at chromosomes XF (FRAXF) or 16A (FRA16A). The fragile sites FRAXA at Xq27.3 or FRAXE at Xq28 are linked to varying degrees of mental retardation. At several of the fragile sites in the human genome, expansions of naturally occurring 5Ј-(CGG) n -3Ј repeats have been documented. The pathologically expanded repeats and adjacent nucleotide sequences can become de novo 5Ј-(CG)-3Ј-methylated (13).
In the FRAXA syndrome, typically presenting with severe mental retardation, facial dysmorphologies, macroorchidism, and joint hyperextensibility, the 5Ј-(CGG) n -3Ј repeat amplification is located in the 5Ј-untranslated region and promoter regions of the FMR1 gene. The 5Ј-(CG)-3Ј methylation is associated with the inactivation of the FMR1 promoter. In a few cases, this methylation has been absent despite a marked 5Ј-(CGG) n -3Ј expansion, the FMR1 gene continued to be expressed, and the probands with the full expansion have failed to exhibit the FRAXA phenotype (14). Recent results of genomic sequencing experiments in the FMR1 repeat region indicate that methylation mosaics exist in healthy and some fragile X individuals. 1 We have initiated a search for regions in the human genome with 5Ј-(CGG) n -3Ј repeats. Because most of the previously isolated 5Ј-(CGG) n -3Ј repeat-containing fragments are located in noncoding regions, genomic libraries were screened with a 32 P-labeled 5Ј-(CGG) 17 -3Ј oligodeoxyribonucleotide. We report here the isolation of new genomic sequences containing 5Ј-(CGG) n -3Ј repeats, mostly from the 5Ј-upstream segments of gene-rich regions.
DNA and RNA Extraction, Southern (DNA), and Northern (RNA) Transfer Hybridizations-DNA was prepared from peripheral blood as described (16). RNA was isolated from cultured cells according to published protocols (17,18). Restriction enzyme cleavage, gel electrophoresis, 32 P-labeling of hybridization probes, hybridization and autoradiography were carried out as described (19,20). For Southern transfer (21) and Northern transfer experiments, a downward blotting protocol was used (19). The filter with the human somatic cell hybrid panel was obtained from Oncor. The human RNA master blot and the human multiple tissue Northern blot were purchased from CLONTECH. All hybridization steps were performed using the prehybridization and hybridization solutions described elsewhere (19). In hybridization experiments with the RNA master blot, 40 g of C o t-1 DNA and 50 g of herring sperm DNA were added to the labeled probe before hybridization. All DNA-RNA hybridizations were carried out at 65°C.
Genomic Libraries and Screening-A 4 -8 kilobase pair size-selected fraction of human genomic DNA cleaved with EcoRI was isolated by velocity sedimentation on a sucrose density gradient and ligated into the EcoRI-precleaved ZAP express or ZAP II vector (Stratagene), packaged, and screened according to the manufacturer's protocol. The filters were hybridized as described (19). A total of 10 4 colonies of a human chromosome 16-specific library (22) were screened as follows. Oriented filters were incubated for 5 min in 2 ϫ SSC (1 ϫ SSC is 0.15 M NaCl, 15 mM sodium citrate) and 5% SDS and dried for 2 min in a microwave oven at full power. The hybridization was carried out overnight at 68°C in 5 ϫ SSC, 1% sarkosyl. The posthybridization washes were twice for 10 min in 2 ϫ SSC, 0.5% sarkosyl followed by twice 10 min in 0.5 ϫ SSC, 0.5% sarkosyl. The probe used in all screening experiments was a In the P13/P14 reaction, a dGTP:7-deaza-dGTP ratio of 1:3 was used. Conditions for PCR were 1 cycle for 5 min at 96°C, 25 cycles for 2 min at 95°C, 30 s at 65°C (in the P9/P10 reaction annealing was at 61°C), and for 2 to 2.3 min at 72°C. One-tenth of the PCR products was analyzed by electrophoresis on a 2% agarose gel. The specificities of PCR products were ascertained by Southern blot hybridization using a 32 P-end-labeled 5Ј-(CGG) 8 -3Ј probe. A subset of PCR products from reactions P1/P2, P9/P10, and P11/P12 with DNA from three fragile X patients and from five normal individuals was used for cloning. PCR products from the primer set P1/P2 were cleaved with EcoRI and NotI and cloned in EcoRI/NotI precleaved pBluescript KSϩ. The PCR products from the primer set P9/P10 were cut with AciI and cloned into AccI-precleaved pBluescript KSϩ. PCR products from the primer set P11/P12 were directly cloned into the PCRscript vector from Stratagene. Single clones were used to determine nucleotide sequences.
Nucleotide Sequencing and Computer Programs-Plasmid DNAs were sequenced with a Taq fluorescent dideoxy terminator cycle sequencing kit (Applied Biosystems). The University of Wisconsin Genetics Computer Group (UWGCG) package versions 9.0 and 9.1 were used for sequence analyses such as homology search and restriction mapping. Expressed sequence-tagged (EST) and bacterial artificial chromosome (BAC) clones were obtained from Genome Systems Inc.
Chromosome Mapping with the Fluorescent in Situ Hybridization Method-Human peripheral blood cells from the buffy coat layer were grown for 72 h in RPMI 1640 medium supplemented with 10% fetal calf serum enriched with 26 g of phytohemagglutinin/ml of medium. The cells were then arrested in metaphase by incubation at 37°C for 2 h with 10 g of ethidium bromide and 50 ng of Colcemid/ml. The chromosome harvesting, hybridization, and detection procedures were as described (23). The BAC clones were labeled by nick translation with biotin-16-dUTP, and the chromosome 6-specific probe 6q27 (Oncor) was digoxygenin-labeled. Undesirable cross-hybridizations with the labeled BAC probe was suppressed by adding 20 g of human C o t1 DNA to the hybridization mixture, which was incubated for 10 min at 37°C before adding the mixture to the chromosome preparation. The chromosomal DNA was counterstained with propidium iodide or 4Ј, 6-diamidino-2phenylindole when two different probes had to be detected.

RESULTS
Genomic DNA Clones Containing 5Ј-(CGG) n -3Ј Repeats-A subgenomic library was constructed by ligating EcoRI-cleaved genomic DNA from human white blood cells into the Zap express and Zap II vectors as described under "Experimental Procedures." In addition, chromosome-specific libraries (22) were screened, and we began to isolate positive clones from the chromosome 16 library, since the DNA on this chromosome had already shown expansions in the 5Ј-(CGG) n -3Ј-containing region of the locus FRA16A at 16p13.11 in some individuals (24,25). All the clones isolated so far are schematically presented in Fig. 1. The total lengths of the nucleotide sequences and the nature and locations of the 5Ј-(CGG) n -3Ј repeats actually detected in each of the cloned DNA segments are also indicated. In addition to the 5Ј-(CGG) n -3Ј repeats, repeats with different sequences were also present in two of these DNA segments. The actual nucleotide sequences were not reproduced here. They are available via accession numbers AJ001215, AJ001216, AJ001217, AJ001218, and AJ001219 from the EMBL data base for clones P4, E7, P5-5, CL16-1, and CL16-8, respectively. Table I summarizes the homologies found among the individual DNA sequences in the GenBank using the Blast program.
Chromosomal Localizations of Human DNA Segments Carrying 5Ј-(CGG) n -3Ј Repeats-The isolated human DNA segments with 5Ј-(CGG) n -3Ј repeats have been mapped to specific locations on the human chromosomes (i) by hybridization to a somatic cell hybrid panel (Oncor), (ii) by fluorescent in situ hybridization of the biotinylated BAC clones to human chromosome spreads (Fig. 2), and (iii) by hybridization to radiation hybrid panels (31). The CL16-1 and the CL16-8 clones were confirmed to be derived from human chromosome 16, and the P5-5 clone was located on chromosome 10 (data not shown). The mapping results of the CL16-1 and P5-5 clones were in line with the locations of their homologous sequences (Table I). The E7 and P4 clones mapped to chromosome 6q. The E7 genomic fragment mapped to 6q25-26 (Fig. 2a) and the P4 segment to 6q22-23 (Fig. 2b). The clone P4 has been more precisely assigned to a chromosomal location by radiation hybrid mapping that has been performed at the Sanger Center in Cambridge, UK. The P4 clone is closely linked to the marker stSG11083, which lies between markers D6S1004 and D6S1620 on the long arm of chromosome 6 at position 510.7 cR 3000 of the Sanger Center's radiation hybrid map (data not shown).
Transcription of the Newly Identified DNA Segments Containing 5Ј-(CGG) n -3Ј Repeats-The 32 P-labeled subfragments of the clones were hybridized to a multiple tissue Northern blot (CLONTECH), to a Master blot (CLONTECH), or to a Northern blot on which poly(A)-selected RNAs from different tumor cell lines were immobilized. The human DNA segment in clone P4 was transcribed in skeletal muscle into RNAs of 5.1, 3.3, 2.8, and 1.8 kb and in placenta into RNA of 2.8-kb lengths (Fig. 3a). This apparently tissue-specific 2.8-kb RNA was also present in the human tumor cell lines A172, 293, A549, KB, HeLa, and Hep2 (Fig. 3b).
In all the tissues tested, the DNA segment in the E7 clone was transcribed into a 5-kb RNA molecule (Fig. 4, a and b). None of the tested tumor cell lines expressed this segment. The brain-specific transcripts, detected by the EcoRI-EcoRI frag-ment as hybridization probe were 9.6, 4.8, and 3.8 kb in lengths (Fig. 4c). The 5-kb transcript present in all tested tissues (Fig.  4b) was not detected with the EcoRI-EcoRI fragment of E7 in Fig. 4c. To assess the extent of this brain-specific expression, we probed a master blot carrying dot-blotted poly(A) RNAs from 50 different human tissues (Fig. 4d). Transcripts were mainly found in different parts of the brain, with a stronger signal in cerebellum. Significant expression was also seen in some glandular tissues like the pituitary gland (D4), the thymus (E5), and to a lesser extent in the thyroid (D6) and the adrenal gland (D5), as well as in fetal brain, kidney, and thymus (Fig. 4c). Transcription analyses were not conducted  CL16-1, P5-5, and CL16-8 (b). Most designations are self-explanatory. The following details can be added. a, Clone E7 (6170 nucleotide pairs). P refers to primer pairs and the lengths of the PCR-amplified DNA segments synthesized with the aid of a given primer pair. DAN15 and EST1 are previously identified DNA segments whose nucleotide sequences are available from the EMBL data bank. R1-R4 describe the repeat sequences, as indicated underneath the E7 map. Clone P4 (4010 nucleotide pairs). The EST clone whose 919-bp cDNA has been spliced as indicated. b, clone CL16-1 (2837 nucleotide pairs). CKII sequence elements from the human casein kinase II ␣Ј subunit gene are shown. Clone P5-5 (5365 nucleotide pairs): homologies to the gene for the Ca 2ϩ -activated potassium channel have been indicated. 1-4 designate additionally known EST sequences. Clone CL16-8 (3628 nucleotide pairs). In all maps, vertical arrows designate the locations of the individual repeat sequences. frag., fragment for the CL16-1 and P5-5 clones. However, the CL16-8 clone did not allow the detection of RNA molecules by hybridization to the multiple tissue Northern or the master blot.
Lack of Expansions of 5Ј-(CGG) n -3Ј Repeats in Fragile X Patients-We next searched for polymorphisms within the repeats in the DNA loci E7, P4, and CL16-8 from healthy individuals. The possibility existed that in fragile X patients 5Ј-(CGG) n -3Ј repeats outside the bona fide location on Xq27.3 would also become subject to expansions. Therefore, DNAs from several full expansion fragile X patients were analyzed. Southern blotting experiments did not reveal abnormal mobility of the corresponding P4 (Fig. 5, a-c), E7 (Fig. 5, d-f), or CL16-8 bands (data not shown) in normal individuals (Fig. 5, a  and d) or in fragile X individuals (Fig. 5, b and e). Minor variations in the repeat-containing regions were assessed by PCR using primers indicated in the scheme of each clone (see maps in Fig. 1, a and b). Changes in sizes could not be detected in the tested DNAs. Because the trinucleotide repeats observed were cryptic and complex, we cloned some of the PCR products from P4 and regions R2 plus R3 of the E7 DNA segment in a suitable vector to sequence individual clones, as described under "Experimental Procedures." Only minor polymorphisms were observed in region R2 of the E7 locus in healthy and fragile X individuals (Table II). The control PCR products from the E7 plasmid remained constant throughout the cloning procedure. It is concluded that the 5Ј-(CGG) n -3Ј repeats in various segments of the human genome outside the Xq27.3 locus are not expanded in fragile X patients. The P4 and E7 loci are not markedly polymorphic.
No Rearrangements in the P4 and E7 Loci in Human Tumor Cell Lines-Because of the differences in expression of the E7 and P4 DNA segments between different human tissues and human tumor-derived cell lines, we decided to analyze repeats in the DNAs from these cell lines as well as from two colon carcinoma cell lines, RER(Ϫ) SW480 and RER(ϩ) LoVo. The latter is known to exhibit (CA) n microsatellite instabilities. Analyses of the DNAs from these tumor cell lines by Southern blotting (Fig. 5, c and f) or by PCR experiments (data not shown) did not reveal any major rearrangements in either   FIG. 2. Chromosomal localizations of (a) the E7 (chromosome 6q25-26) and (b) the P4 (chromosome 6q22-23) clones. In a, the E7-matching BAC clone was biotinylated (cy3-red) and hybridized to spread metaphase chromosomes together with the digoxygenin-labeled chromosome 6-specific probe (fluorescein isothiocyanate green), the 6q27 TBP (Oncor). In b, the corresponding P4-biotinylated BAC clone was hybridized to spread metaphase (fluorescein isothiocyanate signal) chromosomes. Chromosomal DNA was stained with (4Ј, 6-diamidino-2phenylindole) (a) or propidium iodide (b).  locus. Therefore, the observed differential expression (loss of tissue specificity) could be attributed to alterations of DNA methylation in the P4 region. However, some deletions occurring upstream and/or downstream the 6170-bp E7 fragment and/or mutations in the E7 sequence could still be responsible for the complete absence of E7 expression in tumor cell lines.

DISCUSSION
In several human genomic DNA fragments we have identified 5Ј-(CGG) n -3Ј repeats located 5Ј to the previously identified cDNA sequences of the genes for the Ca 2ϩ -activated potassium channel and the casein kinase II ␣Ј subunit. These repeats might play a role in the regulation or altered functions of the corresponding proteins. The human Ca 2ϩ -activated potassium channel (K ca ) proteins are a large and diverse family of ion channels. All the different isoforms of the K ca are generated by alternative splicing of a single gene mapped on 10q22.3 (28,29). Although the expression of the K ca is widely distributed, the highest levels of expression are found in brain, aorta, and skeletal muscle (28). So far, only the different cDNA sequences of the gene were reported. The genomic clone P5-5 isolated here contains the 5Ј region of the different spliced mRNA molecules, and only a part of the complete 5Ј-(GCC) n -3Ј-rich region is included in the different transcripts reported.
Clone CL16-1 is a genomic DNA fragment comprising the 5Ј region of the casein kinase II ␣Ј subunit (CSNK2A2). This enzyme is known to have three possible isoforms ␣ 2 ␤ 2 , ␣␣Ј␤ 2 , and ␣Ј 2 ␤ 2 . The ␣ and ␣Ј polypeptides are the catalytic subunits. The ␣Ј cDNA can detect several mRNAs in Northern blot experiments (26). Alterations of the repeat-rich region located 5Ј to the CSNK2A2 gene might play a role in the regulation of different levels of mRNAs.
In the case of spinocerebellar ataxia 6, the gene had been known for a long time before it could be assigned to a disease (32)(33)(34). Interestingly, the human DNA segments investigated here exhibit a combination of several types of repeats, CGG, CAG, CCT, GGA, and GAG. Within a 1.8-kb fragment, the DNA segment E7 clone carries most of the identified repeat sequences.
FIG. 5. Assessment of the stability of the E7 and P4 loci. Variations in size at the P4 (a-c) and E7 loci (d-f) were investigated by cleaving 15 g of genomic DNA with EcoRI plus KpnI in (a-e), whereas in f, DNA was cut with EcoRI plus PstI. In the E7 locus, the 2162 and 2180 bp fragments contain both the repeats R1 and R2; repeats R3 and R4 are located in the 2976-and 2734-bp fragments. The 1354-bp fragment in the P4 locus contains the repeat. In a and d, cleavage patterns of DNA from normal individuals, in b and e from fragile X individuals, and in c and f from human tumor cell lines are shown. The lanes E7 and P4 are plasmid controls. In panels a to c, the 32 P-labeled DNA fragment 4-0 was used to probe the DNA on the membranes, and the 7-3 and 7-42 fragments were used in panels d to f. Letter combinations refer to DNAs from control (a, d) or fragile X individuals (b, e).

TABLE II
The various alleles of the E7 repeat R2 observed in normal individual (N2) and in full expansion fragile X individual (RF) The relevant part of R2 is reproduced here; the sequences were analyzed after sequencing the single clones. The 10 isolated clones from the plasmid control E7 did not show any variations.

R2 Consensus
(CTT) 4 (CGG) 8 N2 (CTT) 3 (CCG) 8 (CCT) 4 (CCG) 7 (CCT) 4 (CCG) 8 RF (CCT) 4 (CCG) 7 (CCT) 4 (CCG) 8 (35) on the instabilities within simple tandem repeats (STR) at 10 loci in 29 Huntington disease families that have been compared with instabilities at the same loci in 29 colon cancer patients with defects in mismatch repair enzymes. The hypothesis that secondary structures adopted by CAG, CGG, or GAA repeats in the trinucleotide repeat diseases might mediate instability is supported by the work of Gacy et al. (36). This group has suggested that the improper DNA structure at the repeat region is responsible for GAA instability in Friedreich's ataxia.
The tissue-specific transcription displayed by the P4 DNA segment is apparently erased in the tumor cell lines. In only one of the two glioblastoma cell lines, the 2.8-kb transcript has been identified. The location of the P4 clone, 6q22-23, and its strong muscle expression renders it a possible candidate gene for muscular dystrophy. Indeed it maps close to some other genes also expressed in muscle, e.g. lamA2 and lamA4. The lamA2 gene is involved in merosin-deficient congenital muscular dystrophy (37).
Surprisingly, the 5-kb transcripts ubiquitous in many tissues were absent in all tumor cell lines tested. Could this region be involved in the process of tumorigenesis because of its location at 6q25-26? Several authors reported that the telomeric part of the long arm of chromosome 6 might contain tumor suppressor genes (15, 38 -39).
The region R3 with its 5Ј-(CAG) n -3Ј triplet repeat and region R4 of the E7 locus were transcribed. The EST1 clone, which is homologous to the E7 region detecting the brain-specific transcripts, was only 1,283 nucleotide pairs long. The sequence available in the data base from the DAN15 clone is 575 bp long. By isolating the full-length cDNA of both genes, one might be able to identify the location of the 5Ј-(CGG) n -3Ј and other repeats. The E7 DNA fragment maps to 6q25-26, is repeat-rich, and expressed specifically in brain.