Purification of Nuclear Proteins from Human HeLa Cells That Bind Specifically to the Unstable Tandem Repeat (CGG) n in the Human FMR1 Gene*

Autonomous expansions of trinucleotide repeats with the general structure 5 (cid:42) -d(CNG) n -3 (cid:42) are associated with several human genetic diseases. We have characterized nuclear proteins binding to the unstable 5 (cid:42) -d(CGG) n -3 (cid:42) repeat. Its expansion in the human FMR1 gene leads to the fragile X syndrome, one of the most frequent causes of mental retardation in human males. Electrophoretic mobility shift assays using nuclear extracts from several human and other mammalian cell lines and from primary human cells demonstrated specific binding to dou-ble-stranded DNA fragments containing only a 5 (cid:42) -d(CGG) 17 -3 (cid:42) repeat or the repeat and flanking genomic sequences of the human FMR1 gene. Protein binding was inhibited by complete methylation of the trinucleotide repeat. The complex formed with crude nuclear extract apparently did not contain the human transcription factor Sp1 that binds to a characteristic GC-rich sequence. A 20-kDa protein involved in specific binding to the double-stranded 5 (cid:42) -d(CGG) 17 -3 (cid:42) repeat was purified from HeLa nuclear extracts by DNA affinity chromatography. The autonomous, mechanistically still unexplained expansion of naturally occurring trinucleotide tandem repeats in the human genome has been recognized to be related to a number of serious human diseases: the fragile X syndrome (FRAXA locus), myotonic dystrophy, spinal and bulbar muscular atro-phy, Huntington disease, mental retardation associated with the fragile site FRAXE on the human X chromosome, spino-cerebellar ataxia type I,

Autonomous expansions of trinucleotide repeats with the general structure 5-d(CNG) n -3 are associated with several human genetic diseases. We have characterized nuclear proteins binding to the unstable 5-d(CGG) n -3 repeat. Its expansion in the human FMR1 gene leads to the fragile X syndrome, one of the most frequent causes of mental retardation in human males. Electrophoretic mobility shift assays using nuclear extracts from several human and other mammalian cell lines and from primary human cells demonstrated specific binding to double-stranded DNA fragments containing only a 5d(CGG) 17 -3 repeat or the repeat and flanking genomic sequences of the human FMR1 gene. Protein binding was inhibited by complete methylation of the trinucleotide repeat. The complex formed with crude nuclear extract apparently did not contain the human transcription factor Sp1 that binds to a characteristic GC-rich sequence. A 20-kDa protein involved in specific binding to the double-stranded 5-d(CGG) 17 -3 repeat was purified from HeLa nuclear extracts by DNA affinity chromatography.
The autonomous, mechanistically still unexplained expansion of naturally occurring trinucleotide tandem repeats in the human genome has been recognized to be related to a number of serious human diseases: the fragile X syndrome (FRAXA locus), myotonic dystrophy, spinal and bulbar muscular atrophy, Huntington disease, mental retardation associated with the fragile site FRAXE on the human X chromosome, spinocerebellar ataxia type I, and dentatorubral-pallidoluysian atrophy (for reviews, see Refs. [1][2][3][4][5][6][7]. Fragile sites, also known as folate-sensitive sites, are chromosomal aberrations that condense poorly during metaphase and can break under specific experimental conditions (8). Several such sites have been identified on the X chromosome (FRAXA, FRAXE, FRAXF; Ref. 9) and on the autosomes 11 (FRA11B; Ref. 10) and 16 (FRA16A; Refs. 11 and 12). All fragile sites identified so far have been found to be associated with amplifications of the simple unstable tandem repeat 5Ј-d(CGG) n -3Ј.
In the fragile X syndrome, the expanded tandem repeat 5Ј-d(CGG) n -3Ј is located in the 5Ј-untranslated region (UTR) 1 of the FMR1 gene in the human chromosomal location Xq27.3 (13). The number of repeat units varies between 6 and 54 in normal individuals, whereas more than 200 to up to 2000 repeat units can be found in affected individuals. Expansion of the repeat is accompanied by extensive methylation of the 5Ј-dCG-3Ј dinucleotides in the repeat (14 -16) and is associated with transcriptional silencing of the FMR1 gene (17)(18)(19). The function of the FMR1 protein is not yet known. The de novo methylation of the expanded trinucleotide repeat can be interpreted as a cellular defense against the invasion of foreign DNA or against unusual DNA structures (20,21). The cellular mechanism of triplet repeat amplification is not understood. Interestingly, procaryotic DNA polymerases are capable of expanding short synthetic oligodeoxyribonucleotides containing simple tandem repeat sequences to DNA stretches of several thousand nucleotides in lengths even in the absence of template DNA (21,22). This finding suggests a slippage mechanism (23,24) for the expansion of trinucleotide repeats presumably involving specific DNA-binding proteins. In transgenic mice for instance, a 5Ј-d(CAG) 45 -3Ј repeat in the androgene receptor gene is stable upon transmission in the mouse, whereas it is expanding upon transmission in humans (25). The authors suggest the involvement of sequence-specific, probably species-specific, DNA-binding proteins in the amplification reaction. Experiments with crude nuclear extracts from human HeLa cells indeed have shown binding of proteins to tandem repeat sequences (26). In addition, an amplified 5Ј-d(CTG)-3Ј repeat is a preferential target for nucleosome assembly (27,28).
We have initiated experiments to characterize and purify human nuclear proteins that bind specifically to the doublestranded 5Ј-d(CGG) n -3Ј repeat. Such proteins are present in a variety of human and other mammalian cell lines, as well as in primary cells.

EXPERIMENTAL PROCEDURES
Cells and Cell Lines-Human HeLa cells were purchased from Gesellschaft fü r Biotechnologische Forschung, Braunschweig, Germany. Human KB and Jurkat cells, BHK21 hamster cells, and fat head minnow (FHM) fish cells (29) were propagated by standard methods. Primary human lymphocytes were prepared and grown as reported (30). Hamster cell line T637, an adenovirus type 12 (Ad12)-transformed BHK21 cell line, and the revertants of cell line T637, TR3, and TR12, with no detectable and about one genome equivalent of integrated Ad12 DNA, respectively (31), were all grown in Dulbecco's medium supplemented with 10% fetal calf serum. Cell lines 293 and HEK12, human embryonic kidney cells transformed with parts of adenovirus type 5 (Ad5) (32) and Ad12 (33), respectively, cell lines A549 (human lung cancer) and C4/I (human cervix carcinoma), and a permanent cell line isolated from a human amnion tumor were gifts of the Institute of Cell Biology or of Molecular Biology (University of Essen, Medical School, Essen, Germany) as well as monkey Vero cells and the Ad12-transformed rat embryo fibroblast line REF12.
Oligodeoxyribonucleotides and DNA Fragments-Oligodeoxyribonucleotides were synthesized in an Applied Biosystems 381A DNA synthesizer. Hybridization to form double-stranded oligodeoxyribonucleotides was carried out in a polymerase chain reaction thermal cycler (Perkin Elmer Cetus) under the following conditions: 10 min at 95°C, cooling to 70°C for 60 min, 60 min at 70°C, cooling to 58°C for 60 min, 60 min at 58°C, cooling to 17°C for 90 min, 60 min at 17°C. Oligodeoxyribonucleotides were subsequently purified by electrophoresis on polyacrylamide gels according to standard procedures. The compositions of the synthetic oligodeoxyribonucleotides used in this study and the abbrevations to designate them were summarized in Table I. DNA fragments were isolated from the plasmid pE5.1, which was a gift from Stephen T. Warren, Emory University School of Medicine, Atlanta, GA. This plasmid contained a 5Ј-d(CGG) 16 -3Ј repeat in exon 1 of the human FMR1 gene and flanking genomic DNA sequences (13). The plasmid was cut with NarI, and the excised 441-bp fragment was isolated. This fragment was subsequently treated with RsaI or BfaI to yield a 198-bp (198ds) or a 126-bp (126ds) fragment, respectively. To obtain the 248-bp (248ds) fragment, the plasmid was first cleaved with RsaI, and the resulting fragment was isolated and cut with DdeI. A restriction map illustrating the derivation of these fragments was presented in Fig. 1.
Oligodeoxyribonucleotides were 5Ј-end labeled with T4-polynucleotide kinase (New England Biolabs, Beverly, MA) and [␥-32 P]ATP. DNA fragments were labeled at the 3Ј-end with the Klenow fragment of DNA polymerase I (Boehringer Mannheim) and ␣-[ 32 P]dATP or ␣-[ 32 P]dCTP according to standard procedures. The specific activity of the DNA probes was 10 7 cpm/pmol.

Preparation of Nuclear Extracts and Purification of Proteins
Binding to the Double-stranded 5Ј-d(CGG) n -3Ј Repeat-All procedures were carried out at 4°C, unless stated otherwise. Nuclei were isolated from cells according to Dignam et al. (34) and Barrett et al. (35) by lysing the cells either in hypotonic buffer A (20 mM HEPES, 10 mM NaCl, 1 mM MgCl 2 , 0.15 mM spermine, 0.1 mM EDTA, 0.1 mM EGTA, 0.5 mM dithiothreitol, 0.5 M sucrose, and protease inhibitors, pH 7.9) in 0.25-0.5% Triton X-100 or by disintegrating the cells in a tight fitting glass Dounce homogenizer followed by centrifugation at 600 ϫ g for 15 min. Nuclei were washed 3 times with Triton-free buffer B (same as buffer A, except 0.35 M sucrose) and extracted on ice for 30 min in buffer C (buffer A, without sucrose, containing 420 mM NaCl and 20% glycerol). The supernatant of the subsequent centrifugation at 100,000 ϫ g for 60 min was dialyzed for 3 h against buffer W (buffer A without sucrose, containing 80 mM KCl and 20% glycerol). The dialysate was centrifuged at 100,000 ϫ g for 10 min, frozen in liquid nitrogen, and stored at Ϫ80°C. Under these conditions, DNA binding activity of proteins was stable. Protein concentrations were measured by standard procedures (36). Nuclear extracts from KB and BHK21 cells infected with Ad12 were gifts from Sabine Huppertz, those from the insect cell line IPLBSF21 (SF21) were from Andreas Kremer, and those of FHM cells were from Mark Munnes, all at the Institute of Genetics in Cologne.
For the purification of HeLa cell proteins (designated CGGBP(s) ϭ 5Ј-d(CGG) n -3Јds binding proteins) that bind to the double-stranded 5Ј-d(CGG) n -3Ј repeat, crude nuclear extracts isolated from 2 ϫ 10 9 cells (20 mg of protein) were equilibrated in buffer QA (10 mM Tris-HCl, 100 mM KCl, 1 mM MgCl 2 , 0.15 mM spermine, 0.1 mM EDTA, 0.5 mM dithiothreitol, 20% glycerol, 0.01% Tween-20, and protease inhibitors, pH 7.9) using NAP-10 columns (Pharmacia Biotech Inc.) or Econo DP10-columns (Bio-Rad) and subsequently loaded on a 1-ml Resource Q column (Pharmacia) equilibrated in buffer QA. Proteins binding to the oligodeoxyribonucleotide (CGG) 17 ds (see Table I) eluted in the flowthrough (fraction I, see Fig. 4). DNA affinity Sepharose was prepared by coupling 400 g of the 3Ј-amino modified oligodeoxyribonucleotides (CGG) 17 ds, CGG8Ads, or (CAG) 17 ds covalently to 1 ml of N-hydroxysuccinimide-activated Sepharose beads (HiTrap; Pharmacia) according to the manufacturer's protocol. The material was equilibrated in buffer QA immediately before use. Proteins were bound and eluted in a batch procedure, washing and elution were performed in spin columns (Biometra, Göttingen, Germany). Active fraction I was incubated with CGG8Ads-Sepharose (250 l) in the presence of 200 g of poly(dA⅐dT) for 1 h. Unbound proteins containing CGGBP(s) (fraction II) were then incubated with 100 l of (CGG) 17 ds-Sepharose either at 4°C for 4 h or at room temperature for 1 h. The material was centrifuged at 600 ϫ g for 10 min, washed twice with 1 ml of buffer W 100 (20 mM HEPES, 100 mM NaCl, 1 mM MgCl 2 , 0.15 mM spermine, 0.1 mM EDTA, 0.5 mM dithiothreitol, 20% glycerol, 0.01% Tween-20, and protease inhibitors, pH 7.9), and subsequently washed twice with 1 ml of buffer W 150 (same as W 100 but with 150 mM NaCl and 100 pmol of an unrelated oligodeoxyribonucleotide). CCGBP(s) were eluted as fraction III from the resin in 100 l of buffer E 750 and partly in 100 l of buffer E 1000 (same as W 100 but with 750 mM and 1 M NaCl, respectively). After equilibration of fraction III in buffer W 100 supplemented with 0.4% Tween-20, proteins were again bound to 20 l of (CGG) 17 ds-Sepharose. Binding, washing and elution were carried out as described above, but smaller volumes of the buffers W 100 (1 ml) and W 150 (100 l) were used. CGGBP(s) eluted in 20 l of buffer E 750 to yield fraction IV. Only low activity remained after elution with buffer E 1000 (fraction IV). Active fractions I to IV were analyzed by SDS-polyacrylamide gel electrophoresis (37) followed by silver staining.
Influence of sodium deoxycholate on complex formation was tested as described previously (38). Crude nuclear extracts or fraction I were incubated with labeled DNA fragments as described above for 10 min. Different amounts of sodium deoxycholate were then added in the absence or presence of 0.6% Nonidet P-40, the mixture was incubated for another 30 min and subsequently analyzed by gel electrophoresis.
The monoclonal antibody against the human transcription factor Sp1 was purchased from Santa Cruz Biotechnology Inc. (Santa Cruz, CA). Crude nuclear extracts or fraction I or III were incubated with the specific DNA fragment as described above in the presence of the anti-Sp1 antibody (0.3-1 g) for 60 min at room temperature. Complexes were separated by electrophoresis on polyacrylamide gels. 17 -3Ј Repeat-We have examined nuclear extracts from various human and other mammalian cells by electrophoretic mobility shift assays (EMSA) for the presence of proteins that bind to the doublestranded 5Ј-d(CGG) n -3Ј repeat located in the exon 1 of the human FMR1 gene (13). For this purpose, DNA fragments containing a 5Ј-d(CGG) 16 -3Ј repeat and the flanking genomic sequences from the 5Ј-UTR of the human FMR1 gene (Fig. 1) or double-stranded, repetitive oligodeoxyribonucleotides (Table I) were used. In order to ensure formation of the double strand, the repetitive single-stranded oligodeoxyribonucleotides were hybridized under controlled conditions at high annealing temperatures. Double-stranded oligodeoxyribonucleotides migrated according to their sizes in native polyacrylamide or NuSieve agarose gels, whereas the single-stranded 5Ј-d(CNG) n -3Ј oligodeoxyribonucleotides showed increased mobility (data not shown; Ref. 39).

Nuclear Proteins from Several Mammalian Cells Bind Specifically to the Double-stranded 5Ј-d(CGG)
Binding of nuclear proteins to the synthetic oligodeoxyribonucleotide (CGG) 17 ds and the FMR1 promoter derived DNA fragment 198ds was demonstrated by EMSA ( Figs. 2 and 3, a and c). Specificity of binding was ascertained by competition experiments using the unlabeled homologous oligodeoxyribonucleotide (CGG) 17 ds and additional synthetic products containing different tandem repeat sequences as competitors (Fig. 3, a and c). Nuclear proteins isolated from the established human cell lines HeLa, C4/I, KB, Jurkat, A549, 293, HEK12, an amnion tumor-derived cell line, as well as from primary human lymphocytes gave rise to the specific DNA-protein complex I (Fig. 2a, cI) after incubation with the oligodeoxyribo-nucleotide (CGG) 17 ds. Formation of complex I could be competed by the oligodeoxyribonucleotide (CGG) 17 ds in at least 75-fold excess, but not by several oligodeoxyribonucleotides with different sequences (Fig. 3a). Additional DNA-protein complexes apparent in EMSAs shown in Fig. 2a were not specific as shown by competition experiments (Fig. 3a). Extracts from non-human cells like hamster BHK21 cells and rat embryo fibroblasts REF12 produced the same patterns as those from human cells (Fig. 2b). However, proteins from monkey Vero cells, from nonmammalian FHM fish cells, and from the insect cell line SF21 generated specific DNA-protein complexes (Fig. 2b), which were different from those with proteins from human cell lines.
Infection of the permissive human cell lines HeLa and KB with Ad12 did not abolish CGG-binding activity (Fig. 2a). However, the abortive infection of hamster BHK21 cells with Ad12 gave rise to two additional bands showing slightly higher mobility in EMSA (Fig. 2b). In contrast, extracts from the Ad12transformed BHK21 cell line derivative T637 or from its revertants TR3 or TR12 showed the same patterns as proteins from extracts of the parental BHK21 cells. Interestingly, CGGBP(s) were not detectable in extracts isolated from BHK21 cells grown in suspension cultures (Fig. 2b).
The biological significance of these data had to be ascertained by repeating the binding experiments with authentic DNA fragments from the 5Ј-UTR of the FMR1 gene. Fragment 198ds gave rise to the DNA-protein complexes 1-4 (Fig. 3c, c1-c4) when nuclear extracts from human HeLa cells were used. Similar or identical patterns were found when extracts from other human or non-human cell lines were investigated. Complex 3 was not always detectable. Complex 1 appeared to be specific for CGG binding, as its formation could be blocked by competition with the oligodeoxyribonucleotide (CGG) 17 ds, but not with other oligodeoxyribonucleotides. The strong complex 4 seemed also to be formed by CGGBP(s), because its formation was partly competed by the oligodeoxyribonucleotide (CGG) 17 ds (Fig. 3c) and also by 198ds. During the purification of CGGBP(s), complex 4 was the only detectable complex involving the 198ds fragment. Its formation could then be specifically competed by the oligodeoxyribonucleotide (CGG) 17 ds and FMR1 promoter fragments 126ds, 198ds, and 248ds, but not by other oligodeoxyribonucleotides. Thus, complex 1 might contain additional factors that were probably associated with factors binding to flanking 3Ј-sequences. These additional factors could have been lost during purification and were no longer present in the CGGBP(s) in complex 4. Interestingly, the binding of proteins from nuclear extracts to the 126ds fragment with the same 5Ј-sequence as 198ds but a shorter 3Ј-end ( Fig.  1) gave rise to only one complex and a pattern similar to that formed with the oligodeoxyribonucleotide (CGG) 17 ds (data not shown). In contrast, binding of nuclear proteins to the 248ds fragment, which had the same 3Ј-sequence as 198ds but a longer 5Ј-sequence, produced the same pattern as the 198ds fragment. It is concluded that several human and other mammalian cells express a (CGG) 17 ds binding activity that gives rise to the same, strong complex I with the oligodeoxyribonucleotide (CGG) 17 ds and to at least one specific complex with the authentic DNA fragments 198ds, 126ds, and 248ds from the 5Ј-UTR of the human FMR1 gene.

Specificity of Complex Formation as Assessed by Competition Experiments-
The results of a series of competition experiments, which were performed to assess the specificity of complex I formation, were summarized in Table II. The formation of complex I was impaired only by competition with the doublestranded oligodeoxyribonucleotides (CGG) n ds (8 Ͻ n Յ 17) and with the authentic DNA fragments 126ds, 198ds, and 248ds from the 5Ј-UTR of the FMR1 gene (Fig. 3a). Single-stranded oligodeoxyribonucleotides (CCG) 17 ss or (CGG) 17 ss did not compete for binding.
Moreover, complex I was observed only with oligodeoxyribonucleotides (CGG) 17 ds and (CGG) 12 ds as binding probes, whereas (CGG) 8 ds gave rise to a very faint complex (data not shown). The oligodeoxyribonucleotide FraxF isolated from the human FRAXF locus (9) did not serve as a specific binding probe for CGGBP(s) and did not compete for binding to (CGG) 17 ds. The FraxF oligodeoxyribonucleotide contained eight 5Ј-d(CGG)-3Ј repeats and alternating 5Ј-d(CAGCGG)-3Јds repeats (Table I). Hence, effective binding of CGGBP(s) to the recognition sequence required more than 8 repeat units.
Formation of complex I was only partly competed by the synthetic oligodeoxyribonucleotide CGG8Tds (Fig. 3a), whereas no competition was observed with the oligodeoxyribonucleotide (TGG) 17 ds (nucleotide sequences, see Table I). However, complex I formation was not competed by the addition of oligodeoxyribonucleotides with other triplet repeat sequences (Fig. 3a). Moreover, binding of nuclear proteins to the 5Јd(CAG) 17 -3Јds repeat was unspecific (data not shown). When the authentic DNA fragments 198ds or 126ds were used as binding probes, the 5Ј-d(CGG) 16 -3Ј-specific complexes 1, 3, and 4 were competed by the oligodeoxyribonucleotide CGG8Tds (Fig. 3c) but not with other oligodeoxyribonucleotides.
Complex I and complexes 1-4 were destroyed after the addition of the anionic detergent sodium deoxycholate (Ն0.03%), whereas the nonionic detergents Triton X-100 or Tween 20 (Յ2%) did not have any effects on complex formation (data not shown). Complex disruption by sodium deoxycholate was reversed in the presence of 0.6% Nonidet P-40. Although it cannot be ruled out that sodium deoxycholate as an anionic detergent affects protein-DNA interaction, the sodium deoxycholate sensitivity of the binding of CGGBP(s) to the 5Ј-d(CGG) 17 -3Ј repeat and the reversal by Nonidet P-40 suggest the involvement of protein-protein interactions in complex formation (38).

CGGBP(s) Do Not Bind to the Fully Methylated Trinucleotide
Repeat-The results of experiments with crude nuclear extracts from HeLa cells suggested methylation sensitivity of proteins binding to the 5Ј-d(CGG) n -3Ј repeat (26). In order to investigate this problem further, oligodeoxyribonucleotides, which contained partly or fully methylated trinucleotide repeats, were used as binding probes or in competition experi-

FIG. 2. Binding of nuclear proteins isolated from various cell lines to the double-stranded trinucleotide repeat 5-d(CGG) 17 -3.
Crude nuclear extract (0.5-2 g) was incubated with the oligodeoxyribonucleotide (CGG) 17 ds in the presence of unspecific DNA. a, nuclear proteins isolated from a variety of human cell lines and human primary lymphocytes gave rise to the formation of one major complex I (cI). The same complex was observed with proteins isolated from various mammalian cell lines (b). Different complexes were detected with extracts from the fish cell line FHM and the insect cell line SF21, whereas no complex was detected with extracts from BHK21 cells grown in suspension. Experimental details were outlined in the text under "Experimental Procedures." cI indicates the position of the specific complex I. ments (Fig. 3, a and b). The completely methylated oligodeoxyribonucleotide (MGG) 17 ds and the partly methylated oligodeoxyribonucleotides 8MCGGds and 4MCGGds (nucleotide sequences see Table I) were synthesized by incorporating 5-methyldeoxycytidine instead of C during chemical synthesis. Only weak competition for the formation of complex I (cI) was observed when the completely methylated double-stranded oligodeoxyribonucleotide (MGG) 17 ds was added (Fig. 3a). Moreover, only proteins from crude nuclear extracts were capable of forming complexes with the methylated oligodeoxyribonucleotide (MGG) 17 ds (Fig. 3b, lanes 5-8). These complexes MI to MIII were not formed with proteins from fractions enriched for CGGBP(s) (see below and Fig. 3b, lanes 3 and 4). The formation of complex MIII was weakly competed by the unmethylated oligodeoxyribonucleotide (CGG) 17 ds, complexes MI and MIII were not formed in the presence of (MGG) 17 ds as competitor (Fig. 3b). In contrast, partly methylated oligodeoxyribonucleotides 8MCGGds and 4MCGGds formed the same complex I with crude nuclear extracts and purified CGGBP(s) as found with the unmethylated counterpart (CGG) 17 ds (data not shown). These findings indicated methylation sensitivity of CGGBP(s). The binding of nuclear proteins to the fully methylated oligodeoxyribonucleotide (MGG) 17 ds might be due to   5-8). c, DNA fragment 198ds contained the trinucleotide repeat 5Ј-d(CGG) 16 -3Ј flanked by genomic sequences of the 5Ј-untranslated region from the human FMR1 gene. In binding experiments, it gave rise to the specific complexes 1, 3 and 4 (c1, c3, and c4). Their formation was competed only by oligodeoxyribonucleotides of the general structure (CGGNGG) 8 CGGds (with n ϭ T or C). Complex 3 was not always detectable. Double-stranded competitor oligodeoxyribonucleotides were used at a 300 t-fold excess over the double-stranded binding fragment (2 fmol). Sequences of oligodeoxyribonucleotides and a summary of competition experiments were described in Tables I and II, respectively. proteins that interacted specifically with highly methylated DNA sequences (40,41).
It is concluded that proteins in nuclear extracts from primary human cells, from established human cell lines, and from several mammalian as well as from some nonmammalian cells form a specific complex with the synthetic double-stranded oligodeoxyribonucleotides (CGG) n ds, with 12 Յ n Յ 17. The oligodeoxyribonucleotide (CGG) 8 ds suffices for weak complex formation. The authentic DNA fragments 248ds, 198ds, or 126ds from the 5Ј-UTR of the human FMR1 gene can also form at least one 5Ј-d(CGG) 16 -3Јds-specific complex and additional, probably less specific complexes. Some of the more complicated EMSA patterns (Fig. 3c) might be accounted for by additional complex formation with nucleotide sequences that flank the 5Ј-d(CGG) 16 -3Ј repeat. Modifications of the specific 5Јd(CGG) 17 -3Јds sequence can be tolerated for its efficiency in competition experiments when exchanges of the C are limited to 8 and to the pyrimidines T or 5-methyldeoxycytidine. CGGBP(s) do not bind to the fully methylated trinucleotide repeat sequence. The ubiquitous expression of CGGBP(s) points to an important function of these proteins. This binding activity seems to be highly conserved, since similar proteins have been found in extracts from nonmammalian fish or insect cells.
Binding of Nuclear Proteins from Human Cells to the Singlestranded Oligodeoxyribonucleotides (CGG) 17 ss and (CCG) 17 ss Sequences Is Unspecific-Several reports suggested that single-stranded oligodeoxyribonucleotides 5Ј-d(CGG) n -3Ј and 5Јd(CCG) n -3Ј (n Ն 4) might adopt unusual structures in vitro (39,42). In fact, these oligodeoxyribonucleotides exhibited abnormally high electrophoretic mobility in polyacrylamide gels (data not shown). We therefore examined these oligodeoxyribonucleotides for their capacity to bind nuclear proteins from human cells. The oligodeoxyribonucleotide (CCG) 17 ss led to the formation of several complexes that could, however, be prevented by competition with single-stranded oligodeoxyribonucleotides of the general sequence 5Ј-(CSGCSK)-3Ј (S could be G or C and K could be G or T), but not with double-stranded oligodeoxyribonucleotides. The oligodeoxyribonucleotide (CGG) 17 ss did not give rise to any specific complex at all. It is therefore likely that the generation of complexes between nuclear proteins and the single-stranded repeat sequences is rather unspecific and probably due to a single-strand binding protein.
The Human GC Box Binding Transcription Factor Sp1 Is Not Part of the CGGBP(s)⅐(CGG) 17 ds Complex-A possible candidate protein for complex formation with the double-stranded 5Ј-d(CGG) 17 -3Ј repeat was the transcription factor Sp1, which recognized the consensus sequence 5Ј-dGGGCGG-3Ј (43). Therefore, an oligodeoxyribonucleotide Sp1ds containing the Sp1 binding sequence (Table I) was tested for its capacity to compete for protein binding to the 5Ј-d(CGG) 17 -3Ј repeat. It failed to function as a specific competitor (Fig. 3a).
In addition, we tried to assess the participation of Sp1 in the formation of the CGGBP(s)-(CGG) 17 ds complex by testing the effect of an anti-Sp1 monoclonal antibody on complex formation. This antibody did not affect complex formation (data not shown).
It is therefore concluded, that the transcription factor Sp1 is not part of the CGGBP(s)⅐(CGG) 17 ds complex. In addition, putative Sp1 binding sites located in the 3Ј-flanking region of the genomic 5Ј-d(CGG) 16 -3Ј repeat are not bound by Sp1, since the antibody against this factor did not affect the formation of any complex formed with the authentic 198ds fragment (data not shown).
Partial Purification of a Nuclear Protein (p20) Associated with the Binding to the Double-stranded 5Ј-d(CGG) 17 -3Ј Repeat-CGGBP(s) participating in complex I formation were isolated from HeLa nuclear extracts by the purification scheme outlined in Fig. 4. Nuclear extracts were prepared from 2 ϫ 10 9 HeLa cells, and the proteins were first fractionated by anionexchange chromatography (Fig. 5a). Protein binding activity to the double-stranded oligodeoxyribonucleotide (CGG) 17 ds was recovered in the flow-through designated as fraction I (Figs. 4 and 5a). About 60% of unrelated proteins and nucleic acids from the nuclear extracts were eliminated in this purification step. Fraction I was then incubated in a batch procedure with the double-stranded oligodeoxyribonucleotides CGG8Ads or (CAG) 17 ds coupled to Sepharose beads to remove proteins that bound unspecifically to DNA of similar structure (Fig. 4). CGGBP(s) were recovered almost quantitatively in the supernatant. This material was designated as fraction II. Fraction II was subsequently adsorbed to a (CGG) 17 ds-Sepharose matrix, and active fractions (fraction III) were eluted with Ͼ300 mM NaCl (Fig. 5b). After a second passage of fraction III over the (CGG) 17 ds-matrix, a major band of 20 kDa was detected in the active fraction IV (Fig. 5, b and c) by SDS-polyacrylamide gel electrophoresis followed by silver staining. The 20 kDa band was accompanied by an additional faint band of 120 kDa. In order to determine which of the two bands was responsible for specific (CGG) 17 ds binding, proteins of fraction I were bound to the mutated CGG8Ads-matrix (Fig. 4, dashed line). The material was washed and eluted as described above. Fraction IIIЈ FIG. 4. Purification scheme for the isolation of CGGBP(s) from HeLa nuclear extracts. Details of the purification procedure were described in the text and under "Experimental Procedures." eluting with buffer E750 did not show (CCG) 17 ds-binding activity (Fig. 5b). Analyses by SDS-polyacrylamide gel electrophoresis followed by silver staining revealed that this fraction contained several bands at 120, 70, and 55 kDa. However, a band around 20 kDa was not detected (Fig. 5c).
It is concluded that the protein p20 is involved in the formation of complex I and also of complex 4 established with the repetitive oligodeoxyribonucleotide (CGG) 17 ds and the authentic DNA fragment 198ds, respectively. However, participation of additional proteins in complex I and complex 4 cannot be ruled out, since their amounts might be below the detection limit of silver staining. DISCUSSION This research has been initiated on the premise that the size stability of trinucleotide repeats in the human genome and their controlled replication may be regulated by factors that are encoded at chromosomal sites far remote from the locus of the trinucleotide repeats, e.g. of the FRAXA location on Xq27.3 in the instance of the fragile X syndrome (13). Alterations in such regulatory proteins might be implicated in eliciting the repeat expansions that are causally related to a number of serious genetic diseases in humans. In addition, it needs be investigated whether the trinucleotide repeat itself might influence the regulation of the expression of adjacent genes. FIG. 5. Isolation of a nuclear protein (p20) from HeLa cells involved in binding to the double-stranded trinucleotide repeat 5-d(CGG) 17 -3. a, nuclear proteins were separated by anion-exchange chromatography (Resource Q). CGGBP(s) were detected in the flowthrough, whereas accompanying proteins and nucleic acids eluted at higher salt concentrations. The inserts showed the results of EMSA experiments with the individual fractions. Only complex I was shown. b, fraction I was separated by DNA affinity chromatography as outlined in Fig. 4. (CGG) 17 ds-binding activity was detected in fractions III and IV, eluting from the specific DNA affinity matrix (CGG) 17 ds-Sepharose at high salt concentration after a first and a second loading, respectively (left panels). Almost all (CGG) 17 ds-binding activity was found in the flow-through (fraction II) when the unspecific DNA affinity matrix CGG8Ads-Sepharose was used (right panel). c, proteins in fractions I-IV were separated by SDS-polyacrylamide gel electrophoresis (left panel). After silver staining, fraction IV gave rise to one dominant band with an apparent molecular mass of 20 kDa (p20) and a band at 120 kDa (left panel, lane 6). The band at 20 kDa was not present in fractions IIIЈ eluted with high salt isolated from the unspecific DNA affinity matrix CGG8Ads-Sepharose (right panel, lane 6), whereas it was detectable in high salt eluates (fraction III) from the specific DNA affinity matrix (CGG) 17 ds- Sepharose (right panel, lanes 3 and 4). M, molecular mass (kDa) markers.
Whatever the ultimate mechanisms underlying these striking trinucleotide repeat amplifications or the function of the repeat itself may turn out to be, we have considered it interesting to study cellular proteins that can bind specifically to these sequences. The 5Ј-d(CGG) n -3Ј repeat in the 5Ј-untranslated region of the human FMR1 gene has been chosen as a system of considerable theoretical and medical importance.
We have partly purified a protein that is involved in specific binding to the double-stranded form of the synthetic 5Јd(CGG) 17 -3Ј repeat and its naturally occurring counterpart in the 5Ј-regulatory region of the human FMR1 gene. Further experiments will be focused on the isolation of a cDNA encoding this protein and on elucidating its function. Whether additional proteins are involved in complex I formation has to be investigated. However, the GC box binding protein Sp1 (43) does not participate in CGGBP(s)-(CGG) 17 ds complex formation. This specific complex is sensitive to sodium deoxycholate treatment, and this sensitivity can be abrogated by sufficient concentrations of the nonionic detergent Nonidet P-40. This finding is indicative of a complex in which more than one protein is involved and which might be based in part on protein-protein interactions.
The protein-DNA complex investigated responds to specific 5Ј-d(CG)-3Ј methylation in the repeat sequences. This observation lends further credence to the biological significance of this complex formation since it has been demonstrated that in patients with the fragile X syndrome, the repeat sequence is hypermethylated (14 -16). The biochemical functions of the protein(s) actually contained in the complex require further detailed analyses.