Transcriptional Properties of RNA Polymerase II within Triplet Repeat-containing DNA from the Human Myotonic Dystrophy and Fragile X Loci*

Expansion of a (CTG) n segment within the 3′-untranslated region of the myotonic dystrophy protein kinase gene alters mRNA production. The inherent ability of RNA polymerase II to transcribe (CTG)17–255 tracts corresponding to DNA from normal, unstable, and affected individuals, and the normal (CGG)54 fragile X repeat tract, was analyzed using a synchronized in vitro transcription system. Core RNA polymerase II transcribed all repeat units irrespective of repeat length or orientation. However, approximately 50% of polymerases transiently halted transcription (with a half-life of approximately 10 ± 1 s) within the first and second CTG repeat unit and a more transient barrier to elongation was observed roughly centered within repeats 6–9. Transcription within the remainder of the CTG tracts and within the CCG, CGG, and CAG tracts appeared uniform with average transcription rates of 170, 250, 300, and 410 nucleotides/min, respectively. These differences correlated with changes in the sequence-specific transient pausing pattern within the CNG repeat tracts; individual incorporation rates were slower after incorporation of pyrimidine residues. Unexpectedly, approximately 4% of the run-off transcripts were, depending on the repeat sequence, either 15 or 18 nucleotides longer than expected. However, these products were not produced by transcriptional slippage within the repeat tract.

A variety of human diseases and several chromosomal fragile sites have been associated with the expansion of CTG, CAG, CGG, and CCG triplet repeat tracts within the respective genes (reviewed in Ref. 1). In most CAG triplet repeat-associated diseases, the repeat tracts are located within the translated region of the gene and encode a polyglutamine tract, which upon expansion presumably alters protein function (see Ref. 1 for review). In fragile X syndrome (2), both expansion and methylation of the CGG repeat within the 5Ј-untranslated region of the fragile X mental retardation 1 (FMR1) gene are coincident with transcriptional suppression (3)(4)(5). The CCG repeat is associated with fragile site XE mental retardation, but the location of the repeat tract with respect to the gene has not been determined (see Ref. 1 for review).
The CTG repeat of myotonic dystrophy (DM) 1 is located in the 3Ј-untranslated region (UTR) of the DM protein kinase (DMPK) gene (6), and the level of DMPK mRNA within tissues or cell lines derived from affected individuals varies (7)(8)(9). Krahe et al. (10) demonstrated that equal levels of pre-mRNA encoding up to the 3Ј-UTR (CTG) n repeat region were produced from the wild type and affected alleles from a spectrum of skeletal muscle and cell lines derived from heterozygous DM patients. Moreover, processed mRNA levels from the affected alleles were reduced in inverse proportion to the expansion size relative to the wild type allele. It has also been reported that the chromatin structure within the expanded triplet repeat and adjacent downstream region (which includes the polyadenylation site) was maintained in a more condensed chromatin configuration within affected DMPK alleles of three unrelated individuals (11). Thus, it has been proposed that CTG expansion may decrease the efficiency of transcription by RNA polymerase II and/or moderate a downstream mRNA processing step(s). In addition, the expanded repeats may also influence the nuclear retention of properly processed DMPK mRNA (12,13).
By using a synchronized in vitro transcription system, we show that CTG expansion does not alter the inherent transcriptional properties of RNA polymerase II. However, RNA polymerase transiently adopts an elongation-incompetent configuration at the entrance to the repeat tract and transcribes (CTG) n sequences significantly slower than either random sequence or the other triplet repeat bearing templates. These and other salient features of transcription through the triplet repeat templates will be discussed in terms of the biochemical properties of the elongating RNA polymerase II ternary complex. We also consider the biological relevance of an intrinsic pause site within the 3Ј-UTR of the DMPK locus as it might relate to triplet repeat expansion and the disease state.

EXPERIMENTAL PROCEDURES
Materials-All fast protein liquid chromatography-grade non-radioactive nucleotides were obtained from Amersham Pharmacia Biotech.
Construction of pML Templates Containing Triplet Repeat DNA-To test for the effects of triplet repeat sequences on promoter-initiated RNA polymerase II transcription, a slightly modified adenoviral 2 major late (ML) promoter was inserted either upstream or downstream of the two different triplet repeat tracts. For this, pML20 (14) was digested with EcoRI and BamHI to liberate a fragment containing the ML promoter from Ϫ170 to ϩ35 (relative to the ϩ1 transcriptional start site). This fragment was cloned directly into the EcoRI/BamHI-digested pUC19-based (CTG) n -containing plasmids (n ϭ 17, 50 or 255 Ϯ 5) (15) generating pML(CTG) 17 , pML(CTG) 62 , and pML(CTG) 262 Ϯ 5 , respectively. The repeat sequence in the template name designates the nontemplate strand. The differences in repeat tract lengths between parental-and promoter-containing templates is due to instability within the repeat tract during plasmid propagation in Escherichia coli (see Ref. 16; see also below). The triplet repeat-bearing templates also contain flanking non-repetitive human sequences, as indicated in Fig. 1.
Plasmids containing the major late promoter positioned to template off the CTG strand were generated by insertion of the EcoRI/BamHI fragment from pML20 into HindIII digested repeat-containing plasmids. For this, both the fragment and vector ends were made blunt by a fill-in reaction using DNA polymerase I Klenow fragment. Plasmids containing the ML promoter in the appropriate orientation were designated pML(CAG) 17 , pML(CAG) 50 , and pML(CAG) 255 Ϯ 5 .
Plasmid pFXA9.9.32 (17) contains the sequence 5Ј-(CGG) 9 AGG(CG-G)CGA(CGG) 7 AGG(CGG) 22 CGA(CGG) 11 (determined by dideoxy sequencing) derived from the FMR1 locus and will be referred to as (CGG) 54 to indicate the total length of the repeat tract. Note that previous reports used only MnlI digestions to determine interrupt pattern (17). pML(CGG) 54 and pML(CCG) 54 were constructed by inserting the repeat-containing fragment into pML(CTG) 62 or pML(CAG) 50 following digestion with XbaI and PstI to remove the myotonic dystrophy gene sequence. Prior to inserting the 270-bp HaeIII fragment containing the (CGG) 54 repeats and flanking human DNA, the vector ends were made blunt by supplying T4 DNA polymerase and dNTPs.
Assembly and Purification of Transcriptionally Stalled Ternary Complexes-In vitro transcription reactions were performed as described previously (18). Briefly, HeLa cell nuclear extracts were used to assemble RNA polymerase II preinitiation complexes which were then partially purified by Bio-Gel A-1.5m (Bio-Rad) gel filtration chromatography. Transcription was initiated at 27°C by supplying 100 l of the gel-purified preinitiation complex with 2 mM ApC, 10 M dATP, GTP, and UTP, and 1 M [␣-32 P] CTP. To ensure that the majority of ternary complexes had stalled prior to multiple AMP incorporation sites after incorporating UMP at position ϩ20 (U20 complexes), the reaction was chased with nonlabeled CTP and UTP (20 M). The U20 complex was purified after transient exposure to 1% Sarkosyl by another round of gel filtration using 30 mM Tris-HCl (pH 7.9), 62.5 mM KCl, 0.5 mM EDTA, and 1 mM dithiothreitol as running buffer. This procedure we termed "Sarkosyl rinsing" removes nonspecific DNA-binding proteins from the DNA template, presumably any elongation factors retained by the stalled ternary complex, and the divalent cations and NTPs/ApC required for initiation.
Transcript Elongation Reactions-Typically, elongation time course reactions were performed in sufficient initial volumes such that 30-l aliquots could be withdrawn at each time point. Prior to resuming transcription, the reaction mix was supplied 1 mM NTPs and equilibrated to 37°C for 3 min before the addition of MgCl 2 to 7.8 mM. The 30-l reactions were stopped at specific time points by addition to 70 l of Stop Mix (20 mM EDTA, 0.2% SDS and 0.2 mg/ml Proteinase K). Transcription products were processed and fractionated on a sequencing gel containing 7.8 M urea as described previously (18). The RNA markers used for mapping studies (Fig. 2D) were generated under limiting single nucleotide conditions (20 M), whereas the other three nucleotides were maintained at 1 mM. 30-l aliquots were removed at 15-s intervals from an initial reaction mix of 150 l and consolidated prior to nucleic acid purification.
RNase U2 Digestions-HindIII-linearized pML(CTG) 17 run-off transcription products were uniformly labeled by adding 52.4 l of Sarkosylrinsed U20 complex to a tube containing 90 l of [␣-P 32 ]CTP (800 Ci/mmol) that had been lyophilized to dryness and resuspended with 1.9 l of 30 mM (ATP, CTP, and UTP) and 1.7 l of 250 mM MgCl 2 . After a 20-min incubation at 37°C, unincorporated NTPs were removed by gel filtration through a G-50 mini spin-column (Amersham Pharmacia Biotech). Purified and precipitated RNAs were fractionated on a 7% (acrylamide/bisacrylamide (19:1)) sequencing gel (27 ϫ 36 cm) run until the xylene cyanol had migrated 30 cm. The appropriate transcripts were electroeluted from gel slices using autoclaved dialysis membranes and TBE buffer. To maximize RNA stability, the membranes were rinsed with TBE containing 16 units/ml RNAsin (Promega) and 5 mM dithiothreitol, and electroelution was performed with buffer that contained 0.2% SDS. The uniformly labeled RNAs were purified and ethanol-precipitated following the addition of 20 l of non-radiolabeled U20 complex. Each pellet was resuspended in 15 l of RNase U2 reaction buffer (16 mM sodium citrate (pH 3.5), 0.8 mM EDTA, and 3.5 M urea) and divided into 3-or 6-l aliquots. RNase U2 digestions were performed for 6 min at 55°C following the addition of 0.5 l of H 2 O to the 3-l reaction or 1 l of H 2 O containing either 1 or 2 units of RNase U2 to the 6-l reactions. After chilling 2 min on ice, 2 l of formamide dye mix was added, and the RNAs were immediately fractionated on a 10% sequencing gel.
Gel Analysis-Gels were analyzed using a Molecular Dynamics Phos-phorImager (Sunnyvale, CA) and exposed to Kodak X-AR film overnight with an intensifying screen (Corning, Lightning Plus). Elongation rates within the triplet repeat tracts were determined by monitoring the distribution of polymerase within the tract over time. The average length transcript at particular time points were measured by peak determination after area integration of each lane using ImageQuant software. Average transcript size was extrapolated by comparing the log of the migrational distance to a semi-logarithmic plot of DNA markers (length versus migrational distance) (SigmaPlot). The pausing efficiency at the strong pause sites at the entrance to the CTG repeat tract was determined at the 20-s elongation time point by determining the percentage of transcripts associated with paused complexes relative to total transcripts equal to or greater in length. The pause half-life of U-73 complexes was determined by monitoring the concentration of this RNA over time (100% being equal to the concentration at 20 s). The percentages of 15 and 18 nt longer than expected transcripts were calculated based on the total run-off transcription products that spanned from Ϫ20 to ϩ16, relative to the template strand end.

RESULTS
A synchronized in vitro transcription system (18) was used to examine the effects of various triplet repeat tracts on transcript elongation by RNA polymerase II. Synchrony was established by purification of promoter-initiated ternary complexes stalled early in elongation prior to the triplet repeat tract (see "Experimental Procedures"). The purification step removes Mg 2ϩ and NTPs required for initiation and presumably any elongation factors that may be bound to the ternary complex. The purified "core" ternary complexes transcribe mixed sequence DNA at approximately 330 nt/min at 37°C and achieve a physiological elongation rate (1200 nt/min) when supplied elongation factor TFIIF (19). Accurate rate measurements are possible because the complexes resume elongation fairly synchronously after addition of Mg 2ϩ and NTPs. Since the radiolabeled NTP is removed during the final purification of stalled complexes, and subsequent elongation utilizes non-radiolabeled NTPs, the distribution and intensity of the elongation products directly correlates with the distribution of transcriptionally engaged polymerases along the template. The DNA templates are diagrammed in Fig. 1, with the beginning of the repeat tract indicated relative to the start of transcription. Templates pML(CTG) n and pML(CGG) 54 contain triplet repeats and flanking human sequences from the DMPK or FMR1 loci, respectively, in the transcriptional orientation found in vivo.
Characteristics of Transcription on CTG-containing Tem-plates Derived from the Human Myotonic Dystrophy Locus-The plasmids pML(CTG) 17 , pML(CTG) 62 , and pML(CTG) 262 Ϯ 5 contain human DNA-derived repeat lengths found in normal individuals or from individuals with mild or severe myotonic dystrophy, respectively. The purified HindIII-linearized templates containing RNA polymerase stalled primarily after UMP incorporation at template position ϩ20 (U20 complex) also contain a minor population of complexes bearing 19-, 21-, 22-, and 24-nt transcripts (Fig. 2, A-C, lanes 1). Transcription was resumed after a 3-min equilibration at 37°C in the presence of 1 mM NTPs by the addition of Mg 2ϩ ions, and the distribution of RNA polymerase was determined by analyzing the transcription products at 0.25, 0.5, 1, 2, and 5 min (Fig. 2, A-C, lanes [2][3][4][5][6]. The only apparent expansion-dependent difference was the production of discrete, approximately 210-and 266-nt shorter-than-expected transcripts in the pML(CTG) 62 (Fig. 2B, lanes 5 and 6) and pML(CTG) 262 Ϯ 5 (Fig. 2C, lane 6) reactions, respectively. Based on their approximate lengths and relative abundance, these products were probably run-off transcripts derived from templates that had undergone partial repeat tract deletion during plasmid propagation (data not shown, see "Experimental Procedures"). These data indicate that lengthening the CTG tract did not alter the transcriptional properties of RNA polymerase II. However, four distinctive features inherent to transcription of the pML(CTG) templates are worth noting. First, Ͼ50% of RNA polymerases encounter a transient barrier to elongation within the first two repeat units of the triplet repeat tract, irrespective of repeat length (Fig. 2, A-C, lanes [2][3][4][5]. Escape from the pause site within the first triplet repeat occurred with pseudo first-order kinetics with a half-life of 10 Ϯ 1 s. Escape from the second site appears to occur with similar kinetics; however, the half-life was not determined because escape by polymerases from the first site complicates the analysis. Polymerase also encountered a series of more transient barriers roughly centered within tracts 6 -9 (Fig. 2, A-C, lanes [2][3][4]. The pause sites were mapped by comparing the transcription products generated at 10-s intervals under saturating substrate concentrations (Fig. 2D, lanes [3][4][5] to pooled time course "sequencing" reactions performed under limiting GTP (Fig. 2D, lane 1) or limiting CTP (Fig. 2D, lane 2) concentrations. Under limiting CTP, the additional transcripts within the repeat tract result from ternary complexes that stall after GMP insertion prior to CMP incorporation sites. The first two pause sites mapped to position ϩU-73 and ϩU-76. Base hydrolysis ladders were used to confirm these assignments and were used to precisely map pML(CTG) 17 -and pML(CAG) 17 -derived transcripts through to the end of the triplet repeat tract (data not shown). DNA markers were included as approximate length references. The transient pausing patterns within the triplet repeat tracts are summarized in Fig. 7.
The second salient feature of transcription by RNA polymerase was the simple repetitive transient pausing pattern after UMP incorporation (Fig. 2D, compare lanes 2 and 3) throughout the triplet repeat tract (Fig. 2, A-C, lanes 3 and 4). Thus, within each repeat tract, GMP insertion was slow relative to CMP and UMP additions. Note that in contrast to the transient barriers observed at tract numbers 1, 2, and 6 -9, much shorter half-lives were observed on the pML(CTG) 17 template within repeats 3, 4, and 17 ( Fig. 2A, lane 3). Pausing was more uniform within tracts 11-16 and within the body of the expanded tracts within pML(CTG) 62 (Fig. 2B, lanes 3 and 4) and pML(CTG) 262 (Fig. 2C, lanes 3 and 4, and additional data not shown). In contrast, mixed sequence templates produce complex sequence-specific transient pausing patterns (18,19) evident in the distribution and intensities of RNAs generated during transcription prior to the entrance to the triplet repeat tract (Fig. 2, A-C, lanes 2; see also Fig. 3, A-C, lanes 2). A third feature of the transcription reaction was the approximately 170-nt/min rate of transcription through the bulk of the (CTG) 262 Ϯ 5 repeat tract ( Fig. 2C and additional data not shown). This rate is significantly less than the 330-nt/min observed through the mixed sequence pML20 template (19). The fourth, and particularly unusual, feature of the RNA distribution at the pML(CTG) 17 template end was that 3-4% of the total transcripts were approximately 15 nt longer than expected (see below).
Characteristics of Transcription on (CAG) n -containing Templates-RNA polymerase transcription on the triplet repeatbearing templates, pML(CAG) 17 , pML(CAG) 50 , and pML(CAG) 255 , is shown in Fig. 3, A-C, respectively. As with the pML(CTG) n templates, increasing lengths of the (CAG) triplet repeat tract did not alter the ability of RNA polymerase to transcribe this trinucleotide sequence. The transient barrier to The template name designates the repeat tract sequence in the nontemplate strand. Arrows indicate the start site and direction of transcription for each template, and the length, in nucleotides, from the start site to the beginning of the repeat tract is shown. Restriction sites are as follows: E, EcoRI; B, BamHI; X, XbaI; P, PstI; S, SphI; H, HindIII; all but the repeat tract (shaded box) are to scale. C, for each template the number of triplet repeats (n), tract length in bp, and expected run-off product lengths (nt) are indicated. elongation at the entrance to the CAG repeat tract was much less pronounced than that observed at the entrance to the CTG tract. However, a transient barrier with an efficiency and halflife comparable to the barrier at the CTG junction was observed at position ϩ51 (underlined) within the pUC19 polylinker sequence 5Ј-GCAUGC. Transcription within the CAG repeat tract also generated a repetitive transient pausing pattern, with pausing occurring after CMP incorporation (high resolution mapping data not shown, but summarized in Fig. 7A). In addition, the rate of transcription through (CAG) 255 Ϯ 5 was estimated at 400 nt/min ( Fig. 3C and additional data not shown), slightly faster than the rate observed for mixed sequence DNA, and 2.5 times the rate observed through the CTG repeats. The approximately 288-nt product in Fig. 3C (lanes 5 and 6) was probably synthesized from the pML(CAG) 44 deletion product within the pML(CAG) 255 preparation (see "Experimental Procedures"). As observed with pML(CTG) 17 , a small percentage of longer than expected transcripts were generated on the pML(CAG) 17 template (higher resolution gel shown below).
Characteristics of Transcription through the (CGG) 54 and (CCG) 54 Repeats from the Human Fragile X Locus-RNA polymerase II also efficiently transcribed both (CGG) 54 (Fig. 4A) and (CCG) 54 (Fig. 4B) repeat tracts at average rates of 300 and 250 nt/min, respectively. The transient pausing patterns within these triplet repeats were more complex than those observed on the CTG or CAG repeat-containing templates (summarized in Fig. 7B). RNA polymerase II transiently paused after incorporation of CMP and the following GMP within (CGG) 54 and both CMPs on the (CCG) 54 template. Based on a comparison of transcript intensities at these pause sites within the first 15-16 repeat tracts (high resolution mapping not shown) ternary complex dwell times on either template were longest after CMP incorporation prior to GMP insertion. The CCG repeat tract ((CCG) 11 TCG(CCG) 22 CCT(CCG) 7 TCG-(CCG)CCT(CCG) 9 ) contains TCG and CCT interruptions.  6) following Mg 2ϩ addition. One-third of each reaction was resolved on a 10% (acrylamide/bisacrylamide (29:1)) sequencing gel stopped after the bromphenol blue dye maker had run 37 cm. DNA markers (C, lane M) have been previously described (19), and lengths are indicated at right. Regions corresponding to the adenovirus major late gene initial transcribed region Consistent with the notion that transient pause sites are influenced by underlying DNA template sequence, the pausing pattern was also altered at the TCG and CCT interruptions within the CCG tract (see for instance the TCG interruption; Fig. 4B; lane 3 at 97 nt) and within the AGG interruption within the GGC tract (high resolution data not shown, but summarized in Fig. 7B). Transcription into the FMR1 repeat tract from either direction did not significantly alter the transcription properties of the polymerase. However, a moderate transient barrier to elongation occurred within the pML(CGG) 54 template after UMP incorporation at position ϩ94; underlined within the flanking human sequence 5Ј-CGCUGC (Fig. 4A, lanes 2-4). As expected, a transient barrier was also observed at position ϩ51 within pML(CCG) 54 (Fig. 4B, lanes 2-5) since the initial transcribed regions through position ϩ54 are identical within the pML(CCG) 54 and pML(CAG) n templates (see Fig. 3).
Longer Than Expected Products Were Produced on HindIIIlinearized pML(CTG) 17 and EcoRI-digested pML(CAG) 17 Templates-Upon inspection of the run-off products generated from linearized pML(CTG) 17 and pML(CAG) 17 templates, we observed slightly shorter and longer than expected run-off transcripts. These run-off products are more clearly resolved in the autoradiograph shown in Fig. 5 (lanes 1 and 4, respectively). Based on previous studies, RNA polymerase was expected to efficiently transcribe to within 1-2 bp of the template strand end on linear templates bearing 5Ј-overhangs (20). This population of transcripts is designated as "Run-off." Consistent with this assignment (and also as previously reported (20)), addition of Sarkosyl to the elongation reaction caused the ternary complexes to halt transcription 3-7 nt prior to the template end (Fig. 5, lanes 2 and 5). Note that the low levels of run-off transcript in these reactions were due to ternary complex disruption during preincubation with the detergent at 37°C prior to resumption of elongation (i.e. the majority of stalled U20 complexes failed to resume transcription, data not shown). In addition, both templates generated slightly shorter than expected products (roughly Ϫ12 and Ϫ20 nt on pML(CTG) 17 and Ϫ8 nt on pML(CAG) 17 , relative to full-length) whose accumulation was suppressed in reactions supplemented with elongation factor SII (Fig. 5, lanes 3 and 6). Lengths were extrapolated based on a DNA marker-derived standard curve and the expected run-off length as reference. These data are consistent with previous work demonstrating that polymerase may adopt a transcriptionally inactive (arrested) configuration at certain template sequences or as it approaches the ends of linear templates (see Ref. 20 and references therein) and that the conversion back to a transcriptionally competent state is facilitated by elongation factor SII (14,(21)(22)(23).
An unexpected feature of the reaction was that 4% of the total run-off transcripts generated on either template were approximately 15 or 18 nt longer than expected (Fig. 5, lanes 1  and 4). These RNAs were absent in reactions containing the elongation factor SII. SII had essentially no effect on the distribution of the much longer "nonspecific" transcripts, most likely produced by transcription from undigested templates and/or by non-promoter initiation events (Fig. 5, lanes 3 and 6). It is conceivable that the production of the 15 or 18 nt longer than expected products was the result of transcriptional arrest by the "nonspecific" ternary complexes. However, we consider this explanation unlikely since transcripts of these same lengths were not generated during transcription on circular templates (data not shown) or on similarly digested parental (20) or longer repeat tract-containing templates (see Figs. 2 and  3, B and C). Moreover, run-off products generated on pML(CTG) 62 also contained a 15-nt longer than expected product that was absent in reactions containing SII (data not shown).
Since no oversized products longer than 2-4 nt were detected in previous work on a variety of template ends (20), we considered the possibility that the 15-and 18-nt extensions occurred during transcription of the triplet repeat sequences. Several experiments were performed to determine if the pML(CTG) 17derived longer than expected transcripts were produced by transcriptional slippage within the triplet repeats. First, we explored whether increasing RNA/DNA hybrid strength during transcription of a triplet repeat sequence would alleviate production of the longer than expected product. A time course transcription study is shown where 5-Br-UTP (Fig. 6A, lanes   4 -9) was substituted for UTP (Fig. 6, lanes 1-3). In particular, 5-Br-UMP incorporation would increase the strength of the RNA/DNA transcript terminal base pair within transiently paused complexes. Unexpectedly, incorporation of 5-Br-UMP greatly enhanced the inherent relative dwell times of RNA polymerase within the repeat tract sequence. Even after 10 min, most ternary complexes contained 5-Br-UMP terminal RNAs and were located within triplet repeat numbers 1, 2, and 6 -9 (Fig. 6A, lane 9). A similar time course performed on the mixed sequence pML20 template (14) demonstrated that 5-Br-UTP could be incorporated into long RNAs (Fig. 6A, lanes  10-15). Using dC-tail-initiated RNA polymerase II and an unrelated mixed sequence DNA template, it has previously been demonstrated that 5-Br-UTP substitution reduces the average elongation rate more than 2-fold (24). In agreement, a similar fold decrease was observed on the pML20 template. The average transcript length after 2 min (Fig. 6A, lane 13) was approximately 250 nt, whereas the average NTP incorporation rate was approximately 300 nt/min (data not shown). The transcription rate was also reduced on the pML(CTG) 17 template (Fig.  6A, compare lanes 1 and 4). Note also that the transient pausing pattern within the region of mixed sequence template upstream of the triplet repeat tract (compare Fig. 2C, lane 2 and Fig. 6A, lanes 4 -6) and within the pML20 template (1 mM NTP time course not shown) was altered in 5-Br-UTP substitution reactions. Essentially superimposed over the expected transient pausing pattern upstream of the CTG tract were four additional long-lived complexes at template location ϩ35, 49, 58, and 70 (Fig. 6A, lanes 4 -6). However, these complexes were not stalled prior to UMP incorporation sites, which might have been expected if 5-Br-UTP was inefficiently utilized as a substrate. Rather, enhanced pausing was observed at a subset of UMP incorporation sites (U-35, U-58, and U-70) and at position A-49 prior to a GMP incorporation site. Thus, there is no simple explanation for the observed decrease in average elongation rates in the 5-Br-UTP substitution experiment. Moreover, the greatly increased dwell times of complexes within the first FIG. 5. Elongation factor SII abrogated formation of the longer than expected run-off transcripts produced on the pML(CTG) 17 and pML(CAG) 17 templates. Three-minute elongation reactions were performed (as described in the legend to Fig. 2) on HindIII-linearized pML(CTG) 17 (lanes 1-3) or EcoRI-linearized pML(CAG) 17 (lanes 4 -6) templates. The addition of Sarkosyl to 0.3% (lanes 2 and 5) or 4 g/ml human recombinant elongation factor SII (lanes 3 and 6) was made prior to equilibrating the reaction mixture to 37°C. Transcripts were purified and then concentrated by ethanol precipitation. One-third of each reaction was resolved on an 8% (acrylamide/bisacrylamide (19:1) To test directly if the longer than expected product was generated by slip-mispairing within the (CTG) 17 repeat tract, purified uniformly labeled run-off (Fig. 6B, lane 1) and 15-nt longer than expected (Fig. 6B, lane 4) transcripts were subjected to partial RNase U2 digestion (Fig. 6B, lanes 2, 3, 5, and  6, respectively). The purine-specific ribonuclease U2 cleaves ApN linkages at a faster rate than GpN linkages (25). Thus, higher relative concentrations of a 59-nt A-free fragment (boldface within the sequence 5Ј-AAUG(CUG) 17 GGGGGAU) should be generated under appropriate digestion conditions. Slippage within the triplet repeats would result in a longer A-free fragment. Essentially identical nuclease digestion patterns, with preferential release of an appropriately sized fragment (indicated as A-free), were observed after U2 partial digests of "full-length" and oversized product. These data support the notion that the oversized transcripts were produced by promoter-initiated transcription complexes and demonstrated that slippage within the (CTG) 17 triplet repeat tract cannot account for the production of the 15-nt longer than expected RNA. DISCUSSION We have examined the ability of RNA polymerase II to transcribe a DNA segment from the 3Ј-untranslated region of the myotonic dystrophy protein kinase (DMPK) locus corresponding normal (CTG) 17 , unstable (CTG) 50 , and affected (CTG) 255 individuals. We demonstrate that triplet repeat length had no effect on the intrinsic capacity of core RNA polymerase II to transcribe these DNA segments. However, polymerase becomes transiently blocked in elongation within the 1st, 2nd, and 6th to 9th triplet repeat units of the DMPK gene sequence. These DNA sequences must therefore contain intrinsic signals that transiently induce polymerase into an elongation-incompetent configuration. During the course of our studies, we also examined transcription on the complementary CAG templates, as FIG. 6. Longer than expected transcription products on the pML(CTG) 17 templates were not generated by repeat tract slippage events. A, transcription reactions were performed on HindIII-linearized pML(CTG) 17 (lanes 1-9) and pML20 (lanes 10 -15) templates. Transcription reaction time courses containing 1 mM NTPs (lanes 1-3) or 1 mM (ATP, CTP, and GTP) and 2 mM 5 Br-UTP (lanes 4 -15) were stopped at the indicated times, and the RNAs were fractionated as in Fig. 2A. B, purified uniformly labeled pML(CTG) 17 run-off (lane 1-3) and longer than expected (lane 4 -6) transcripts were partially digested with 1 (lanes 3 and 6) or 2 (lanes 2 and 5) units of RNase U2 as described under "Experimental Procedures." RNAs were immediately fractionated as described in Fig. 2A. Lanes 7 and 8 are a longer exposure of lanes 5 and 6. The position of the A-free transcript segment is indicated at left. well as templates containing, in either orientation, the CGG repeat tract derived from the normal human fragile X locus. These repeat tracts, which are associated with other human diseases (see Introduction), appear to be efficiently transcribed at the respective loci in vivo (see Ref. 1 for review), and we show that RNA polymerase II also efficiently transcribes them in vitro. Moreover, the average rate of transcription through the bulk of the triplet repeat tracts varied by more than 2-fold (170 -410 nt/min). Based on differences in the sequence-specific transient pausing pattern within the CNG tracts, slower average transcription rates correlated with slowed polymerization rates by complexes bearing transcript 3Ј-terminal pyrimidine residues.
Transcriptional Pausing-Based primarily on the biochemical properties of prokaryotic RNA polymerase, RNA polymerase may utilize a specialized multi-step process to enter and exit its normal highly processive transcription mode (see Refs. 26 and 27 for recent reviews). During initiation, this "discontinuous" mode apparently represents a transitional pathway required to establish appropriate nucleic acid and nucleoprotein interactions that may include formation of an RNA/DNA heteroduplex, binding of the nascent RNA to a product-binding site(s) within the enzyme, and bilateral template interactions upstream and downstream of the polymerization site. During normal elongation, alterations in these interactions influence individual NTP incorporation rates (discussed further below) and can ultimately collaborate to modulate the conversion of an efficiently elongating complex to one that is prone to pause, arrest, or terminate transcription.
Based on the evolutionary conservation among DNA-dependent RNA polymerases (28), it is reasonable to suppose that transcript elongation by RNA polymerase II shares similar complexities. Indeed, both in vitro (see for instance Refs. 18,19,21,22,[29][30][31][32][33] and in vivo (see Ref. 34 and references therein) systems have demonstrated that certain template sequence/structures affect the transcriptional competency of RNA polymerase II. The best-studied class of sites contain template-strand poly(A) sequences found within the early transcribed portion of the human histone H3.3 (29,33) and c-myc loci (30) and the adenovirus major late gene (see for instance Ref. 35 and references therein). In vitro, transcription through these sequences induces 15-50% (depending on the site) of the transcribing molecules to adopt an arrested configuration. The determinants that influence this conversion include template sequences upstream and downstream of the site of transcription cessation (29,36), the poly(U) transcript terminus (37,38), the capacity of the nascent transcript to form secondary structure (35,38), and the time polymerase dwells at the site(s) of arrest (22,37,39). The biochemical properties of arrested RNA polymerase II complexes (31, 37, 40 -43) are consistent with either a conformationally flexible "inchworm" model for transcript elongation by RNA polymerase, originally proposed by Chamberlin (44), or the more rigid sliding-clamp variant (35,(45)(46)(47). In either model, the arrested configuration represents ternary complexes whose catalytic center is inappropriately positioned over the nascent RNA 7-18 nt upstream of the 3Ј-end (40). Arrested complexes are extremely stable and have a low but detectable probability of spontaneously resuming transcription (48). Rapid resumption of transcription is facilitated by elongation factor SII (37,42), and there is compelling evidence that SII facilitates the repositioned catalytic site to hydrolyze the underlying transcript (40,42,49) thereby creating a transcript 3Ј-end from which elongation resumes. In contrast, SII-facilitated cleavage by transcriptionally competent complexes occurs near the transcript 3-end principally liberating dinucleotide cleavage products (14, 37, 50, 51).
The most dramatic feature of transcription through the 3Јregion of the DMPK gene was that, irrespective of repeat length, approximately 50% of the transcribing polymerases exhibited a greatly increased dwell time after UMP incorporation within the first two 5Ј-CTG repeat units. However, unlike transcriptional arrest, these complexes efficiently resumed transcription. The pausing half-life within the first repeat tract was 10 Ϯ 1 s, in contrast to the rate of individual polymerization steps within the body of the 5Ј-CTG repeat which occurred, on average, every 0.35 s. More transient blockades occurred after UMP incorporation roughly centered within repeat tracts 6 -9. Moreover, the SII-facilitated cleavage increment of the transiently blocked complexes suggested that their transcript 3Ј-end remained associated with the catalytic center. 2 Thus, these complexes represent a different class of complex than those arrested within intrinsic arrest sites.
At this point we can say very little about the mechanism by which the triplet repeat junction induces the paused configuration. Based on a number of criteria, we can speculate that the pause determinants may be akin to the multipartite pausing signals recognized by bacterial RNA polymerase within the leader regions of amino acid biosynthetic operons regulated by attenuation (see Ref. 52 and references therein). Extensive analysis of the histidine operon pause site indicates that the multipartite pausing signals include template sequences at and surrounding the site of catalysis and involves the formation of a pause RNA stem-loop structure (see Refs. 52 and 53 and references therein). First, the role of the downstream template sequence must be considered, since triplet repeat tracts adopt an unusual, flexible DNA helix (15,54,55). However, alterations in helical structure are apparently not sufficient to induce the paused configuration, since polymerase efficiently traversed the repeat junctions within the 5Ј-CAG, -CCG and -CGG templates. Second, transcript secondary structure predictions within the paused pML(CTG) n -derived U-73 complexes indicated the potential for an extensive stem-loop structure properly positioned upstream of the pause site (data not shown). Since this structure is encoded, at least in part, by the flanking DMPK gene sequence, the potential for this structure exists in vivo. In addition, the triplet repeat encoded transcript is also predicted to adopt a stem-loop structure, which might influence pausing within tracts 6 -9. Third, the TG sequence at the pause site may also be an integral component in establishing the paused configuration. Pausing invariably occurs after pyrimidine incorporation prior to GMP insertion sites within the leader sequences of various prokaryotic amino acid biosynthetic operons regulated by attenuation (see Ref. 52 and references therein).
Functionally, we also demonstrated that extending the time polymerase dwells at the 5Ј-CTG junction pause site by limiting GTP concentrations in the reaction mixture increased both the pausing efficiency and half-life (Fig. 2D, compare lanes 1  and 2, and additional data not shown). Similarly, studies to define the determinants that influence pausing at the his operon limited the concentration of the nucleotide inserted after the pause site to increase the half-life of the otherwise unmeasurable paused complex (52). It is important to emphasize that at saturating NTP concentrations the pause frequency and half-life at the 5Ј-CTG junction were similar to those observed at the bacterial his pause site under limiting GTP conditions. This might reflect differences in the strength of the pause signal and/or the inherent difference in maximal in vitro elongation rates between prokaryotic RNA polymerase (35-45 nt/s (56)) and the core RNA polymerase II complexes (2.8 -5.5 nt/s) utilized in this study. An enhancement in pausing was also observed in 5-Br-UTP substituted reactions (Fig. 6A, lanes  4 -9). How 5-Br-UMP incorporation influences the pause halflife of complexes within the entrance to the triplet repeat tract is unclear. Our data suggest that 5-Br-UMP incorporation could enhance the pause half-life directly by increasing the time polymerase dwells at the pause site, since 5-Br-UMP incorporation increased the dwell time of polymerase at a subset of UMP insertion sites within the mixed sequence DNA template region upstream of the triplet repeat tract (compare Fig. 2, A-C, lanes 2, with Fig. 6A, lanes 4 -6). Alternatively, 5-Br-UTP might enhance the pause half-life by affecting nucleic acid and/or protein-nucleic acid interactions that occur upstream of the site of catalysis. For example, 5-Br-UMP incorporation might influence the rate of formation and/or stability of an RNA secondary structure involved in stabilizing the paused configuration. These possibilities are not mutually exclusive.
Based on the inherent transcriptional properties at the 5Ј-CTG junction and the preferential assembly of 5Ј-CTG repeat tracts into nucleosomes (57)(58)(59), the possibility remains that 5Ј-CTG expansion could alter the transcriptional potential of the DMPK 3Ј-UTR in vivo. If, for example, pausing were exacerbated by an expansion-dependent nucleosomal array, then the efficiency of subsequent transcription could be compromised if the paused configuration influenced (directly or indirectly) the ability of elongation factors to augment the efficiency of transcription (see Ref. 60 for review). Alternatively, continued mRNA processing may become inefficient or uncoupled, if the paused configuration made the ternary complex more prone to modifications, such as affecting the phosphorylation state of the C-terminal domain (see Refs. 61 and 62 and references therein).
Elongation Kinetics-Consistent with the notion that the ternary complex represents a complex network of protein/nucleic acid interactions, even during "normal" elongation, the kinetic parameters for addition of the different nucleotides varies depending on surrounding template sequence (56,(63)(64)(65)(66)(67). Because of the complex sequence variations within the surrounding template on mixed sequence templates, it is difficult to correlate directly the effect of nucleotide sequence on relative polymerization rates (63,66).
Within triplet repeat tracts, the nucleoprotein contacts within the ternary complex might be identical at each reiterated template location. This may account for the observed uniformity in the relative "polymerization" rates within the bulk of the homopolymeric tracts. Moreover, when transcribed in either orientation, the (CTG) (CAG) and (CCG) (CGG) repeats constitute a heteropolymeric (5Ј-CNG) template set (where n ϭ T, C, G, or A). Each of the 5Ј-CNG repeats generated a unique transient pausing pattern, and depending on the central base composition of T, C, G, and A, the average rates were 170, 250, 300, and 410 nt/min, respectively. Thus, we wondered whether the observed increase in average elongation rate was directly related to the polymerization rate of the central variable base.
Kinetic schemes devised to evaluate steady-state transcript rates generally assume that relative differences in sequential polymerization steps within a short template region are directly proportional to the relative intensities of transcripts within that region (the transient pausing pattern) as long as that region is centered within the bulk of transcriptionally engaged polymerases (note that the term polymerization does not differentiate between NMP insertion and translocation rates; see Ref. 63 for a detailed discussion). Based on this assumption, the polymerization rate of the central variable nucleotide within the heteropolymeric tracts did not correlate with the observed increase in average transcription rates. For example, transient pausing occurred after UMP incorporation within the 5Ј-CTG repeat tract and shifted to C-terminal complexes within the 5Ј-CAG template (transcriptional pause sites are summarized in Fig. 7). Thus, GMP incorporation was the slowest polymerization step within the 5Ј-CTG template, whereas AMP insertion was the slowest within the 5Ј-CAG template even though this triplet repeat sequence was transcribed at the fastest average rate. Furthermore, there was no simple correlation between the observed variation in average transcription rates and either the energetics of template unwinding (the 5Ј-CTG and 5Ј-CAG templates were derived from the same duplex DNA) or base-pairing strength between the template and the transcript 3Ј-end. There was, however, an inverse relationship between nontemplate strand pyrimidine content and average elongation rates. Moreover, the relative polymerization rates were consistently slower after pyrimidine incorporation. Assuming that the relative polymerization rates of complexes containing 3Ј-C or G-terminal RNAs are similar on the 5Ј-CTG and 5Ј-CAG templates, the 2.4-fold increase in average transcription rates could reflect a similar fold difference in the incorporation rate of GMP between complexes containing 3Ј-U or A-terminal RNAs. This simple interpretation is consistent with the observation that under limiting CTP conditions, the average rate of transcription on the 5Ј-CTG tract was reduced approximately 2-fold (data not shown) and the relative polymerization rates of complexes containing G-or U-terminal RNAs were approximately equal (Fig. 2D, lane 2).
Our results also suggest that the identity of the next base to be added influenced the polymerization rates. Within the 5Ј-CCG template, pausing was observed with both C-terminal complexes and was more prolonged prior to GMP insertion. On the 5Ј-CGG template, pausing occurred after incorporation of CMP and the following GMP; GMP insertion was slower with C-terminal ternary complexes. Although we cannot know how surrounding template and/or RNA sequence/structure specifically affect relative polymerization rates within the repeat tracts, two observations suggest that these relative rates were established primarily by the template sequence at or near the site of polymerization. First, in all resolvable cases, the triplet repeat pausing patterns were established within the first few repeat tracts and were maintained through to the last or penultimate tract. Second, transient pausing patterns were altered at sites within the repeat tracts bearing single base changes. For instance, the pausing pattern was altered within the 11th repeat unit of the pML(CCG) 54 template, the site of an in phase C G to T A transition (summarized in Fig. 7B).
Whereas the effects of template sequence on polymerization rates are complex, based on the average transcription rates within the 5Ј-CNG templates and the observed changes in transcriptional pausing patterns, we speculate that the individual polymerization rates within the 5Ј-CNG tracts increased as a function of the 3Ј-terminal base composition (U Ͻ Ͻ C Ͻ G Ͻ Ͻ A). One simple interpretation of these results is to assume that correct positioning of the transcript 3Ј-end within the catalytic center of polymerase is less energetically favored with pyrimidine-based residues. This interpretation is consistent with the observed effects of the fungal toxin ␣-amanitin on transcript elongation and truncation by RNA polymerase II (41,48,68) and the possibility that the toxin binds near a location normally occupied by the transcript 3Ј-end (see Ref. 48 and references therein). In particular, these studies showed that toxin binding appears to uniformly reduce the polymerization rate. Under these conditions, it becomes apparent that template strand polypyrimidine segments are transcribed faster than template regions containing purine residues. In this regard it is important to note that the sequence immediately downstream of the last CTG repeat within the pM-L(CTG) 17 template (5Ј-CTGGGGGGA) contains six consecutive non-template strand purine residues. The under-representation of transcripts associated with polymerase transcribing this region in time course studies (Fig. 2, lanes 3 and 4, and Fig. 5,  lanes 1 and 2) supports the notion that enhanced transcription of template strand polypyrimidine tracts is an inherent transcriptional property of RNA polymerase II. Our results also suggest that, with the exception of sites that induce pausing, the individual polymerization rates within the repeat tracts vary over a fairly narrow range (2-3-fold). Two-to three-fold variations in polymerization steps were also observed at most locations during transcription of the Saccharomyces cerevisiae SUP4 tRNA Tyr gene by yeast RNA polymerase III (63).
Run-off Transcription Products-Previous studies using mixed sequence DNA templates and a variety of templates ends have demonstrated that as RNA polymerase II approaches the end of linear templates it may either arrest 6 -10 bases upstream of the template end or continue to transcribe to or near the template end (20). Unexpectedly, approximately 4% of the run-off transcription products on the pML(CTG) 17 and pML(CAG) 17 templates were approximately 15 or 18 nt longer than expected, respectively. We demonstrated that the extension products generated on the pML(CTG) 17 template were produced by promoter-initiated RNA polymerase and contained faithfully transcribed triplet repeat sequence (Fig. 6B). Thus, the observed transcript lengthening was not the result of transcriptional slippage within the repeat tract. In the accompanying report (69), we continue our analysis of this reaction and demonstrate that these products are produced by template terminal complexes and a general feature of run-off transcription by RNA polymerase II. Moreover, we provide evidence that these products represent intermediates in a template-switching pathway where RNA polymerase may retain its nascent transcript and resume transcription after switching to another template end.