Quantitative Analysis of in Vivo Initiator Selection by Yeast RNA Polymerase II Supports a Scanning Model*

Initiation of transcription by RNA polymerase II (RNAP II) on Saccharomyces cerevisiae messenger RNA (mRNA) genes typically occurs at multiple sites 40–120 bp downstream of the TATA box. The mechanism that accommodates this extended and variable promoter architecture is unknown, but one model suggests that RNAP II forms an open promoter complex near the TATA box and then scans the template DNA strand for start sites. Unlike most protein-coding genes, small nuclear RNA gene transcription starts predominantly at a single position. We identify a highly efficient initiator element as the primary start site determinant for the yeast U4 small nuclear RNA gene, SNR14. Consistent with the scanning model, transcription of an SNR14 allele with tandemly duplicated start sites initiates primarily from the upstream site, yet the downstream site is recognized with equivalent efficiency by the diminished population of RNAP II molecules that encounter it. A quantitative in vivo assay revealed that SNR14 initiator efficiency is nearly perfect (∼90%), which explains the precision of U4 RNA 5′ end formation. Initiator efficiency was reduced by cis-acting mutations at –8, –7, –1, and +1 and trans-acting substitutions in the TFIIB B-finger. These results expand our understanding of RNAP II initiation preferences and provide new support for the scanning model.


Initiation of transcription by RNA polymerase II (RNAP II) on
Saccharomyces cerevisiae messenger RNA (mRNA) genes typically occurs at multiple sites 40 -120 bp downstream of the TATA box. The mechanism that accommodates this extended and variable promoter architecture is unknown, but one model suggests that RNAP II forms an open promoter complex near the TATA box and then scans the template DNA strand for start sites. Unlike most proteincoding genes, small nuclear RNA gene transcription starts predominantly at a single position. We identify a highly efficient initiator element as the primary start site determinant for the yeast U4 small nuclear RNA gene, SNR14. Consistent with the scanning model, transcription of an SNR14 allele with tandemly duplicated start sites initiates primarily from the upstream site, yet the downstream site is recognized with equivalent efficiency by the diminished population of RNAP II molecules that encounter it. A quantitative in vivo assay revealed that SNR14 initiator efficiency is nearly perfect (ϳ90%), which explains the precision of U4 RNA 5 end formation. Initiator efficiency was reduced by cis-acting mutations at ؊8, ؊7, ؊1, and ؉1 and trans-acting substitutions in the TFIIB B-finger. These results expand our understanding of RNAP II initiation preferences and provide new support for the scanning model.
Eukaryotes rely on RNA polymerase II (RNAP II) 2 to synthesize all messenger RNAs (mRNAs) and most of the small nuclear RNAs (snR-NAs) and small nucleolar RNAs (snoRNAs) encoded within their nuclear genomes. Efficient and accurate transcription initiation is vital to ensure the proper expression and function of these RNAs. The recruitment of RNAP II to gene promoters is mediated through the assembly of a pre-initiation complex (PIC). RNAP II accessory proteins provide promoter specificity and the structural core for assembly of the PIC. These accessory proteins include the general transcription factors TFIID, TFIIB, TFIIF, TFIIE, and TFIIH (1)(2)(3). Some transcription factors engage in sequence-specific contacts with core promoter elements (4); one of the most fundamental interactions for PIC assembly is between the TATA-binding protein subunit of TFIID and the TATA box (5)(6). In a stepwise model for PIC assembly, TATA-binding protein binding is followed by the addition of TFIIB, RNAP II-TFIIF, TFIIE, and TFIIH (7).
In metazoans, the assembly of a PIC at the TATA box results in start site selection 25-30 bp downstream (4). The architecture of the PIC is such that the transcription start site is placed precisely within the active center of RNAP II (8 -9). In the yeast Saccharomyces cerevisiae, RNAP II initiation typically occurs at multiple sites at variable distances from the TATA box, with most start sites ranging from 40 to 120 bp downstream of the TATA box (10). The initiation mechanism that accommodates this extended and variable promoter architecture is unknown, but it does not appear to be dependent on assembling the yeast PIC in a manner different from that of metazoans. Yeast promoter melting has been shown to begin at the same position as in metazoans, ϳ20 bp downstream of the TATA box (11). In addition, the ϳ30-bp distance between the TATA box and RNAP II active center has been confirmed through structural analysis of yeast PICs (8 -9). A scanning model for start site selection has been proposed for yeast (11). In this scanning model a PIC assembles at a TATA box, the DNA is melted, and RNAP II translocates downstream searching the template strand for acceptable start sites.
The initial sequence comparisons and mutational analysis of a relatively small set of yeast mRNA genes helped define three related yeast start site consensus sequences, RRYRR, TCRA, and YA(A/T)R in the non-template strand, where the initiation site is underlined, Y is pyrimidine, and R is purine (12)(13)(14). Recently, an alignment of sequences flanking 4637 yeast transcription start sites has provided a more refined consensus sequence: A(A rich ) 5 NYA(A/T)NN(A rich ) 6 (15). The DNA sequences encompassing yeast transcription start sites are sometimes termed initiator elements. In metazoans the initiator is defined as a core promoter element distinct from the TATA box that nucleates PIC assembly and is sufficient for accurate transcription (16). Although there is evidence to suggest that some yeast initiators may function in this way (17)(18), most appear to play a more limited role in transcriptional control and influence accuracy but not overall efficiency (2).
It seems likely that the recognition and efficient utilization of yeast start sites involves a sequence-specific interaction between the yeast initiator element and either RNAP II, an accessory protein, or both. RNAP II and TFIIB have been shown to dictate the distance from TATA boxes to start sites in yeast (19). TFIIB substitutions that confer downstream shifts in yeast start site selection map to the "B-finger" domain, which encompasses residues 55-88 of the N-terminal region (9, 20 -22). The promoter sequence immediately upstream of yeast start sites can influence the severity with which TFIIB B-finger substitutions alter start site selection (23). A yeast RNAP II-TFIIB crystal structure model shows the TFIIB B-finger inserted through the RNA exit pore into the polymerase active site, suggesting that start site selection may be mediated by a direct interaction between the B-finger and promoter DNA (9).
Here we report the characterization of cis-and trans-acting determinants of start site selection at the yeast U4 snRNA gene, SNR14. In contrast to the heterogeneous transcription start site selection exhibited at most mRNA genes, yeast snRNAs typically have one major start site, thus providing a model system for the study of accurate initiation. We identified a highly efficient initiator element within the SNR14 promoter, defined the positions most critical for its function in start site selection, and quantified its efficiency relative to other initiator sequences. Substitutions within the TFIIB B-finger genetically interact with SNR14 initiator mutations in a sequence-dependent manner. Quantitative analysis of the utilization of tandemly duplicated initiator elements strongly supports the proposed scanning model for yeast transcription start site selection and demonstrates that scanning by RNAP II is processive.

EXPERIMENTAL PROCEDURES
Plasmid Construction-SNR14 (positions Ϫ224 to ϩ701, relative to ϩ1 transcription start site) was cloned by PCR amplification of a genomic DNA template isolated from yeast strain PJ43Ϫ2b and ligated into the BamHI site of pRS313 (CEN4, ARS1, HIS3). 5Ј end truncation constructs were generated by the same method but using pRS313-SNR14 as the template. pRS313-SNR14-StDup was created by using QuikChange PCR mutagenesis (Stratagene) to insert 14 bp of DNA (Ϫ13 to ϩ1 relative to SNR14 start site) between positions ϩ1 and ϩ2 of pRS313-SNR14, creating an overlapping 20-bp duplication. pRS317 (LYS2)-SUA7 contains the entire SUA7 promoter and coding region (TFIIB gene) and was constructed by ligation of the ClaI/SacI fragment of pRS314 (TRP1)-yIIBN (kindly provided by A. Ponticelli, State University of New York at Buffalo) into pRS317 (LYS2). TFIIB expressed from these constructs contains an N-terminal hexahistidine tag. All mutations within pRS313-SNR14-StDup and pRS317-SUA7 were created using the QuikChange method (Stratagene). pRS316 (URA3)-SNR14, SUA7 was constructed by ligating a PCR-amplified region of SNR14 (Ϫ224 to ϩ 701) into the SalI/XhoI sites of pRS316 (URA3)-yIIBN (kindly provided by A. Ponticelli, State University of New York at Buffalo). Oligonucleotide sequences are available upon request.
DNase I Chromatin Footprinting-Chromatin footprinting was performed as previously described (27) using the yeast strain PJ43Ϫ2b. After digestion of lysed yeast cells or purified genomic DNA with DNase I (Invitrogen), cleavage sites on the non-template strand of the SNR14 promoter were mapped by primer extension using 32 P-labeled oligo U4-14C, which is complementary to non-template strand residues ϩ32 to ϩ51. Sequencing ladders were generated by primer extension of genomic DNA using 32 P-labeled oligo U4-14C and a dNTP mix containing dideoxy-ATP or -GTP.
RNA Analysis-Total cellular RNA was isolated using the guanidinium thiocyanate method including a 65°C phenol extraction (28). Reverse transcription for the determination of the Sec3 mRNA-processing sites was performed in a 50-l reaction volume containing 5 g of total RNA from strain PJ43Ϫ2b, 50 mM Tris-HCl (pH 8.3), 8 mM MgCl 2 , 50 mM NaCl, 11 mM dithiothreitol, 1 mM dNTPs, 40 units of RNasin (Promega), 250 pmol of T16-EcoR1 oligo, and 37.5 units of avian myeloblastosis virus reverse transcriptase (United States Biochemical). cDNA synthesis proceeded at 42°C for 1 h. Ten l of the 50-l RT reaction was used as a template for PCR in a 100-l volume containing 20 mM (NH 4 ) 2 SO 4 , 50 mM Tris-HCl (pH 9.0), 0.75 mM MgCl 2 , 50 pmol of T16-EcoR1 oligo, 100 pmol of SEC3-RT-PCR oligo, and 1 unit of MasterAmp TM Tfl DNA polymerase (Epicentre). Each PCR cycle consisted of a denaturation at 94°C for 30 s, annealing at 42°C for 30 s, and elongation at 72°C for 1 min. A total of 30 cycles was performed with an additional extension at 72°C for 5 min. RT-PCR products were gelpurified and ligated into the BamHI/EcoRI site of pRS316. Recovered plasmids were sequenced using an M13F oligo.
Primer extension analysis of 5 g of total cellular RNA was carried out using 32 P-labeled oligonucleotide U4 -14B (complementary to nucleotides 140 -159 of yeast U4 RNA) or SCR1 (complementary to nucleotides 75-92 of yeast scR1 RNA) (31). Sequencing ladders were generated using the Sequitherm EXCEL II DNA sequencing kit (Epicentre). The cDNA products were electrophoresed on 6% polyacrylamide, 8.3 M urea gels. Gels were visualized with a Storm PhosphorImager (Amersham Biosciences), and data were quantitated with Amersham Biosciences ImageQuant software (Version 5.2).

RESULTS
Conserved Sequence Elements Upstream of the Yeast U4 snRNA Gene, SNR14-To begin characterizing SNR14 promoter architecture, we used comparative sequence analysis to identify conserved elements upstream of the transcription start site and downstream of the 5Ј-adjacent gene, SEC3 (Fig. 1A). An alignment of sequences upstream of SNR14 in four different species of Saccharomyces (32) helped identify several conserved elements (Fig. 1B). The most strikingly conserved regions include the sequence immediately upstream of the transcription start site, a TATA box located 100 base pairs upstream of the start site, a T-stretch just upstream of the TATA box, and a region located 31-44 base pairs upstream of the TATA box. The most upstream conserved region may be an upstream activating sequence and in S. cerevisiae exactly matches the consensus binding site of the transcriptional activator Abf1 (33)(34).
In addition to promoting SNR14 transcription, another likely function for conserved sequences in this intergenic region is to direct cleavage and polyadenylation of Sec3 mRNA. RT-PCR was used to identify the predominant Sec3 mRNA 3Ј-processing sites. The most efficiently recovered site, which appeared in 12 of 21 clones, mapped to the middle of the putative Abf1 binding site (Fig. 1B). Six other nearby sites were represented by 1-2 clones each. This result implies that the sites of SEC3 transcription termination and SNR14 PIC assembly overlap.
DNase I chromatin footprinting was used to complement comparative sequence analysis in the search for potential SNR14 promoter elements. This procedure probed in vivo assembled chromatin by lysing yeast cells directly into a solution of DNase I. For comparison, purified genomic DNA was digested with DNase I. Cleavage sites were detected by primer extension (27). No obvious DNase I footprint was observed between the SNR14 TATA box and start site (Fig. 1C) despite the fact that the gene is single-copy and highly transcribed. This finding suggests that there is no high occupancy protein binding site in the region of the promoter that separates the location of PIC recruitment from that of transcription initiation. Rather, the subtle changes in DNase I protection and enhancement suggest partial occupancy. In addition, some subtle changes in DNase I sensitivity were observed at the putative upstream activating sequence, TATA box, and initiator region, consistent with partial occupancy. We could not make any conclusions regarding the protein occupancy of the T-stretch given that it was not efficiently cleaved by DNase I. Because a scanning RNAP II complex is unlikely to provide sufficient promoter occupancy for detectable DNase I protection, the footprinting results obtained for the SNR14 promoter are consistent with this model for transcription initiation.
The Conserved SNR14 TATA Box Is Not a Determinant of Start Site Position in Vivo-Functional upstream SNR14 promoter elements were roughly mapped using 5Ј-truncation analysis of a plasmid-borne allele ( Fig. 2A). Inserts were tested in both orientations within the vector to control for effects of plasmid sequences. Primer extension of U4 RNA synthesized from its chromosomal locus showed the single major transcription start site designated as ϩ1 (Fig. 2B, lane 1). Strains bearing SNR14 with 224 base pairs of upstream DNA on a plasmid yielded the same initiation pattern as the chromosomal locus (Fig. 2B, lanes 2 and 9). Deletion of the putative upstream activating sequence and T-stretch had no effect on transcription efficiency or accuracy in the context of the "forward" plasmid-borne allele but decreased efficiency 2-fold in the "reverse" orientation ( Fig. 2B, lanes 3 and 10). The efficiency of SNR14 transcription was reduced an additional 4 -11-fold upon deletion of the TATA box, but transcription start site selection was changed little (Fig. 2B, lanes 4 and 11). Upon further truncation to position Ϫ74 and beyond, transcription start site selection became increasingly aberrant, resulting in the appearance of additional U4 RNA 5Ј ends (Fig. 2B, lanes 5-8 and 12-14). Except for the Ϫ3 reverse allele, all SNR14 truncation mutant strains are viable. Surprisingly, even strains bearing deletions of upstream DNA to position Ϫ3 expressed transcripts initiating from the ϩ1 start site, albeit at a very low level (Fig. 2C). The effects of promoter truncations varied somewhat depending on the orientation of SNR14 alleles within the plasmid, with the reverse orientation yielding more severe effects on growth and U4 RNA synthesis. Because the differences appeared to be limited to transcription efficiency and not transcription start site selection, they are likely due to functional promoter elements in the vector sequences.
To confirm that the additional U4 RNA 5Ј ends observed upon truncation of the SNR14 promoter were due to misinitiation and not degradation, we tested for the presence of the methylguanosine 5Ј cap that is added to nascent RNAP II transcripts (35). Because the 7-methylguanosine cap attached co-transcriptionally to RNAP II transcripts is hypermethylated to trimethylguanosine on snRNAs, we used a monoclonal antibody that recognizes both cap structures. RNAs corresponding to all major U4-specific primer extension products were efficiently immunoprecipitated with an anti-cap antibody (Fig. 3A), confirming that they are primary transcripts. The scR1 RNA is synthesized by RNAP III and was not immunoprecipitated with anti-cap antibody, showing that immunoprecipitation is specific for capped transcripts. The major alternative start sites resulting from SNR14 promoter truncations were mapped to positions Ϫ35, Ϫ32, and Ϫ27, which are located within plasmid sequence, and ϩ20, ϩ32, ϩ83, and ϩ88 in the U4 coding region. Interestingly, all of the start sites mapped to a purine residue, and 5 of the 8 start sites (including the wild-type start site) immediately follow the dinucleotide CC (Fig. 3B).
Thus, transcription initiation of SNR14 from the normal ϩ1 start site is remarkably resistant to deletion of conserved upstream promoter elements. Much like what has been observed previously at mRNA genes, the SNR14 TATA box primarily affects the frequency of RNAP II transcription and not the position of initiation (10). Alternative SNR14 initiation sites minimally include a purine preceded by a pyrimidine and most often by two cytosines. These results suggest that start site selection at SNR14 is directed by an initiator-like element, potentially in combination with a downstream (intragenic) element.
A Polar Effect on SNR14 Start Site Selection Supports a Unidirectional Scanning Model-A scanning model for yeast transcription start site selection posits that after melting DNA near the TATA box, RNAP II translocates along downstream DNA until a suitable initiation sequence is located (11). We directly tested the validity of this model by constructing an SNR14 allele with tandemly duplicated start sites, SNR14-StDup (Fig. 4A, construct 2). The SNR14-StDup construct has a 14-base pair insertion that creates two start sites identical in sequence from positions Ϫ13 to ϩ7, which should be indistinguishable to factors binding directly to these sequences, including RNAP II. Therefore, if start site selection RNAs in total cellular RNA isolated from an untransformed strain with an intact SNR14 chromosomal locus (CHR) or from strains bearing a disrupted SNR14 chromosomal locus transformed with plasmid-borne SNR14 promoter deletion alleles. Orientation of the SNR14 allele within the plasmid is arbitrarily specified as forward (F) or reverse (R). The positions of U4 and scR1 cDNAs are indicated on the left. Arrows on the right denote additional U4 RNA 5Ј ends resulting from promoter truncations. C, summary of SNR14 promoter truncation effects on growth phenotype and U4 RNA level. Orientation of the SNR14 allele within the plasmid is as specified in panel B. RNA levels were quantitated using only U4 transcripts initiated from the ϩ1 start site and are relative to the full-length construct (Ϫ224) normalized to scR1 level, which was defined as 100. Growth phenotypes were determined qualitatively from serial dilution spot tests on solid synthetic complete medium containing 5-fluoroorotic acid. Primer extension of total RNA or pellet (P) and supernatant (S) fractions was done with 32 P-labeled U4-and scR1-specific primers. The scR1 RNA served as a negative control since it was synthesized by RNAP III as an uncapped transcript. The positions of alternate start sites within upstream plasmid sequence and the U4 coding region were mapped to the nucleotide (data not shown) and are indicated on the right. No IP, no immunoprecipitation. B, sequence context of alternative SNR14 start sites, with non-template strand DNA aligned from Ϫ10 to ϩ10 relative to the site of initiation.
is driven by a random collision of RNAP II or another initiator-binding protein with the DNA, we should detect roughly equal utilization of the two sites. If the start site is defined by its distance from factors bound upstream or by a polymerase scanning from the TATA box, we should detect primarily upstream starts. In contrast, if the start site is measured from factors bound to intragenic promoter elements, we should detect primarily downstream starts (36).
In fact, the upstream start site (ϩ1u) is preferentially utilized (Fig. 4A, lane 2), consistent with upstream recruitment and a directional scanning model. When the initiating non-template nucleotide of the upstream start is changed from A to T, the downstream start (ϩ1d) is more heavily utilized, confirming that the start sites are in competition and that precise spacing from an upstream recruitment site is not required (Fig. 4A, lane 3). Changing the downstream start from A to T essentially abolishes its usage entirely (Fig. 4A, lane 4). When both start sites are changed from A to T, weak alternative sites are used at Ϫ8d and ϩ7d (Fig. 4A, lane 5). Weak initiation at Ϫ8 and ϩ7 is also observed in the wild-type allele. Interestingly, initiation at Ϫ8u, the most upstream observed start site in the SNR14-StDup allele, does not increase when the ϩ1u and ϩ1d start sites are mutated. This finding is also consistent with the directional scanning model, in which read-though of a site should not affect initiation at sites upstream.
Yeast Start Site Efficiency Is an Intrinsic, Quantifiable Property of an Initiator Sequence-To obtain a quantitative estimate of initiation sequence preference, we assumed that a homogeneous population of initiation-competent polymerases scan unidirectionally through the Ϫ8u to ϩ7d interval of the SNR14-StDup allele in search of a good match to the ideal initiator consensus. Start site efficiencies were calculated by dividing the relative yield of a start site product by RNAP II flux at that site. The relative yield of a transcript from a given start site was determined by dividing its signal intensity by the total signal intensity of products from all detectable start sites (Ϫ8u, ϩ1u, Ϫ8d, ϩ1d, ϩ7d). RNAP II flux was defined as the relative number of polymerases encountering a given start site and was arbitrarily assigned a value of 100 units at the Ϫ8u site. Because 2% of the U4 cDNA ends at the Ϫ8u position, 2 units of RNAP II must have initiated at this site, and 98 units continued to scan (assuming there is no loss of RNAP II except by detectable initiation in the Ϫ8u to ϩ7d interval). When start site efficiency is determined without considering flux, the ϩ1u and ϩ1d start site efficiencies differ by about 10-fold (87% versus 8%). When RNAP II The relative yield of a transcript from a given start site was determined by dividing its signal intensity by the total signal intensity of products from all detectable start sites (Ϫ8u, ϩ1u, Ϫ8d, ϩ1d, ϩ7d). For simplicity, the efficiency of the ϩ7d start site was assumed to equal 100%. RNAP II flux is defined as the relative number of polymerases encountering a given start site and was arbitrarily assigned an initial value of 100 units (U). C, calculating start site efficiency for SNR14- StDup Aϩ1uT (panel A, lane 3). flux at the two positions is included in the calculation, the ϩ1u and ϩ1d start site efficiencies are found to be equal at 89% (Fig. 4B). Even when the efficiency of the ϩ1u site is reduced more than 10-fold by the Aϩ1uT mutation, the efficiency of ϩ1d and other downstream sites remains about the same when flux is considered (Fig. 4C). These results indicate that initiator efficiency is an intrinsic property that is largely independent of start site position. We can, therefore, use this value to classify initiator strength. For example, we can deduce that the ϩ1 initiator sequence of the U4 gene is a nearly perfect initiator, with an efficiency of 89 -92%.
Mutations in SNR14-StDup Define Preferred Sequences at Ϫ1 and Ϫ8 of the Initiator-Having developed a quantitative assay for in vivo start site selection, we next tested the sequence requirements for RNAP II initiation through site-directed mutagenesis of phylogenetically conserved base pairs in the SNR14 major initiator. In addition to the ϩ1 position, the nucleotide identities of positions Ϫ8, Ϫ7, Ϫ5, Ϫ4, Ϫ2, and Ϫ1 of SNR14 are conserved across the Saccharomyces genus (Fig. 1). A double transversion mutation at positions Ϫ8u, Ϫ7u (A3 T) or Ϫ2u, Ϫ1u (C3 G) of the SNR14-StDup allele significantly shifted initiation toward downstream start sites, reducing efficiency of the ϩ1u site by about 5-fold (Fig. 5A, lanes 2 and 5). Upon separation of the Ϫ8u/Ϫ7u double mutation into single point mutations, it became clear that the Ϫ8u mutation contributes more to the initiation defect than the Ϫ7u mutation (Fig. 5A, lanes 3 and 4). A similar dissection of the Ϫ2u/Ϫ1u double mutation showed that the change at position Ϫ1u accounted for all of the downstream shift (Fig. 5A, lanes 6 and 7). Transversions at Ϫ5u or Ϫ4u on their own had little if any effect on start site selection (Fig. 5B,  lanes 8 and 9).
The base preference at positions Ϫ8u and Ϫ1u was explicitly tested by creating all possible base substitutions. The A-8uT mutation reduced usage of the ϩ1u start site by about 2-fold, an effect that was slightly greater than that observed for A-8uG and A-1uC (Fig. 5B, lanes 2-4). The C-1uG mutation reduced usage of the ϩ1u start site by at least 5-fold, an effect that was followed closely by C-1uA (Fig. 5B, lanes 5 and  6). The C-1uT mutation had a very minor effect on start site selection (Fig. 5B, lane 7). Overall, it appears that the strength of the SNR14 start site is dependent on a purine at ϩ1, a pyrimidine at Ϫ1, and an adenine at Ϫ8.
There are other sequences within the RNAP II scanning window between the SNR14 TATA box and ϩ1 start site that resemble start sites but at which initiation does not efficiently occur. We predicted that changing these sites to match the bases preferred at positions Ϫ8, Ϫ7, and Ϫ1 would contribute to more efficient start site usage. Weak initiation occurs at Ϫ8u in a wild-type SNR14-StDup allele (Fig. 5C, lane 1). The C-16u/15uA or T-9uC mutations on their own increase usage of the Ϫ8u site by about 5-or 2-fold, respectively (Fig. 5C, lanes 2 and 3). Combining these mutations has an additive effect, resulting in a 10-fold increase in Ϫ8u start site strength (Fig. 5C, lane 4). The optimization of the Ϫ8u start site toward a higher efficiency further demonstrates the importance of the Ϫ8, Ϫ7, and Ϫ1 positions in SNR14 initiator function.
The efficiencies of 26 different wild-type and mutant yeast initiators calculated from various SNR14-StDup alleles are shown in Table 1, with values ranging from about 89 to 4%. The wild-type SNR14 ϩ1 site (Aϩ1d, Aϩ1u) is the most efficient, and divergence from this sequence reduces start site efficiency. Changes at the Ϫ8, Ϫ7, Ϫ1, and ϩ1 positions reduce start site efficiency anywhere from about 2-to 15-fold. Although the efficiency of the wild-type A-8u start was improved 10-fold by changing the Ϫ8, Ϫ7, and Ϫ1 positions toward a more preferred initiator sequence, it is interesting to note that the efficiency of this site is still around 2-fold less than the wild-type Aϩ1 start site. The Ϫ8u and Ϫ8d start site efficiencies differ by 6-fold (4 versus 24%) even when the flanking sequences are identical from Ϫ9 to ϩ7, as is the case in the StDup-Aϩ1uT allele (Table 1). Taken together, these results indi-cate that there are positions other than Ϫ8, Ϫ7, Ϫ1, and ϩ1 at which nucleotide identity influences initiator efficiency.
Substitutions in the TFIIB B-finger Exacerbate the Effect of Initiator Mutations at Ϫ8 and Ϫ1-One possible cause of the start site selection defects exhibited by SNR14 initiator mutants is a disruption of direct amino acid/nucleotide contact(s) made between a protein component of the yeast PIC and the initiator. To analyze the role of TFIIB as the potential trans-acting component of the PIC that interacts with the yeast initiator element, we generated a double knock-out strain that has disrupted chromosomal copies of the SNR14 and SUA7 (TFIIB) genes and carries wild-type copies of these genes on a URA3-marked plasmid. Standard plasmid shuffle protocols were used to introduce mutant alleles of SNR14 and SUA7. The effect of TFIIB B-finger substitutions previously shown to alter initiation on protein-coding genes (22-23) was tested in the context of the SNR14-StDup allele. For the most part,

TABLE 1 Initiation efficiency of yeast RNAP II as a function of start site sequence
a Asterisks indicate alleles analyzed in Fig. 5C. Reverse text indicates positions divergent from the Aϩ1 WT start site. b Efficiencies were calculated as described in Fig. 4 and are shown as an average in cases where n Ͼ 1.
substitutions in residues 63, 64, 66, and 78 of TFIIB all caused a similarly modest shift in transcription initiation from upstream to downstream sites, reducing ϩ1u start site efficiency by about 1.5-fold (Fig. 6A). The effect of the W63R substitution was less severe than W63P and the other TFIIB substitutions, consistent with what has been observed at the CYC1 and ADH1 genes (22). Overall, the effect of SUA7 mutations on SNR14 start site selection was not as dramatic as has been observed on some mRNA genes, but this is not surprising given that the sensitivity of genes to TFIIB substitutions is known to be dependent on the sequence immediately upstream of the start site (23).
The superimposition of the RNA-DNA hybrid from the RNAP II transcribing complex structure upon a recent crystal structure of yeast RNAP II-TFIIB suggested that template-strand DNA is adjacent to conserved residues of the TFIIB B-finger domain (9). The points of closest contact included nucleotides Ϫ6 to Ϫ8 (relative to the nucleotide addition site at ϩ1) and B-finger residues 62-66. These structurally predicted contacts are consistent with results showing that the archaeal TFIIB homologue cross-links to template DNA near the transcription start site (37)(38). We tested whether mutations in SUA7 genetically interact with mutations in the initiator element at positions that are required for accurate start site selection. An analysis of ϩ1u start site efficiency in the context of the TFIIB R64A substitution revealed a difference in the level of enhancement between Ϫ8 and Ϫ1 initiator mutants, suggesting that the effect of R64A on start site selection is influenced by the initiator sequence (Fig. 6B). The C-1uG/R64A mutant exhibited a 2-fold reduction in ϩ1u efficiency relative to C-1uG alone, similar to the effect of the R64A substitution with the wild-type ϩ1u site (1.5-fold). In contrast, the A-8uT/R64A mutant exhibited about a 4-fold reduction in ϩ1u efficiency relative to A-8uT alone. The differential level of sensitivity of ϩ1u variants to R64A was not observed at the ϩ1d site, consistent with the fact that A-8uT and C-1uG specifically alter usage of the upstream site in SNR14-StDup. Of the remaining start sites tested, Ϫ8d was very sensitive to R64A (9-fold reduction), whereas Ϫ5d was relatively insensitive. Overall, these data support the notion that TFIIB B-finger substitutions affect RNAP II start site selection in a manner dependent on the sequence of the initiator, an element that extends upstream to at least the Ϫ8 position.

DISCUSSION
The synthesis of non-coding RNAs, especially snRNAs and snoR-NAs, puts strong demands on the accuracy and efficiency of transcription initiation by RNAP II. The transcription start site of such RNAs usually corresponds to a unique mature 5Ј end, and its precise placement may be required for optimal RNA function. Yeast snRNAs and snoRNAs typically have steady-state levels of hundreds of copies per cell, so their genes must be actively transcribed. Non-coding RNA gene promoters are, therefore, interesting subjects for study of the optimal sequences for directing initiation by RNAP II. Here we provide evidence that the S. cerevisiae U4 snRNA gene, SNR14, fulfills these stringent requirements by coupling a consensus TATA box with a nearly perfect initiator element. Furthermore, the DNA between the TATA box and initiator is devoid of initiator-like sequences that might divert RNAP II from the proper start site as it scans downstream from the TATA box. The differential utilization of duplicated initiator elements in artificial variants of the SNR14 promoter strongly supports the scanning model of start site selection by RNAP II in budding yeast and demonstrates that initiator element efficiency is an intrinsic property dependent primarily on the sequence at positions Ϫ8, Ϫ7, Ϫ1, and ϩ1 relative to the start site.
Architecture of the SNR14 Promoter-In terms of their general promoter architecture, yeast snRNA genes bear a strong resemblance to mRNA genes both in the position and function of their core elements. In agreement with their observed roles in yeast mRNA genes, the conserved TATA box and initiator elements of the SNR14 promoter primarily influence RNAP II transcription efficiency and accuracy, respectively. A more distinctive feature of the SNR14 promoter is the presence of a conserved T-stretch and putative Abf1 binding site. Abf1 sites have previously been identified upstream of T-rich stretches in the promoters of ribosomal protein genes (39) and snoRNA genes (40). Both microarray-based readout of chromatin immunoprecipitation (ChIPchip) and protein binding to microarrays (PBM) have identified SNR14 as a target gene for Abf1 (33,41). The similarity in upstream sequences between genes encoding components of the spliceosome, ribosome, and RNA modification machinery raises the possibility of coordinate regulation of these activities at the level of transcription.
In one orientation of a plasmid-borne SNR14 allele, deletion of the putative Abf1 site and T-stretch led to a 2-fold reduction in U4 RNA level, suggesting a potential role for one or both of these elements in transcription efficiency. In the context of the ribosomal protein-coding gene RPS28A, a mutation that destroys Abf1 binding in vitro reduced transcription by 10-fold, whereas substitutions in the T-rich element reduced transcription by 2-fold (39). Other potential functions for Abf1 within the SNR14 promoter (e.g. genome partitioning, nucleosome  Fig. 4, except that ϩ7d and a site further downstream (indicated with asterisks) were also included in the total signal intensity. organization) would likely require a more native chromosomal environment than was provided in our study (42). The potential role for Abf1 in genome partitioning of this region is made more interesting given the observation that the sites of SEC3 transcription termination and SNR14 PIC assembly overlap.
Refinement of the Yeast RNAP II Initiator Consensus Sequence-The preference for particular nucleotides at positions Ϫ1 and ϩ1 of the yeast initiator has been widely reported in the literature. Although a YA initiator consensus applies well to many documented yeast start sites, it is too minimal to have predictive value. A point mutation analysis of an RRYRR consensus initiator from the TRP4 gene revealed that the central pyrimidine and at least one of the 3Ј-flanking purine nucleotides were essential but alone insufficient to define a functional initiator element (43). Previous reports have identified regions immediately 5Ј of the original yeast initiator consensus that influence start site selection. Maicas and Friesen (44) identified a region centered at Ϫ9 of the TCM1 gene and 95 other mRNA genes that they termed the "locator." The locator was defined as a region where the base composition of the nontemplate strand sharply switched from a preponderance of thymine residues to predominantly adenine residues. Rathjen and Mellor (45) identified a region from Ϫ10 to Ϫ4 (ACAGATC) of the major PGK1 start site as a "determinator" element. Deletion of the determinator resulted in a loss of initiation from the normal start site and increased use of more downstream sites. Healy and Zitomer (46) were able to show that insertion of CAAG upstream of the CYC7 gene could direct initiation at a site at which it did not normally occur, and it is interesting to note that their insertion also introduced an adenine at the Ϫ8 position on the non-template strand. Our genetic evidence supporting a preference for adenine at positions Ϫ8 and Ϫ7 of the SNR14 initiator expands the older yeast initiator consensus and can account for the earlier observations described above for the TRP4, TCM1, PGK1, and CYC7 genes.
The general importance of the Ϫ8, Ϫ7, Ϫ1, and ϩ1 positions for initiator efficiency across all RNAP II-transcribed yeast genes is supported by a recent bioinformatics study that compared 4637 yeast transcription start sites. Sequence alignment produced the consensus A(A rich ) 5 NYA(A/T)NN(A rich ) 6 , where the underlined A is the initiation site (15). This yeast initiator consensus is more expansive than that reported for higher organisms like Drosophila TCA(G/T)T(T/C) or mammals YYAN(T/A)YY (4). Although they all share the minimal consensus YA, the latest evidence suggests that yeast initiator sequence preferences extend beyond Ϫ1/ϩ1 to include at least 8 nucleotides upstream and downstream. Zhang and Dietrich (15) could not conclude whether the A-richness of the yeast initiator consensus sequence was important for transcription initiation or a consequence of some other aspect of genome structure. Here, we present direct evidence indicating that the adenine at position Ϫ8 and to a lesser extent the adenine at Ϫ7 contributes to the functionality of the yeast initiator as a start site determinant. The fact that substitution of any base besides adenine at Ϫ8 significantly decreases start site efficiency suggests that the functional impairment is not merely related to the melting potential of an A:T base pair. Rather, it suggests that the Ϫ8 position is recognized in a sequencespecific manner.
How the yeast initiator sequence determines start site usage is as yet unknown. It seems likely that the initiator is recognized by a protein component of a scanning pre-initiation complex. Mutations that alter yeast transcription start site selection have been identified in numerous protein components of the PIC, including RNAP II (Rpb1, Rpb2, Rpb9), TFIIB, and TFIIF (20 -21, 47-52). Of these proteins, Rpb1, Rpb2, TFIIB, and TFIIF (Tfg1, Tfg2) have also been cross-linked to DNA at or near a transcription start site (53).
Faitar et al. (23) determined that mutations in yeast initiator sequences genetically interact with substitutions in the TFIIB B-finger, making certain start sites more or less sensitive to downstream shifts in transcription start site selection. Specifically, they showed that among a set of mutations made from positions Ϫ6 to ϩ5 of an ADH1 initiator, changes at Ϫ2, ϩ1, and ϩ2 significantly increased or decreased the sensitivity of the ϩ1 transcription start site to TFIIB-V79L. Here, we present evidence supporting a genetic interaction between the Ϫ8 position of the SNR14 initiator with the TFIIB B-finger, expanding our view of what constitutes an initiator element and where potential protein-DNA interactions may occur. Furthermore, the fact that the Ϫ8d start site is highly sensitive to the TFIIB-R64A substitution in both the StDup-A-8uT and -C-1uG alleles whereas the Ϫ5d start site is essentially insensitive in both contexts indicates that sensitivity correlates with sequence and not simply initiator strength.
Implications for a Scanning Model of RNAP II Start Site Selection-A scanning model is currently the best supported explanation for how yeast start sites are selected, but experiments directly testing the basic implications of the model are scarce in the literature. We constructed an allele of SNR14 with tandemly duplicated start sites as a means to test the yeast scanning model in both a qualitative and quantitative fashion. We observed that although the upstream start site had a higher relative yield than the downstream start site, a reduced level of RNAP II flux fully accounted for the lower relative yield from the downstream site. RNAP II flux is an inherent property of a unidirectional scanning model for yeast transcription initiation. Thus, the fact that RNAP II flux can be used to resolve the observed differences in relative utilization of two identical start sites is in itself strong support for the model. The fact that our estimations of RNAP II flux so closely agree with shifts in utilization of start sites 14 -22 bp apart in response to mutations suggests that RNAP II scanning is reasonably processive. A scanning mechanism of start site selection requires RNAP II to be processive to accommodate the large and variable distances between yeast TATA boxes and initiator elements. It will be interesting to test the limits of processive scanning by RNAP II. The identification of initiators with a range of efficiencies (Table 1) should aid such studies.
The yeast initiator sequence consensus is readily apparent among RNAP II-transcribed snRNA genes (SNR14, SNR19, SNR20, SNR7) and snoRNA genes. A Weblogo alignment (54) using a pool of 22 yeast snRNA and snoRNA transcription start sites results in an initiator consensus very similar to that reported for mRNA genes, A(A rich ) 3 NNYYA(A/T)N(A rich ) 2 . Given that yeast RNAP II transcribes all mRNA genes and most snRNA and snoRNA genes, it is expected that the cis-acting sequence requirements for their start site selection would be similar. However, the identification of a strong match to the initiator consensus from such a small sample size of snRNA/snoRNA genes suggests a basis for why yeast snRNA/snoRNA transcription initiation is much more precise than that of mRNA genes. Simply stated, genes with a highly efficient initiator have fewer start sites because a smaller population of polymerases is available to scan further downstream. Nagawa and Fink (55) touched on this idea when they suggested that one reason that yeast genes HIS1 and CYC1 have multiple weak start sites is because they lack a strong start site.
In addition to snRNA genes evolving highly efficient initiators, the regions between snRNA TATA boxes and coding regions may have also undergone negative selection against the initiator consensus to minimize the usage of non-optimal start sites. Consistent with this idea, an analysis of all 10 potential YR start sites within the RNAP II scanning window between the SNR14 TATA box and the ϩ1 start site revealed that none contained the preferred adenine at the Ϫ8 or Ϫ7 positions. The driving force behind this proposed kind of snRNA promoter evolution would be to increase the functional capacity of the RNA gene products, which are essential for yeast viability. snRNA genes encode structural RNAs that require precise 5Ј ends for their function. For example, the 5Ј end of yeast U4 RNA engages in base-pairing interactions with the yeast U6 RNA during the splicing cycle. In contrast, mRNA transcripts contain 5Ј-untranslated regions that typically have no precise length requirement for proper expression. Consequently, the 5Ј end of mRNAs typically need not be formed in as precise of fashion as those of snRNAs, and their gene promoters would likely not have undergone the same type of evolutionary selection.
The study of transcription initiation on yeast non-coding RNA genes has provided useful insight into the fundamental process by which RNAP II initiates RNA synthesis, particularly with regard to its accuracy. Additional genetic, biochemical, and structural studies are necessary to elucidate the underlying mechanism by which both initiator DNA and PIC proteins function in the process of RNAP II transcription start site selection.