Single-stranded RNA recognition by the bacteriophage T4 translational repressor, regA.

The T4 protein, RegA, is a translational repressor that blocks ribosome binding to multiple T4 messages by interacting with the mRNAs near their respective AUG start codons. Other than the AUG, there are no obvious similarities between the affected mRNAs. High affinity RNA ligands to RegA were isolated using SELEX (systematic evolution of ligands by exponential enrichment). The selected RNAs exhibited the consensus sequence 5'-AAAAUUGUUAUGUAA-3'. The AUG was invariant, suggesting that it is the primary effector of binding specificity. The UU immediately 5' to the AUG and the upstream poly(A) tract were highly conserved among the selected RNAs. Boundary and footprinting experiments are consistent with the consensus sequence defining the RegA-binding site. Interestingly, chemical modification and nuclease digestion data indicate that the RNA-binding site is single-stranded, as if RegA discriminates between targets based on their primary sequence, not their secondary structure. Minor variations from the consensus at positions other than the universally conserved AUG have little effect on RegA binding, but accumulation of mutations has a profound effect on the interaction. Comparison of the in vivo targets for RegA to the SELEX-generated consensus suggests a repression pattern whereby the translation of individual messages is sequentially halted until the least similarly affected message, the regA gene itself, is repressed.

Translational regulation has been shown to be an important means for controlling gene expression in a variety of organisms, both prokaryotic (1, 2) and eukaryotic (3,4). One of the more interesting regulatory mechanisms involves the repression of translation caused by RNA-binding proteins interacting specifically with mRNAs. Many of these translational repressors function by directly competing for mRNA binding with ribosomes, thus decreasing the level of translational initiation (5). Among the well characterized repressors, the bacteriophage T4 translational repressor, RegA, is unusual in that it affects the translation of many independent messages.
The expression of at least nine T4 genes is reduced in the latter stages of the phage life cycle by the autoregulated product of the regA gene (6). Transcription of these genes is not altered, and thus RegA-mediated repression occurs post-transcriptionally (7). Genetic analysis and in vitro footprinting indicate that RegA specifically interacts with several of the regulated mRNAs near their translational start sites (8 -10). The presence of RegA alters the binding of the 30 S subunit of Escherichia coli ribosomes to these mRNAs, thus preventing translational initiation (9). Taken together, these data are consistent with RegA altering gene expression in vivo by obstructing ribosome binding to specific mRNAs.
Although RegA repression is specific, the mRNA sequences that are bound by the repressor display few similarities. A consensus for the region surrounding the translational start sites of the affected mRNAs is indistinguishable from one generated for all of the known T4 messages (11). The lack of a distinctive consensus for the RegA-bound RNAs is not terribly surprising given that the putative repressor-binding site lies within the RNA domain used for translational initiation. Elements such as the AUG start codon and Shine-Dalgarno sequence are apparent in all of the messages of T4; thus repressed and unrepressed messages will necessarily share these characteristics within the RegA binding domain. However, it has been shown that single-site substitutions in this region of the affected messages can reduce RegA binding by more than 2 orders of magnitude (12). In addition, NMR studies of an RNA fragment harboring a G 3 U substitution in the same region indicate that both the native and mutated RNAs have similar single-stranded conformations (13). Results from mutational (14) and deletion analyses (15) suggest that the C terminus of T4 RegA protein provides the nonspecific nucleic acid binding component but the ability to discriminate between sequences resides elsewhere on the protein.
An understanding of the RNA elements required for binding by RegA would best be achieved by separating repressor binding from the in vivo requirements for translation. We used SELEX 1 (16), an in vitro method for isolating RNAs from a random sequence population that has the highest affinity for a target protein, to identify the RNA components required for RegA binding without the requisite need for translation. Fifteen rounds of selection yielded a clear consensus. Surprisingly, the consensus was not a structural motif as is generally the case for RNA-binding proteins; rather the consensus was a specific sequence. Results reported here led us to propose that RegA recognizes its targets in a sequence-preferred structureindependent manner.

EXPERIMENTAL PROCEDURES
Protein Purification-The RegA protein was purified as described (17).
Selection of RNA Ligands for RegA-A nucleic acid library possessing 5Ј and 3Ј fixed regions surrounding a 30-nucleotide randomized region was generated as described (18). 10 15 RNA molecules comprising approximately 10 14 unique sequences were incubated in 100 l of RegA buffer (10 mM Hepes, pH 7.2, 100 mM NaCl, 5 mM MgCl 2 , and 0.01 mM dithiothreitol) with 10 M RegA for 5 min at 25°C. The binding reactions were applied to nitrocellulose filters, which preferentially retain RNAs that are bound to protein, and the filters were washed with 10 ml of RegA buffer. The protein-bound RNA was extracted from the protein/ filter as described (19). The RNA was reverse-transcribed using the 3Ј primer 3G1 (GCC GGA TCC GGG CCT CAT GTC GAA), and PCR amplification was carried out using both 3G1 and the 5Ј primer 5G1 (CCG AAG CTT AAT ACG ACT CAC TAT AGG GAG CTC AGA ATA AAC GCT CAA). The resulting PCR product was transcribed with T7 RNA polymerase (20). The RNA was gel-purified (19) and used in the subsequent round of RegA binding. This process was continued with decreasing amounts of RegA (to increase selection stringency) for 15 cycles. The round 15 PCR product was restricted with HindIII and BamHI and ligated into similar sites of pUC 18, and the resulting plasmids were used to transform DH5␣ as described (21). Clonal inserts were sequenced using standard methods.
RNA Ligand-RegA Binding Affinities-Dissociation constants for the interactions between RegA and various ligands were determined using an electrophoretic mobility shift assay (22). 50 pM RNA that had been 5Ј end-labeled with 32 P by T4 polynucleotide kinase was incubated with various concentrations of RegA (0.1 nM-1 M) in 10 l of RegA buffer at 25°C for 5 min. The bound and unbound RNAs were separated by electrophoresis through a non-denaturing 8% polyacrylamide gel. For the quantitative K d analysis shown in Table I, the relative amounts of bound and unbound RNA in each lane were quantified by scintillation counting of appropriate bands.
Boundary Analysis-The minimal 5Ј sequence required for the binding of RegA was determined basically as described (23). Five picomoles of partially hydrolyzed 5Ј end-labeled RNA was incubated with 5 or 15 nM RegA in 500 l of RegA buffer at 25°C for 5 min. The RegA-bound RNA fragments were separated from the unbound fragments by nitrocellulose filter binding. Bound RNAs were extracted as above and size-separated by polyacrylamide gel electrophoresis. Partial RNase T1 digests were performed as described (23).
Generation of RNAs with 3Ј Disruptions-An oligonucleotide of sequence 5Ј-CCGGGCCTTTTGTCGAATT-3Ј was used in PCR reactions along with the 5Ј primer used in the selection and DNA preparations of the various clones to provide a template whose transcription product possessed a disrupted 3Ј-RegA-binding site. The RNAs were internally labeled by including [␣-32 P]UTP in the transcription reactions.

RESULTS
Translational repression of several early T4 genes results from the binding of the regA gene product to specific mRNAs near their respective AUG start codons, preventing the initiation of translation (9). Although the sequences and putative RegA-binding sites of several of these mRNAs are known, a shared primary or secondary structure is not obvious (11). SELEX was used to uncover the binding site specificity of RegA. Fifteen cycles of repressor binding, partitioning, and amplification of selected sequences reduced the dissociation constant of the RNA population for RegA from approximately 10 M to 20 nM (data not shown). The round 15 population was cloned, and 24 of the clones were sequenced.
Within the 30-nucleotide variable region of the selected ligands are two highly conserved domains, one covering the 13 5Ј nucleotides and the second covering the 4 3Ј nucleotides (Fig.  1). The 5Ј consensus sequence, AAUUGUUAUGUAA, pos-sesses what we believe is the AUG start codon observed in all of the native mRNA targets of RegA. The 3Ј consensus of AAAA is interesting when the 3Ј fixed region of UUCGACAUG is taken into account. The resulting sequence of AAAAUUCGA-CAUG compares favorably with the 5Ј consensus of AAAAUU-GUUAUGUAA, where the first two As in the latter sequence are provided by the 5Ј fixed region. The similarities between the two independent sites within the selected RNAs suggest the existence of two binding sites for RegA on each molecule.
Because the putative 5Ј-binding site is made up almost entirely of nucleotides that were selected from the variable region, the relative conservation of each of the positions is quite telling as to the nature of the RegA binding interaction. The AUG is absolutely conserved among the ligands, suggesting that it is the primary effector of binding affinity. The poly(A) tract is also highly conserved, with its apparent optimum position being five nucleotides upstream of the AUG. The UU immediately following the upstream poly(A) tract and the AA following the AUG are also present in Ͼ90% of the ligands. The remaining positions are less conserved, although the level of conservation of these sequences within the putative binding domain is still quite high (Ͼ75%).
Binding affinities between RegA and specific ligands were measured using a gel mobility shift assay. The two sequences that occurred in multiple clones were PCR-amplified from plasmid DNA, transcribed, and radiolabeled. These two ligands were incubated with a range of RegA concentrations, and the bound and unbound RNAs were separated by non-denaturing polyacrylamide gel electrophoresis. Protein binding, as witnessed by a mobility shift, was first observed in the low nM range for both ligands, with their dissociation constants occurring at approximately 5 nM (Fig. 2, A and C). At RegA concentrations of approximately 20 nM, a third, slower migrating band was apparent (Fig. 2, A and C). All of the label shifted to this region of the gel at the highest concentration of protein. A simple explanation for these results is that a single binding event occurs at lower RegA concentrations and that an additional protein binds to each ligand at higher concentrations yielding the second shift. This is consistent with the sequence analysis above that suggests that two RegA-binding sites exist on each of these RNAs.
The AUG of the putative second binding site was altered to UUU, and the 10 3Ј-most nucleotides were removed to test whether a single binding event could be observed. The modified ligands produced only a single shift (Fig. 2, B and D) and displayed affinities for RegA that were the same as the fulllength molecules from which they were derived. These findings are consistent with there being two independent RegA-binding sites associated with each of the two ligands. The 5Ј-binding sites of the two RNAs are apparently the higher affinity sites, with dissociation constants for RegA of approximately 5 nM.
Although the sequence data suggest a relative size of the RegA-binding site based on the region of greatest homology, a physical measure of the minimal domain required for high affinity binding to the protein was acquired using the boundary FIG. 1. Consensus sequence from RegA SELEX. The two smaller As at the 5Ј end represent the last two nucleotides at the 3Ј end of the 5Ј fixed region. Our proposed RegA-binding site is in bold. A total of 25 isolates were sequenced. Below each position is the frequency at which that position varies and the base substitution. The two As from the 5Ј fixed and the AUG are invariable. assay (23). Each of the four RNAs described in the binding experiments above were 5Ј end-labeled and subjected to partial alkaline hydrolysis. The resulting RNA fragments were bound to RegA and passed through nitrocellulose filters. The fragments that still possessed the protein-binding site were preferentially retained on the filters. The recovered RNA fragments were characterized via polyacrylamide gel electrophoresis and autoradiography (Fig. 3).
The two full-length RNAs yielded two distinctive boundaries, the more efficiently retained occurring up to the AUG of the putative 3Ј-RegA-binding site and a second boundary occurring at the 3Ј end of the UAA of the putative 5Ј-binding site. These data are consistent with there being two independent binding sites. The presence of two binding sites could either increase the relative affinity of the RNA for protein or increase the efficiency of filter retention by increasing the amount of protein bound per RNA, thus enhancing the percentage of fragments being retained during partitioning. There is binding after the 3Ј site is disrupted so long as the 5Ј site is maintained, but once the fragments lose nucleotides at the 3Ј end of the proposed 5Ј-binding site, filter retention is lost altogether. The RNAs with the 3Ј site disruptions displayed only a single boundary at the 3Ј end of the predicted 5Ј-binding site (Fig. 3).
Partial nuclease digests of the full-length and 3Ј site-disrupted versions of ligand A indicated that most of the nucleotides were sensitive to the single-strand-specific RNases (Fig.  4). The addition of two concentrations of RegA to the various RNA/RNase reactions resulted in the protection of specific bases from nuclease attack. Complete protection of the bases between positions A 22 3. Boundary analysis of two ligands. T1 indicates that the RNAs were subjected to partial digestion with RNase T1. Alk denotes that RNA were partially alkaline-hydrolyzed. The RNAs in the remaining lanes (Bndry) were alkaline-hydrolyzed and bound to RegA (5 or 15 nM) before nitrocellulose filtration. between positions G 44 and U 59 in the full-length version of ligand A are completely protected from nuclease attack by both concentrations of repressor (Fig. 4A). This second RegA footprint covers the putative 3Ј-binding site of the RNA. Except for positions U 48 and U 58 , which are partially protected by 500 nM RegA, the corresponding bases in the 3Ј site-disrupted version of ligand A are accessible to the nucleases in the presence or absence of RegA (Fig. 4B). The loss of the footprint at the 3Ј end of the second ligand is once again consistent with the assertion that two RegA binding events occur on the full-length ligands and that binding at the 3Ј site is dependent on there being an AUG.
The apparent lack of RNA secondary structure observed in the nuclease digestions was investigated further using several single-strand-specific base-modifying reagents. The nucleotides making up the putative RegA-binding sites for ligands A and B were completely sensitive to the reagents (Figs. 5A and 4B). In fact, very few of the bases in the entire RNAs escaped modification, indicating that the ligands were devoid of structures that rely on Watson-Crick base pairing. These data suggest that RegA interacts with its RNA-binding sites in a structure-independent manner, a property that is unlike the well characterized RNA-binding proteins.
Because none of the mRNAs affected by RegA in vivo possess the SELEX-generated consensus sequence, it was of interest to understand the relative effect that alterations from the consensus have on RegA binding affinity. Using RNAs whose 3Јbinding site had been disrupted as above, the consensus sequence and several variants were tested for RegA binding. As seen in Table I, the UUGUU region 5Ј to the AUG and the UAA 3Ј to the AUG can undergo single base changes with little effect on binding. Thus the binding data are consistent with the sequence data that indicated that these two domains are important, but not essential, for RegA binding. Mutations in the AUG probably have a much greater effect as witnessed by the apparent loss of RegA binding caused by mutating the AUG of the 3Ј-binding site to UUU (Fig. 2). In contrast to the slight effect that single base changes have on RegA binding, multiple changes cause more significant decreases in affinity (Table I). This suggests that multiple mutations in this conserved region may have an additive effect on the interaction, which can have a profound consequence on the binding affinity of the site.

DISCUSSION
The T4-encoded RegA is one of the few known proteins that regulates the translation of multiple transcripts (26). The mechanism by which the protein discriminates between messages has remained a mystery, as the makeup of the start sites of the affected mRNAs are not statistically unlike those that are unaffected (11). Using the SELEX protocol, a set of RNAs were generated that possessed high affinity for RegA. Characterization of these ligands indicates that the RegA consensus binding site is AAAAUUGUUAUGUAA. Stable secondary structures that rely on canonical base pairs do not exist for the consensus, suggesting that RegA discriminates between RNAs based on primary sequence rather than secondary structure. The absence of Watson-Crick base pairing in the RNA binding domain of the SELEX ligands is supported by chemical modification and nuclease digestion results. A lack of secondary structure potential has likewise been observed in the in vivo binding sites for the protein (27). The consensus sequence suggests that the relative site of interaction between RegA and the affected mRNAs includes the first two codons of the message plus the nine nucleotides immediately upstream.
The sequences of the T4 mRNAs that are known to be regulated by RegA were compared with the SELEX-generated consensus using the AUG start codon for alignment. In addition to the AUG, most of the RNAs possess a UU 5Ј to the translational start site and a poly(A) tract two to eight nucleotides upstream (Fig. 6). Although these features are not at fixed distances from the AUG, their presence is suggestive of similar interactions between substrate and repressor. Interestingly, the mRNAs display varying degrees of similarity to the consensus binding site, with the regA gene itself being the least similar. If the binding affinity of RegA for the mRNAs correlates with their similarity to the consensus, then the translation of the various genes would be halted successively until the regA gene was repressed. All mRNAs that were less similar to the consensus than the regA gene would not be repressed, as the concentration of RegA would be held below a threshold level that was a function of the binding affinity of the RegA-binding site of the regA gene.
The SELEX-generated consensus sequence has a stop codon following the AUG start in the RegA-binding site; thus, the highest affinity site for RegA could not exist within an mRNA that encodes a polypeptide. Why would a translational repressor be selected with such an RNA-binding site? A possible explanation could be that mRNA binding is actually a secondary function and that RegA is optimized to bind a cellular RNA. The T4 genome was searched for sequences that match the consensus. No exact matches of the consensus RegA-binding site were found, but several sites with properly aligned sequences (upstream poly(A), U/G tract, AUG, and UAA) were uncovered. One of the most similar sequences was AAAAAUAUUAUGUAA, which is located within the dam gene (28). This region of the T4 genome has been heavily studied because it supports T4-dependent replication. Current opinion holds that the region is transcribed by the host RNA polymerase, giving rise to an RNA that primes T4 replication. If a transcript is produced from this region of the T4 genome, then the high affinity site for RegA actually exists in vivo during a time that RegA is produced (29,30). It is postulated that RNA-protein and protein-protein interactions involving RegA localize the various components of the T4 replisome to origins of replication. The presence of a potentially high affinity RegAbinding site within a T4 genomic domain associated with a replication origin is intriguing, as it provides a possible mechanism for localizing replisome assembly within the cell to a region where replication is initiated.