Highly Specific Recognition of Primer RNA Structures for 2′-OH Priming Reaction by Bacterial Reverse Transcriptases*

A minor population of Escherichia coli contains retro-elements called retrons, which encode reverse transcriptases (RT) to synthesize peculiar satellite DNAs called multicopy single-stranded DNA (msDNA). These RTs recognize specific RNA structures in their individual primer-template RNAs to initiate cDNA synthesis from the 2′-OH group of a specific internal G residue (branching G residue). The resulting products (msDNA) consist of RNA and single-stranded DNA, sharing hardly any sequence homology. Here, we investigated how RT-Ec86 recognizes the specific RNA structure in its primer-template RNA. On the basis of structural comparison with HIV-1 RT, domain exchanges were carried out between two E. coli RTs, RT-Ec86 and RT-Ec73. RT-Ec86 (320 residues) and RT-Ec73 (316 residues) share only 71 identical residues (22%). From the analysis of 10 such constructs, the C-terminal 91-residue sequence of RT-Ec86 was found to be essential for the recognition of the unique stem-loop structure and the branching G residue in the primer-template RNA for retron-Ec86. Using the SELEX (systematic evolution of ligands by exponential enrichment) method with RT-Ec86 and primer RNAs containing random sequences, the identical stem-loop structure (including the 3-U loop) to that found in the retron-Ec86 primer-template RNA was enriched. In addition, the highly conserved 4-base sequence (UAGC), including the branching G residue, was also enriched. These results indicate that the highly diverse C-terminal region recognizes specific stem-loop structures and the branching G residue located upstream of the stem-loop structure. The present results with seemingly primitive RNA-dependent DNA polymerases provide insight into the mechanisms for specific protein RNA recognition.

scriptase (RT) encoded by a retron is essential for the msDNA synthesis. A single RNA transcript (primer-template RNA) from a retron is used as primer as well as template for msDNA synthesis; the 5Ј-end and the 3Ј-end regions of the transcript form a stable duplex placing a specific G residue at the end of the duplex. The 2Ј-OH group of the G residue is used by the retron RT to prime cDNA synthesis using the same RNA strand as template. The template RNA used for cDNA synthesis is removed by ribonuclease H, and cDNA synthesis stops at a specific site on the template for individual msDNAs. As a result, the msDNAs identified so far have the following characteristics: the 5Ј-end of a single-stranded DNA from 48 to 163 bases in length links to the 2Ј-OH group of an internal G residue (termed the branching G residue) of a single-stranded RNA of 50 to 120 bases in length (8); both DNA and RNA molecules contain stable secondary structures (10); and the 3Ј-ends of both DNA and RNA molecules are complementary to each other, forming a DNA-RNA hetroduplex (1) (see Refs. 13 and 14 for reviews).
Notably, individual msDNA synthesis is highly specific to individual retrons. RTs are not exchangeable between two retrons, unless the msr region in a retron is replaced from the same retron of RT (15). The msr region encodes a short singlestranded RNA part of msDNA. This RNA forms one or two stable stem-loop structures; the RNA structures in msDNA-Ec73 and msDNA-ms86 are shown in Fig. 1, A and B, respectively. As can be seen, there are virtually no primary sequence homologies between the two RNA molecules. This unique structural feature is considered responsible for the priming reaction for individual msDNA synthesis.
In the present paper, we have attempted to determine how RT could specifically recognize a primer-template RNA to prime cDNA synthesis from the 2Ј-OH group of the branching G residue. In particular, using two RTs from E. coli retrons Ec73 (17) and Ec86 (4), we identified the regions that are essential for the recognition of their cognate primer-template structures. These two RTs, RT-Ec73 and RT-Ec86, consist of 316 and 320 amino acid residues, respectively, sharing only 71 identical residues (22% of RT-Ec73). Results from the domain analysis for RNA recognition indicate that presumed primitive DNA polymerases from bacterial retrons have a highly diversified C-terminal region consisting of approximately 90 residues, which plays an essential role in the recognition of a unique stem-loop structure and its upstream sequence containing the branching G residue. It appears that the C-terminal domains of individual bacterial RTs have evolved to recognize highly specific RNA structures for the 2Ј-OH primer reaction, and they share a structural homology for DNA polymerase common to all RTs from the prokaryotes to the eukaryotes. The present results also raise interesting questions as to the molecular mechanisms of the priming reaction of bacterial RTs and the mechanisms of RNA-protein interaction.

EXPERIMENTAL PROCEDURES
Materials-[␣-32 P]dCTP, [␣-32 P]CTP, and [␣-35 S]dATP were purchased from Amersham Pharmacia Biotech. Taq polymerase and T4 ligase were purchased from Roche Molecular Biochemicals. Restriction enzymes were from Roche Molecular Biochemicals and New England Biolabs. DNA sequencing was performed using a Sequenase kit from U. S. Biochemical Corp. The TA vector kit was purchased from Invitrogen.
Bacterial Strains-E. coli JM83 (19) was used for propagation of plasmids, and E. coli LE392 (DE3), a K-12 strain caring the gene for T7 polymerase (16), was used for purification of various constructs of RTs.
Construction of Various Hybrid RTs-To exchange C-terminal regions between RT-Ec73 and RT-Ec86, an MspI site was created in their VTGL coding region by changing the ACA codon for Thr-244 to ACC by site-directed mutagenesis (17). After the DNA sequence was confirmed, the 240-base pair MspI-BamHI fragments were isolated from pET11RT73 and pET11RT86 and exchanged between them. Other hybrid RTs were constructed using two-step PCR with proper primers by the method described previously (16). All RT genes were cloned in pET11(km) under the control of a T7 promoter and His 6 -tagged at their C-terminal ends except for RT-Ec73. pET73RT(His), in which His 6 was added at the N-terminal end, was constructed previously (16).
Purification of Various Hybrid RTs-Purification of RTs was performed using the method described previously (16) with some modifications. Cells were grown in M9 medium supplemented with 0.2% Casamino acids (Difco), 0.4% glucose, 20 g/ml tryptophan, 2 g/ml thiamine, 0.8 mM MgSO 4 , and 50 g/ml kanamycin at 37°C up to 90 klett units. Then, cells were harvested by centrifugation at room temperature and suspended in the same volume of L broth medium (18). After a 15-min incubation at room temperature, isopropyl-␤-D-thiogalactopyranoside was added to a final concentration of 1 mM, and the culture was incubated for 1 h at room temperature. Cells were harvested and broken by French Press, and the membrane and soluble fractions were separated by centrifugation (100,000 ϫ g for 30 min). RTs were isolated from only soluble fractions using Ni-nitrilotriacetic acid (NTA) affinity resin (Qiagen) chromatography. After elution of RTs with 100 mM imidazol, RTs were dialyzed against dialysis buffer (16) and stored in a storage buffer (50 mM Tris-HCl, pH 7.5, 5 mM ␤-mercaptoethanol, 10% glycerol, 0.1% Nonidet P-40, 0.2 M NaCl) at Ϫ80°C.
Preparation of RNA by T7 Polymerase-DNA fragments corresponding to Ec73msr-msd30, Ec86msr-msd48, Ec86msr, Ec86msr⌬a, and Ec86msr⌬b were amplified by one-step or two-step PCR as described previously (16), using the primers listed in Table I, and cloned into pUC9 (19) or pCR2.1 (Invitrogen). After confirming the DNA sequences, DNA fragments digested by EcoRI were isolated by polyacrylamide gel electrophoresis. Because amplified fragments contained a T7 promoter sequence, the EcoRI fragments were used for in vitro transcription by T7 RNA polymerase. Preparation of the RNAs by T7 RNA polymerase was performed by the method described previously (16) using [␣-32 P] CTP. Fifty ng of the purified EcoRI fragment were mixed with transcription buffer (40 mM Tris-HCl, pH 7.5, 6 mM MgCl 2 , 2 mM spermidine, 10 mM NaCl, 10 mM dithiothreitol, 0.5 mM each ATP, GTP, and UTP, and [␣-32 P]CTP), and 40 units of RNase inhibitor and 40 units of T7 RNA polymerase (both from Roche Molecular Biochemicals) to a total volume of 100 l. The reaction mixture was incubated at 37°C for 15 min, and then 0.5 mM CTP was added. After a 1-h incubation, the mixture was treated with 30 units of RNase-free DNase for 15 min at 37°C and then extracted with phenol and chloroform. A 1/10 volume of 3 M sodium acetate (pH 5.1) and 3 volumes of ethanol were added, and the mixture was placed at Ϫ70°C for 30 min and then centrifuged (13, 500 ϫ g, 10 min) to remove unincorporated nucleotides. The precipitated RNA fraction was redissolved in 100 l of 0.3 M Na0Ac and divided into 10 portions. Three volumes of ethanol were added, and tubes were stored at Ϫ70°C separately until needed. Immediately before use for binding assay, the RNA was precipitated by centrifugation.
Binding Assay-RNA prepared was solubilized in an annealing buffer (50 mM Tris-HCl, pH 8.0, 10 mM MgCl 2 ) in 1 pmol/l. The mixture was incubated at 95°C for 2 min, 37°C for 30 min, and 4°C for 30 min for the formation of secondary structures. RT was preincubated in a binding buffer (the final concentration of 10% glycerol, 10 mM Tris-HCl, pH 7.8, 2 mM dithiothreitol, 25 mM MgCl 2 , 1 mg/ml tRNA) for 10 min at room temperature. After 10 min, 1 l of an anealed RNA and 1 l of 10 mM ATP, and H 2 O to make 20 l was added. The reaction mixture was incubated for 30 min at room temperature. A typical reaction mixture contained 1 pmol of RNA and 4 pmol of RT in 20 l. The reaction was stopped by adding 5 l of 5ϫ dye mixture (25% glycerol, 0.1 M EDTA, 0.025% bromphenol blue, and 0.025% xylene cyanol), and the sample was analyzed by 8% polyacrylamide gel electrophoresis at 4°C.
SELEX Method-Two randomized oligmers, oligo 8081 for RNA I and oligo 7498 for RNA II, 5Ј-end oligo 8082, and 3Ј-end oligo 6817, all used for the SELEX method, are listed in Table II. First, a doublestranded DNA was synthesized by the Klenow enzyme using oligo 7498 and oligo 6817 as template and primer, respectively. Two hundred pmol of oligo 7498 and 600 pmol of oligo 6817 were annealed in 50 l of a buffer containing 10 mM Tris-HCl (pH 7.5), 10 mM MgCl 2 , 6 mM NaCl, and 6 mM ␤-mercaptoethanol, and then 1 l each of a 2 mM dNTP mixture and the Klenow enzyme were added. After incubation at room temperature for 30 min, the mixture was applied to an 8% polyacrylamide gel, and the band at 75 base pairs was cut out. DNA fragments were eluted and precipitated. 50 -100 ng of the double-stranded DNA fragment, oligo-(7498), were transcribed by T7 RNA polymerase using ATP, UTP, CTP, and GTP each at 2.5 mM.

Bacterial RTs Recognize Only Their Cognate
Primer-Template RNAs-A minor population of wild E. coli strains contains highly diverse retrons producing their own specific msDNAs with little homology among them. To date, seven E. coli retrons have been identified: retron-Ec67 (2) retron-Ec86 (4), retron-Ec73 (7), retron-Ec107 (1), retron-Ec83 (3), retron-Ec78 (5), and retron-Ec48 (6). RTs encoded by these retrons are highly diverse; the identities between any two E. coli RTs are less than 30%, and there are virtually no homologies in primary sequences between any two primer-template RNA molecules. The only invariable features among the primer-template RNAs are the branching G residues, located at the end of a stem formed in the primer-template RNA (See Fig. 1), and a highly conserved 4-base sequence, UAGC, in which the G residue corresponds to the branching G residue. Downstream of the G residue, there are always stable secondary structures, which have been shown to be essential for RT recognition of its cognate primer-template RNA (15).
To investigate the mechanisms for the highly selective recognition of specific RNA structures by individual bacterial RTs, we first constructed T7 systems to produce two RNA transcripts, Ec73msr-msd30 ( Fig. 1A) and Ec86msr-msd48 (Fig.  1B). These RNAs are truncated versions of the primer-template RNAs for RT-Ec73 and RT-Ec86, respectively; the former RNA was produced from a mutated retron-Ec73 containing a 43-base pair deletion in the msd region (the cDNA template region of msDNA-Ec73). The mutated retron-Ec73 was still capable of efficiently producing an msDNA, called msDNA-miniEc73, consisting of only 30-base single-stranded DNA in vivo (21). Similarly, the latter RNA was produced from a mutated retron-Ec86 containing a 38-base pair deletion in the msd region. This mutated retron produced msDNA-miniEc86, consisting of a 48-base single-stranded DNA (not shown). These miniprimertemplate RNAs were designated Ec73msr-msd30 (114 bases) and Ec86msr-msd48 (133 bases), respectively, and used for the present experiment, because these RNAs were more suitable for cell-free msDNA synthesis and gel mobility shift experiments than the full-length primer-template molecules.
First, purified RTs RT-Ec73 and RT-Ec86 were tested for their specific binding to the miniprimer-template RNAs by gel retardation analysis. As shown in Fig. 1C, when mini-Ec73 RNA was used, only RT-Ec73 bound to the RNA (lane 2), but RT-Ec86 did not (lane 5). Similarly, when mini-Ec86 RNA was used for gel retardation assay, RT-Ec86 bound to the RNA (Fig.  1D, lane 5), but RT-Ec73 did not (lane 2).
Identification of Domains for Specific RNA Recognition-The results presented above demonstrate that bacterial RTs can recognize only their cognate primer-template RNAs. To identify the unique domain structures in the bacterial RTs, which are responsible for the specific RNA recognition, we next attempted to align RT-Ec73 and RT-Ec86 with HIV-1 RT.
Bacterial RTs and eukaryotic RTs have been shown to be evolutionarily related to each other, and phylogenetic interrelationships have also been proposed (22,23). On the basis of the three-dimensional structure of HIV-1 RT and sequence similarities between bacterial RTs and HIV-1 RT (24,25), domain assignments were proposed for RT-Ec73 and RT-Ec86 (26). Fig. 2A shows a modified version of the previous alignment, where 15 ␤-strand structures, from ␤0 to ␤14, and 10 ␣-helical structures, from ␣A to ␣J, of HIV-1 RT are aligned to bacterial RTs. In these alignments, there are 22 highly conserved residues among the three RTs, including three Asp residues (Asp-110, Asp-185, and Asp-186 in HIV-1 RT; marked by solid circles above the sequences shown in Fig. 2A). These Asp residues, the only invariant residues among all known RTs, form the catalytic triad essential for DNA polymerase activity (25). Between RT-Ec73 and RT-Ec86, there are 71 identical residues (22% identity) and 46 similar residues.
In the structural assignments shown in Fig. 2A, the C-terminal regions, consisting of ␣G, ␣H, ␣I, and ␣J, correspond to the thumb domain of HIV-1 RT (25). These regions are particularly highly diversified among bacterial RTs, suggesting that these regions may be involved in recognition of unique primertemplate RNA structures for individual RTs. In these alignments, there are two regions, X and Y (boxed in Fig. 2A), that are unique in bacterial RTs. Note that in the Y sequence there is the highly conserved sequence, VTGL (marked by open circles in Fig. 2A), which is found in all bacterial RTs known today (26).
To identify the RT regions required to specifically recognize the individual primer-template RNA, a number of hybrid proteins between RT-Ec73 and RT-Ec86 were constructed as shown in Fig. 2B. First, we exchanged the C-terminal region at the highly conserved VTGL sequence between RT-Ec73 and RT-Ec86 by using a MspI site created within the VTGL-corresponding region. In RT-Ec73/86, the C-terminal fragment of RT-Ec73 downstream of the VTGL region (residue 238 -241) was exchanged with the corresponding fragment from RT-Ec86 (see Fig. 2B, construct a). Similarly, in RT-Ec86/73 the Cterminal fragment downstream of the VTGL sequence (residue 243-246) was exchanged with the corresponding fragment from RT-Ec73 (see Fig. 2B, construct b). The resulting hybrid RTs, RT-Ec73/86 and RT-Ec86/73 (see Fig. 2B, constructs a and b, respectively), were tested for binding to miniprimer-template RNAs (Fig. 1, A and B). Gel retardation assays are shown in Fig. 1, C and D. When the mini-Ec73 RNA was used, RT-Ec86/73 but not RT-Ec73/86 bound to the RNA (Fig. 1C, lanes  4 and 3, respectively). Conversely, when the mini-Ec86 RNA was used, RT-Ec73/86 but not RT-Ec86/73 bound to RNA (Fig.  1D, lanes 3 and 4, respectively). These results indicate that the Y sequence and/or the C-terminal fragment following the Y sequence play a crucial role in the specific RNA recognition by RTs. In addition to these sequences, however, the N-terminal fragment upstream of the Y sequence is required for msDNA synthesis in vivo and also seems to participate in the specific recognition of the RNA structure as discussed below.
Specificity of Recognition of Stem-Loop Structures-We next attempted to dissect its primer-template RNA using RT-Ec86 and its primer-template RNA (Ec86msr-msd48 RNA; see  3D, lane 2) as the original primer-template RNA (Ec86msr-msd48 RNA; see Fig. 1D, lane 5). Next, stem-loop structures a and b were separately eliminated from Ec86msr RNA to form either Ec86msr⌬a RNA (deletion of the entire 25-base stemloop structure a) or Ec86msr⌬b RNA (deletion of the entire 17-base stem-loop structure b) (see Fig. 3, B and C, respectively). These RNAs were used for gel retardation assay with RT-Ec86. As shown in Fig. 3E, Ec86msr⌬a RNA (1 pmol) was completely retarded by RT-Ec86 (4 pmol) (lane 3) under the identical condition used for Ec86mrs RNA, shown in Fig. 3D. In contrast, when Ec86msr⌬b RNA was used, RT-Ec86 hardly bound to the RNA (Fig. 3F). Even if the RNA amount was increased to 8 pmol (lane 6), only a faint retarded band was observed. Note that the mobility of Ec86msr⌬a RNA was not retarded at all by bovine serum albumin (BSA; Fig. 3E, lane 2). The retardation of Ec86msr⌬a RNA by RT-86 was completely inhibited in the presence of a 10-or 30-fold excess of nonradioactive Ec86msr-msd48 RNA (Fig. 3E, lanes 4 and 5, respectively) but not in the presence of Ec73msr-msd30 RNA (lanes 6  and 7). These results demonstrate that stem-loop structure b is specifically recognized by RT-Ec86 and that stem structure a has a weak requirement for recognition. Preliminary experiments revealed that the K d values of RT-Ec86 and RT-Ec73/86 for Ec86msr RNA were at the level of 10 Ϫ8 M as measured by a gel retardation assay (not shown).
Roles of X and Y Sequences in RNA Recognition-Earlier, we showed that the Y sequence and/or the C-terminal fragment downstream of the Y sequence are responsible for specific RNA recognition (Fig. 1). To further characterize the role of the Y and X sequences, these sequences or the segments containing these sequences were exchanged between RT-Ec73 and RT-Ec86, resulting in eight new constructs, from c to j, as shown in Fig. 2B. These hybrid RTs were purified and then tested for their abilities to bind to Ec73msr-msd30 RNA (Fig. 1A) and Ec86msr RNA (Fig. 3A). The results are shown in Fig. 5, A and  B.
When Ec73msr-msd30 RNA was used, the following four constructs caused mobility shift as judged from the formation of new upper bands accompanied with density reduction at the position of free RNA: RT-Ec73 (Fig. 4A, lane 1); RT-Ec73X86 (Fig. 2B, construct f;  following 52-residue C-terminal fragment, both of which are derived from RT-Ec73. These results indicate that the entire C-terminal region of RT-Ec73, including its Y sequence, is required for the recognition of the template-primer RNA. Interestingly, the C-terminal 91-residue fragment from RT-Ec86 (construct j; RT-Ec86(230 -320)) bound to the template-primer RNA for RT-Ec73. This loss of RNA-binding specificity by the C-terminal fragment suggests that the N-terminal fragment upstream of the Y sequence may also be involved in the specific RNA recognition.
When binding was tested with the RT-Ec86 primer RNA in addition to RT-Ec86 (Fig. 4B, lane 12), the following hybrid proteins were found to bind to the primer RNA: RT-Ec73/86 (construct a in Fig. 2B; Fig. 4B, lane 5); RT-Ec73XYC86 (construct i; the X sequence as well as the Y sequence plus the following C-terminal region of RT-Ec73 were exchanged with those from RT-Ec86; lane 9); RT-Ec86/73/86 (construct d; the sequence of RT-Ec86 between X and Y sequences were exchanged with that of RT-Ec73; lane 10); and RT-Ec86(230 -320) (construct j; lane 11). Again, these results indicate that only when the hybrid proteins contain the C-terminal region downstream of the Y sequence of RT-Ec86 (see Fig. 4B, lanes 5, 9, and 10) can they bind to the RT-Ec86 primer RNA. This confirms the previous conclusion that the C-terminal sequence downstream of the Y sequence is responsible for the primer recognition. It should be noted that the Y sequence alone was not sufficient for the specific recognition because the exchange of only the Y sequence of RT-Ec73 with that of RT-Ec86 (RT-Ec73Y86; construct g in Fig. 2B) did not cause gel mobility shift with both RT-Ec73 and RT-Ec86 primer RNAs (Fig. 4, A and B,  lanes 4). In addition to this Y sequence exchange, the exchange of the X sequence (RT-Ec73XY86; construct h in Fig. 2B) did not cause binding of the hybrid protein to the RT-Ec86 primer RNA (Fig. 4B, lane 8). Furthermore, the deletion of the Y sequence from RT-Ec73 (RT-Ec73⌬Y; construct e in Fig. 2B) abolished its RNA binding (Fig. 4A, lane 2). These results indicate that both the Y sequence (consisting of 37 and 35 residues for RT-Ec73 and RT-Ec86, respectively) and the Cterminal sequence following the Y sequence (51 and 52 residues, respectively) play an essential role in the primer RNA FIG. 2. Alignment of RT-Ec73 and RT-Ec86 with HIV-1 RT and construction of hybrid proteins. A, alignments were carried out by visual examination of the sequences and adopted with some modifications from the previous report (Ref. 26; identical residues are shaded in black, and functionally similar residues are shaded in gray. Structural assignments for ␣ and ␤ structures are from the x-ray structure of HIV-RT (24,25). X and Y sequences indicate the unique sequences found in bacterial RTs and other non-long terminal repeat RTs (26). The highly conserved VTGL sequence in the Y sequence is marked by open circles above the sequence, where the domain exchanges were carried out between RT-Ec73 and RT-Ec86. The residues marked with solid circles are three Asp residues, the only ones known to be invariant among all RTs, that form the catalytic triad essential for DNA polymerase activity (30). B, construction of hybrid proteins between RT-Ec73 and RT-Ec86. Exchanging of the C-terminal fragments between the two RTs was carried out by creating an MspI site within the regions corresponding to the highly conserved VTGL sequences in the Y sequence of RT-Ec86 (see legend for A and "Experimental Procedures"), and thus RT-Ec73/86 and RT-Ec86/73 were constructed. Additional exchanges of the N-terminal fragments, including the X sequence (RT-Ec73/86/73 and RT-Ec86/73/ 86), were carried out within the region of the highly conserved Asp residue immediately downstream of the X sequence (see legend for A and "Experimental Procedures"). In RT-Ec73⌬Y, the Y sequence from Ile-228 to Ser-264 of RT-Ec73 was deleted. The X sequence exchanges were carried out between the X sequence from Leu-55 to Lys-83 of RT-Ec73 and the X sequence from Leu-87 to Asn-113 of RT-Ec86 (see RT-Ec73X86, RT-Ec73XC86, and RT-Ec73XY86). The Y sequence exchanges were carried out between the Y sequence from Ile-228 to Ser-264 of RT-Ec73 and the Y sequence from Ile-234 to His-268 of RT-Ec86 (see RT-Ec73Y86 and RT-Ec73XY86). The C-terminal 9-residue fragment of RT-Ec86 (RT-Ec86(230 -320)) contained the fragment downstream of Lys-230. All of the constructs were His-tagged and expressed in a T7 vector as described under "Experimental Procedures." recognition, and that both sequences have to be derived from the same RT. Interestingly, however, the N-terminal domain upstream of the Y sequence either from the same RT protein or a different RT protein is also involved in RNA recognition specifically.
Recognition of Unique RNA Structures by RT-Ec86 -To de-

FIG. 4. RNA binding specificity of various hybrid proteins between RT-Ec73 and RT-Ec86.
A, gel retardation assay using Ec73msr-msd30 RNA (see Fig. 1A). The hybrid protein used is described above each lane. Designations of hybrid proteins are from Fig. 2B. RNA binding was carried out in a 20-l reaction mixture containing 1 pmol of RNA and 4 pmol of protein under the condition described under "Experimental Procedures." B, gel retardation assay using Ec86msr RNA (see Fig. 3A). Experiments were carried out as described for A. termine how RT-Ec86 recognizes specific structures in the Ec86msr⌬a RNA, the SELEX method (27) was applied using two differently randomized RNA molecules derived from Ec86msr⌬a RNA. In the first RNA (RNA I), region I, encompassing a 10-base sequence from the C residue 5 bases upstream of the branching G residue to the U residue 4 bases downstream of the branching G residue, was randomized (boxed in Fig. 5; Table II); and in the second RNA (RNA II), the 11-base stem-loop structure of stem-loop b was randomized (boxed in Fig. 5; Table II). After six cycles of SELEX using RNA I with RT-Ec86, 9 sequences were determined as shown in Fig.  5A. Of 9 sequences, 6 (A-1-A-6) contained the highly conserved 3-base sequence AGC (shown in bold), which includes the branching G residue. The AGC sequence at the branching G is found in 10 of 11 msDNAs so far identified: Mx162 (8), Ec86 (4), Ec67 (2), Mx65 (9), Ec73 (7), Sa163 (10), Ec107 (1), Ec83 (3), Ml162 (12), Ec78 (5), and Ec48 (21). In msDNA-Ec67, this sequence is replaced with AGA (2). The position of the branching G residue was quite accurately determined at the 5th base upstream of stem b together with the C residue at the 9th base, A at the 6th and C at the 4th residue. In two cases, the AGC sequence (A-7) or the AGA sequence (A-8) were closer to stem b by one base.
When RNA II was used with RT-Ec86, highly unique sequences were enriched as shown in Fig. 5B. Of 14 sequences determined, 10 (71%) were identical to the wild-type sequence (B-1-B-10). All of the remaining 4 sequences, despite base substitution to the wild-type sequence, still retain 4-base palindromic structures so that they are able to form a 4-base pair stem structure, except for the B-12 sequence, which has two mismatches in the stem. The wild-type sequence also forms a 4-base pair stem with a 3-base loop. Importantly, all but two (B-13 and B-14) have an identical loop structure consisting of 3 U residues. These results demonstrate that RT-Ec86 highly specifically recognizes the stem-loop structure found in the wild-type primer (structure b in Fig. 3A). It is interesting to note that the preferential sequences include the triple U sequence in the loop as well as two GU pairs rather than GC pairs in the stem.
The SELEX screening on RNA II was also performed with a hybrid RT, RT-Ec73/86 (construct a in Fig. 2B). Of 14 sequences determined, all but one (C-14 in Fig. 5C) again retained the ability to form a 4-base pair stem as in the case with RT-Ec86 (Fig. 5B). However, in most of these sequences, one or two base pairs were replaced with different base pairs. Such replacements occurred most frequently at the second base pair from the bottom; the UG base pair was replaced with either GU (C-1, C-5, C-6, C-7, and C-10), AU (C-8, C-9, and C-13), or CG (C-11 and C-12). It is interesting to note that no matter what kinds of replacements occurred, again there were always two GU pairs maintained in the stem structure except for C-4 and C-9, with one GU plus one AU pair, and C-10, with one GU plus two AU pairs. As for the loop structure, the UUU sequence preferred for RT-Ec86 was found in only 1 sequence (C-1). In 8 of 13 sequences, the first U residue was replaced with A, and only 3 sequences retained U at this position. Interestingly, the primer RNA for RT-Ec73 has an A residue at this position (see Fig. 5C). This A residue may be preferred because of the Nterminal domain of RT-Ec73/86 derived from RT-Ec73. Intriguingly, a U residue, without exception, occupied the second position; U is used for RT-Ec86, whereas G is used for RT-Ec73. The third position was mostly U, which is preferred by both RTs. These results confirmed the notion that the C-terminal domain including the Y region is responsible for the recognition of the specific secondary RNA structure.
cDNA Synthesis with Ec86msd-msr48 RNA and Its Inhibition by Ec86msr⌬a RNA-In a cell-free system for msDNA synthesis with RT-Ec86, Ec86msr-msd48 RNA (see Fig. 1B) can serve as a primer and a template (as shown in Fig. 6A). The cDNA synthesis was highly dependent upon the addition of dGTP and TTP when [␣-32 P]dCTP was used for labeling (lane 3). In the presence of only dATP and dCTP (lane 1) or dATP, TTP, and dCTP (lane 2) no msDNA synthesis was observed. The dependence of dCTP incorporation upon the addition of dGTP and TTP, but not upon the addition of dATP, is due to the template structure of Ec86msr-msd48 RNA, 3Ј CAGU-CUXXXXX 5Ј (1 base after the a1 sequence in Fig. 1B). Thus, the results obtained in Fig. 6A are consistent with the notion that the cDNA synthesis was initiated from the 2Ј-OH group of the branching G residue from which the cDNA, 5Ј GTCAGA 3 was elongated.
Next, we examined the inhibitory effect of Ec86msr⌬a RNA on the msDNA synthesis with Ec86msr-msd48. As shown in FIG. 5. Determination of RNA binding specificity of RT-Ec86 and RT-Ec73/86 by the SELEX method. The structure of Ec86msr⌬a RNA used for the SELEX method is shown at the top. Two different RNA molecules were used for the SELEX method; in RNA I the 10-base sequence from base 9 to 18 was randomized, and in RNA II the 11-base sequence from base 23 to 33 was randomized. These sequences are boxed and are marked I and II, respectively. The branching G residue is circled. A, sequence enrichment in RNA I with RT-Ec86 by the SELEX method. After six cycles, 9 clones were randomly picked up and their DNA sequences were subsequently determined. Highly conserved AGC or AG sequences and the highly conserved C residue at the second position are shown in bold. The sequence of Ec86msr⌬a, corresponding to the randomized region, is shown in bold and boxed at the bottom. B, sequence enrichment in RNA II with RT-Ec86 after six cycles of the SELEX method. Fourteen random clones were picked up, and their DNA sequences were determined as shown. Those bases, which are identical to the wild-type sequence shown at the bottom, are shown in bold. C, sequence enrichment in RNA II with RT-Ec73/86 after 6 cycle of the SELEX method. Fourteen random clones were sequenced as described for B. Note that sequences 7 and 14 have a 1-base insertion and a 1-base deletion, respectively. Those sequences for Ec86 RNA (bold) and Ec73 RNA are boxed at the bottom. Fig. 6B, as more Ec86msr⌬a RNA was added to the same reaction mixture used in Fig. 6A, mini-msDNA synthesis was inhibited more severely; when 5 pmol (5-fold) of Ec86msr⌬a RNA was added to the reaction mixture, the msDNA synthesis was inhibited by more than 90% (lane 4). At 10 pmol of Ec86msr⌬a RNA, the msDNA was almost completely blocked, indicating that Ec86msr⌬a RNA indeed binds to RT-Ec86, competitively inhibiting the msDNA synthesis from Ec86msr-msd48 RNA.
During the course of these experiments, we noticed that there is another band migrating at a position lower than mini-msDNA, as indicated by arrowhead a in Fig. 6B. The synthesis of this band is reciprocal to the synthesis of mini-msDNA-Ec86, as its highest production was observed when 10 pmol of Ec86msr⌬a RNA was used. Because this band production depends on the incorporation of [␣-32 P]dCTP, it was speculated that Ec86msr⌬a RNA may serve as primer and template to produce the band a product. Indeed, the sequence 3Ј GAGU 5Ј (from the 6th G residues to the 9th U residue from the 3Ј-end of the RNA) can form a duplex (see Fig. 6C), which allows the 2Ј-OH priming reaction at the branching G residue (circled) from the 11th U residue as template. Therefore, we tested the possible cDNA synthesis on the presence of Ec86msr⌬a RNA only. Fig. 6D demonstrates that the cDNA synthesis hardly occurs with only dATP and [␣-32 P]dCTP (lane 1) or with dGTP, TTP, and [␣-32 P]dCTP (lane 3). However, with dATP, TTP, and [␣-32 P]dCTP, a reasonable amount of cDNA is produced (lane 2). This cDNA is likely to be a tetranucelotide consisting of 5Ј ATTC 3Ј (see Fig. 6C). When all 4 bases are added, [␣-32 P]dCTP was extremely well incorporated, the oligonucleotide was further extended by several bases (lane 4). Again this is consistent with the secondary structure for E86msr⌬a RNA proposed in Fig. 6C. These results provide additional strong support for the specific biochemical function of Ec86msr⌬a RNA as essential for RT-Ec86 recognition. DISCUSSION Retrons are unevenly distributed in bacterial genomes. Retron-Mx162 from M. xanthus is the first retro-element ever found in the prokaryotes (1, 28), and it has been shown that all natural isolates of M. xanthus contain retron-Mx162, which produces msDNA-Mx162 (8). In contrast, in E. coli only a minor population of natural isolates contains retrons, which are highly diverse. Among seven retrons so far identified in E. coli (retron-Ec67 (2), retron-Ec86 (4), retron-Ec73 (7), retron-Ec107 (1), retron-Ec83 (3), retron-Ec78 (5), and retron-Ec48 (21)), there are virtually no sequence homologies in their msDNAs except for a few bases upstream of the branching G residue. RTs encoded by these retrons are also highly diverse, and RT-Ec86 and RT-Ec73, which are studied in the present paper, share only 22% identity. In other enterobacteria, retrons are also found in minor populations (11), indicating that the retroelements in enterobacteria were only recently integrated in their genomes, after these species had been established, whereas the retro-elements in M. xanthus were integrated into its genome before this species was established, during its evolution.
The msr region encoding a stem-loop structure(s) immediately downstream of the branching G residue in msDNA has been shown to be essential for the priming reaction for the RT encoded by the same retron that contains the msr region (15). If the msr region is exchanged with another retron's msr region, msDNA cannot be synthesized. Consistent with this finding, the present results clearly demonstrate that bacterial RTs bind quite tightly to their cognate msr-encoded RNA but not to RNA from another retron. Interestingly, of two stem-loop structures formed immediately downstream of the branching G residue in Ec86msrRNA (see Fig. 3A), only the second stem-loop structure is recognized by RT-Ec86. At present, it is not clear what the role of the first stem-loop structure is in msDNA synthesis.
In the present paper, we have demonstrated that in the FIG. 6. msDNA synthesis with RT-Ec86msr-msd48 RNA and its inhibition by Ec86msr⌬a RNA. A, cell-free msDNA synthesis was carried out with 4 pmol of RT-Ec86 and 1 pmol of Ec86msr-msd48 RNA as primer and template, respectively (see Fig. 1B). [␣-32 P]dCTP was used. The reaction products were analyzed on a 6% polyacrylamide-8 M urea gel. The positions of standard DNA fragments are shown by base numbers. B, inhibition of msDNA-Ec86 (mini) by Ec86msr⌬a RNA. The reaction was carried out as described for A except that a different amount of Ec86msr⌬a RNA was added in each reaction as indicated on the top of each lane. An arrowhead with the letter b indicates msDNA-miniEc86, and an arrowhead with the letter a is the cDNA synthesized using Ec86msr⌬a RNA as primer and template (see legend for C and "Results"). C, a possible secondary structure of Ec86msr⌬a RNA. The primary sequence is identical to that shown in Fig. 3B. However, the 4-base sequence near the 3Ј-end forms a duplex to the 4-base sequence immediately upstream of the branching G residue (circled). D, cDNA synthesis by RT-Ec86 with Ec86msr⌬a RNA. The reactions were carried out as described for A using 1 pmol of Ec86 msr⌬a RNA. specific recognition of the second stem-loop structure by RT-Ec86, the C-terminal region of the RT plays the major role. This region is assigned by sequence alignment with HIV-1 RT to encode the thumb domain of the RT molecule, consisting of ␤12, ␤13, ␤14, ␣G, ␣H, ␣I, and ␣J (24,25). Notably, this region is most diverse in bacterial RTs, which is consistent with the notion that this region is responsible for the recognition of the specific primer structure in the individual primer-temperate RNAs.
It is also interesting to note that the Y sequence contains two ␤-strands, ␤12 and ␤13, which form an element designated as the "primer grip" because of its proximity to phosphate joining the nucleotides at the primer end (24). Between these two ␤-strands is the VTGL sequence, which is highly conserved in bacterial RTs. It is tempting to speculate that the VTGL sequence indeed forms the loop between strands ␤12 and ␤13 and plays an essential role in orienting the branching G residue at the priming site. The enriching of the branching G residue by the SELEX method (Fig. 5) also suggests that the structure provides specific recognition at this site for a G residue.
Importantly, however, this G residue is not the 3Ј-end of the primer RNA for msDNA synthesis. The 3Ј-OH group of this G residue is already occupied by being connected to a long RNA strand by a 3Ј-5Ј-phosphodiester linkage. Therefore, the cDNA synthesis has to be initiated from the 2Ј-OH group of the G residue. The appropriate positioning of the G residue is essential for the priming reaction; it must be positioned in such a way that its 2Ј-OH group is close and exposed to the catalytic triad of three invariant Asp residues. For this configuration, the stem-loop structure downstream of the G residue is likely to play the essential role. On the basis of the three-dimensional structure of HIV-1 RT (24,25), the structure of the priming complex of RT-Ec86 with Ec86msr-msd48 RNA (Fig. 1B) may be formed as follows; the primer-template a2-a1 duplex lays down at the foot of the thumb so that the branching G residue (circled in Fig. 1B) is positioned at the primer grip site. The RNA strand downstream of the G residue forms a stable complex with the thumb so that the thumb is sandwiched by the RNA primer-template duplex from the inside and by the stable stem-loop structure formed in the RNA stand downstream of the branching G residue from the outside. This would bend the primer RNA strand at the branching G residue to make its 2Ј-OH accessible for the priming reaction. We propose that the interaction between the stem-loop structure and the thumb determines the highly specific mechanisms of msDNA priming reaction. Of the four helical structures in the thumb (␣G, ␣H, ␣I, and ␣J), the ␣J helical structure is located at the external part of the thumb and may be primarily responsible for the interaction with the stem-loop structure of the primer RNA. The bacterial RT's ␣J sequences, as aligned in Fig. 2A, are highly diverse and contain many basic residues, which may be important for the specific recognition with a specific RNA secondary structure (for example, as shown for the bacteriophage N peptide and box B RNA complex (29)). It remains to be answered why and how such custom-made thumbs have evolved to regulate the cDNA priming reaction in this highly specific manner.
It is also important to note that the X region seems to play an essential role in msDNA synthesis, as the exchanging of this region between RT-Ec73 and RT-Ec86 resulted in no msDNA synthesis (see Fig. 2B). This region is located between ␤5b and ␤6 strands, and it is assumed to be close to the joint of the thumb in bacterial RTs. This region seems also to be specific for individual RTs and to be involved in the formation of the priming complex, probably coordinating the initial interaction of primer-template RNA with RT. The determination of the three-dimensional studies of bacterial RTs will provide important insights into these questions.