Sequence, Distance, and Accessibility Are Determinants of 5′-End-directed Cleavages by Retroviral RNases H*

The RNase H activity of reverse transcriptase is essential for retroviral replication. RNA 5′-end-directed cleavages represent a form of RNase H activity that is carried out on RNA/DNA hybrids that contain a recessed RNA 5′-end. Previously, the distance from the RNA 5′-end has been considered the primary determinant for the location of these cleavages. Employing model hybrid substrates and the HIV-1 and Moloney murine leukemia virus reverse transcriptases, we demonstrate that cleavage sites correlate with specific sequences and that the distance from the RNA 5′-end determines the extent of cleavage. An alignment of sequences flanking multiple RNA 5′-end-directed cleavage sites reveals that both enzymes strongly prefer A or U at the +1 position and C or G at the –2 position, and additionally for HIV-1, A is disfavored at the –4 position. For both enzymes, 5′-end-directed cleavages occurred when sites were positioned between the 13th and 20th nucleotides from the RNA 5′-end, a distance termed the cleavage window. In examining the importance of accessibility to the RNA 5′-end, it was found that the extent of 5′-end-directed cleavages observed in substrates containing a free recessed RNA 5′-end was most comparable to substrates with a gap of two or three bases between the upstream and downstream RNAs. Together these finding demonstrate that the selection of 5′-end-directed cleavage sites by retroviral RNases H results from a combination of nucleotide sequence, permissible distance, and accessibility to the RNA 5′-end.

the RNA genome both during and after minus-strand DNA synthesis to facilitate plus-strand DNA synthesis and strand transfers. Second, RNase H creates the polypurine tract (PPT) primer from the viral genome. And third, RNase H removes the PPT and tRNA primers used to prime plus-strand and minus-stand DNA synthesis, respectively. Given the vital roles of RNase H in retroviral replication, it is important to understand how RNase H recognizes the RNA/DNA hybrids created during reverse transcription as substrates for cleavage. Extensive studies have shown that both generation of the PPT primer and removal of the PPT and tRNA primers require specific sequences (9 -19). By contrast, fewer studies have examined general degradation by RNase H (20 -23), and less is known about the determinants that might influence RNase H specificity for this type of degradation during retroviral replication.
RNase H has both polymerization-dependent and polymerizationindependent modes of cleavage. The polymerization-dependent mode accompanies RNA-dependent DNA synthesis, and the nascent DNA 3Ј primer terminus positions RNase H cleavages. Such cleavages occur 17-20 nucleotides away on the RNA template strand but are not strictly coupled to DNA synthesis (20, 24 -26). This mode of cleavage likely functions at pause sites during minus-strand synthesis, when the 3Ј terminus of the nascent DNA is recessed on the genomic template. However, the polymerization rate of reverse transcriptase is faster than the rate of RNase H cleavage (27), and the polymerization-dependent mode of RNase H cleavage does not completely degrade the template RNA (20,28). Thus significant amounts of RNA can remain annealed to the minus-strand DNA (22), and removal of these fragments by the polymerization-independent RNase H activity is likely necessary for efficient plus-strand DNA synthesis.
Polymerization-independent RNase H cleavages have been attributed to at least three distinct mechanisms, which differ in how reverse transcriptase associates with the hybrid substrate (reviewed in Refs. 1, 7, and 8). First, reverse transcriptase can bind the RNA strand of a hybrid without positioning by either a DNA 3Ј-end or an RNA 5Ј-end, resulting in internal cleavages by the RNase H (23). Second, the polymerase domain can bind a recessed DNA 3Ј primer terminus to facilitate DNA 3Ј-end-directed cleavages that occur ϳ17-20 nucleotides back on the RNA strand of a hybrid, similar to what happens at pause sites during polymerization. Third, the polymerase domain can associate with a recessed RNA 5Ј-end, and RNase H can carry out 5Ј-end-directed cleavages that occur ϳ15-20 nucleotides downstream on the RNA strand. While the distance from the recessed RNA 5Ј-end has been characterized as the primary determinant for this mechanism (29), these cleavages have been observed as close as 12-15 nucleotides (30,31) and as far as 21 nucleotides (29,32) from the RNA 5Ј-end. Notably, this range of distances is much broader than the more fixed distance of 17-19 nucleotides that separates the polymerase and RNase H active sites as determined by crystallography studies or footprinting of reverse transcriptase with substrates (33)(34)(35)(36).
A limited number of studies has suggested the possibility that sequence might influence 5Ј-end-directed cleavages (24,29,30), but this hypothesis has not been directly tested before. We recently reported that sequence specificity is an important determinant of internal RNase H cleavages and that an RNA 5Ј-end at a nick does not promote 5Ј-enddirected RNase H cleavages (23). These results prompted us to ask three fundamental questions concerning RNA 5Ј-end-directed cleavages by retroviral RNases H. First, does sequence influence 5Ј-end-directed cleavages? Second, what distances from the RNA 5Ј-end are acceptable for 5Ј-end-directed cleavage sites? Third, how large of a gap is required between the 3Ј terminus of an upstream RNA and the 5Ј-end of a downstream RNA to allow efficient 5Ј-end-directed cleavages in the downstream RNA? Our findings offer new insights into the mechanisms of RNA 5Ј-end-directed cleavages for M-MuLV and HIV-1 reverse transcriptases, and the role of RNase H in retroviral replication.

EXPERIMENTAL PROCEDURES
Enzymes and Reagents-Recombinant HIV-1 reverse transcriptase was obtained from Worthington Biochemicals. Recombinant wild-type M-MuLV reverse transcriptase, T7 DNA polymerase, and calf intestinal alkaline phosphatase were purchased from Amersham Biosciences. T4 polynucleotide kinase, T4 DNA ligase, T4 DNA polymerase, and all restriction enzymes were obtained from New England Biolabs. DNA oligonucleotides were obtained from Qiagen and Invitrogen. Oligonucleotides were gel-purified in denaturing polyacrylamide gels as described (23,37).
To generate additional RNAs with different sequences, some pGEM plasmids were digested with single-cutting restriction enzymes downstream of the T7 promoter (as indicated below), the overhangs were filled in or removed by T4 DNA polymerase treatment, the blunt ends were ligated using T4 DNA ligase, and the resulting plasmids were transformed into XL1-Blue cells (Stratagene). The sequences of these plasmids were confirmed by DNA sequencing. For T4 DNA polymerase treatment, a digested plasmid was incubated in a 20-l reaction containing 1 mM dNTPs and 3 units of T4 DNA polymerase for 5 min at 37°C and then for 15 min at 12°C. To make 25-mer R3Z2, pGEM3Zf(Ϫ) was digested by EcoRI and BamHI, and the resulting plasmid was linearized with HindIII prior to in vitro transcription. To make 27-mer R7Z2, pGEM7Zf(ϩ) was digested by ApaI, treated with T4 DNA polymerase, ligated, digested with AatII, treated with T4 DNA polymerase, and ligated, and then the resulting plasmid was linearized with XbaI. To make 32-mer R7Z3, pGEM7Zf(ϩ) was digested by ApaI and XbaI, and the resulting plasmid was linearized with Acc65I. To make 30-mer R7Z4, pGEM7Zf(ϩ) was digested by ApaI and EcoRI, and the resulting plasmid was linearized with BstBI. To make 30-mer R7Z5, pGEM7Zf(ϩ) was digested by ApaI and SmaI, and the resulting plasmid was linearized with HindIII.
5Ј-End Labeling, 3Ј-End Labeling, and 5Ј-End Phosphorylation-5Ј-End labeling or 3Ј-end labeling of RNAs were carried out as described previously (23,37). Because cleavage by RNase H produces a 3Ј-hydroxyl group and a 5Ј-phosphate group, an unlabeled phosphate group was added to the 5Ј-ends of 3Ј-end-labeled RNAs as previously described (23).
Preparation of Hybrid Substrates-To prepare the substrate without upstream RNA (No Upstream substrate), RNAs were annealed to the appropriate template DNAs at an RNA:DNA molar ratio of 1:2, and to prepare the Nick or Gap substrates, upstream PPT20 RNA was additionally included at a 3-fold molar excess over the first RNA. Annealings were carried out in 10 mM Tris-HCl, pH 8.0, and 200 mM KCl at 90°C for 3 min followed by cooling to room temperature.
For the Nick substrate containing upstream PPT20 and downstream Md1, the template strand was D49 (5Ј-CCAAACCTACAGGTGGGG-TCTTTCATTT[ ]CCCCCCTTTTTCTGGAGACTAA-3Ј). The brackets in the sequence of D49 indicate the position that nucleotides were added to create the template strands for the corresponding Gap substrates, which are D49(1b) (5Ј . . . Statistical Analysis-For each position of the aligned sequences, the chi-square one-dimensional test was used to determine the deviation from the random distribution of bases (44). The expected frequency for each base was determined by summing the nucleotide frequencies of the first 25 nucleotides from the 5Ј-end of each RNA used for HIV-1 (Fig. 5) and for M-MuLV (Fig. 6). For HIV-1, the expected frequencies were A ϭ 0.233, C ϭ 0.243, G ϭ 0.319, and U ϭ 0.205. For M-MuLV, the expected frequencies were A ϭ 0.220, C ϭ 0.251, G ϭ 0.326, and U ϭ 0.203.

Quantitative Analysis of RNase H Cleavage
Products-Band intensities of cleavage products were determined using ImageQuanT (Amersham Biosciences). To compare the amount of product resulting from cleavage at the same sequence in different RNAs (for example, counting from the RNA 5Ј-end, site F is cleaved in Md1 between the 16th and 17th nucleotides, and in Md2 between the 15th and 16th nucleotides), the area integration function was used to quantitate the relative abundance of individual bands present in each sample. This analysis normalized the amount of cleavage product for a specific site as a percentage of the total cleavage products for each hybrid substrate. Quantitations using data from the 15-s and 1-min time points were found to be essentially identical, and the results from 1-min time points are presented graphically in Fig. 4.

Sequence and Distance Are Determinants of RNA 5Ј-End-directed
Cleavages-The 5Ј-end-directed cleavages by RNase H have been shown to occur 15-20 nucleotides from the RNA 5Ј-end. To address how RNase H cleaves at specific sites within this range, we have evaluated the roles of RNA sequence, distance, and accessibility to the RNA 5Ј-end as possible determinants of cleavage. As a model system, hybrid substrates with RNAs containing sequence from the M-MuLV PPT region were used. In this region of the M-MuLV genome, the cleavage that generates the 3Ј-end of the PPT primer is defined to occur between nucleotides Ϫ1 and ϩ1 and is called the Ϫ1/ϩ1 cleavage site. This numbering is also used to designate the positions of nucleotides in the model hybrid substrates, which contained 7 different 29-mer RNAs, termed Md1 through Md10 (Fig. 1). These RNAs have 5Ј-ends beginning downstream of the PPT at positions ϩ1, ϩ2, ϩ4, ϩ6, ϩ7, ϩ9, or ϩ10 and contain a total of 29 consecutive nucleotides of sequence, such that blocks of the same sequence are located at different distances from the RNA 5Ј-ends (shaded box in Fig. 1). These RNAs were used to ask if 5Ј-end-directed cleavages occur at identical distances from the RNA 5Ј-ends and if sequence influences the positioning of 5Ј-end-directed cleavages.
For RNase H cleavage assays, each RNA was 5Ј-or 3Ј-end-labeled and annealed to a template DNA to generate a hybrid with a recessed RNA 5Ј-end. Hybrid substrates were incubated with HIV-1 or M-MuLV reverse transcriptase in time-course assays; the products representing the initial 5Ј-end-directed cleavage events were found in the 15-s and 1-min time points when most of the initial substrate remained uncleaved. These initial cleavage sites were mapped by comparing the mobilities of cleavage products to size ladders generated with nuclease P1 (data not shown). Depending upon whether an RNA was 5Ј-or 3Ј-end-labeled, the longest products represented the cleavages that occur the furthest from or the closest to the RNA 5Ј-end, respectively. Importantly, the location and extent of cleavage for each site were determined by using the data obtained for both 5Ј-and 3Ј-end-labeled RNAs. To facilitate comparison of the same sites in different substrates, the cleavage sites generating the observed products have been named A through I, as indicated in the relevant figures and as described previously (23).
When substrates containing 5Ј-end-labeled RNAs Md1 through Md10 were treated with HIV-1 reverse transcriptase, it was immediately apparent that the cleavage products were not of uniform size ( Fig. 2A). Instead, the products had varied lengths indicating that cleavages occurred at different distances from the RNA 5Ј-ends. The expected products resulting from cleavage at the same sites in the different substrates (labeled "E-I") are connected by lines in Fig. 2. Substrate Md1 was predominantly cleaved at sites E, F, and G ( Fig. 2A, lanes [1][2][3][4][5].
Counting from the RNA 5Ј-end, site E is between nucleotides 13 and 14, site F is between nucleotides 16 and 17, and site G, which represented the most distal 5Ј-end-directed cleavage site in Md1, falls between nucleotides 19 and 20. Cleavage of substrate Md2 was very similar to that of substrate Md1 except that all of the cleavage products were one nucleotide smaller (lanes 6 -10). Importantly, site G remained the most distal 5Ј-end-directed cleavage, even though this site was now located between the 18th and 19th nucleotides from the Md2 RNA 5Ј-end, and, unlike Md1, no cleavage occurred between the 19th and 20th nucleotides. Although site H was not cleaved in Md1 or Md2, this site was detectably cleaved between the 19th and 20th nucleotides in substrate Md4 (lanes [11][12][13][14][15] and significantly cleaved between the 17th and 18th nucleotides in substrate Md6 (lanes 16 -20). No cleavage was observed at site I in substrates Md1 through Md7 (lanes 1-25), but this site was cleaved when located between the 19th and 20th nucleotides in substrate Md9 and between the 18th and 19th nucleotides in substrate Md10 (lanes [25][26][27][28][29][30][31][32][33][34][35]. Assays with 3Ј-end-labeled RNAs and HIV-1 reverse transcriptase were used to confirm that the 5Ј-end-directed cleavages sites closest to the RNA 5Ј-ends in the above analysis represented the initial cleavage products and were not the result of secondary cleavages (Fig. 2B). In general, the locations were the same as for the 5Ј-endlabeled substrates, but the relative abundance of products sometimes varied because strong cleavage sites close to the labeled end dimin-FIGURE 1. RNA oligonucleotides used in RNase H cleavage assays. All RNAs contain M-MuLV sequences from the PPT region, which contains the Ϫ1/ϩ1 site where cleavage occurs to generate the plus-strand primer and which corresponds to nucleotides 7815 and 7816 in the M-MuLV sequence (55). PPT20 is upstream of this site and spans nucleotides Ϫ20 to Ϫ1. The seven downstream RNAs are named as Moloney downstream (Md) followed by the nucleotide position of the 5Ј terminus relative to the Ϫ1/ϩ1 site. The sequence shared by all of these RNAs is indicated by a shaded box.
These same substrates were also used to analyze 5Ј-end-directed cleavage by the RNase H of M-MuLV reverse transcriptase (Fig. 3). Substrates were treated with HIV-1 reverse transcriptase, aliquots were removed at the indicated times, and samples were analyzed in denaturing 20% polyacrylamide gels. Results were visualized using a PhosphorImager. Products resulting from cleavage at specific sites in Md1 are labeled as sites E-I, and solid lines track cleavage products corresponding to products from cleavage of the same sites in the other substrates. In Md1 (see Fig. 1), the cleavage sites are found between the following nucleotides from the RNA 5Ј-end: site E is ϩ13/ ϩ14, site F is ϩ16/ϩ17, site G is ϩ19/ϩ20, site H is ϩ22/ϩ23, and site I is ϩ27/ϩ28. The cleavage sites identified in Md1 that give rise to the cleavage products shown for Md10 are indicated at the right.
Most of the cleavage sites recognized in Md1 through Md10 by M-MuLV reverse transcriptase were identical to those recognized by the HIV-1 enzyme, but the extent of cleavage often varied. M-MuLV RNase H cleaved 5Ј-end-labeled Md1 at site D, between the 8th and 9th nucleotides, and at sites E and F, but notably, no 5Ј-end-directed cleavages occurred beyond site F (Fig. 3A, lanes 1-5). Faint cleavage of site G was apparent between the 18th and 19th nucleotides in substrate Md2 but was more distinct in substrate Md4, when this site was located between the 16th and 17th nucleotides (lanes 6 -15). Adjacent cleavages beginning at site H were the most distal 5Ј-end-directed cleavages in substrates Md6 (between the 17th and 18th nucleotides) and Md7 (between the 16th and 17th nucleotides) (lanes 16 -25). Similarly in substrates Md9 and Md10, a pair of cleavages between the 19th and 20th nucleotides or the 18th and 19th nucleotides, respectively, were the furthest 5Ј-end-directed cleavage sites and included site I (lanes 26 -30).
As described above, we used 3Ј-end-labeled RNAs in the hybrid substrates to map the 5Ј-end-directed cleavages close to the RNA 5Ј-ends for M-MuLV. In general, the strongest cleavages were observed at positions corresponding to site F in substrates Md1, Md2, and Md4 (Fig. 3B,  lanes 1-15). In substrates Md6 through Md10, adjacent cleavages, including site H, were the strongest cleavages near the RNA 5Ј-end (lanes 16 -35). The locations of these cleavages ranged from between the 17th and 18th nucleotides in Md6 to between the 13th and 14th nucleotides in Md10.
The short fragment resulting from cleavage between the 8th and 9th nucleotides at site D observed with the 5Ј-end-labeled substrate (Fig.  3A, lane 2) could have been a secondary cleavage product. However, at least some of this product was generated independent of other cleavages, because the corresponding cleavage product was also observed using the 3Ј-end-labeled substrate Md1 (Fig. 3B, lanes 1-5). Limited but detectable independent cleavages also occurred between the 8th and 9th nucleotides of two other substrates, at site E in substrate Md6 and at site F in substrate Md9 (lanes 16 -20 and 26 -30, respectively). The relevance of these 5Ј-proximal cleavages to 5Ј-end-directed cleavage is considered under "Discussion." Quantitative Analysis of Distance Effects on Extent of Cleavage-To better understand how the extent of cleavage at a specific site is affected by the distance of the site relative to the RNA 5Ј-end, we carried out quantitative analyses of the cleavage products generated with the 5Ј-end-labeled substrates Md1 through Md10 shown in Figs. 2 and 3. Sites F, G, and H were chosen as representative cleavages, because these sites were recognized by both HIV-1 and M-MuLV reverse transcriptases and because they are present over the relevant range of distances in the various substrates.
First, the extent of cleavage at each site was evaluated in all substrates. For HIV-1 RNase H, cleavage at site F appeared to have a bell-shaped pattern, with the highest cleavage occurring with substrate Md4 (Fig.  4A). The pattern was similar for site G, but cleavage was most abundant in substrates Md6 and Md7. The pattern for cleavage of site H was also similar, with the highest cleavage in substrates Md9 and Md10. For M-MuLV RNase H, the amount of cleavage at each site also appeared as non-overlapping, bell-shaped patterns distributed in substrates Md1 through Md10 (Fig. 4B). Cleavage of site F was maximal in substrate Md2, cleavage of site G was highest in substrates Md4 and Md6, and cleavage of site H was greatest in substrates Md6 and Md7. Interestingly, the greatest amount of cleavage for a particular site occurred in different substrates for M-MuLV and HIV-1 RNases H. For example, site F was cleaved to the greatest extent in substrate Md2 for M-MuLV RNase H and in substrate Md4 for HIV-1 RNase H (Fig. 4AB).
Next, we determined how the extent of cleavage at sites F, G, and H was affected by the distance from the RNA 5Ј-end. For HIV-1 RNase H (Fig. 4C), the plots of distance versus cleavage at each site overlapped, indicating that the overall effects of distance on the extent of cleavage were very similar. A plot of the equivalent data for M-MuLV RNase H also showed that distance from the RNA 5Ј-end similarly influenced the extent of cleavage at sites F, G, and H (Fig. 4D).
Preferred Nucleotides Flank RNA 5Ј-End-directed Cleavage Sites-We next asked if preferred nucleotides could be identified near 5Ј-enddirected cleavage sites, because we recently showed that preferred nucleotides are found near internal cleavage sites recognized by the M-MuLV and HIV-1 RNases H (23). This analysis required the mapping of 5Ј-end-directed cleavage sites on a variety of different hybrid substrates containing RNAs with recessed 5Ј-ends. Three RNAs derived from viral sequences were used: PPT62, containing sequence from the M-MuLV genome (PPT62 (23)), and RNAs Md1 and Md10. To generate additional sequence diversity, several RNAs with unique sequences were made using in vitro transcription plasmids (see "Experimental Procedures"). Finally, for HIV-1 reverse transcriptase, we have included 5Ј-end-directed cleavage sites reported previously by other investigators in studies where the exact positions of these sites were mapped (21, 25, 29, 38 -43, 45).
To generate hybrid substrates with recessed 5Ј-RNA ends, RNAs were 5Ј-or 3Ј-end-labeled and annealed to DNA templates. These substrates were used in RNase H cleavage assays as shown above with HIV-1 or M-MuLV reverse transcriptase, and cleavage sites were mapped to the nucleotide level (data not shown). The relative extent of cleavage at the various sites in each substrate was classified as strong, medium, or weak. As an example using substrate containing RNA Md1 and HIV-1 reverse transcriptase (Fig. 2, A and B, lanes 1-5), sites E and F were classified as strong, and site G was classified as medium. Using substrate Md1 with M-MuLV reverse transcriptase (Fig. 3, A and B,  lanes 1-5), site E was classified as medium and site F was classified as strong. For both M-MuLV and HIV-1, cleavages between sites E and F represent weak sites. In our analysis, only strong and medium sites were considered. For the mapped cleavage sites observed in prior studies with HIV-1 (21, 25, 29, 38 -43, 45), all of the identified RNase H cleavage sites were used.
To compare the sequences surrounding RNA 5Ј-end-directed cleavage sites, the cleavage was defined to occur between nucleotides Ϫ1 and ϩ1, and the flanking sequences were aligned. These cleavage sites are presented for HIV-1 in Fig. 5 and for M-MuLV in Fig. 6. To statistically determine whether any base preferences correlated with the cleavage sites, the nucleotides from positions Ϫ10 to ϩ4 were tabulated. The frequency of bases in the first 25 nucleotides of sequence beginning from each RNA 5Ј-end was calculated to determine the expected distribution of nucleotides. For a given cleavage site, the significance of any deviations from random nucleotide frequencies was determined by comparing the base distribution at each position with the expected distribution using the chi-square method ("Experimental Procedures"). The resulting chi-square values were plotted against the nucleotide positions (Fig. 7).
For both HIV-1 and M-MuLV, two nucleotide positions had p values Ͻ0.01 ( 2 values Ͼ 11.34) and were considered strong deviations from random. Position ϩ1 showed a strong preference for A or U, tolerated C, and strongly disfavored G. Position Ϫ2 had a strong preference for G or C and especially disfavored U. In addition, the Ϫ4 position for HIV-1 also had a p value Ͻ0.01 and disfavored A. Unlike the nucleotide preferences seen for internal cleavage sites (23), strong preferences for sequences further upstream were not observed (see "Discussion").
Accessibility to a 5Ј-End Affects RNA 5Ј-End-directed Cleavages-We recently showed that the 5Ј-end of an RNA at a nick does not allow 5Ј-end-directed cleavages (23). To determine the gap size required for 5Ј-end recognition, the RNase H cleavage pattern was compared using substrates in which the same RNA was placed at different distances downstream from PPT20, an RNase H-resistant RNA containing nucleotides Ϫ20 to Ϫ1 of the M-MuLV PPT sequence ( Fig. 1 (37)). The distances between PPT20 and the downstream RNAs were either a nick or a gap of 1-5 bases. For M-MuLV reverse transcriptase, 5Ј-end-labeled Md1 was used as the downstream RNA. In the No Upstream substrate, cleavages occurred predominantly at sites D, E, and F in Md1 (Fig. 8, lanes 1-5 and 36 -40). Addition of upstream PPT20 to create a nick reduced 5Ј-end-directed cleavages at these sites and promoted internal cleavages at sites closer to the nick, such as sites A and B (lanes 6 -10). This cleavage pattern was identical to that observed using a continuous RNA containing the same sequence but lacking the nick (23); together these data first suggested that a nick was insufficient to direct 5Ј-end-directed cleavages. Introduction of a 1-or 2-base gap between the upstream and downstream RNAs decreased cleavage at the A and B sites and increased cleavage at the E and F sites (Fig. 8, compare  lanes 11-20 with lanes 6 -10). The cleavage pattern with a 3-base gap most closely matched the No Upstream substrate (compare lanes 21-25 with 1-5 and 36 -40). Interestingly, the highest extent of 5Ј-end-directed cleavages was observed in substrates with a gap size of 4 or 5 bases (lanes 26 -35). Experiments performed using 3Ј-end-labeled Md1 confirmed these results (data not shown).
To test how a gap influences 5Ј-end-directed cleavages by HIV-1 reverse transcriptase, 5Ј-end-labeled Md10 was used as a downstream RNA (Fig. 9). In the No Upstream substrate, the initial cleavages in Md10 occurred at sites H and I (lanes 1-5 and 36 -40). Addition of upstream PPT20 decreased cleavage at sites H and I and increased internal cleavages at site E among others (lanes 6 -10). For substrate with a 1-base gap, cleavages closer to the RNA 5Ј-end such as site E decreased while cleavages at sites H and I slightly increased (lanes 11-15). In the 2-base and 3-base gap substrates, RNase H cleavages were most similar to those in substrate lacking upstream RNA (compare lanes 16 -25 with lanes 1-5 and 36 -40). Experiments performed with 3Ј-end-labeled Md10 and HIV-1 reverse transcriptase generated comparable results (data not shown).
RNase H assays were also carried out with M-MuLV or HIV-1 reverse transcriptase and substrates containing downstream Md10 or Md1, respectively (date not shown). In both cases and similar to the data shown in Figs. 8 and 9, the pattern of RNase H cleavages in substrates with a 2-or 3-base gap most clearly matched the No Upstream substrate.

DISCUSSION
The ability to cleave the RNA strand of hybrids formed during reverse transcription is essential for retroviral replication. RNase H cleavages positioned by an RNA 5Ј-end represent a very robust form of RNase H activity that likely provides an important role in general degradation of the viral genome. The data in this study demonstrate that multiple determinants affect the specificity of RNA 5Ј-end-directed cleavage by retroviral RNases H.
Previous work has shown that reverse transcriptase does not efficiently utilize the RNA 5Ј-end at a nick to position 5Ј-end-directed cleavages (23). In this study, we addressed how much space between the 3Ј-and 5Ј-ends of RNA is required to render the RNA 5Ј-end accessible for directing cleavage in the downstream RNA. By increasing the distance between an upstream RNA 3Ј-end and a downstream RNA 5Ј-end in single nucleotide increments, we found that gaps of 2 or 3 bases permit 5Ј-end-directed cleavages at a level comparable to that observed for a recessed, free RNA 5Ј-end. It seems likely that the binding of reverse transcriptase to a recessed RNA 5Ј-end in a hybrid involves recognition of the discontinuity in the substrate structure at the junction between the single strand and the duplex. A similar discontinuity is offered by a typical primer-template, where the recessed end is a DNA 3Ј terminus rather than an RNA 5Ј-end. Thus, perhaps the primer-template binding cleft and in particular the primer grip region (34) of the polymerase domain positions the recessed RNA 5Ј-end of a hybrid for 5Ј-end-directed cleavage by recognizing some of the same structural features presented by a primer terminus. In substrates containing a nick or a 1-base gap, the RNA 5Ј-end is obscured and RNase H cleavages are limited to the internal model of cleavage. A substrate with a gap of 2 or more bases offers a sufficient distance between the upstream and downstream RNAs to allow the access and recognition required for the 5Ј-end-directed mode of cleavage and, additionally, may offer contact sites in the upstream RNA that facilitate enzyme binding. During gen- was used as a blunt-end hybrid, but a later study has shown that while the extent of cleavage is diminished, the locations of 5Ј-end-directed cleavage sites are not affected by a blunt end (40). RNA D is from previous studies (29,32,42). The 20-mer RNA is from Ref. 45. In the center column, the sequence surrounding each cleavage site is given, with the location of the cleavage site represented as a gap. The right column gives the position of each cleavage site counting from the 5Ј-end of the RNA. eral degradation of the genome, it seems likely that gaps are important to promote 5Ј-end-directed cleavages.
To initially test how sequence might influence RNA 5Ј-end-directed cleavages, RNase H assays were carried out with hybrid substrates that contained portions of the same sequence located at different positions relative to an RNA 5Ј-end. A comparison of the cleavages in these substrates (Fig. 10) reveals several interesting features for the 5Ј-end-directed mode of cleavage that are shared between the reverse transcriptases of HIV-1 (arrows above sequences) and M-MuLV (arrows below sequences). First, cleavages are not restricted to a discrete distance from the RNA 5Ј-end. Counting from the RNA 5Ј-end for HIV-1, strong or medium cleavage sites were as close as between positions ϩ13 and ϩ14 (Md1, Md4, Md7, and Md10) and as far as between positions ϩ20 and ϩ21 (Md9). For M-MuLV, strong and medium cleavage sites ranged from between positions ϩ13 and ϩ14 (Md1, Md4, Md7, and Md10) to between positions ϩ19 and ϩ20 (Md9 and Md10). Second, the same substrate can be cleaved at multiple sites, and independent cleavages can be distributed over a span of up to 8 nucleotides (for example, Md9 and Md10). Third, cleavages occur at the same sites in the sequence even though these sites are located at different distances from the RNA 5Ј-ends. As an example for HIV-1 RNase H, site G is cleaved in substrate Md1 between the 19th and 20th nucleotides, but this site is also cleaved when located 6 nucleotides closer to the RNA 5Ј-end in substrate Md7 (indicated in Fig. 10). Fourth, some positions are consistently resistant to cleavage irrespective of distance from the RNA 5Ј-end, even when located between other strong 5Ј-end-directed cleavage sites. Finally, the strong and medium 5Ј-end-directed cleavage sites fall between the 13th and 20th nucleotides from the RNA 5Ј-end (Fig. 10, indicated with a  gray box). These data indicate that 5Ј-end-directed cleavage sites are not randomly chosen and that both sequence and distance determine the cleavage positions of retroviral RNases H.
Because some early studies suggested that sequence might influence RNA 5Ј-end-directed cleavages (24,29,30), it is somewhat surprising that this possibility has not been investigated more thoroughly until now. Most likely this is because prior studies utilized a limited number of RNAs and thus had very little variation in sequence (21, 29, 32, 38 -43). Also for several studies, the precise mapping of RNA 5Ј-enddirected cleavage sites was often difficult or not required by the experimental design (21,29,32,46). Most recently, RNA 5Ј-end-directed cleavages have been examined with regard to the order of cleavage that occurs on an RNA/DNA hybrid with a recessed RNA 5Ј-end (38 -40). From these studies, it was proposed that 5Ј-end-directed cleavages constitute an initial or primary cleavage ϳ18 nucleotides from the RNA 5Ј-end and are followed by independent, secondary cleavages that are 8 to 9 nucleotides from the RNA 5Ј-end and occur at a slower rate.
Although our findings demonstrate that sequence is an important determinant of 5Ј-end-directed cleavages, it remains to be investigated how sequence influences secondary cleavages.
By mapping multiple 5Ј-end-directed cleavage sites in hybrids containing RNAs with different sequences, we observed a statistically significant bias for specific nucleotides at positions flanking these sites. For both HIV-1 and M-MuLV RNases H, A or C, but not G, was preferred at position ϩ1, and C or G, but not U, was preferred at position Ϫ2. In addition, HIV-1 RNase H disfavored A at position Ϫ4. Precisely how the preferred nucleotides facilitate RNase H cleavage is unknown, but it has been proposed that the structures associated with some hybrid sequences might affect cleavage specificity (35). Several features influenced by sequence are the base composition, the trajectory of the helical axis, and the width of the major and minor grooves (35,47,48). Although it is possible that the preferred nucleotides flanking a cleavage site reflect interactions in the DNA strand instead of or in addition to the RNA strand, the co-crystal structure of HIV-1 reverse transcriptase and a PPT-containing RNA/DNA hybrid reveals potential hydrogenbonding contacts between the ϩ1 RNA base and Arg-448, and between the Ϫ2 RNA base and Gln-475 (35) that might contribute to the observed sequence preferences. Our observation that an A is disfavored at the Ϫ4 position does not correspond to any base contacts in the co-crystal structure, but there are phosphate contacts from Ϫ4 to Ϫ9 in the DNA strand and the RNase H primer grip region of the HIV-1 enzyme (35). Notably, the Ϫ4 position in the co-crystal structure falls within an unusual structural deformation consisting of a region of unpaired bases that may be PPT-specific and consequently may not reflect the binding situation for hybrids containing other sequences. Importantly, our data suggest that hybrid substrates containing the preferred nucleotides at the most critical positions of Ϫ2 and ϩ1 may facilitate the generation of co-crystals of reverse transcriptase with bound substrate in future crystallographic analyses.
Very recently, the first co-crystal structure of a prokaryotic RNase H with the hybrid substrate positioned in the enzyme active site has been reported (49). The authors note that four residues (Asn-77, Asn-106, Gln-134, and Asn-105) donate hydrogen bonds to bases close to the scissile phosphate, and one residue (Arg-195) has a sequence-specific contact with the DNA strand at the ϩ5 G residue. This latter residue is homologous to Arg-557 in HIV-1 reverse transcriptase, but thus far our sequence analyses of internal and 5Ј-end-directed cleavage sites have not suggested a preference for C in the RNA strand at the ϩ5 position (Ref. 23 and data not shown).
Both the 5Ј-end-directed and internal modes of RNase H cleavage prefer similar nucleotides at positions ϩ1 and Ϫ2 for M-MuLV and positions ϩ1, Ϫ2, and Ϫ4 for HIV-1 (this work and Ref. 23). Internal cleavages also exhibit nucleotide preferences further upstream from the cleavage site (at positions Ϫ6 and Ϫ11 for M-MuLV and positions Ϫ7, Ϫ12, and Ϫ14 for HIV-1 (23), but no equivalent positions were identified from the statistical analyses of 5Ј-end-directed cleavage sites. The absence of obvious nucleotide preferences further upstream of 5Ј-enddirected cleavage sites may derive from the nature of the upstream sequences found in many of our hybrid substrates, where the first 8 -10 nucleotides of RNA sequence are dictated by the in vitro transcription vectors. Although these sequences may not offer the optimal nucleotides that would otherwise be preferred for 5Ј-end-directed cleavage, the specificity determinant contributed by binding the RNA 5Ј-end may substitute or compensate for additional upstream preferences. This consideration suggests that preferred nucleotide positions located most proximal to the cleavage site are the most important in determining the positioning for both modes of RNase H cleavage. As would be predicted from the overlap between the preferred nucleotide specificity of 5Ј-end-directed and internal cleavage sites, it appears that the 5Ј-end-directed sites are a subset of the possible internal sites that are only cleaved when they meet certain distance constraints. In support of this prediction, all of the 5Ј-end-directed cleavage sites we have observed thus far are also observed as internal cleavage sites on FIGURE 8. Influence of gap size on 5-end-directed cleavage of Md1 by M-MuLV reverse transcriptase. 5Ј-End-labeled Md1 was used as a substrate without upstream RNA (No Upstream), as a substrate with PPT20 annealed immediately upstream (Nick), or as substrates with gaps of 1-5 bases between the 3Ј terminus of upstream PPT20 and the 5Ј-end of Md1 (1-to 5-base gap). Substrates were incubated with M-MuLV reverse transcriptase, and analyzed as described in Fig. 2. In Md1, cleavage site A is ϩ2/ϩ3, site B is ϩ5/ϩ6, and the other sites are as described in Fig. 3.   FIGURE 9. Influence of gap size on 5-end-directed cleavage of Md10 by HIV-1 reverse transcriptase. 5Ј-End-labeled Md10 was used as a substrate without upstream RNA (No Upstream), in substrate with PPT20 annealed immediately upstream (Nick), or in substrates with gaps of 1-5 bases between the 3Ј terminus of upstream PPT20 and the 5Ј-end of Md10 (1-to 5-base gap). Substrates were incubated with HIV-1 reverse transcriptase, and analyzed as described in Fig. 2. Products resulting from cleavage at specific positions in Md10 are indicated at the left, as described in Figs. 2 and 3. The smallest cleavage product (most distinct in lanes 7-10 below site E) corresponds to cleavage between the first and second nucleotides of Md10.
longer RNA/DNA hybrids that do not require a nucleic acid end for positioning (Ref. 23 and data not shown). For example, the predominant 5Ј-end-directed sites in substrates Md1 through Md10 are also recognized as internal cleavage sites in longer RNAs containing the sequence ϩ1 through ϩ29 beyond the M-MuLV PPT origin (Figs. 1 and 2) (23). As indicated in Fig. 11A, only some of the possible internal cleavage sites in the ϩ1 to ϩ29 sequence (indicated by arrows above the upper sequence) are recognized as 5Ј-end-directed sites in substrates Md1 and Md10 (indicated by vertical dashed lines). Importantly, the internal sites located close to the 5Ј-or 3Ј-ends of substrates Md1 or Md10 are not cleaved efficiently by the RNA 5Ј-end-directed mechanism because of the restrictions imposed by the distance from the RNA 5Ј-end.
The internal mode of RNase H cleavage primarily recognizes sites according to sequence (23). Our findings that a combination of an accessible RNA 5Ј-end, proper sequence context, and an appropriate distance from an RNA 5Ј-end are all determinants of RNA 5Ј-enddirected cleavages suggest a revision in the model for how retroviral RNases H carry out 5Ј-end-directed cleavages. In this model, 5Ј-enddirected cleavage sites are not chosen randomly at a measured distance from the RNA 5Ј-end. Instead, to be eligible for cleavage, a site must conform to the preferred nucleotide sequence and the site must fall between a minimum and maximum range from an accessible RNA 5Ј-end. We term this acceptable range the "cleavage window." A bar graph depicting how frequently each position relative to the RNA 5Ј-end is utilized in the strong and medium cleavage sites in Figs. 5 and 6 reveals the breadth of the cleavage window (Fig. 11B). This graph indicates that 5Ј-end-directed sites can occur at all positions within a cleavage window between the 13th and 20th nucleotides from the RNA 5Ј-end. As illustrated in Fig. 11A, the observed pattern of cleavages in substrates Md1 and Md10 result from recognition of cleavage sites only positioned within this window.
Our quantitative data with substrates containing 5Ј-end-labeled RNAs (Fig. 4) also support the view that recognition of 5Ј-end-directed cleavage sites is optimal within a defined cleavage window. By defining that the extent of cleavage was significant if the cleavage product represented Ն5% of all bands, cleavages were optimal when the sites were located between the 13th and 18th nucleotides for HIV-1, and between the 13th and 17th nucleotides for M-MuLV in these substrates. Because these experiments used substrates with labeled RNA 5Ј-ends, these data were biased somewhat against the more distal cleavages in the cleavage window. This cleavage window model accounts for the range of distances observed in RNA 5Ј-end-directed cleavages assays. Although cleavage sites are often 15-19 nucleotides from the RNA 5Ј-end (for example, see Refs. 25, 27, 29, and 50 -53), cleavages from 10 to 21 nucleotides from the RNA 5Ј-end have been reported in this study and others (29 -32). These cleavage distances are broader than the 17 or 18 nucleotides that separate the polymerase and RNase H active sites in HIV-1 reverse transcriptase, based upon co-crystal structures of the enzyme with duplex DNA or RNA/DNA hybrids in which a DNA 3Ј-end fits in the Md10 are aligned by the RNA 5Ј-ends to compare the positions of 5Ј-end-directed cleavage sites. In each sequence, the extent of cleavage at a site is indicated as strong (large arrows) or medium (small arrows) for HIV-1 reverse transcriptase (above) or M-MuLV reverse transcriptase (below). As described under "Discussion," the range of the closest and furthest independent 5Ј-end-directed cleavage sites is indicated by the positions of the bordering nucleotides from the RNA 5Ј-end, the position of site G in substrates Md1 and Md7 is indicated, and the gray box highlights nucleotide positions ϩ13 and ϩ20 that include the range of distances where the 5Ј-end-directed cleavages occur. polymerase active site (33)(34)(35). It is possible that contacts between the polymerase domain and a DNA 3Ј-primer terminus more tightly anchor the RNase H domain on the substrate, whereas an RNA 5Ј-end allows more flexibility in positioning. This would predict that the cleavage window for DNA 3Ј-end-directed cleavages is smaller than that for RNA 5Ј-end-directed cleavages, and experiments are underway to test this possibility. It is also possible that the co-crystal structures reflect the distance between the active sites when the enzyme initially binds the substrate, and that, in the case of RNA 5Ј-end-directed cleavages, the enzyme can slide on the substrate after the initial contacts with the RNA 5Ј-end are released or adjusted.
Secondary RNase H cleavages have been proposed to occur independently of primary, 5Ј-end-directed cleavages and result from an initial binding and more extensive sliding of reverse transcriptase on its substrate (38 -40). For M-MuLV, independent cleavages in this report were observed as close as between the 8th and 9th nucleotides from the RNA 5Ј-end (for example, site D in Md1). At this time, we cannot conclude whether these cleavages result from a secondary cleavage or a 5Ј-end-directed mechanism. It may be that the definition of 5Ј-enddirected versus secondary cleavages is difficult to distinguish at sites closer to the RNA 5Ј-end, and more experiments are required to determine the nature of these cleavages.
After minus-strand synthesis, extensive degradation of the RNA genome is required to facilitate plus-strand synthesis and strand transfers, and much of this general degradation appears to proceed by the polymerization-independent mode of RNase H activity (20,22,25,28). The remaining RNA template that requires further degradation will likely contain nicks generated during polymerization of the minus-strand. Because the RNA 5Ј-end at a nick does not efficiently promote 5Ј-end-directed RNase H cleavages, this form of RNase H activity would be restricted to initially act at gaps, but whether gaps of sufficient size are generated during minus-strand synthesis remains to be determined. Internal cleavages do not require positioning by an RNA or DNA end, so at least initially the polymerization-independent activity of RNase H can carry out internal cleavages. If recognition sites are located closely together and cleaved by the internal mode, this would create gaps that are sufficient for 5Ј-end-directed cleavages. Because 5Ј-end-directed cleavages are kinetically favored over internal cleavages (54), the multitude of possible internal cleavage sites combined with robust 5Ј-enddirected cleavages would together facilitate rapid and thorough degradation of the retroviral genome.