A Small 2′-OH- and Base-dependent Recognition Element Downstream of the Initiation Site in the RNA Encapsidation Signal Is Essential for Hepatitis B Virus Replication Initiation*

Hepatitis B viruses replicate through reverse transcription of an RNA intermediate. In contrast to retroviral reverse transcriptases, their replication enzyme, P protein, does not use a nucleic acid primer but initiates DNA synthesis de novofrom within an RNA stem-loop structure called ε. A short DNA oligonucleotide is copied from ε and covalently attached to P protein, and then synthesis is arrested. The information for initiation site selection and synthesis arrest must be contained in the structure of the P protein/ε complex. Because P protein activity depends on cellular chaperones this complex can as yet only be generated byin vitro translation of duck hepatitis B virus P protein in rabbit reticulocyte lysate; functional interaction with its cognate RNA element Dε can be monitored by the covalent labeling of P protein during primer synthesis. Combining this in vitro priming reaction and a set of chimeric RNA-DNA Dε analogues, we found that only five ribose residues in the 57-nucleotide stem-loop were sufficient to provide a functional template; these are a single residue in the template region and the two base pairs at the tip of the lower stem. The base identities in the very same region are essential as well. The presence of this 2′-OH- and base-dependent determinant shortly downstream of the initiation site suggests a mechanism that can account for both initiation site selection and programmed primer synthesis arrest.

Hepatitis B virus (HBV), 1 the causative agent of B-type hepatitis in humans, is the type-member of the hepadnaviruses. These small DNA-containing viruses replicate by reverse transcription of an RNA intermediate, the pregenomic RNA (pgRNA). Their replication mechanism is unique among the retroid elements (1)(2)(3)(4). It involves selective packaging of the pgRNA by the reverse transcriptase, called P protein, into nucleocapsids (5), and initiation of reverse transcription without a nucleic acid primer (6). Specific recognition of pgRNA over other viral or cellular RNAs is effected by the interaction between P protein and a 5Ј-proximal stem-loop structure on pgRNA that is unique to this RNA species (7,8). This RNA element of about 60 nucleotides in length serves therefore both as encapsidation signal (⑀) and as replication origin for synthesis of the first DNA strand (Fig. 1A). The enzyme copies part of a bulged region in the RNA stem-loop into a short 3-or 4-nucleotide oligodeoxynucleotide primer (9 -11). In this priming reaction, the first nucleotide becomes covalently linked to a Tyr residue in the protein (12)(13)(14). The covalent complex then translocates to a 3Ј-proximal RNA element called DR1*, and the primer is extended into a complete (Ϫ)-DNA strand from which then an incomplete (ϩ)-DNA copy is made. These subsequent reactions, occurring inside the capsid, eventually yield the characteristic partially double-stranded DNA genome found in extracellular hepadnavirions.
Thus far, it has not been possible to reconstitute a functional RNA-protein initiation complex from purified components. The major reason is that unlike other reverse transcriptases, with the possible exception of telomerase (15), the hepadnaviral P protein depends on several cellular chaperones for activity, probably in a fashion similar to steroid hormone receptors (16,17). Most data on the general replication mechanism have therefore been worked out genetically using transfection of mutant HBV genomes into suitable cell lines. This approach, initially exploiting the ability of ⑀ to mediate RNA packaging into viral capsids, revealed some of the principal aspects such as the importance of the overall stem-loop structure of ⑀ (Fig. 1B) for its function (11, 18 -20). The only system that allows for more detailed biochemical studies of replication initiation is in vitro translation of the related duck hepatitis B virus (DHBV) P protein in rabbit reticulocyte lysate (6), which provides the necessary chaperones (21,22). For unknown reasons, the enzyme of the human virus is inactive even in this system. DHBV P protein, by contrast, displays authentic activity as shown by its ability to specifically initiate DNA synthesis from the corresponding RNA element D⑀ (6, 23) upon provision of dNTPs. Using radiolabeled dNTPs, the covalent attachment of the DNA primer to the protein provides a specific test for productive interaction between P protein and D⑀ RNA. The high sensitivity of this priming assay compensates for the minute amounts of protein (in the range of 10 ng per 25-l translation reaction). Advantageously, in vitro priming does not depend on the D⑀ element being present on the same RNA that serves as template for P protein translation, as is the case in the authentic situation. Rather, short transcripts containing just the stem-loop structure are also accepted by the enzyme (23,24).
Using this trans-priming reaction as a functional assay (Fig.  2) combined with secondary structure analyses of in vitro transcribed D⑀ RNAs, we have previously shown that free D⑀ RNA, despite its significantly different primary sequence from HBV ⑀, adopts a similar bipartite stem-loop structure (Fig. 1B) (25) and that this overall structure is important for P protein recognition (26,27). Mutations grossly altering the structure, in particular of the bulge region, prevent binding of P protein and consequently priming. Alternatively, point mutations at some strategic positions, e.g. U-2590 in the loop, inhibit binding despite conservation of the authentic structure. Finally, formation of a functional P protein/D⑀ initiation complex is accompanied by a major structural alteration in the apical stem, induced by P protein binding (Fig. 1B) (23), and RNA variants unable to undergo this conformational change are not accepted as templates for initiating reverse transcription (23). Hence, overall structure, base identities at specific positions in the upper stem, and RNA flexibility contribute in a complex fashion to a productive interaction between D⑀ and P protein.
Given the size of the RNA element and the probably multiple protein components, direct structural analyses of the hepadnaviral initiation complex are not in sight. We therefore exploited the in vitro translation system to further characterize the D⑀/P protein interaction. As a reverse transcriptase, P protein can use both RNA and DNA as templates. Hence, we first asked whether (and if so to what extent) the RNA nature of D⑀ is important for its interaction with the P protein. To this end, we used the trans-priming assay to functionally analyze synthetic chimeric D⑀ molecules in which increasing portions of the ribose phosphate backbone were substituted by their deoxyribose counterparts. Surprisingly little of the entire structure, i.e. only the template region and the two base pairs at the tip of the lower stem, was required to consist of RNA. Based on these results, we then tested whether the nature of the bases at these positions was essential. For this, we used in vitro transcribed all-ribo molecules in which increasing parts of the lower stem of D⑀ were replaced by either the corresponding regions from HBV ⑀, or simply by G-C pairs. Convergent with the first data set, we found that conservation of two to three authentic base pairs at the tip of the lower stem was necessary and sufficient for a productive interaction with P protein. Hence, whereas most of the lower stem simply needs to be base paired, the presence of 2Ј-hydroxyls, and of specific bases at the tip of the lower stem underlying the bulge, is of utmost importance. This indicates that this small region contains a principal determinant for the interaction with the hepadnaviral reverse transcriptase, most likely because it forms, together with the bulge, a distinct three-dimensional structure and/or is involved in direct protein contacts. The structural and functional implications of these findings are discussed in the light of recently solved structures of other RNA-protein complexes.

EXPERIMENTAL PROCEDURES
In Vitro Synthesis of D⑀ RNA-All-ribo D⑀ molecules were obtained by in vitro transcription (T7 MEGAshortscript kit; Ambion) according to the manufacturer's recommendations. After in vitro transcription, template DNA was digested with DNase I, and the RNA was extracted with phenol, precipitated with isopropanol, and resuspended in TE buffer (10 mM Tris-Cl, 1 mM EDTA (pH 7.5)). Plasmid pD⑀1 (25) served as template for the synthesis of wild-type D⑀ RNA used as reference in exper- The wavy line depicts the pgRNA, a capped and polyadenylated transcript comprising a unit length genome plus a terminal redundancy, because of which a second copy of the sequence containing the encapsidation signal (D⑀) and the DR1 is present at the 3Ј-end. pgRNA serves first as bicistronic mRNA for core (small spherical objects) and P protein (sphere marked P). P protein binding to the 5Ј-copy of D⑀ initiates nucleocapsid assembly (left) and initiation of (Ϫ)-DNA synthesis (right). Part of a bulged region within D⑀ is copied into a short DNA oligonucleotide that becomes covalently linked to the enzyme. After translocation of the complex to the 3Ј-proximal DR1*, the oligonucleotide is used as primer for a complete (Ϫ)-DNA strand. The associated chaperones that are essential for these reactions and (ϩ)-DNA synthesis are not shown. The general process is essentially the same in human HBV replication. B, secondary structures of HBV and DHBV encapsidation signals. The overall structures of HBV ⑀ and D⑀ are similar, with a characteristic central bulge separating the lower and upper stem. Encircled nucleotides indicate positions that were deleted or changed as indicated in the synthetic RNA-DNA D⑀ analogues. The template region is denoted by larger lettering; the DNA primer is shown on the right (D⑀ bound), covalently bound via its 5Ј-end to a Tyr residue in P protein. For D⑀, it is known that primer synthesis requires a major structural alteration by P protein binding that involves opening of the upper stem. Gray ovals show RNA regions that are protected from Pb 2ϩ cleavage in the protein complex, whereas the black oval indicates a region that becomes hypersensitive to single-strand specific reagents.

FIG. 2.
In vitro reconstitution of (؊)-DNA initiation. DHBV P protein is in vitro translated in reticulocyte lysate programmed with an artificial mRNA that only contains the P open reading frame but neither the 5Ј-nor the 3Ј-copy of D⑀. Instead, D⑀ RNA or appropriate analogues are supplied as separate molecules. In the presence of dNTPs, functional D⑀ analogues are used as templates for DNA primer synthesis. As a consequence of a productive RNA-protein interaction, this trans-priming reaction will result in covalent labeling of the P protein if one of the dNTPs is labeled in the ␣-position. Labeling of the protein can subsequently be monitored by SDS-polyacrylamide gel electrophoresis and autoradiography.
iments with RNA/DNA chimeras. The resulting transcript is 76 nucleotides in length and contains DHBV sequence from position 2557 to 2624. Wild-type D⑀ RNA used as reference in experiments with sequence variants of D⑀ was transcribed from plasmid pBD⑀wt containing DHBV16 sequence from position 2520 to 2652 cloned into the SalI and EcoRV sites of pBluescript II SK(Ϫ) (Stratagene). Plasmids coding for D⑀ variants H/D0, H/D1, H/D2, H/D2.5, H/D3, and synD were obtained by polymerase chain reaction-based mutagenesis using pBD⑀wt as template, and their identities were confirmed by DNA sequencing. Plasmids were linearized with EcoRV and transcribed with T7 RNA polymerase yielding RNA transcripts of 170 nucleotides in length.
Chemical Synthesis and Purification of RNA/DNA Chimeras-Oligonucleotides were synthesized on an ABI synthesizer using standard phosphoramidite chemistry; for ribonucleotides, 2Ј-O-t-butyldimethylsilyl ether protection was used (28). Thymidines in DNA moieties within chimeric D⑀ molecules were substituted by 2Ј-deoxyuridines to avoid negative effects of the additional 5Ј-methyl group. After removal of the protecting groups, the chimeric oligonucleotides were precipitated using n-butanol and subsequently purified by electrophoresis through denaturing 12% polyacrylamide gels. Full-length products were detected by UV-shadowing, cut out, eluted, precipitated with isopropanol, and resuspended in TE buffer. Preparations were analyzed for purity and concentration by electrophoresis on 2% agarose gels, and photometrically assuming that 1 A 260 nm equals 40 g/ml RNA. All chimeric D⑀ analogues, 53 nucleotides in length, contain the authentic D⑀ sequence with a shortened lower stem from DHBV16 position 2561 to 2615 (Fig. 1B). Typical yields of a 200-nmol synthesis were in the range of 100 -150 g of oligonucleotide.
In Vitro Trans Priming Assay-DHBV P protein was in vitro synthesized in a coupled in vitro transcription/translation system (TNT T7 quick coupled transcription/translation system; Promega) from pT7AMVpol⌬3Ј⑀. This plasmid contains the complete P protein open reading frame and was obtained by deletion of the 3Ј-proximal copy of D⑀ in pT7AMVpol (12) in order to exclude cis priming events. Transcription/translation reactions, typically in a total volume of 50 -100 l, were performed according to the manufacturer's protocol. After 60 min of incubation at 30°C, the samples were split into 10-l aliquots, D⑀ RNA or synthetic analogues were added, and the samples were incubated for further 60 min at 30°C to form D⑀ RNA/P protein complexes. For the detection of protein priming activity 2.5 Ci of ␣-[ 32 P]dATP (3000 Ci/mmol) in 1 volume of 2ϫ priming buffer (12) was added, and the samples were incubated for 60 min at 37°C. Aliquots of each sample were analyzed for radioactively labeled P protein by reducing SDSpolyacrylamide gel electrophoresis on 7.5% polyacrylamide gels. Signal intensities were quantified using a phosphoimager system (BAS1500; Fuji).

RESULTS
Design of Artificial D⑀ RNA/DNA Chimeras-The authentic D⑀ stem-loop encompasses 57 nucleotides in length (Fig. 1B), a size beyond the range currently accessible with most conventional RNA synthesis equipment. Although a test RNA of 35 nucleotides could be isolated using phosphoramidite chemistry and t-butyldimethylsilyl protection for the 2Ј-OH groups, attempts to obtain a completely synthetic D⑀ RNA did not yield detectable amounts of full-length material (data not shown). Therefore, to minimize length for more efficient synthesis, some nonessential nucleotides were deleted in the further D⑀ constructs: in the upper stem and lower stem, U-2596 and the single unpaired U-2610 were removed (indicated in Fig. 1B by encircled nucleotides). Both deletions have been reported to be compatible with the packaging function of D⑀ in transfected cells (20). To directly monitor its effect on priming, a corresponding T7 transcript (D⑀ RNA ⌬U ; Fig. 3B) was analyzed, at 50 and 500 nM concentrations, side-by-side with wild-type D⑀ RNA in the in vitro priming reaction. At these concentrations, wild-type D⑀ RNA leads to a dose-dependent labeling of P protein, whereas the signal saturates above 1 M D⑀ RNA (24,25). At both concentrations, the D⑀ RNA ⌬U displayed an only 2-fold reduction in priming efficiency (Fig. 3A). To further shorten the sequence, the bottom base pair (positions 2560 and 2616) was deleted, and the penultimate base pair was changed from G-U to G-C (see Fig. 1B). These modifications are known not to affect the D⑀/P protein interaction in vitro (26). The first chimeric molecule, variant DRbl1, was designed to contain ribonucleotides in the loop, plus in the apical bulge region (Fig.  3B). Synthesis of this molecule, with 22 ribonucleotides in a total of 53 nucleotide positions, yielded clearly detectable amounts of full-length material that was functionally active in the priming assay (see below). Accordingly, all additional synthetic derivatives were based on this sequence. Reaction products were separated by SDS-polyacrylamide gel electrophoresis and visualized using a phosphorimager. The position of 32 Plabeled P protein ( 32 P-Pol) is indicated on the left. The exact nature of the slowly migrating material extending to the top of the gel, sometimes observed at high concentrations of D⑀ RNA or an analogue, is not known. Its amount appears overrepresented in this exposure, which was chosen to make weak bands also visible. Numbers below the lanes indicate the relative signal intensities of the 32 P-P protein bands determined using the phosphorimager, with the value obtained for wild-type D⑀ RNA at the high concentration set at 100. RNA wt refers to an in vitro transcript corresponding to D⑀ as shown in Fig. 1B, DNA dU to a synthetic all-DNA analogue with the deoxythymidine residues replaced by deoxyuridine. The other variants are shown in B. B, schematic representation of the shortened D⑀ variant RNA ⌬U and its chimeric derivatives DRb1 and DRbl1. Ribonucleotides are shaded; the remaining positions consist of DNA.
All-DNA Analogues of D⑀ Are Functionally Inactive but Ribonucleotides in the Bulge and Loop Regions Restore Activity-As a first step toward analyzing the functional importance of the ribose moieties within D⑀ we analyzed the performance in trans-priming assays of an unmodified DNA analogue of D⑀, D⑀ DNA dT . It corresponded to DHBV nucleotides 2557 to 2617 and hence contained the complete, authentic stem-loop sequence. Neither at 100 nM nor at 1000 nM could any priming signal be detected (data not shown). The same negative result was obtained with the homologous oligonucleotide D⑀ DNA dU , which contained deoxyuridine instead of deoxythymidine (Fig.  3A). Hence, complete substitution of all ribose moieties is incompatible with a productive P protein interaction, and the 5-methyl groups of deoxythymidine are not responsible for this inactivity.
Next we asked whether partial restauration of the RNA backbone would reestablish functional activity. Because RNAspecific non-Watson-Crick interactions are frequently found in single-stranded regions, we first focussed on the bulge and its vicinity, and on the apical loop region. The corresponding chimeras, DRbl1 and DRb1 (Fig. 3B), when tested at 50 and 500 nM concentrations, both stimulated covalent labeling of P protein to essentially the same levels as the reference RNA D⑀ RNA ⌬U . This indicated that the principal elements for a productive interaction with P protein reside mainly in the RNA bulge region and its vicinity, whereas in the apical region, the 2Ј-hydroxyls are of minor, if any, importance.
The Tip of the Lower Stem and the Bulge Harbor the Minimal Ribose-dependent Element for Productive Interaction with P Protein-In DRb1, the bulge region, the opposite unpaired U and the two adjacent base pairs in the lower and the upper stem consisted of RNA (Fig. 3B). To further narrow down the essential ribonucleotide positions, we used the next set of variants, DRb2 through DRb5 (Fig. 4B), all of which contain a complete ribonucleotide bulge. In DRb2, only the flanking residues on the left side are composed of ribonucleotides. In DRb3 and DRb4, two and one bulge-proximal ribo base pair(s), respectively, in the lower stem are maintained. In DRb5, the two bulge-proximal ribo base pairs in the upper stem consist of ribonucleotides.
Priming assays with these variants at 50 and 500 nM concentration (Fig. 4A) revealed that three of the four chimeras (DRb2, -4, and -5) were severely impaired in their activity. However, DRb3 produced signals with an intensity comparable to that of DRb1. This indicated the presence of an important ribose-dependent recognition element at the tip of the lower but not the base of the upper stem. Because variant DRb4 with a single ribo base pair at the tip of the lower stem was only marginally active, this element involves the two tip base pairs of the lower stem but not the unpaired U (compare DRb4 with -3).
Combined with Two Ribo Base Pairs at the Tip of the Lower Stem, a Single Ribonucleotide at The Initiation Site Is Necessary and Sufficient for Productive Interaction with P Protein-In all variants analyzed thus far, the bulge consisted completely of RNA. To test whether all of these ribonucleotides are essential, we used the next series of variants in which the two ribo base pairs at the tip of the lower stem were maintained as RNA, and various parts of the bulge itself were replaced by DNA (Fig. 5B).
Variant DRb35 with a complete DNA bulge produced only a very marginal priming signal (about 5% of DRb3), emphasizing the need for the presence of one or more ribonucleotides in the bulge. Restauration of ribonucleotides in the 5Ј-proximal half of the bulge did not rescue priming activity (DRb32). By contrast, reintroduction of ribonucleotides in the 3Ј part of the bulge, i.e. the actual template region, restored activity. Both DRb31 and DRb33 led to a similar labeling of P protein as the all-ribo bulge chimera DRb3, and even the chimera DRb34 with a single ribonucleotide at the 3Ј-terminal position of the bulge, where DNA synthesis most likely initiates, produced a signal of about one-third the intensity seen with D⑀ RNA ⌬U . This positive effect on priming efficiency of the single additional 2Ј-hydroxyl clearly establishes the essential role of 2Ј-hydroxyls in the template region. Notably, because the priming assays were performed using [␣ 32 P]dATP, incorporation of the label into the protein must have occurred in a U-templated fashion. This suggests that once initiated at a ribonucleotide, P protein is able to proceed on a DNA template during primer synthesis, although the efficiency may be somewhat reduced (compare the signal intensities for variants DRb33 and DRb34). Together, these data demonstrate that as few as 5 ribonucleotides in the entire D⑀ RNA element are sufficient for productive interaction with P protein.
Functional D⑀ /P Interaction Also Depends on the Base Iden-

FIG. 4.
In the presence of an RNA bulge, two ribonucleotide base pairs at the tip of the lower D⑀ stem are required for efficient priming. A, priming assays with the chimeric D⑀ analogues DRb2 to DRb5. Based on the priming competence of variant DRb1, the ribonucleotide content in the regions adjacent to the bulge was further reduced in this set of variants. Priming assays were semiquantitatively evaluated as described in the legend to Fig. 3, with the value obtained for DRb1 at the high concentration set at 100. Note that, in contrast to the other variants, DRb3 gave a priming signal comparable to that of DRb1. B, schematic representation of the different ribonucleotide contents of variants DRb2 to DRb5. Only the bulge region and its vicinity is shown; the remaining positions consisted of DNA as in DRb1.

tities in the Ribonucleotide Pairs at the Tip of the Lower Stem-
Next, we asked whether the nature of the bases at the above defined ribose-dependent positions was also important. The experimental strategy involved the generation of a series of variant D⑀ in vitro transcripts containing mutated lower stems. To maintain base pairing while at the same time changing the sequence we first replaced the entire lower D⑀ stem with the corresponding part from HBV ⑀ (variant H/D0; see Fig. 6B). In accord with a previous report (24), this complete domain swap abolished the template function of D⑀ (Fig. 6A). We then progressively reintroduced D⑀-specific base pairs and monitored the priming activities of the corresponding variants. Based on the above described results and the previous finding that the lower stem can be shortened from its base to between 5 and 7 base pairs without loosing P protein binding competence (26), we concentrated on the base pairs at the tip of the lower stem. The corresponding variants H/D1, H/D2, and H/D3 contain one, two, and three authentic D⑀ base pairs, respectively, underneath the bulge; in variant H/D2.5 the left-hand C residue in the third pair from the top was changed back from C-G to U-G (see Fig. 6B). As a final control, we used the variant synD in which the 8 bottom base pairs of D⑀ are substituted for by 5 G-C pairs (Fig. 6B).
The corresponding in vitro transcripts were used in transpriming assays at a 500 nM concentration as described above (Fig. 6A). Variant H/D1, with a single D⑀-specific base pair, was as inactive in priming as H/D0 RNA. However, a slight but reproducible increase in signal intensity (to about 5% of the wild-type signal) was observed with variant H/D2 containing 2 authentic D⑀ pairs. The activity was significantly increased (to about 20% of wild-type level) when the left-hand C residue in the third pair was changed back from C to U (H/D2.5), and to about 50% of wild-type activity by replacement with the original U-A pair from D⑀ (H/D3). Hence, two D⑀ base pairs are essential for at least a basal priming activity, and three D⑀ base FIG. 5. Five ribonucleotides in the entire D⑀ sequence are necessary and sufficient for priming. A, priming assays with the bulge variants DRb31 to DRb35. Starting from variant DRb3, the ribose content within the bulge region was further reduced in this set of variants. Priming assays were performed as described in Fig. 3. The signal intensity obtained with DRb3 at high concentration was set to 100. B, schematic representation of variants DRb31 to DRb35. Note that an all-DNA bulge is incompatible with significant priming (DRb35), whereas a substantial signal intensity is restored by the presence of a single ribonucleotide in the template region (DRb34).

FIG. 6. The recognition element at the tip of the lower stem is not only 2-hydroxyl-dependent but also base-dependent. A,
priming assays with all-ribo D⑀ variants with mutated lower stems. All priming assays were performed at 500 nM concentration of in vitro transcribed RNAs, and the signal intensities were normalized to that of D⑀ RNA wt . B, schematic representation of the lower stem variants. All RNAs are based on wild-type D⑀, except that various parts of the lower stem were replaced by the corresponding lower stem segments from HBV ⑀ (darker shading, variants of the H/D series) or an artificial GC-rich sequence (lighter shading, variant synD). Note that the forth and fifth base pairs under the bulge in synD fortuitously correspond to authentic D⑀ sequence. pairs are essential for a substantial priming activity. The crucial role of the base identities in these three pairs but not in the base of the lower stem was further corroborated by variant synD which displayed about 80% of the wild-type priming activity. That it performed even better than H/D3 might be due to the facts that the two G-C pairs underneath the three tip pairs correspond to authentic D⑀ sequence and that the fourth and/or fifth pair may contribute to priming efficiency. Hence, we conclude that, overlappingly with the 2Ј-hydroxyls in the two top base pairs, the base identities of the three base pairs at the tip of the lower stem are crucial for productive interaction with P protein.
An Essential Role for the G Residue in the U-G Base Pair at the Tip of the Lower D⑀ Stem-One prediction from the above described results was that, in a reverse experiment, replacement of the authentic base pairs at the tip of the lower D⑀ stem should interfere with the priming function. Indeed, a corresponding variant, Ϫ1A/U (Fig. 7B), containing just the tip A-U base pair of HBV ⑀ in an otherwise completely D⑀-derived context, showed a very low priming activity (about 5% of wildtype level), even at a 500 nM concentration (Fig. 7A). The same was true for variant Ϫ1C/U. By contrast, approximately 50% of the wild-type activity was seen with variant Ϫ1C/G having the right-hand G residue from D⑀. To confirm the special role of this G residue suspected from this result, we finally combined it with an A residue on the left. The corresponding variant Ϫ1A/G exhibited the same activity as the wild-type RNA. Hence, the presence of either U (wild-type D⑀), C, or A at the left-hand position of the tip base pair is compatible with significant priming activity, regardless of Watson-Crick pairing, whereas replacement of the authentic right-hand G by U strongly decreases priming. This establishes a dominant discriminating role of this single G residue in a productive interaction with DHBV P protein. DISCUSSION The unique mechanism of replication initiation in hepatitis B viruses requires that the information for exact start site selection by P protein be defined by the specific interaction between the ⑀ RNA template and the reverse transcriptase. Because of the size and multicomponent nature of the initiation complex, direct biophysical analyses of its structure are currently not possible. Using biochemical methods instead, we have here defined a small 2Ј-OH-and base-dependent determinant at the tip of the lower D⑀ stem that, in concert with a single ribose residue in the bulge, is necessary and sufficient for efficient hepatitis B virus replication initiation (Fig. 8). Unusually, this specific element is located downstream of the initiation site.
Structural Implications for the P Protein/D⑀ Interaction-A comparison with the limited number of other protein-RNA complexes that have been investigated by similar biochemical or, in a few cases, crystallographic methods indicates that the interaction of P protein with D⑀ bears some striking similarities to several proteins that recognize RNA hairpins. One example is the coat protein of bacteriophage MS2, which specifically binds to a 19-nucleotide stem-loop, the operator RNA (29). Deoxy substitution experiments defined a ribose requirement in the upper two base pairs of the stem, and at a single position in the four-member loop (30). As proposed and later directly confirmed by x-ray crystallography, the 2Ј-hydroxyl of this loop residue is involved in hydrogen bonding contacts to the protein (31,32). The single ribose residue in the template region of the D⑀ bulge required for efficient priming may also be in direct contact with P protein but a more subtle effect on its template quality rather than its phyiscal binding ability cannot be excluded at present. It should also be noted that, in contrast to the MS2 operator and the spliceosomal RNAs discussed below, there are probably few base-specific contacts in the singlestranded D⑀ bulge. In human HBV, for instance, a substantial variety of bulge sequences is compatible with the template and the encapsidation functions of ⑀ (11, 33). The ribose moieties at the tip of the operator stem have been suggested to maintain the helix in the A-form conformation observed in the crystal structure (30). For some model compounds, it has been shown that a single ribo-base pair at the end of a double helix can indeed induce an A-like conformation in the entire molecule (34,35). Hence, the essential ribose residues at the tip of the lower D⑀ stem could serve a similar, general function. The simultaneous importance of base identities is, however, compatible with a more specific role. First, the ends of RNA helices provide a rich source for specific recognition because the distortions caused by the adjacent singlestranded regions allow for contacts that are not possible in an ideal A-form RNA with its deep and narrow major groove (36). Second, there are several instances in which exactly the ends of a helix harbor important specificity determinants enabling highly similar RNAs to be accurately discriminated by very similar proteins. The cognate RNA elements of the spliceosomal proteins U1A and the U2BЉ-U2A complex (U1 snRNA hairpin II and U2 snRNA hairpin IV) share not only a common secondary structure but also almost identical sequences in the single-stranded loops. A key role for selective recognition is played by the loop-closing base pairs, C-G in the U1 and the noncanonical U-U pair in the U2 snRNA (37,38). The underlying stems contribute to the specific structure adopted by the closing base pairs; in addition, the rigidly structured 3Ј-part of the loop in the U2 snRNA allows for direct interactions between the RNA stem and the second protein component in the complex. Hence, we propose that the tip of the lower D⑀ stem, and probably at least part of the bulge, harbor a similarly important specificity determinant. This view is supported by the crucial importance of the right-hand G residue at the tip D⑀ base pair, the counterpart of which in U1 snRNA makes specific contacts to the protein. Such a discriminating role of G-2605 in D⑀ could plausibly explain why exchange of this single nucleotide for the HBV-specific U residue abolishes functional interaction with DHBV P protein. The base dependence at the next base pairs further down in the lower stem implies that also this part of the structure is involved in protein contacts.
Functional Implications for Replication Initiation in Hepatitis B Viruses-In addition to the specific ribose-and base-dependent recognition element at the tip of the lower D⑀ stem defined in this study, the upper stem and the loop are also important for a productive interaction as established by previous mutational analyses (27). We have not found any ribose requirement in this apical region that, overall, is also relatively tolerant toward base alterations, given they do not grossly alter the conformation of the free RNA and are compatible with formation of the largely unpaired structure induced by P protein binding. There are, however, some positions at which a single base exchange can completely abrogate the interaction with P protein, for instance a U to C mutation at position 2590 in the loop (27). This demonstrates that also the apical D⑀ region contains specific recognition elements. Hence, the actual replication initiation site is bracketed between specific interaction sites for the protein.
Two distinct features of hepadnaviral replication are de novo initiation and a programmed synthesis arrest. Only after the covalent complex of P protein with the self-made primer is translocated to DR1* does DNA synthesis resume. All other reverse transcriptases, like DNA polymerases in general, require a nucleic acid primer. Therefore, the known structures from retroviral systems, for instance HIV-1 RT primer/template complexes (39,40), are of limited comparative value. The only known other exception is the unusual Mauriceville retroplasmid reverse transcriptase, which can initiate DNA synthe-sis from the penultimate residue at the 3Ј-end of the RNA template. Its positioning over the initiation site is mediated by a tRNA-like structure (41), a mechanism closely resembling that employed by viral RNA-dependent RNA polymerases. Unfortunately, little structural information is available about the corresponding initiation complexes. The other large class of polymerases capable of de novo initiation comprises the DNAdependent RNA polymerases. The recent high resolution structure of an initiation complex formed by their simplest representative, the single subunit enzyme from bacteriophage T7 (42), has revealed how this protein achieves specific initiation site selection without a primer. Three features essential for proper positioning are (i) specific tight interactions with distinct upstream sequences within the double-stranded T7 promoter, (ii) melting of the template by site-specific contacts with the template strand immediately before the initiation site to create a transcription bubble, and (iii) generation of a distinct structure in the template strand such that the first template FIG. 9. Comparison of primer-less synthesis initiation by hepadnaviral P proteins and T7 RNA polymerase. The model drawn for hepadnaviral replication initiation incorporates several previous data sets: initiation at the C residue marked ϩ1 (9), covalent linkage to a Tyr residue in the terminal protein domain of P protein (12,13), melting of the upper stem upon protein binding and the detectability on the right former half-stem of Pb 2ϩ footprints (23), and the importance for priming of individual residues in the former loop region (e.g. U-2590 (27)); the data in this paper demonstrate, in addition, the presence of a specific ribose-and base-dependent determinant at the tip of the lower stem. Further details of the protein/RNA interaction are not yet known. The model shown for T7 RNA polymerase is a vastly simplified representation of x-ray crystallographic data of a complex of the polymerase with a synthetic T7 promoter duplex (42). Shaded ovals symbolize protein regions, and the dashed boxes symbolize the active sites of the enzymes. Important features for site-specific initiation in the T7 system are extensive interactions of the protein upstream of the initiation site and with the template strand to form a transcription bubble and distinct orientation of the ϩ1 template nucleotide and the first dNTP. The lack of specific interactions further downstream allows the polymerase to continuously copy any downstream template sequence unless abortive cycling is artificially induced. Hepadnaviral P proteins also have upstream interaction sites, but, in addition they have a specific recognition element a few nucleotides downstream of the ϩ1 position. The model proposes that both elements are important for positioning the enzyme but that, furthermore, the downstream element is involved in the programmed synthesis arrest. nucleotide, dC, can optimally base pair when the NTP binding pocket is occupied with GTP (Fig. 9).
Although the hepadnaviral template is RNA rather than DNA, there are several parallels and one major difference: like the phage enzyme, P protein is directed selectively to the template region by, in this case, RNA-instead of DNA-specific recognition elements. The bulged structure of the template region resembles a transcription bubble already in free D⑀ RNA; however, most likely, the constraints imposed on the initiation nucleotide immediately 5Ј to the double-stranded upper stem need to be released before the template can be accommodated into the active site to allow pairing with a dGTP in the dNTP binding pocket. This would be fully accounted for the observed opening of the upper stem upon P protein binding (23). The difference is the presence in D⑀ of a separate, specific recognition element downstream of the initiation site. There is no evidence for specific downstream interactions of the phage enzyme except for the first few nucleotides immediately following the initiation site, and once specifically initiated, it processively copies virtually any template. By contrast, the arrest of primer synthesis after a few nucleotides is an integral part of hepadnaviral replication. Hence, we propose that the recognition element at the tip of the lower D⑀ stem is important for both initiation site selection and programmed primer synthesis arrest. The drastic increase in priming efficiency observed upon reintroduction of at least one ribonucleotide into an all-DNA bulge D⑀ variant suggests that the template region itself also contributes to the high specificity of the initiation event. In addition, it corroborates the distinct natures of the initiation and elongation processes because the hepadnaviral P proteins are perfectly able to use a DNA template during (ϩ)-strand DNA synthesis.
Clearly, much further information will be required to firmly establish these proposals. An attractive option made possible by our results is the site-specific introduction of reactive groups into synthetic, chimeric D⑀ oligonucleotides, which should allow the tracing, by cross-linking of the path of the RNA molecule through the P protein.