A Novel Long Distance Base-pairing Interaction in Human Immunodeficiency Virus Type 1 RNA Occludes the Gag Start Codon*

The 5′-untranslated region (5′-UTR) is the most conserved part of the HIV-1 RNA genome, and it contains regulatory motifs that mediate various steps in the viral life cycle. Previous work showed that the 5′-terminal 290 nucleotides of HIV-1 RNA adopt two mutually exclusive secondary structures, long distance interaction (LDI) and branched multiple hairpin (BMH). BMH has multiple hairpins, including the dimer initiation signal (DIS) hairpin that mediates RNA dimerization. LDI contains a long distance base-pairing interaction that occludes the DIS region. Consequently, the two conformations differ in their ability to form RNA dimers. In this study, we have presented evidence that the full-length 5′-UTR also adopts the LDI and BMH conformations. The downstream 290–352 region, including the Gag start codon, folds differently in the context of the LDI and BMH structures. These nucleotides form an extended hairpin structure in the LDI conformation, but the same sequences create a novel long distance interaction with upstream U5 sequences in the BMH conformation. The presence of this U5-AUG duplex was confirmed by computer-assisted RNA structure prediction, biochemical analyses, and a phylogenetic survey of different virus isolates. The U5-AUG duplex may influence translation of the Gag protein because it occludes the start codon of the Gag open reading frame.

Human immunodeficiency virus type 1 (HIV-1) 1 virions contain two full-length positive-stranded RNA molecules as genome. The full-length RNA not only serves as viral genome but also functions as an mRNA to encode the Gag and Gag-Pol polyproteins. The highly structured 5Ј-UTR is the most conserved part of the HIV-1 genome and is involved in several steps of the viral replication cycle (1). Distinct functions have been assigned to individual sequence and/or structure motifs (presented in differ-ent colors in Fig. 1A). The 5Ј-UTR consists of an upstream repeat (R) region that recurs at the 3Ј-terminus of the HIV-1 genome and that comprises TAR and the polyadenylation (poly(A)) signal. The well characterized TAR hairpin mediates transcription activation by binding of the viral Tat protein and the cellular protein, cyclin T (2-10). The poly(A) hairpin inhibits premature polyadenylation of the nascent RNA by masking of the AAUAAA polyadenylation signal (11,12). The U5 region is located downstream of the R region and contains two important signals for reverse transcription, the primer activation signal (PAS) and the primer binding site (PBS) (13,14). Additional essential motifs are located further downstream in the 5Ј-UTR. These include the RNA dimer initiation signal (DIS), the major splice donor site (SD) that is required for the generation of subgenomic mRNAs, the packaging signal (⌿) that is required for the assembly of infectious virus particles, and a hairpin motif that includes the Gag start codon (15)(16)(17)(18)(19)(20)(21)(22)(23).
The secondary structure of the HIV-1 5Ј-UTR has been studied extensively, and a variety of structure models have been proposed (1,18,19). Recently, the 5Ј-UTR was shown to fold alternative secondary structures (Fig. 1A) (24). The ground state conformation is formed by a long distance interaction of the poly(A) and DIS regions and is termed LDI. The alternative, metastable conformation is a branched structure with multiple hairpins and is termed BMH. The two conformations differ in their ability to form RNA dimers. The DIS sequence is masked in the LDI conformation by long distance base pairing with upstream sequences, thus preventing dimer formation. In contrast, the DIS hairpin with the palindromic loop sequence is folded in the BMH structure. Thus, BMH RNA is able to engage in a kissing-loop interaction with the DIS palindrome of a second RNA molecule, thereby forming loose dimers (15,(25)(26)(27)(28)(29)(30). Heat treatment or incubation with the HIV-1 nucleocapsid (NC) protein triggers the formation of a tight dimer with extended inter-strand base pairing (15,25,28,31). Interestingly, the NC protein also mediates the switch from LDI to BMH (24). This RNA switch mechanism may allow regulation and appropriate timing of the different 5Ј-UTR functions. For instance, the HIV-1 genomic RNA should be translated into the Gag and Gag-Pol proteins prior to RNA dimerization and packaging into assembling virions.
The LDI and BMH structures have been studied in transcripts that comprise the 5Ј-terminal 290 nucleotides (nts) of the HIV-1 leader RNA. Because the SD site is located at nucleotide position 289, these results suggest that both genomic and subgenomic HIV-1 mRNAs can fold the LDI conformation. The 5Ј-UTR of the genomic HIV-1 RNA consists of 335 nucleotides up to the AUG start codon of the Gag open reading frame (ORF). In this work, we studied the folding of the downstream leader region 290 -368 that contains the SD and ⌿ signals and part of the Gag ORF. Computer-assisted folding and a phylogenetic survey of the leader RNA of different primate lentiviruses revealed a novel long distance interaction between U5 sequences and the Gag initiation codon: the U5-AUG duplex. The proposed U5-AUG long distance interaction was analyzed by mutational analysis, polyacrylamide gel electrophoresis, and RNA structure probing. The U5-AUG long distance interaction is formed exclusively in the BMH structure and not in the alternative LDI fold. The duplex is of particular interest because it occludes the AUG start codon of the Gag ORF, and it therefore has the potential to be involved in regulation of mRNA translation.

EXPERIMENTAL PROCEDURES
RNA Secondary Structure Prediction-Computer-assisted RNA secondary structure predictions were performed using the Mfold version 3.0 algorithm (32,33) offered by the MBCMR Mfold server (mfold. burnet.edu.au/). Standard settings were used for all folding jobs (37°C and 1.0 M NaCl, with a 5% suboptimality range). Folding was performed with sequences comprising nucleotides 1-368 of the genomic RNA sequence of the wild-type (wt) and mutant HIV-1 LAI RNA. Phylogenetic studies were based on MFold data obtained with 500-nucleotide leader fragments of the primate lentiviral genomes.
Constructs-For mutation of the HIV-1 leader RNA, we used the plasmid Blue-5ЈLTR (34). This pBluescript-derived construct contains the XbaI-ClaI fragment of the infectious pLAI clone, including the 5Ј-LTR, the complete 5Ј-UTR, and part of the Gag ORF (Ϫ454/ϩ376). Mutations were created by a standard PCR mutagenesis protocol. For construction of the s1 mutation, oligonucleotide primers TA007 (5Ј-C-CC 76 AAGCTTGCCTTGAGTGCTTCAAGTAGTGTGCACCCATCTGTT-GTGTGACTCT GG 130 -3Ј) and AD-GAG (complementary to position 442-462 of the HIV-1 genome) were used in a standard PCR reaction. For the w1 and w2 mutations, we used the forward primers TA009 (5Ј-CCC 76 AAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGTTTGTCT-GTTGTGTG ACTCTGG 130 -3Ј) and TA008 (5Ј-CCC 76 AAAGCTTGCCT-TGAGTGCTTCAAGTAGTGTG GAAAGTCTGTTGTGTGACTCTGG-130 -3Ј). The mutated nucleotides are underlined, and the nucleotide positions of the HIV-1 sequence are indicated in superscript. The sequence of the PCR products was confirmed by sequencing. The mutant PCR products were digested by HindIII and ClaI and cloned into the Blue-5ЈLTR vector. The XbaI-ClaI fragments were subsequently cloned into pLAI-R37, a derivative of the full-length infectious clone pLAI (35). The mutant proviral constructs were designated pLAI-s1, -w1, and -w2. Transfection of the SupT1 cell line was performed by electroporation, and CA-p24 levels were determined as described previously (13,36).
In Vitro Transcription and RNA Dimerization-pLAI, pLAI-s1, -w1, and -w2 plasmids were used as template in a PCR reaction with primers T7-2 (corresponding to position 1-18 of the HIV-1 genome with an upstream T7 RNA promoter sequence) and R:A 368 -A 347 (complementary to 368 -347 of the HIV-1 genome). The PCR products were ethanol-FIG. 1. Overview of the HIV-1 5-UTR organization and structure of the wild-type and mutant U5-AUG duplex. A, top, organization of the genomic 5Ј-UTR with the regulatory motifs is indicated by colored boxes. The two segments that form the long distance base-pairing interaction (U5-AUG duplex) are indicated in red. The Gag initiation codon is marked by an asterisk. Middle, traditional secondary structure model of the genomic 5Ј-UTR that highlights the hairpin structures and the regulatory motifs (1). Bottom, the alternative LDI and BMH structures of the genomic 5Ј-UTR. The U5 and AUG segments are single-stranded in the BMH fold and are now proposed to form the U5-AUG duplex. B-D, base pairing of the wild-type and mutant U5-AUG duplexes. Nucleotide positions are indicated. Mutated nucleotides are indicated in bold. The thermodynamic stability is indicated at the right (⌬G in kcal/mole). precipitated and used for in vitro transcription by T7 RNA polymerase with [␣-32 P]dCTP according to the manufacturer's protocol (MEGAshortscript T7 transcription kit, Ambion, Inc.). Transcription reactions were stopped by addition of formamide-containing loading buffer and applied to 5% denaturing polyacrylamide gels. Gel slices containing the radiolabeled transcript were excised and soaked in TBE buffer (90 mM Tris borate, 2 mM EDTA) overnight at room temperature to elute the RNA. The RNA was ethanol-precipitated and dissolved in water. Equal amounts of RNA were heat-denatured and slowly renatured in the presence of dimerization buffer L (40 mM NaCl, 0.1 mM MgCl 2 , 10 mM Tris-HCl, pH 7.5). Aliquots were analyzed on polyacrylamide gels in 0.25 ϫ TBE (22.5 mM Tris borate, 0.5 mM EDTA) and 0.25 ϫ TBM (22.5 mM Tris borate, 0.1 mM MgCl 2 ), either with a formamide-containing buffer or non-denaturing loading buffer. Gels were dried and applied to a Storm PhosphoImager. We used the computer program ImageQuant 5.0 (Amersham Biosciences) to quantify the RNA signals. The dimerization yield was determined by dividing the amount of dimer by the total amount of RNA (dimer plus monomer).
RNA Structure Probing-pLAI and pLAI-s1 plasmids were used as templates in a PCR reaction with primers T7-2 and TA015 (complementary to 442-462 of HIV-1 genome). The PCR products were ethanol-precipitated and used for in vitro transcription with the Ambion MEGAshortscript T7 transcription kit. Transcripts were DNase I-treated, phenol-extracted, ethanol-precipitated, and dissolved in water. The RNA samples were heat-denatured, followed by addition of sodium cacodylate (pH 7.0) and MgCl 2 to a final concentration of 100 and 1 mM. The RNA (10 g) was treated at room temperature with 2 l of kethoxal for 10 min, with 1 l of dimethylsulfate (DMS) for 5 min, or mock-treated. The reactions were stopped by addition of 50 g of Escherichia coli tRNA. The RNA was ethanol-precipitated and dissolved in 22 l of water, and 4 l was used in a primer extension assay with 5Ј-end-labeled oligonucleotide primers.  (24) showed that the HIV leader RNA is able to form two mutually exclusive secondary structures. In the ground state structure, the leader RNA adopts the LDI conformation that is based on an interaction between the poly(A) and DIS regions. In the presence of the viral NC protein, the LDI conformation switches to the BMH conformation that presents the poly(A) and DIS hairpins. Studies thus far have focused on transcripts that comprise the 5Ј-terminal 290 nucleotides of the HIV-1 leader RNA. In this study, we have analyzed the RNA folding of the complete 5Ј-UTR. Using the MFold computer program, we identified a novel long distance interaction that includes the start codon of the Gag ORF. This base-pairing possibility occurs between nucleotides 105-115 in the U5 region and 334 -344 surrounding the AUG initiation codon and is termed the U5-AUG duplex. These two sequence elements are marked in the linear presentation of the 5Ј-UTR and the LDI and BMH structures (Fig. 1A). The duplex consists of 11 consecutive base pairs, including four G-U base pairs (Fig. 1B). The MFold results indicate that formation of the U5-AUG duplex occurs exclusively in BMH-like structures and not in the LDI conformation (results not shown).

Identification of the U5-AUG Long Distance Interaction-Previous work by Huthoff and Berkhout
To test for the presence of the U5-AUG duplex, we designed mutants that either strengthen or weaken the base pairing interaction (Fig. 1C). The duplex is stabilized in the s1 mutant by substitution of three G-U base pairs by one G-C and two A-U base pairs. The duplex is destabilized in the w1 and w2 mutants by replacing the central C-G base pairs either by U-G base pairs or by A-G mismatches, respectively. We first set out to determine the dimerization properties of the wt and mutant transcripts on a non-denaturing gel. Radiolabeled transcripts of the genomic RNA (nts 1-368) were synthesized in vitro, incubated at RNA dimerization conditions, and analyzed on gel ( Fig. 2A). RNA monomers and dimers were detected for all transcripts in TBE and TBM gels. The most noticeable observation is that the s1 transcript with the stabilized U5-AUG duplex migrates faster in TBM gels than the wt transcript. Formamide-denatured samples were included as control (indicated above the lanes), and the remarkable migration of the s1 transcript is lost upon denaturation. The s1 dimer also mi-grates faster than the wt dimer in the TBE gels, but the fast migrating s1 monomer is observed as a diffuse band. Apparently, Mg 2ϩ in the gel is required to stabilize the U5-AUG duplex in s1 monomers. The results of two experiments were quantified to calculate the level of RNA dimerization for the wt and mutant transcripts (Fig. 2B). The TBM gel shows both dimer types (loose and tight dimers), whereas only tight dimers are detected on TBE gels. The small difference in dimerization TABLE I Secondary structure probing of the wt and s1 mutant leader RNA Reactivity of wt and s1 RNA to kethoxal (G-specific) and DMS (A-and C-specific) were estimated and classified into five categories: ϩϩϩ ϭ highly reactive, ϩϩ ϭ reactive, ϩ ϭ moderately reactive, ϩ/Ϫ ϭ marginally reactive, Ϫ ϭ not reactive. The sequences that constitute the U5-AUG duplex are indicated by outlined boxes and the Gag initiation codon is indicated in bold. The sequence substitutions in the s1 RNA are indicated in italics. s indicates reverse transcription stops.
efficiencies in TBE versus TBM gels is therefore likely because of the presence of loose dimers on the latter gel type. The mutant transcripts s1, w1, and w2 dimerize more efficiently than the wt transcript, independent of the presence of Mg 2ϩ . Apparently, all mutations in the U5 motif result in elevated levels of RNA dimerization, which may be because of their destabilizing effect on the LDI conformation.
To test whether the fast migration of the s1 transcript is caused by stabilization of the U5-AUG interaction, we created a set of double mutants (Fig. 1D). The downstream segment 334 -343 of the U5-AUG duplex was substituted by sequences that disrupt or weaken base pairing. The three central C-G base pairs were opened in the AUG3 mutant, and nearly all base pairs were disrupted in the AUG10 mutant. The destabilizing mutations were introduced both in the wt and s1 mutant transcripts. The wt and mutant transcripts were subjected to non-denaturing gel electrophoresis (Fig. 3A). Most importantly, opening of the U5-AUG duplex in the s1-AUG3 and s1-AUG10 mutants corrects the unusual migration of the s1 transcript. The AUG3 and -10 mutations have no effect on the migration of the wt transcript. These results confirm that formation of the U5-AUG duplex in transcript s1 induces a conformation in the HIV-1 leader RNA that migrates relatively fast during gel electrophoresis. We also quantified the dimerization efficiencies of this set of mutants (Fig. 3B). The wt transcript shows a moderate increase in dimerization efficiency upon introduction of the AUG3 or -10 mutations (from 30 to 34% dimers). The s1 transcript shows increased dimerization (60% dimers), and this effect is countered by the AUG3 or -10 mutations (47% dimers). Thus, the increased dimerization efficiency of the s1 transcript is caused, at least partially, by stabilization of the U5-AUG duplex in the BMH context.
New HIV-1 RNA Structure Models for the Full-length 5Ј-UTR-We next set out to determine the secondary structure of the wt and s1 mutant transcript (1-462) by RNA structure probing. Because the fast migrating s1 structure was only visible in the presence of Mg 2ϩ (Fig. 2), the transcripts were heat-denatured and refolded in the presence of Mg 2ϩ . The transcripts were treated with limiting amounts of kethoxal or DMS and subsequently used as template for reverse transcription with several antisense DNA primers. The cDNA products were analyzed by denaturing gel electrophoresis. The complete set of probing data is listed in Table I. To facilitate the discussion of this complex data set, we will first present the new secondary structure models in Fig. 4. The wt RNA is folded in the ground state LDI conformation, in which the poly(A) and DIS regions (marked orange and pink) are base paired in a long distance interaction that extends the stem of the PBS domain. The downstream region 282-352 folds an extended stem-loop structure with three internal loops and a GGAG loop (marked yellow). The top of this extended hairpin is, in fact, the previously described ⌿ or SL3 hairpin that is required for viral RNA packaging (37). We termed the extended hairpin ⌿ E . The SD site (marked gray) is located within an internal loop of ⌿ E . The Gag initiation codon (marked by an asterisk) is located in the central internal loop and the adjacent stem segment of ⌿ E . The downstream Gag sequences (nts 358 -367) are possibly engaged in long distance base pairing with nucleotides 60 -67 in the R region directly downstream of TAR. This interaction is termed the R-Gag duplex. In contrast, the s1 mutant RNA folds the BMH structure that exposes both the poly(A) and DIS hairpins. The downstream sequences in s1 RNA fold the SD hairpin and the short version of the ⌿ hairpin, and the leader domain is closed by the U5-AUG duplex (105-115 pairs with 334 -344). The structures shown in Fig. 4 are consistent with the MFold analyses. The LDI conformation with the extended PBS stem and the extended ⌿ E hairpin is the most stable structure adopted by the wt RNA. The BMH folding with the multiple hairpins (poly(A), DIS, SD, and short ⌿) and the novel U5-AUG duplex is the most stable structure adopted by s1 RNA. Apparently, the metastable BMH folding is facilitated by stabilization of the U5-AUG interaction. We previously demonstrated that the BMH fold can also be triggered by stabilization of the poly(A) or the DIS hairpin (24). Few leader RNA motifs do not change their structure during the LDI to BMH switch: the TAR hairpin (nts 1-57, marked green), the upper primer activation signal/primer binding site domain (nts 116 -239, marked lilac and blue), and the short ⌿ hairpin (nts 305-331, marked yellow). The constitutive folding of the TAR and PBS domains in the LDI and BMH structures was described previously (24). Apparently, these structures fold autonomously, suggesting that their biological function is independent of the LDI/BMH switch.
Structure Probing-The structure probing data of the wt and s1 RNA are presented to highlight the differences between the LDI and BMH structures. There are three regions that differ significantly in accessibility to the single strand-specific reagents kethoxal and DMS in the two transcripts. The first region is segment 105-115 of the U5-AUG duplex in which the s1 mutations were introduced (Fig. 5A). G 106 and G 108 are accessible to kethoxal in the wt transcript, whereas G 106 and A 108 are not sensitive to kethoxal and DMS in the s1 transcript. Apparently, these nucleotides are base-paired in the s1 transcript. Interestingly, the control primer extension reaction yields two major stop products on the s1 RNA template at position U 118 and U 120 (marked s in Table I). Because the wt transcript has an identical sequence, it is likely that the RT enzyme is stopped by a structure that is specific for the s1 template. Apparently, the RT enzyme stopped three and five nucleotides before reaching the U5-AUG duplex. The second region that shows differential s1-wt reactivity concerns the sequences flanking the Gag initiation codon (Fig. 5B). Purines 332-336 are exclusively accessible to kethoxal and DMS in the wt transcript, indicating that these nucleotides are singlestranded. The third region that exhibits major probing differences is domain 235-242 (Fig. 5C). This sequence is completely sensitive to kethoxal and DMS in the s1 transcript, whereas it is only partially sensitive in the wt transcript. Together, these results support the folding of the U5-AUG interaction in the s1 mutant transcript. As a result, nucleotides 240 -242 become single-stranded exclusively in the BMH fold of the s1 transcript (Figs. 4 and 5C). These nucleotides are paired to nucleotides 113-115 in the LDI conformation of the wt transcript.
The poly(A) and DIS regions also react differently in the wt and s1 transcripts (Fig. 5D). All five A residues of the poly(A) signal 73 AAUAAA 78 are equally accessible to DMS in the wt transcript, confirming that the poly(A) signal is singlestranded as in the LDI structure. In contrast, 73 AA 74 is less exposed to DMS than 76 AAA 78 in the s1 transcript, indicating that the poly(A) hairpin of the BMH conformation is formed. We previously used this differential reactivity within the poly(A) signal to differentiate between the LDI and BMH structures (24). Several nucleotides in the DIS region (264 and 274 -276) are more exposed in wt RNA compared with s1 RNA, confirming the LDI fold of wt RNA (Fig. 5E). In contrast, A 263 is exclusively DMS-sensitive in the s1 transcript, consistent with the folding of the DIS hairpin in the BMH structure. These combined results confirm that the wt transcript adopts the LDI conformation as the ground state structure and the s1 mutations force the RNA into the alternative BMH fold.
Slight differences in reactivity between the two transcripts are also observed for positions 66 -68, 274 -305, and 356 (Table  I). For instance, G 290 and G 292 are more reactive and G 298 is less reactive in s1 RNA. These differences led to the proposed folding of the SD hairpin in the BMH structure and the ⌿ E hairpin and R-Gag duplex in the LDI conformation (Fig. 4). The nucleotides in the bottom stem segment of ⌿ E and in the R-Gag duplex are moderately accessible to kethoxal/DMS (Table I), suggesting that these RNA structures are metastable.
Phylogenetic Analysis of U5-AUG Duplex-We have shown that the s1 mutant folds the U5-AUG duplex as part of the BMH fold. The U5-AUG interaction is not present in the LDI fold, which is the most stable structure of the wt leader RNA. It proved difficult to formally demonstrate that the U5-AUG duplex will be formed in the wt leader once the RNA switches into the BMH structure because the LDI conformation is strongly favored. We therefore performed an extensive phylogenetic analysis of leader sequences of other lentiviruses to provide further evidence for the U5-AUG interaction in the form of base pair co-variations (Fig.  6). This survey presents convincing evidence for the proposed long distance base pairing. For instance, the closing base pair U-A is replaced by C-G in the HIV-1 isolate from the N (new) group. The U5-AUG duplex is also conserved in the more distantly related simian immunodeficiency viruses (SIV) and HIV-2 lentiviruses. It is not surprising that the AUG start codon is absolutely conserved, but we identified numerous sequence changes in the nucleotides that flank the start codon. These changes are compensated by complementary changes in the upstream U5 sequences. For instance, the U5-AUG duplex in SIV l'Hoest shows five co-variations, two of which affect the Gag ORF. Despite all sequence variations, it is remarkable that the stability of the U5-AUG interaction is kept within certain limits, ranging from 10 to 13 base pairs. Because the U5-AUG duplex is present exclusively in the BMH structure, major changes in its stability will have a direct impact on the LDI-BMH equilibrium. This requirement may explain the conservation of the U5-AUG duplex stability, exactly as was described for other leader RNA structures such as the poly(A) hairpin (38,39). In summary, the phylogenetic data indicate that U5-AUG base pairing, but not the actual nucleotide sequence, is conserved among primate lentiviruses. The combined results support a function for this long distance interaction in the viral replication cycle. To directly test this, we performed replication experiments with the wt and w2mutated viruses. One representative replication curve in the SupT1 cell line is shown in Fig. 7. This result indicates that preventing the formation of the U5-AUG duplex leads to a significant replication defect. Studies are ongoing to further analyze these mutant viruses and select for phenotypic revertants.  Table I and Fig. 5. The regulatory motifs are marked in colors as in Fig. 1A. The two segments of the U5-AUG duplex are presented in a red-outlined box, and the Gag initiation codon is marked by an asterisk. Overall, the percentage of single-stranded nucleotides in the BMH conformation is similar to that of the LDI conformation (39%).

DISCUSSION
We analyzed the secondary structure of the complete 5Ј-UTR of the HIV-1 genomic RNA (nts 1-368). Previous studies on the 5Ј-terminal 290-nucleotide fragment indicate that the 5Ј-UTR is able to adopt two mutually exclusive structures, LDI and BMH. This study reveals that the complete 5Ј-UTR also folds these alternative conformations. The 3Ј-terminal 290 -368 segment contributes differently to the LDI and BMH structures. In the context of the ground state LDI conformation, these sequences fold the well known ⌿ hairpin, but in a significantly extended form. This ⌿ E hairpin includes the SD signal and the Gag start codon. The internal loops of this ⌿ E hairpin are remarkably symmetrical and purine-rich; only a single U and no C residues are present among the 29 single-stranded nucleotides. Different extended forms of the ⌿ hairpin have been described (19, 37, 40 -42). These alternative conformations are not confirmed by our RNA structure probing results with the full-length 5Ј-UTR. It is clear from our study that the lower half of the ⌿ E hairpin cannot be too rigid because it has to melt to allow the formation of the SD hairpin and the U5-AUG duplex in the BMH context. The LDI conformation of the complete 5Ј-UTR also contains a long distance base-pairing interaction between sequences in the Gag ORF and the R region immediately downstream of TAR (R-Gag duplex). Studies are in progress to verify the presence of this duplex and its role in HIV-1 biology.
In the metastable BMH conformation, the lower part of ⌿ E is opened to allow the formation of the S.D. hairpin that flanks the short version of the ⌿ hairpin. Furthermore, the downstream Gag sequences are also free to engage in alternative base pairing to form the long distance interaction with upstream U5 sequences: the U5-AUG duplex. In fact, formation of the U5-AUG duplex creates a base-pairing partner for the single-stranded U5 region between the poly(A) hairpin and the PBS domain that thus far could not be paired with other sequences in the HIV-1 RNA genome. Former studies suggested that the Gag initiation codon is located in the bottom stem of a bulged hairpin (1,43). This folding was based on Mfold and probing studies of incomplete leader sequences that lack nucleotides 105-115 that are necessary to form the U5-AUG interaction. The U5-AUG duplex closes a domain with multiple hairpins (PBS, DIS, S.D., and ⌿) that are separated by a purine-rich ring structure with A 9 , G 7 , C 1 , and U 0 . The bulges and internal loops of several of the hairpin motifs (DIS, SD, and ⌿) are also purine-rich with A 5 , G 4 , C 0 , and U 0 . The probing results of the complete 5Ј-UTR clearly indicate that the multiple purines of the ring are single-stranded. An extended format of the DIS hairpin has recently been proposed based on nuclear magnetic resonance analysis of a small RNA fragment (44). Such a DIS extension will, at least partially, close the ring structure that we propose. However, the DIS extension is not confirmed by our probing data, and the proposed base pairing is not supported by base pair co-variations in different viral isolates. In fact, several isolates including our LAI strain contain sequence variations that do not allow this DIS extension. The The U5-AUG duplex was analyzed by experimental and theoretical approaches. The duplex was strengthened in the s1 variant by mutations in the U5 domain. Probing experiments showed that s1 RNA folds the BMH conformation with the U5-AUG duplex. The wt transcript folds the LDI conformation that was originally discovered because it migrates fast on nondenaturing gels compared with the BMH conformation (45). Strikingly, the s1 transcript migrates even faster, suggesting that the BMH structure is compacted by closing of the U5-AUG duplex. It will be of interest to test whether this potentially compact RNA fold is suitable for x-ray studies. Strengthening of the U5-AUG duplex shifts the equilibrium from LDI to the BMH conformation. Previous work showed that such a shift usually coincides with an increased RNA dimerization capacity (24,46,47). Indeed, the s1 transcript dimerizes more efficiently than wt RNA, and this effect was neutralized by mutations in the Gag region that weaken the U5-AUG duplex.
Phylogenetic analyses of the 5Ј-UTR sequences of all known types of HIV and SIV revealed that a similar U5-AUG duplex can be formed despite considerable divergence in the sequence of the U5 and AUG segments. Many base pair co-variations were observed, providing evidence for the existence and biological importance of this structural motif. Because the U5-AUG duplex occludes the Gag initiation codon, it is possible that this interaction influences the translation of the Gag protein. It has been shown that the HIV-1 5Ј-UTR can function as an internal ribosomal entry site (IRES). 2 Nevertheless, the 5Ј-UTR of different viral isolates exclude upstream AUG triplets (1), which is consistent with a regular scanning mechanism of translation. It is therefore possible that translation of the genomic HIV-1 RNA proceeds both by scanning and by internal initiation. The translational mode may differ with the stage of infection or on spliced versus unspliced RNA. We previously speculated that NC may shift the 5Ј-UTR conformation from LDI to BMH late in the infection process. This could coincide with a switch in the mechanism of translation, from scanning to in- ternal initiation or vice versa. In the former scenario, the BMH conformation and the U5-AUG duplex may be important structures for the proposed IRES function. Certain features of the BMH structure do in fact resemble the IRES element of the pestiviral RNA genome, which was recently shown to be critically dependent on clustered single-stranded adenosines within this structured RNA motif (48). More strikingly, polypurine A-rich sequences were also shown to exhibit IRES activity (49), and we observed the abundance of single-stranded purines and especially adenosines in the ring structure that is formed by closure of the U5-AUG duplex. There is also accumulating evidence that unpaired adenosine residues can dock into the minor groove of a receptor helix, and this A-minor motif appears a very important element for the acquisition of global RNA architecture (49). Thus, the destiny of the HIV-1 genomic RNA: the ribosome or the virion (50) may be regulated by structural changes in the leader RNA, similar to mechanisms that have been described for the cauliflower mosaic virus (51,52). Translation studies are in progress to test these intriguing possibilities.
Another role for the U5-AUG duplex may reside in RNA packaging, because this long distance interaction has a major impact on the structural presentation of the leader RNA sequences that are involved in RNA packaging. The U5-AUG duplex can form exclusively in the genomic HIV-1 RNA that includes the Gag region and not in the multiple subgenomic forms of HIV-1 RNA. Thus, this structure may contribute to specific packaging of the full-length genomic RNA into new virus particles. Interestingly, previous work showed that deletion of the upstream sequences of the U5-AUG duplex (nts 98 -126) induces an RNA packaging defect (53). Gag-and NCbinding sites on the HIV-1 RNA have been mapped to a segment of ϳ120 nucleotides that includes the DIS, SD, and ⌿ hairpins and sequences of the Gag ORF (37, 43, 54 -56). Most of these studies used relatively small fragments of the 5Ј-UTR that cannot fold the LDI conformation, and these RNAs will constitutively fold the multiple hairpins of the alternative BMH structure. However, binding of Gag/NC to the full-length 5Ј-UTR may depend on formation of the U5-AUG duplex and the concomitant LDI to BMH switch. RNA packaging studies with mutant viruses are severely hampered by the fact that the 5Ј-UTR encodes many overlapping regulatory signals that cannot be studied independently (57). For instance, alterations in the DIS region affect RNA dimerization but also result in an RNA packaging defect (58 -60). Likewise, mutation of the Gag initiation codon in an HIV-1-based vector was shown to result in very low levels of intracellular genomic RNA, which consequently results in reduced RNA packaging (61). In general, many indirect effects can be expected from mutations that influence the overall folding of the HIV-1 5Ј-UTR, and such side effects do severely complicate the description of discrete RNA motifs like the packaging signal. In vitro studies with short RNA fragments will certainly miss some of the important features of the HIV-1 5Ј-UTR because the proper secondary and tertiary RNA structure is not formed.