In vitro evidence for a long range pseudoknot in the 5'-untranslated and matrix coding regions of HIV-1 genomic RNA.

The 5'-untranslated leader region of human immunodeficiency virus type 1 (HIV-1) RNA contains multiple signals that control distinct steps of the viral replication cycle such as transcription, reverse transcription, genomic RNA dimerization, splicing, and packaging. It is likely that fine tuned coordinated regulation of these functions is achieved through specific RNA-protein and RNA-RNA interactions. In a search for cis-acting elements important for the tertiary structure of the 5'-untranslated region of HIV-1 genomic RNA, we identified, by ladder selection experiments, a short stretch of nucleotides directly downstream of the poly(A) signal that interacts with a nucleotide sequence located in the matrix region. Confirmation of the sequence of the interacting sites was obtained by partial or complete inhibition of this interaction by antisense oligonucleotides and by nucleotide substitutions. In the wild type RNA, this long range interaction was intramolecular, since no intermolecular RNA association was detected by gel electrophoresis with an RNA mutated in the dimerization initiation site and containing both sequences involved in the tertiary interaction. Moreover, the functional importance of this interaction is supported by its conservation in all HIV-1 isolates as well as in HIV-2 and simian immunodeficiency virus. Our results raise the possibility that this long range RNA-RNA interaction might be involved in the full-length genomic RNA selection during packaging, repression of the 5' polyadenylation signal, and/or splicing regulation.

The genomes of RNA viruses are multifunctional molecules. In retroviruses, including human immunodeficiency virus type 1 (HIV-1), 1 the primary RNA transcript functions as pre-mRNA (splicing), mRNA (synthesis of Gag and Gag-Pol proteins), and genomic RNA for packaging into infectious particles. The 5Ј-untranslated leader region of the HIV-1 RNA genome contains cis-acting signals of recognition for proteins and RNAs responsible for regulating several crucial steps of the viral life cycle. This region includes a long terminal repeat consisting of the R (repeat) and U5 (unique at 5Ј) regions and the primer binding site (PBS), as well as exon 1 leader sequences downstream of U5 (Fig. 1A) (1).
Despite numerous studies aimed to probe the structure of the 5Ј-unstranslated region of HIV-1 genomic RNA, very little is known about its tertiary structure in vitro as well as in the virion. It is likely that specific RNA-protein and RNA-RNA interactions allow fine tuned coordinated regulation of the different functional sites in this region and permit the compaction of the genomic RNA in a 120-nm particle. Recent articles reported that the leader region of HIV-1 RNA can adopt a compactly folded structure (32) and that a conformational RNA switch could regulate various functions in the viral life cycle (5). In a search for cis-acting elements important for the tertiary structure of the 5Ј-unstranslated region of HIV-1 genomic RNA, we used ladder selection experiments to identify a short stretch of nucleotides directly downstream of the poly(A) signal that interacts with a nucleotide sequence located in the matrix coding region. We report that site-directed mutagenesis disrupting either of these sequences inhibits the long distance interaction. Similarly, antisense oligonucleotides efficiently inhibit the interaction. The functional significance of this long range pseudoknot is further supported by phylogenetic sequence analysis that revealed conservation of this interaction in the genome of all HIV-1, HIV-2, and SIV isolates.
RNAs were either 5Ј-end-labeled for 30 min at 37°C with [␥-32 P]ATP (Amersham Biosciences) and T4 polynucleotide kinase, 3Ј-end-labeled overnight at 4°C with [ 32 P]pCp and T4 RNA ligase, or randomly labeled during transcription for 2 h at 37°C with [␣-32 P]ATP (38). For 5Ј-end labeling, RNAs with free 5Ј-OH groups were prepared by in vitro transcription using T7 RNA polymerase in the presence of 4 mM ApG (Sigma) and 1 mM NTPs as previously described (38).
Mobility Shift Assay-In a typical experiment, wild-type or mutant unlabeled RNAs and their labeled counterparts (3-5 nCi, 0.01-0.04 g) were diluted in Milli-Q water (Millipore Corp.) alone or with their corresponding unlabeled partners at 400 nM final concentration, heated for 2 min at 90°C, and chilled on ice for 2 min. After the addition of 2 l of 5-fold concentrated binding buffer (final concentration: 50 mM sodium cacodylate (pH 7.5), 300 mM KCl, 5 mM MgCl 2 ), the samples were incubated for 30 min at 37°C and analyzed at 4°C on 1% (w/v) agarose gels in 45 mM Tris-borate (pH 8.3), 0.1 mM MgCl 2 . Gels were fixed for 10 min in 10% (v/v) trichloroacetic acid and dried for 40 min under vacuum at room temperature. RNA monomers and shifted RNA species were visualized after autoradiography or by using a BAS 2000 BIO-Imager (Fuji).
Thermal Stability of the Long Range Interaction-To determine the thermal stability of the long range interaction, samples were incubated 30 min at 30°C, and then the temperature was gradually increased by 7°C steps. After a 5-min incubation at the appropriate temperature, an aliquot was loaded on a 1% (w/v) agarose gel after the addition of glycerol (20% final concentration) and run as previously described. Monomers and shifted RNA species were visualized after ethidium bromide staining and quantified with the MacBas (Fuji) software. The melting temperature of the shifted RNA, T m , was defined as the temperature at which the fraction of shifted RNA was reduced by 2-fold, as compared with its value at 37°C. Enzymatic RNA Probing with RNase T2-In a standard experiment, 400 nM 1-156 or 1-615 RNAs was dissolved in 8 l of water, heated for 2 min at 90°C, chilled on ice, and renatured for 30 min at 37°C in 50 mM sodium cacodylate (pH 7.5), 5 mM MgCl 2 , 300 mM KCl. After renaturation, the samples were cooled at room temperature for 10 min before treatment with RNase T2 (15 min at 37°C; 0.002 units/l). The positions of RNase hydrolysis were detected by primer extension with avian myeloblastosis reverse transcriptase as previously described (2).
Ladder Selection Experiments-5Ј-and 3Ј-end-labeled RNA molecules (400 nM) were submitted to limited alkaline hydrolysis in 50 mM NaCO 3 (pH 8.9) during 4 min at 90°C. Alkali ladders were neutralized with 300 mM sodium acetate (pH 5.6), ethanol-precipitated, and used in the mobility shift assay with the adequate nonhydrolyzed RNA partner (400 nM). Monomers and shifted RNA molecules were cut out from the 1% low melting agarose gel and extracted with phenol (v/v) for 15 min at 50°C. After ethanol precipitation, RNA fragments were resuspended in 4 l of formamide-containing loading buffer and separated by denaturing gel electrophoresis on an 8% acrylamide gel. RNase T1 (Gspecific) and RNase U2 (A-specific) sequencing reactions were run in parallel to detect the borders of the selected population of RNA fragments. After autoradiography, films were scanned, and densitograms of the lanes corresponding to the initial alkali ladder and the selected RNA fragments were obtained with the program QuantityOne (Bio-Rad).
Inhibition of the RNA-RNA Interaction by Oligodeoxyribonucleotides-Synthetic DNA oligodeoxyribonucleotides complementary to positions 313-334, 335-366, 367-394, 395-420, and 441-460 of the HIV-1 Mal sequence were used in the mobility shift assay. Briefly, the antisense oligodeoxyribonucleotide (400 nM or 1.6 M) was first incubated with 400 nM of 3Ј-end-labeled RNA-(305-615) in the binding buffer for 15 min at 37°C. The RNA-oligonucleotide complexes were then incubated for 30 min at 37°C with an equimolar amount of RNA-(1-311) DIS(Ϫ) and analyzed by agarose gel electrophoresis. The fraction of monomer and shifted RNA molecules were quantified using a BAS 2000 BIO-Imager (Fuji) as previously described (24,39).

RNA Sequences Upstream and Downstream of the SD Site
Interact Together-It has previously been reported by us (25,26,39) and by others (22,23) that the 5Ј-untranslated region of HIV-1 genomic RNA contains the major dimerization signal (DIS) but that other regions around the DIS could influence the scaffold of the RNA tertiary structure (24,40). To better understand the HIV-1 RNA folding and to identify potential long distance contacts, we synthesized a set of HIV-1 RNAs differing by their length by in vitro transcription with T7 polymerase ( Fig. 2A). We analyzed the shift induced by labeled RNA-(305-615) on this set of RNAs by native agarose gel electrophoresis (Fig. 2B).
When labeled RNA-(305-615) was incubated with unlabeled RNA-(1-615) or RNA-(1-615) DIS(Ϫ), only very weak shifted bands were detected (Fig. 2B, lanes 3 and 4). Our interpretation of this result is that the interaction between the sequences located upstream and downstream of the SD site was intramolecular in RNA-(1-615) (and RNA-(1-615) DIS(Ϫ)). Thus, the downstream site of the long RNA efficiently competed with the homologous site of the truncated RNA-(305-615) for binding to the upstream site, reducing the level of the intermolecular interaction with RNA-(305-615). Taken together, these results suggest that the long distance interaction is independent of the RNA dimerization process and that this interaction requires two elements apart from the SD site that interact with each other through intramolecular base pairing in RNA-(1-615).
Characterization of the Sequences Involved in the Long Distance Interaction-To further characterize the long distance interaction, we next analyzed RNA mobility shift by using a A, schematic diagram of the strategy used to define the 5Ј border of the 5Ј interaction domain using 5Ј-end labeled RNA-(1-311). Similar strategies were used to define the 3Ј border of the 5Ј interaction domain and the 5Ј and 3Ј borders of the 3Ј interaction domain. After statistical alkali hydrolysis of 32 P-labeled RNA-(1-311), the ladder was incubated with RNA-(305-615). Shifted molecules were visualized on native agarose gel and purified, and the truncated version of the 1-615 RNA. To avoid dealing with multimeric complexes on agarose gels (Fig. 2B, lane 5), we used dimerization-deficient RNAs (DIS(Ϫ); see "Experimental Procedures"). Both RNAs starting at positions 100 and 123 were unable to shift RNA-(305-615) (Fig. 2B, lanes 10 and 11). A similar result was obtained with RNA-(1-62) (Fig. 2B, lane 7). On the contrary, RNA-(1-102) and RNA-(1-152) gave a shift with a yield comparable with the one obtained with RNA-(1-311) DIS(Ϫ) (Fig. 2B, lanes 8 and 9). These results indicated that one of the sequences required for the interaction was located between nucleotides 62 and 102, corresponding to the poly(A) hairpin loop (Fig. 1B). Similar experiments were conducted using unlabeled 3Ј-truncated RNAs starting at position 123 or 305 with labeled RNA-(1-311) DIS(Ϫ). They allowed us to delimit the 3Ј interacting domain downstream of nucleotide 415 (data not shown).
To estimate the number of base pairs involved in the long range interaction, we performed thermal denaturation experiments of the complex formed by RNA-(1-311) DIS(Ϫ) and RNA-(305-615) (Fig. 2C). The T m of this complex is 43-45°C. This value is fully comparable with the T m obtained with the DIS hairpin of HIV-1, where the two RNA monomers are able to interact through six Watson-Crick base pairs (24).
Ladder Selection Experiments-In an attempt to map more precisely the 5Ј and 3Ј borders of the RNA domains that form the long distance interaction, we performed ladder selection experiments. In these experiments, the RNA subfragments, obtained from a pool of RNAs generated by mild alkaline hydrolysis of either 5Ј-or 3Ј-end-labeled RNA-(1-311) or RNA-(305-615), were selected for their capability to retain binding to their unlabeled RNA partner (Fig. 3A). After extraction of the bound RNA fragments from the agarose gel (see "Experimental Procedures"), selected molecules were analyzed by electrophoresis on a denaturing polyacrylamide gel (Fig. 3A), and the selection boundaries were determined from the densitograms of the initial and selected RNA species. The boundaries were identified as the starting points of a strong continuous selection. Peaks corresponding to isolated selected fragments, such as those observed at position 134 in Fig. 3B and positions 434, 436, and 447 in Fig. 3C, were not taken into account, since they most likely reflect artifactual selections due to aberrant folding of these RNA fragments.
When the 1-311 ladder was used in the mobility shift assay with unlabeled RNA-(305-615), the size of the retained RNA subfragments varied from full-length down to position U 84 for 5Ј-end-labeled fragments and position C 70 for 3Ј-end-labeled RNAs (Fig. 3B). These positions (positions 70 -84) defined the boundaries of a region that should be sufficient for binding RNA-(305-615). Interestingly enough, this region corresponds to the poly(A) hairpin loop and seemed even more efficiently selected when the whole poly(A) hairpin was present (Fig. 3B, higher selection from position 100), indicating that the selection probably involved structural motif recognition.
When the 305-615 ladder was used in our selection experiments (Fig. 3C), we showed that the region potentially sufficient to interact with RNA-(1-311) was delimited by nucleotides A 408 to G 462 (Fig. 3C). Surprisingly, this region is located in the Matrix coding sequence of the gag gene. The secondary structure of this region shown in Fig. 1 is only tentative, and with the exception of stem 413-421/490 -498, the rest of this RNA domain may form loops and metastable helices particu-larly rich in AU pairs (2). Thus, this region might either possess a low level of structural organization or fold into alternative structures in equilibrium with each other.
Enzymatic Probing of the Poly(A) Hairpin Loop-The results described above indicate that a sequence located in the poly(A) hairpin loop and the 5Ј-end of the matrix coding region trigger the formation of the RNA shift. Thus, we focused our analysis on these two domains and looked for complementary sequences. Indeed, we identified two putative seven-nucleotide sequences that could potentially interact through Watson-Crick base pairing. The first is located immediately downstream of the poly(A) signal ( 77 GCUUGCC 83 ), whereas the second corresponds to nucleotides 457 GGCAAGC 463 .
To test the first sequence, we performed enzymatic RNA structure probing using truncated transcripts containing or not containing the TAR and poly(A) hairpin structures (RNAs 1-156 and 123-615) (Fig. 4). These RNAs have been tested for their potential interaction, and a clear shift was detected by mobility shift assay (data not shown). The susceptibility of RNA-(1-156) toward RNase T2 was tested when incubated alone or in the presence of increasing concentrations of RNA-(123-615) and compared with RNA-(1-615). In RNA-(1-156), the whole poly(A) hairpin loop was accessible to digestion by RNase T2 (Fig. 4, lane 1). However, the sequence in the loop immediately downstream of the poly(A) signal ( 77 GCU-UGCC 83 ) clearly became protected upon incubation with increasing amounts of RNA-(123-615) (Fig. 4, lanes 3 and 4). As one would expect, RNA-(1-615), which contains the two putative sequences involved in the long distance intramolecular interaction, showed no reactivity of these nucleotides toward selected molecules were separated on a 8% denaturing polyacrylamide gel. B and C, ladder selection experiments with 5Ј-end and 3Ј-end 32 P-labeled RNA fragments 1-311 (B) and 305-615 (C). Lanes 1, RNAs submitted to RNase T1 digestion; lanes 2, RNAs submitted to RNase U2 digestion; lanes 3, RNAs statistically hydrolyzed with alkali; lanes 4, RNAs statistically hydrolyzed with alkali and that have been selected by the corresponding RNA partner. The borders corresponding to the start of strong selection are indicated in boldface type on the left of the gels. The densitograms of lanes 3 (gray) and 4 (black) are shown beside the autoradiographs. RNase T2. These data reinforces the hypothesis that nucleotides 77-83 constitute the 5Ј interaction site. It is worthy of note that the RNase T2 accessibility of the poly(A) hairpin loop was independent of the integrity of the DIS loop (data not shown). Assays to test the downstream sequence were unsuccessful, probably due to the structural versatility of this domain (results not shown).
Inhibition of the Long Distance Interaction by Antisense Oligonucleotides-In a second step, to validate the implication of region 457-463 in the long distance interaction, we wondered whether the interaction could be inhibited by antisense DNA oligonucleotides directed against this region (Fig. 5). We were particularly interested to understand the behavior of the matrix region, since no convincing results were obtained from probing experiments. RNA-(305-615) and the different antisense DNA oligonucleotides were heat-annealed as described under "Experimental Procedures" and were further used in the gel mobility shift assay. As shown in Fig. 5, some antisense DNA molecules had no effect on the shift of RNA-(305-615) (Fig. 5, AS313-334 and AS367-394). An additional band was observed with AS-(313-334) that could be explained by the induction of a conformational switch of the RNA by DNA annealing, as previously observed (32). Annealing of AS-(335-366) and AS-(395-420) partially inhibited the RNA shift, and AS-(441-460) almost completely prevented RNA-(305-615) from shifting (Fig. 5). Taken together, those results confirm our ladder selection data (Fig. 3) but raise the possibility that multiple domains in the matrix coding sequence might directly or indirectly affect the long range interaction. This observation which correlates with the absence of clear probing information, suggests that the tertiary interaction depends on particular features of the global versatile structure of the Matrix coding region.
Inhibition of the Long Distance Interaction by Site-directed Mutagenesis-To test the putative base pairing interaction (Fig. 6A) between the poly(A) and the matrix coding regions, mutants were constructed in which the predicted interaction was disrupted. Fig. 6B shows the three types of mutations that were introduced in the different size RNA fragments (1-311 DIS(Ϫ), 305-615, or 1-615 DIS(Ϫ)). Note that as previously explained, all mutant RNAs tested in this study were mutated in the DIS hairpin loop, so that only one shifted species can be formed. Mutant RNA CG77 contains a four-nucleotide substi- We next analyzed the capacity of the RNA mutants to give a shift on agarose gel with different RNA partners (Fig. 6, C-E).
In the first set of experiments (Fig. 6C), we showed that the substitution of the poly(A) hairpin loop in the 1-311 context almost completely suppressed the interaction with RNA-(305-615) (Fig. 6C, compare lanes 2 and 4). Similar results were obtained with RNA-(123-615) (data not shown). On the other hand, the mutation in the matrix coding region only partially inhibited the interaction with DIS(Ϫ) RNA-(1-311) (Fig. 6C,  lanes 4 and 5). Similarly, the base pairing capacity was not completely restored when using a pair of RNA mutants containing complementary sequences (Fig. 6C, lanes 2 and 3). The residual interaction between RNA-(1-311) and RNA-(305-615) UU460 might be due to the complementarity between the 75 AAGC 78 sequence in the poly(A) hairpin loop and 459 GCUU 462 in the mutated region. The inefficient transcomplementation between the two mutated sequences suggests the existence of an important structural element that would be disrupted in the UU460 mutant.
Finally, we tested the effects of the CG77 and UU460 mutations using various combinations of RNAs 615 nucleotides in length (Fig. 6E). With the exception of the control RNA used in lane 1, all RNAs were mutated in the DIS. Thus, RNA-(1-615) CG77 (lane 2) and RNA-(1-615) UU460 (lane 3) migrated as the expected monomeric species. Furthermore, we already showed that in RNA- (1-615), the long range interaction is intramolecular (Fig. 2). Therefore, an RNA mutated in the DIS and bearing wild-type sequences in the poly(A) hairpin loop and matrix coding region also migrated as a monomer (lane 2). Similarly, the migration as a monomer of the RNA bearing the compensatory mutations CG77 and UU460 reflects either an intramolecular long range interaction or the absence of such an interaction (lane 5). To address this question, we used a variety of RNA-(1-615) combinations (Fig. 6E, lanes 6 -10). Remarkably, only combination of RNA-(1-615) UU460 and RNA-(1-615) CG77 was able to form a stable complex (lane 10). This trans-complementation strongly suggests that regions CG77 and UU460 are indeed those involved in the long range interaction.
The fact that trans-complementation only occurs between large fragments and not between short fragments suggests that the negative effect of the UU460 mutation is more pronounced in the truncated RNA than in the intact RNA. This can be correlated with the versatility of the matrix domain that is more sensible to its context than stable regions. Combined together, these results strongly suggest that the sequences we mutated in the poly(A) hairpin loop and in the matrix coding region constitute the 5Ј and 3Ј interaction site of the long range pseudoknot.
Conservation of a Long Distance Pseudoknot in HIV-1, HIV-2, and SIV-To test the biological significance of the long range interaction we identified, we performed an extensive sequence comparison of the 5Ј-polyadenylation signal region and the matrix coding sequence around amino acids 35-37 in human and simian lentiviruses (available on the World Wide Web at hiv-web.lanl.gov). We took all sequences of the nucleotide alignments of HIV-1/HIV-2/SIV complete genomes (42) into account, and Fig. 7A includes all sequence variations present in these alignments.
Alignment of the sequences surrounding the poly(A) signal and the matrix region was not very informative as far as we considered only the HIV-1 isolates. Indeed, these two sequences were absolutely conserved in all HIV-1 clades. This conservation suggests that the two aligned sequences are functionally important, but it did not give us any information about their interaction.
Interestingly, more variation was found among the poly(A) hairpin loop and the matrix coding sequences of HIV-2 and SIV isolates, but the alignment revealed the maintenance of the potential base pairing between these regions (Fig. 7A). Indeed, despite the lack of strong sequence homology within these two groups, we observed a high degree of conservative changes (GC changed to AU or vice versa). Regarding the HIV-2 group, the strongest co-variations were observed for the HIV-2_B_D205 and HIV-2_B_EHO isolates, where base deletions and substitutions in the poly(A) hairpin were compensated by base substitutions in the matrix sequence. All other HIV-2 and SIV isolates also demonstrated the same base pairing conservation, ranging from 6 (SIVMM251) to 10 (SIVVER9063) base pairs for the putative interaction (Fig. 7A). For those having an increased number of base pairs (8 -10), it was interesting to note the appearance of one (HIV-2_B_EHO, SIVVER9063, SIVJHOEST, and SIVAGM677) or two (SIVGRI677) GU base pairs, suggesting that the overall stability of the long-range pseudoknot must be maintained within narrow limits.
The SIVSYK173 isolate is also very remarkable. Indeed, it was also possible to draw a putative long distance interaction with the matrix domain, but in this case, the complementary sequence was located upstream of the poly(A) signal (Fig. 7). Interestingly, it has been shown by phylogenetic analysis that the overall architecture of the poly(A) hairpin is very well conserved among lentiviruses but that the poly(A) signal in the SIVSYK173 isolate is shifted toward the 3Ј side of the loop due to a nine-nucleotide duplication (12) (Fig. 7B). Thus, the sequence complementary to the matrix coding region is part of the poly(A) hairpin loop, and therefore, it is available to interact with its putative partner.
The high pressure of selection to conserve the long distance interaction between the poly(A) hairpin loop and a region located in the matrix coding region in HIV-1, HIV-2, and SIV supports the functional importance of this novel tertiary interaction in HIV-1 replication.

DISCUSSION
The genome of HIV-1 is composed of two homologous RNA molecules about 9200 nucleotides long, which are packaged in a 120-nm diameter particle. The virus has developed specific mechanisms to package its genome, involving either RNA-RNA (DIS) or RNA-protein (NC-DIS, NC-Psi) interactions. However, other RNA-RNA interactions must be present all along the genome to allow its compaction and its correct folding.
In this study, we provide strong evidence that a long distance pseudoknot exists in the 5Ј-end of HIV-1 genomic RNA. By using RNA fragments of different lengths and ladder selection on the first 600 nucleotides of HIV-1 genomic RNA, we identified a heptanucleotide sequence located immediately downstream of the polyadenylation signal ( 77 GCUUGCC 83 ) that interacts with a complementary sequence located in the matrix coding sequence, about 400 nucleotides downstream ( 457 GGCAAGC 463 ) (Figs. 2 and 3). The melting temperature of the complex formed between RNAs 1-311 and 305-615 is consistent with the proposed interaction. Mutations that disrupt the putative base pairing severely impaired the long distance interaction (Fig. 6), as well as oligodeoxyribonucleotides directed against one of those regions (Fig. 5). In the wild type RNA, this long range interaction is intramolecular, since 1) these regions are not accessible toward RNase T2 digestion in the large fragment ( Fig. 4 and data not shown), and 2) interaction between a long and a short fragment containing the two interacting sequences or only one, respectively, can only occur when one of the two interacting sequence is mutated in the large fragment (Fig. 6D). Furthermore, sequence comparisons of human and simian lentiviruses brought compelling evidence that the long range interaction is preserved despite significant variation among the sequences (Fig. 7).
The identification of the present long distance RNA-RNA interaction raises a number of questions about its biological significance. The fact that this tertiary interaction involves two regions located in the noncoding and coding sequences of the HIV-1 genome suggests that it could regulate key steps of the replication cycle.
Dimeric RNA Encapsidation-Encapsidation and dimerization of genomic RNA have been suggested to be related processes in the course of HIV-1 replication. Indeed, the cis-acting sequences required for encapsidation partially overlap those required for in vitro dimerization (43). Moreover, specific encapsidation of unspliced genomic RNA implies that at least part of the encapsidation signals should be located downstream of the SD site to prevent encapsidation of spliced RNAs. Indeed, the packaging signal of HIV-1 RNA is multipartite (35,44), and regions outside Psi (TAR, poly(A), DIS, Gag) contribute to optimal packaging (7,11,34,36,(45)(46)(47). The long distance interaction between the poly(A) hairpin and the matrix coding region could facilitate the discrimination between unspliced and spliced RNAs by the retroviral Gag proteins. This hypothesis is consistent with a recent study showing that substitution of the sequence directly downstream of the poly(A) signal reduced the ratio of genomic/spliced RNAs in virions (7) but does not affect the dimerization efficiency of the RNA genome (48).
Polyadenylation-Due to the duplication of the poly(A) signal at both ends of the HIV-1 genome, a fine tuned mechanism must exist to restrict the proximal poly(A) signal used. Two main inhibition mechanisms have been proposed: proximity of the transcription initiation site (49) and occlusion by Tat (50) or by the SD site via its interaction with U1 snRNP (51,52). Moreover, sequences of the leader that are uniquely present downstream of the poly(A) site decreased the binding of polyadenylation factors (17). Considering the fact that the poly(A) hairpin loop interacts with a sequence further downstream in the matrix coding region, one can easily imagine that the access of polyadenylation factors would be impeded by such a mechanism. It should be noted that an RNA encompassing the interaction site in the matrix coding domain was not able to interact in vitro with a 600-nucleotide-long RNA corresponding to the 3Ј-end of the HIV-1 genome (data not shown).
Translation-Likewise, it is conceivable that the long range pseudoknot might regulate translation of the gag gene. It has recently been suggested that the HIV-1 leader region could form two alternative structures in vitro and that a conformational RNA "switch," from a rodlike structure to a branched structure, would regulate key steps in the replication cycle, such as translation to dimerization, packaging, and reverse transcription (5). Interestingly, the long range pseudoknot can only take place in the branched structure, since the poly(A) hairpin does not exist in the rodlike structure. Although additional experiments are required to test this possible NC-mediated RNA "switch," it has been previously shown ex vivo that HIV-1 unspliced RNA constitutes a single pool that can function interchangeably as mRNA and as genomic RNA (53)(54)(55)(56). This mechanism differs from the one described for HIV-2, which packages RNA predominantly in cis (i.e. the newly synthesized Gag preferentially encapsidates the RNA from which it was produced) (54,55).
Long distance interactions have been involved in the regulation of RNA synthesis and/or gene expression in a variety of (ϩ)-strand RNA viruses (57,58) and eukaryotic mRNAs (59). Our structural data demonstrate the existence of a long range pseudoknot in the 5Ј-end of the HIV-1 genome, although additional experiments are necessary to understand its exact role in the replication cycle. However, the demonstration of the functional role of the proposed interaction may prove rather difficult. First, the involved poly(A) region is duplicated at both 5Јand 3Ј-ends of the genome. Thus, if one mutates the 5Ј site only, one can only study a single replication cycle because the 3Ј site will be mutated during reverse transcription. If both sites are mutated, it will be difficult to distinguish between effects due to mutation of the functional 3Ј polyadenylation site and disruption of the long range interaction. Preliminary results of single replication cycle experiments did not allow us to detect significant differences between wild-type and mutant viruses regarding Gag expression or viral particle release, thus suggesting that the long distance interaction is not involved in Gag translation, assembly, and protein maturation. 2 This negative result does not rule out the existence of this long range pseudoknot in vivo and its functional role. Indeed, we are presently developing chemical probing of HIV RNA in infected cells and inside the viral particles, and our preliminary data are consistent with the proposed interaction. 3 In addition, HIV and SIV have had thousands years of evolution to select for features that only slightly increase the viral fitness. Of course, such features cannot be tested in a single replication cycle. For instance, mutations of the dimerization initiation site of the HIV-1 RNA produce a packaging defect that can only be observed in multiple cycle infections, although the genomic RNA of all retroviruses is dimeric (36,46,60). Furthermore, our phylogenetic analysis strongly supports the functional importance of this tertiary interaction.