Dissection of a Circumscribed Recombination Hot Spot in HIV-1 after a Single Infectious Cycle*

Recombination is a major source of genetic heterogeneity in the human immunodeficiency virus type 1 (HIV-1) population. The main mechanism responsible for the generation of recombinant viruses is a process of copy choice between the two copies of genomic RNA during reverse transcription. We previously identified, after a single cycle of infection of cells in culture, a recombination hot spot within the gp120 gene, corresponding to the top portion of a RNA hairpin. Here, we determine that the hot region is circumscribed to 18 nucleotides located in the descending strand of the stem, following the sense of reverse transcription. Three factors appeared to be important, albeit at different extents, for the high rate of recombination observed in this region. The position of the hot sequence in the context of the RNA structure appears crucial, because changing its location within this structure triggered differences in recombination up to 20-fold. Another pivotal factor is the presence of a perfectly identical sequence between donor and acceptor RNA in the region of transfer, because single or double nucleotide differences in the hot spot were sufficient to almost completely abolish recombination in the region. Last, the primary structure of the hot region also influenced recombination, although with effects only in the 2-3-fold range. Altogether, these results provide the first molecular dissection of a hot spot in infected cells and indicate that several factors contribute to the generation of a site of preferential copy choice.

Recombination is a major source of genetic heterogeneity in the human immunodeficiency virus type 1 (HIV-1) population. The main mechanism responsible for the generation of recombinant viruses is a process of copy choice between the two copies of genomic RNA during reverse transcription. We previously identified, after a single cycle of infection of cells in culture, a recombination hot spot within the gp120 gene, corresponding to the top portion of a RNA hairpin. Here, we determine that the hot region is circumscribed to 18 nucleotides located in the descending strand of the stem, following the sense of reverse transcription. Three factors appeared to be important, albeit at different extents, for the high rate of recombination observed in this region. The position of the hot sequence in the context of the RNA structure appears crucial, because changing its location within this structure triggered differences in recombination up to 20-fold. Another pivotal factor is the presence of a perfectly identical sequence between donor and acceptor RNA in the region of transfer, because single or double nucleotide differences in the hot spot were sufficient to almost completely abolish recombination in the region. Last, the primary structure of the hot region also influenced recombination, although with effects only in the 2-3-fold range. Altogether, these results provide the first molecular dissection of a hot spot in infected cells and indicate that several factors contribute to the generation of a site of preferential copy choice.
The genome of retroviruses consists of a single-stranded RNA molecule of positive polarity, present in two copies within retroviral particles in the form of a dimer (1). After infection of a target cell, reverse transcription will generate a double-stranded DNA molecule that will be integrated into the genome of the host. A process that has been shown to be extremely frequent in the human immunodeficiency virus type 1 (HIV-1) 3 is template switching between the two copies of genomic RNA during the reverse transcription step (2). According to recent estimates, from 3 to 30 switching events can occur per genome for each infectious cycle, depending on the type of infected cell (3,4). When the two genomic RNAs are not identical (heterozygous viruses), this process can generate chimeric DNA molecules that will result in the production of recombinant viruses in the subsequent generation. In HIV-1, the high genetic heterogeneity and the frequent occurrence of coinfection of a cell by genetically divergent viruses (5)(6)(7)(8) favor the generation of viral particles carrying two non-identical copies of genomic RNA. As a consequence, recombination is extremely frequent in this virus and is considered nowadays as the main source of HIV-1 genetic variability worldwide (9).
The mechanisms underlying recombination in HIV-1 have been intensively studied during the last decade. Most recombination events have been shown to occur by template switching during synthesis of the first DNA strand, when the reverse transcriptase (RT) uses the genomic RNA as a template (3, 10 -12). In retroviruses, reverse transcription is coupled to the degradation of the RNA template by the RT-encoded RNase H activity, a process that is mandatory for template switching (13)(14)(15). The degradation of the template RNA (donor RNA) is required to leave the trailing nascent DNA in a single-stranded form, available for annealing onto the other copy of genomic RNA present within the viral particle (acceptor RNA). This interaction would then drive the ultimate transfer of the growing 3Ј end of the nascent DNA onto the second RNA moiety, leading to the generation of a chimeric DNA carrying genetic information from both copies of genomic RNA. Different causes have been proposed to be responsible for template switching (16), such as breaks on the genomic RNA (17,18) or strong pause sites during reverse transcription (13,19,20). The role of stalling of reverse transcription in response to these obstacles to DNA synthesis would be to allow a more extensive degradation of the RNA template by the RT-encoded RNase H activity (13, 19 -21).
The idea that template switching could occur preferentially at strong pause sites of reverse transcription has been supported by the correlation between a high rate of template switching, and the nearby presence of a strong pause site observed in cell-free reconstituted systems with several templates (13,19,22,23). Stalling of reverse transcription is indeed supposed to enhance the efficiency of template switching by allotting more time to the RNase H activity to degrade the donor RNA (24 -26) and by increasing the time of residence of the reverse transcriptase in the given sequence interval (23,27). However, mounting evidence indicates that preferential sites for recombination do not necessarily correlate with stalling of reverse transcription. A particularly well documented case is the occurrence of preferential transfer in structured regions of the RNA template (20,23,(27)(28)(29)(30)(31)(32)(33). In some of these cases the hairpin structures were suggested to induce stalling of reverse transcription at their base favoring the annealing of the acceptor RNA on the nascent DNA (20).
Using an original system to study recombination after a single cycle of infection of cells in culture, we recently identified a hot spot within the coding portion of the gp120 gene from the LAI strain of HIV-1, where the recombination rate per nucleotide was up to 10 times higher than those observed in the surrounding regions (34). In that work, which provided the first demonstration of the existence of a hot spot for recombination in infected cells, copy choice was studied on a 400-nt long region that was subdivided into five subregions named R1 to R5 following the sense of (Ϫ)-DNA synthesis. The hot region corre-sponded to region R2, which spanned the 58 nt that make the upper part of a hairpin (C2 hairpin), whose structure was deduced by enzymatic probing in vitro (27). In addition, by introducing mutations in the lower part of the stem in such a way as to alter the stability of the hairpin, we were able to affect the efficiency of recombination in R2. In this study, we have addressed the question of which determinants, within R2 itself, are responsible for the high degree of transfer observed.

EXPERIMENTAL PROCEDURES
DNA Constructs-The construction of the genomic plasmids and their description are fully detailed in Ref. 34. Mutagenesis of the hairpin structure was done using standard molecular biology and cloning techniques. All constructions were verified by sequencing. The transcomplementation plasmids used were pCMV⌬R8.2 (35), coding for HIV-1 gag, pol, and accessory proteins; and pHCMV-G (36), which carries the gene for the G protein of the vesicular stomatitis virus envelope. For assays with vector particles lacking accessory proteins the pCMV⌬R8.74 transcomplementation plasmid was used (37), together with the pHCMV-G plasmid.
Cells-293T cells were grown in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum, penicillin, and streptomycin (from Invitrogen), and maintained at 37°C with 10% CO 2 . MT4 cells were maintained in RPMI 1640 medium supplemented with 10% fetal calf serum and antibiotics at 37°C with 5% CO 2 .
Production of Vector Particles-HIV-1-based vectors were produced by transient transfection of 293T cells using the calcium phosphate method as described in Ref. 34. Briefly, cells were transfected with an HIV-1 encapsidation plasmid (pCMV⌬R8.2) (35) and a vesicular stomatitis virus envelope expression plasmid (pHCMV-G) (36), together with two plasmids that generate the defective genomic RNAs shown in Fig. 1A. For the generation of viral vectors devoid of accessory proteins, pCMV⌬R8.2 was replaced by pCMV⌬R8.74 (37). To eliminate non-internalized DNA from viral preparations, the supernatants from transfected cells were DNase I treated prior to concentration using Vivaspin Ultrafiltration Concentrators (molecular weight cut-off 50,000). The amount of p24 present in vector preparations was determined by using a HIV-1 p24 enzyme-linked immunosorbent assay kit (PerkinElmer Life Sciences).
Single Cycle Infection Assays-MT4 cells were transduced with 200 ng of p24 antigen per 10 6 cells (corresponding to an approximate multiplicity of infection of 20) in a 500-l volume in 35-mm dishes. Twohours post-transduction the cells were diluted up to a 4-ml volume with supplemented RPMI medium and maintained at 37°C in a 5% CO 2 incubator for 40 h. The reverse transcription products (RTP) were purified from the cytoplasmic fraction of transduced cells using the method described by Hirt (38), because most of the RTP remain in an unintegrated form and, additionally, the genomic vectors lack the FLAP sequence shown to enhance nuclear import of RTP (39). Briefly, cells were lysed, high molecular weight DNA was removed by precipitation, and the lysates were cleared by ultracentrifugation. After phenol chloroform extraction from the supernatant, low molecular weight DNA was ethanol-precipitated and purified using the NucleoSpin Extract clean-up kit (Macherey-Nagel). The purified double-stranded DNA was digested with DpnI to eliminate possible contaminating DNA of bacterial origin prior to PCR amplification (20 cycles) with primers BH and SH, and cloning in Escherichia coli. Plating on isopropyl 1-thio-␤-D-galactopyranoside/5-bromo-4-chloro-3-indolyl-␤-D-galactopyranoside (IPTG/X-gal) containing dishes allowed blue/white screening of recombinant and parental colonies, respectively (34). To determine the regions where strand transfer occurred, 48 recombinant clones were analyzed for each triplicate assay. A full panel of controls run in parallel to ascertain that the recombinant molecules were generated during reverse transcription has been previously described (34).
Estimation of the Recombination Rates-The transfection of equal amounts of pLac ϩ and pLac Ϫ plasmids leads to the production of similar quantities of each of the genomic RNAs as previously reported (34), because the same promoter sequence is present in both genomic plasmids. Given that the RNAs also share the same sequences for dimerization and encapsidation, these processes are expected to yield 50% of heterozygous, and 50% of homozygous particles, with an equal amount of lac ϩ/ϩ and lac Ϫ/Ϫ vectors, as usually assumed (40). After transduction of MT4 cells, the RTP are amplified by PCR and cloned in E. coli after digestion with BamHI and SacII. This procedure will allow cloning BamHI ϩ /lac Ϫ and BamHI ϩ /lac ϩ RTP, which can be generated by reverse transcription in homozygous lac Ϫ/Ϫ and in heterozygous particles (34). As a result, assuming that only one molecule of doublestranded DNA is generated from each viral particle, one-third of the total amount of colonies will correspond to RTP issued from lac Ϫ/Ϫ vectors. The total number of colonies is therefore multiplied by 2/3 to consider only the RTP issued from heterozygous particles. The possibility of cloning products of cellular origin was also ruled out as described in Ref. 34. The number of white colonies (N) is corrected by a factor given by n/48, where n is the number of colonies that resulted from cloning of RTP after analysis of 48 white colonies. The global frequency of recombination (F) is given by: where b is the number of blue colonies. The recombination rate per nucleotide (f) within a given interval (i) is given by: f ϭ F(x i /X)/z i , where F is as above, x i is the number of colonies analyzed where recombination was identified to have occurred within the interval considered, X is the total number of colonies on which mapping was performed, and z i is the size in nucleotides of the interval.
Primer Extension Assays-Reverse transcription was primed using a 5Ј-terminal-labeled deoxyoligonucleotide and, for each RNA analyzed, 3 pmol were annealed with the primer at a molar ratio of primer to RNA of 10:1. Annealing was performed at 65°C for 5 min in a buffer containing 50 mM Tris-HCl (pH 7.8) and 75 mM KCl, followed by slow cooling to 40°C. After incubation on ice for 2 min, dithiothreitol was added at a final concentration of 1 mM, together with 100 units of RNasin (Promega). The nucleocapsid protein (NC) was then added at a ratio of 1 molecule of NC for 8 nt of total RNA and incubated at 37°C for 5 min. HIV-1 RT was added at a final concentration of 400 nM and incubated for 5 min at 37°C to allow the formation of the reverse transcription complex. The reaction was started by addition of the four dNTPs (1 mM each, final concentration) and MgCl 2 (at a final concentration of 7 mM), and stopped at various time intervals by addition of EDTA and SDS at a final concentration of 25 mM and 0.4%, respectively. The samples were incubated for 1 h at 56°C and ethanol precipitated after phenol and chloroform extraction. The products were analyzed by autoradiography using a PhosphorImager (Amersham Biosciences) after electrophoresis on 8% (w/v) polyacrylamide gels containing 8 M urea, using a loading buffer with formamide at a final concentration of 22.5%.

Outline of the System for the Study of Recombination-Recombina-
tion was studied after a single cycle of infection using the procedure previously described (34). Briefly, the system is based in the production of vesicular stomatitis virus envelope-pseudotyped HIV-1 particles by transient transfection of 293T cells with two transcomplementation and two genomic plasmids. The first two plasmids will allow the expression of all HIV-1 proteins except those encoded by the env gene in one case, and of the G protein from the vesicular stomatitis virus envelope in the other. The two genomic plasmids will, instead, direct the synthesis of two defective genomic RNAs that will be encapsidated in the viral particles, and that share a region of homology on which recombination is studied. The viral particles are collected and used to transduce MT4 cells in culture. The rationale is to study copy choice occurring in the region of homology in heterozygous virions by following the generation of BamHI ϩ /lac ϩ RTP, as indicated in Fig. 1B. This is achieved by screening for the presence of a functional lacZ gene in E. coli after cloning these RTP, as detailed in the legend of Fig. 1. This will provide a snapshot of the products generated after a single infectious cycle in MT4 cells because, as they carry non-functional LTR sequences (Fig. 1B), they will not support transcription of new genomic RNAs. The proportion of lac ϩ bacterial clones over the total amount of colonies leads to an estimate of the frequency of recombination as detailed under "Experimental Procedures." The presence of specific mutations on the donor or on the acceptor RNA along the region of homology allows it to divide into subregions ( Fig. 1C) and, after sequencing recombinant RTP, to assess a recombination rate per nucleotide for each subregion as detailed under "Experimental Procedures." Mapping the Hot Region in the Top Portion of the C2 Hairpin-Understanding whether the whole upper half of the hairpin, or if only a portion of this region, is a preferential site for recombination constitutes a crucial issue for dissecting the mechanism of copy choice ongoing in this sequence. To this end, we have introduced, in a first instance, additional mutations in R2 that allow distinguishing between recombination occurring in the upper part of the mounting stem (R2a), in the apical loop (R2b), or in the upper part of the descending stem (R2c) ( Fig. 2A). In addition, a mutation introduced in R1 allowed it to split into two regions, one including the lower part of the mounting portion of the stem and the 20 nt that precede it (R1b), and the other region that constitutes the very beginning of the region of homology (R1a, Fig. 1C). The reason for subdividing R1 in two parts was to explore the possibility The dotted lines indicate the path followed by the reverse transcriptase to generate the parental and recombinant reverse transcription products studied in the present work. Panel B, reverse transcription products studied. Cloning of these products in E. coli is performed after PCR amplification (20 cycles) of reverse transcription products from the transduced cells with primers SH (carrying the SacII site in the non-annealing tail) and BH, and restriction with BamHI and SacII. Panel C, top part, schematic representation of the gp120 coding sequence. SP, signal peptide; C1-C5 and V1-V5, constant and hypervariable regions, respectively. The position of the sequence studied in this work is given at the bottom of the panel. The various regions that compose this sequence are shown, according to the nomenclature used in the text. The position of the components of the hairpin is also given, with S1 and S2 corresponding to the mounting and descending portions of the stem, respectively, and L to the apical loop. The gray arrow indicates the direction of (Ϫ)-DNA strand synthesis. FEBRUARY 3, 2006 • VOLUME 281 • NUMBER 5 that the presence of the hairpin enhanced template switching by inducing stalling of the reverse transcription at its base. If this were the case, R1b should display a high degree of transfer.

In Vivo HIV-1 Recombination Hot Spot
As shown in Fig. 3A (left panel), the frequency of recombination was quite homogeneous, with an average rate of 2.1 ϫ 10 Ϫ4 per nt (S.D. 0.6 ϫ 10 Ϫ4 ), except for region R2c that stood out with a rate five times higher than the average value (10.5 Ϯ 2.3 ϫ 10 Ϫ4 per nt). R2c is a 18-nt long segment that constitutes the first part of the descending stem, reverse transcribed after the apical loop (Figs. 1C and 2A). The R1b region did not display a rate higher than the average, indicating that potential stalling at the base of the hairpin during reverse transcription does not result in increased strand transfer. This analysis indicates that the hot region is circumscribed (spanning 18 nt out of 400) and maps in a double-stranded portion of the RNA constituted by the descending limb of the stem.
Effect of HIV-1 Accessory Proteins on the Distribution of Strand Transfer Events-Various viral accessory proteins have been shown to be present in HIV-1 cores (41), possibly participating in the formation of the reverse transcription complex and affecting the mutation rate during reverse transcription (42). To determine whether these viral proteins have any role in the generation of the identified recombination hot spot, we carried out a recombination assay on C2 RNA after generating vector particles deprived of vif, vpr, vpu, and nef proteins, as described under "Experimental Procedures." Template switching occurred at recombination rates similar to those found in the presence of the accessory proteins, as previously observed (43), and with preference for the same genomic region (Fig. 3A, right panel). Altogether, these observations suggest that the major determinants for the generation of the hot spot in R2c are not related to the presence of these cofactors and might therefore be determined by other parameters, as could be the primary sequence of this region, or its position in the C2 hairpin.
Position Effect on Recombination-The observation that only the descending part of the hairpin constitutes a hot sequence for recombination raises the question of which determinants are responsible for this behavior. We first checked whether the location of R2c in the descending portion was essential for the high efficiency of transfer observed. To this end, we constructed a mutant hairpin where the portions of R2a and R2c engaged in the formation of the upper part of the hairpin were exchanged, as shown in Fig. 2B (SW RNA). The exchange was made maintaining the 3Ј to 5Ј polarity of the swapped sequences as in the C2 RNA, in such a way that the sequences the RT will come across within R2a and R2c will be the same as in C2 RNA (Fig. 2B). Based on folding prediction analyses of the 118 nt that constitute the hairpin using the m-fold program (44), these rearrangements should result in the formation of a hairpin similar to the one of C2 RNA, and no alternative structures were predicted to be formed (supplementary materials). As shown in Fig. 3B (SW RNA), the rate of recombination in R2c decreased by a factor 15 (from 10.5 to 0.7 ϫ 10 Ϫ4 template switching events per nt), and no longer constituted a hot spot. In parallel, positioning the sequence R2a in the descending portion of the SW structure turned it into a recombination hot spot displaying a recombination rate per nucleotide 20 times higher than that observed on the same sequence in C2 RNA (23.9 versus 1.2 ϫ 10 Ϫ4 per nt). These observations suggest that the descending part of the stem constitutes a crucial position for the high recombination rate observed in the hot spot. Noteworthy, the frequency found in R2a with SW RNA is approximately twice (Fig. 3) that found in the same region of the hairpin on C2 RNA, where that position was occupied by region R2c (Fig. 2). This suggests that the primary structure of the region influences the efficiency of recombination also, although at a lesser extent than its position in the secondary structure.
To further investigate the contribution of the position effect and the RNA sequence on the efficiency of recombination suggested by these  arrows). The inset provides a schematic representation of the hairpin, with regions R2a and R2c shown as black and red solid arrows, respectively. Panels B and C, sequence of the SW and SWap constructions in regions R2a and R2c. These sequence rearrangements were done both on donor and acceptor RNAs. As for panel A, the bases that differ between these two RNA populations, used for mapping purposes, are given in a yellow background. The insets give, as in panel A, a schematic representation of the hairpins where the black and red arrows indicate the orientation of R2a and R2c sequences, respectively, within the hairpin.
results, we generated a new mutant where swapping of R2a and R2c was performed using the antiparallel sequences. In this way, while the base content of this portion of the hairpin remains unaltered, the primary structure within these regions is different than in SW RNA (SWap, Fig.  2B). This mutant RNA maintained, as judged by prediction analyses, the ability to fold in a structure comparable with the one of the C2 hairpin, albeit two other potential folding configurations (with a comparable free energy) were predicted in this case, indicating a possible equilibrium among the three forms (see supplementary materials). In this RNA, no preferential spots for recombination could be highlighted among the 8 regions comprising the homologous sequence (Fig. 3B,  right panel).
The difference between these results and those obtained with SW RNA, where the region inserted in the descending part of the hairpin was turned into a hot spot, could be because of different factors, suggesting that the location of a sequence in the upper portion of the descending part of the stem per se is not sufficient to generate the hot spot. Moreover, the primary structure of the RNA in that region seems  FEBRUARY 3, 2006 • VOLUME 281 • NUMBER 5 to play a more important role than suggested by the experiment with SW RNA. In fact, it cannot be excluded that the R2a region in that construct accidentally fulfilled the same requirements as the R2c region did in C2 RNA, and that this is not the case for the R2a-ap region. Another possible explanation could be that the opportunity to generate alternative RNA secondary structures in this region (see supplementary materials) affects strand transfer.

In Vivo HIV-1 Recombination Hot Spot
Inside R2c in C2 RNA-To investigate more in depth to which extent the primary structure of the RNA contributes to the high degree of transfer observed in a given region of the RNA, we focused on the wild type RNA (C2), where R2c constituted the hot region. We first attempted to define more precisely the hot region within R2c. To this end another donor RNA was made, carrying a mutation in a median location of R2c (m10 RNA), to split this region in approximately two halves. In this mutant the U residue at position 10 of the hot spot was replaced by a C (Fig. 2A), allowing to preserve the possibility of forming the hairpin by replacing the U-G pairing of C2 RNA with a C-G pairing in the mutant. Strikingly, the use of this RNA in combination with C2 as acceptor template led to a remarkable drop in the occurrence of recombination in the whole R2c region with respect to the C2 RNA (from 10.5 to 0.7 ϫ 10 Ϫ4 per nt). Recombination rates for each of the other regions were not significantly changed (not shown). We reasoned that two possible explanations could be provided for this result. One could be that sequence determinants required for efficient switching on the donor RNA were abolished in m10 RNA. In particular the interruption of a poly(U) stretch could have been detrimental to the transfer process, in accord to what has been previously suggested (45). An alternative explanation could have been that the presence of a base substitution between donor and acceptor RNA in the region of transfer severely affected the efficiency of the process. On this basis, we examined the contribution of the primary structure of the RNA and sequence divergence between donor and acceptor RNA on the transfer process.
Role of the Primary Structure of the RNA in R2c-We first addressed the issue of the role of the primary structure on the efficiency of transfer. This was achieved by constructing four mutant RNAs, each carrying two consecutive base substitutions in different regions of R2c. The mutations were introduced on the donor and acceptor RNAs in such a way as to maintain a perfect identity between these two RNA moieties inside the R2c region.
Four mutants were constructed, targeting the part near the apical loop (m2-3, see Fig. 1A), the small lateral loop in the descending stem (m6 -7), the stretch of four uridine residues in the central part of the hot spot (m10 -11), and the sequence close to the border with region R3 (m13-14). In the three cases where the mutated residues were located in a double-stranded portion, the mutations were introduced in such a way as to maximize the likelihood of preserving the hairpin structure by introducing compensative mutations in R2a. In all cases these mutations were, accordingly, predicted to preserve the possibility of forming a hairpin as the one in C2. However, in two cases (m6 -7 and m10 -11) alternative conformations were also predicted to be likely (see supplementary materials). The four mutants yielded different results with the pattern of recombination essentially not altered with m2-3 and m13-14, and a 2-3-fold decrease in the efficiency of transfer in R2c for m6 -7 and m10 -11. As a result, no significant preferential sites for recombination were present on these two RNAs. It is worth noting that, instead, the compensatory mutations introduced in R2a did not modify the recombination rate in this region in any of the mutants (Fig. 3C).
These results suggest that sequence specificity determinants influence the efficiency of recombination, confirming the observations made with SWap RNA. The fact that the recombination rates within R2c were not significantly affected when residues 2-3 and 13-14 were mutated (Fig. 3C) could be interpreted as an indication that these residues are located outside the actual hot region. However, this conclusion would rely on the assumption that all mutations within the region important for transfer have a negative effect on template switching. No rationale supports such an interpretation, since mutations could in principle also be neutral or beneficial for strand transfer. We therefore reasoned that another likely explanation could be that the substitutions introduced in m6 -7 and m10 -11 modified crucial parameters for the efficiency of transfer that were not affected in m2-3 and m13-14. These hypotheses were tested below.
Primary Structure and Kinetics of Reverse Transcription-As mentioned above, a likely parameter that can affect copy choice is constituted by the pattern of pausing during reverse transcription. Based on in vitro primer extensions assays, pausing of DNA synthesis has been related in several cases to the efficiency of strand transfer in vitro (13,19,20,22,23,28). The kinetic of reverse transcription was therefore analyzed on the two RNAs carrying double point mutations for which recombination in R2c was decreased with respect to C2 RNA together with wild type RNA and, for comparison, a mutant RNA where recombination was not affected (m13-14). Reverse transcription was performed in the presence of HIV-1 NC, a nucleic acid chaperon protein tightly bound to the genomic RNA during reverse transcription in vivo.
An overview of the pausing pattern along the region of homology used in the recombination assays is presented in Fig. 4. The kinetics of reverse transcription observed here were similar along the four RNAs, with no particular signature of the pausing pattern in R2c compared with other regions. Only minor differences between the four RNAs were found, almost exclusively at the level of the regions harboring the mutations (region R2c for m6 -7 RNA, and regions R2a and R2c for m10 -11 and m13-14 RNAs, Fig. 3).
Repeating the same analysis at a higher resolution allowed us to estimate the degree of pausing for each of the 18 nt constituting R2c (Fig. 5). Two significant pause sites are observed in the wild type R2c region and in the three versions generated by point mutations (Fig. 5B). One site, indicated by a diamond in Fig. 5, is located near the border between R2c and R3 (nt 17-G in Fig. 5A). Its intensity is comparable in C2, m6 -7, and m13-14 RNAs, whereas it only constitutes a minor site for stalling of reverse transcription in m10 -11 RNA. The other pause site corresponds to the first of two guanine residues transcribed after the 4-U tract present in C2 RNA (position 13 G, indicated by an asterisk in Fig. 5). In this case, pausing is similar for C2, m10 -11, and m6 -7 RNAs, whereas an attenuation of stalling was observed in the m13-14 RNA.
In conclusion, this analysis does not highlight any significant correlation between the presence of pause sites and the recombination rates in these four RNAs. This is further supported by the analysis of pausing on SW and SWap RNAs (Fig. 5C), where a strong pause site was detected at the border of the region of interest in SWap RNA, where no hot spot for recombination was detected, and no significant pausing was perceived in SW, where the presence of a hot spot was evident (Fig. 3B).
Sequence Divergence within R2c-As mentioned above, the dramatic inhibition of copy choice in R2c observed when using virions carrying the pair m10/C2 RNAs (donor/acceptor, respectively) was likely indicative of a detrimental role on recombination of the local sequence divergence in proximity of the region of transfer. We therefore reasoned that using slightly divergent donor and acceptor RNAs could be of use to define the region important for template switching. The rationale was that sequence heterogeneity between donor and acceptor RNAs in the area of transfer would hamper template switching. If the mutations were introduced ahead of the position of transfer (following the sense of reverse transcription), or too far behind it, no significant effect should be observed.
We therefore made four types of virions carrying as donor RNA one of the four double mutants (m2-3, m6 -7, m10 -11, and m13-14) with C2 as acceptor RNA. In each of these cases, two different residues will be discordant between the donor and acceptor RNAs in R2c and, with the exception of m6 -7, in R2a ( Fig. 2A). In all four cases the efficiency of transfer was dramatically inhibited with respect to the use of donor and acceptor RNAs with identical sequences in R2c (Fig. 6). These results highlight that the level of identity between donor and acceptor RNAs in the region of transfer is also crucial for the efficiency of the process, and indicate that the whole region can be considered as part of the hot spot. According to this, the 3Ј OH of the nascent DNA would be transferred onto the acceptor RNA near the 5Ј end of R2c.

DISCUSSION
In the present work we characterize a recombination hot spot we previously identified in the top portion of a RNA hairpin, in a region of the HIV-1 env gene coding for the C2 portion of the gp120 protein. The hot spot has been subdivided here into three regions named R2a, R2b, and R2c, following the sense of reverse transcription, corresponding to the upper part of the mounting stem, the apical loop, and the upper portion of the descending stem, respectively ( Fig. 2A). We determine that the high rate of recombination observed in the hot spot formerly identified is because of a short portion of this structure, constituted by the R2c region, the other two regions yielding values in the same range as those observed outside the hot spot.
A series of mutant RNAs were designed to address the question of which features of this region are responsible for the high recombination rate observed. In a first instance the entire hot region was swapped in position with the sequence that was located in the mounting part of the hairpin (R2a). This was done either maintaining the original 5Ј to 3Ј orientation of the sequences, or using the antiparallel sequences. These experiments highlighted the existence of both a strong position effect within the secondary structure of the RNA and the implication of the sequence of the hot spot itself. Indeed, the location in the descending portion of the stem appeared crucial because, in both mutants, R2c underwent a 15-fold drop in the recombination rate in the ectopic position. In addition, for one of the mutants (SW RNA, Fig. 3B) a spectacular increase in recombination was observed for R2a. Altogether, these observations downplayed the role of the primary structure of the RNA in the hot spot. However, the result that, in the other mutant, the insertion of R2a in the antiparallel orientation in the descending limb of the stem failed to turn it into a hot spot (R2a-ap in SWap RNA) underscored that the RNA sequence also contributes to the efficiency of copy choice. The participation of the sequence in determining the efficiency of recombination was confirmed by the results obtained with four constructs that carried, in the R2c region, two base substitutions each, at different positions. For two mutants the ability of R2c to harbor template switching remained unaffected, whereas for two others it decreased by an approximate 3-fold factor. The decrease in recombination could not be related to any obvious change in the kinetic of reverse transcription in vitro, neither within the hot region itself, nor in the region of RNA reverse transcribed before the hot spot. Interpreting data issued from in vitro primer extension assays in light of results obtained for recombination in infected cells in culture must clearly be done with caution, because the extension assays might not accurately reflect the pattern of reverse transcription in an infected cell. However, even in the presence of the HIV-1 NC protein, probably the most crucial cofactor for reverse transcription in infected cells (46), no significant correlation between pausing during DNA synthesis and recombination was  Fig. 2). In this case, the size of these regions varies slightly for each RNA (see Fig.  2). The gray arrows in panels A-C indicate the direction of the (Ϫ)-DNA strand synthesis. The gel is representative of three independent primer extension assays. FIGURE 6. Sequence similarity within the hot spot and recombination in R2c. The recombination rates per nucleotide in R2c using m2-3, m6 -7, m10 -11, and m13-14 as donor RNAs and C2 as acceptor RNA are given by black bars. For comparison, the values found when for each mutant donor RNA the corresponding mutant acceptor RNA was used, are given by gray bars, whereas that found for the wild type sequence C2 (both as donor and acceptor RNAs) is given by the dotted line (these values are the same given in Fig. 3 for region R2c). revealed here. Moreover, the use of virions deprived of the viral accessory proteins vif, vpr, nef, and vpu, affected neither the frequency nor the positions where recombination occurred along the RNA, suggesting that these proteins do not play a major role in the transfer process. Consequently, even if these accessory proteins are not present during the primer extension assays performed here, it is likely that they do not modify the kinetic of reverse transcription in vivo in such a way as to affect the frequency and position of template switching events.
Various mechanisms have been proposed to be responsible for template switching within RNA hairpins, essentially based on observations made in reconstituted in vitro systems (20,27,28,30,31). Here we report the identification of a preferential region for copy choice in a circumscribed double-stranded region of the acceptor RNA, which adds to the problem of melting the nascent heteroduplex, existing for any transfer event, that of opening the acceptor RNA for accepting an incoming DNA strand. We proposed (27) that the formation of compensative base pairings as in the case of branch migration occurring during DNA-DNA recombination (47) could circumvent this obstacle. In the model proposed (outlined in Fig. 7), the hairpin on the acceptor RNA would open as a zipper, a transition that would be favored by the formation of two other double-stranded structures (27). As highlighted in Fig. 7, this would only be possible once DNA polymerization has proceeded through the loop region, consequently restricting template switching to the descending portion of the hairpin. This prediction is consistent with the observations made in this study, where the hot region was always located in the descending limb of the stem.
Although this model remains speculative in its details, the occurrence of strand exchange between a DNA/RNA heteroduplex and a doublestranded acceptor RNA necessarily implies a delicate equilibrium between the conformations of both these reactants. How can the substitution of only two residues in the hot region, as in m10 -11 and m6 -7 RNAs, influence this process? One possibility could be that the disparity in the recombination rates observed with the different RNAs reflects different efficiencies of the migration process. Indeed, branch migration has recently been shown to be highly non-uniform with some sequences resulting in stable conformations in which migration is almost blocked, alternated to sequences where migration is fast (48). Such discrepancies could account for the differences in recombination observed with m6 -7 and m10 -11 mutants.
However, the mutations present in one of the aforementioned mutants (m6 -7) involved two bulged nucleotides and it is therefore expected to affect branch migration only marginally. In the context of the infected cell, viral and, possibly, cellular proteins assist the reverse transcription process (42). In particular, the viral NC protein is expected to modulate the equilibrium between the various structures of the nucleic acids in the transfer process (46). In this regard, it has been shown that the identity of bulged residues as well as subtle sequence changes in the double-stranded region of hairpins can trigger important differences in the stability of these structures, both at the RNA and cDNA levels, and in the modulation of template switching by NC (49 -52). Here, it is possible that the identity of the residues present in the bulge could influence template switching by such a mechanism.
Another possible explanation, not exclusive with the previous ones, is related to the folding of the RNA in the hot region. The different mutants used in this study were designed in such a way as to preserve the possibility for the RNA to fold into a hairpin similar to the one found in C2 RNA, whose structure has been previously determined (27). However, exclusively in those cases where the hot spot was erased (m6 -7, If migration follows the yellow arrow, the result would be the structure shown in panel E, where no transfer has occurred. Migration following the blue arrow, instead, will lead to template switching, because DNA synthesis will continue on the acceptor RNA (panel F). As described in the text, the primary structure of regions R2c and R2a and the presence of discordant residues between donor and acceptor RNAs in these regions are expected to affect the migration process. FEBRUARY 3, 2006 • VOLUME 281 • NUMBER 5 m10 -11, and also for the transposition mutant SWap), this hairpin was predicted to be in equilibrium with alternative structures of comparable stabilities (see supplementary materials), suggesting that these RNAs could frequently shift from one to another of these structural conformations, decreasing the proportion of molecules under the appropriate conformation to promote efficient recombination when reverse transcription reaches this region.

In Vivo HIV-1 Recombination Hot Spot
A major effect on recombination, stronger than that of the primary structure of the hot spot, is exerted by the degree of sequence identity in the region of transfer. The most logical explanation for the dramatic inhibition of template switching, observed even with a single nucleotide of sequence discordance between donor and acceptor RNA, is that it results in a destabilization of the heteroduplex between the nascent DNA and the acceptor RNA, because of the presence of mismatched residues. This implies that whenever a mutation interferes with transfer, this is probably indicative of the participation of the mutated residues in the formation of such heteroduplex and, consequently, that transfer occurs downstream from the mutation, in the sense of reverse transcription. Because the presence of divergent residues between donor and acceptor RNAs affected recombination for all the positions tested, it can be deduced that the transfer probably occurs on a restricted area near the 5Ј end of R2c (close to the border with R3). The inhibition observed for all the mutations tested suggests that, essentially, the whole R2c region is important for the transfer. For a transfer event occurring close to the R2c/R3 border, the size of R2c well correlates with that of the region that, on the nascent DNA, is certainly in a doublestranded form, i.e. the 18 nt that separate the polymerase and the RNase H active sites of the RT (53). It should be noted that for branch migration between DNA molecules, the presence of a single mismatch is sufficient to block the process (54).
Overall, the present work indicates that at least three factors contribute to the high rate of copy choice in R2c. The secondary structure of the RNA and the presence of a perfect identity between donor and acceptor RNAs appear pivotal, triggering changes in the rate of recombination in the 15-20-fold range in the first case, and leading to the total abolishment of recombination for as little as one or two discordant residues between donor and acceptor RNAs. The third factor, contributing to a lesser extent to the efficiency of transfer (effect in the 2-3-fold range) is constituted by the primary structure of the genomic RNA. In view of these observations, successful copy choice in infected cells appears to be the outcome of a delicate thermodynamic equilibrium.