HIV-1 Nucleocapsid Protein Increases Strand Transfer Recombination by Promoting Dimeric G-quartet Formation*

A preferred site for HIV-1 recombination was identified in vivo and in vitro surrounding the beginning of the HIV-1 gag gene. This G-rich gag hotspot for recombination contains three evenly spaced G-runs that stalled reverse transcriptase. Disruption of the G-runs suppressed both the associated pausing and strand transfer in vitro. Significantly, this same gag sequence was able to fold into a G-quartet monomer, dimer, and tetramer, depending on the cations employed. The pause band at the G-run (nucleotide (nt) 405–409), which was predicted to be involved in forming a G-quartet monomer, diminished with increased HIV-1 nucleocapsid (NC) protein. More NC induced stronger pauses at other G-runs (nt 363–367 and nt 382–384), a region that forms a G-quartet dimer, adhering the two RNA templates. We hypothesized that NC induces the unfolding of the monomeric G-quartet but stabilizes the dimeric interaction. We tested this by inserting a known G-quartet formation sequence, 5′-(UGGGGU)4-3′, into a relatively structure-free template from the HIV-1 pol gene. Strand transfer assays were performed with cations that either encourage (K+) or discourage (Li+) G-quartet formation with or without NC. Strikingly, a G-quartet monomer was observed without NC, whereas a G-quartet dimer was observed with NC, both only in the presence of K+. Moreover, the transfer efficiency of the dimerized template (with K+ and NC) reached about 90%, approximately 2.5-fold of that of the non-dimerized template. Evidently, template dimerization induced by NC creates a proximity effect, leading to the unique high peak of transfer at the gag recombination hotspot.

Human immunodeficiency virus type 1 (HIV-1) packages two copies of a single-stranded RNA genome within each virion (1). Viral RT 2 converts these RNAs to a double-stranded DNA as a part of viral replication. In addition to the two requisite DNA strand template switching events associated with replication at both ends of the viral genome, called minus and plus strand transfer, RT also mediates DNA strand switches between RNA templates within the internal regions of the viral genome during reverse transcription (2). Presumably, when the copackaged viral genomes are genetically distinct, these template switches generate recombinant viruses through shuffling of the existing mutations along the viral genomes (3). Indeed, HIV-1 recombines three to 30 times per viral replication cycle, depending on the host cell type (4 -6). Significantly, viral recombination not only generates HIV-1 diversity worldwide but also facilitates viral escape from host immunity and development of resistance to antiviral treatments (7).
Although recombination events can occur throughout the viral genome, there is evidence for the presence of preferred sites (8 -10). Features such as genomic structures of viral RNA, genome similarity, viral and host protein factors, and selection pressures on maintaining RNA elements and viral proteins all contribute to the potential of recombination in HIV-1 (11)(12)(13)(14). A prominent hotspot for recombination was mapped previously by our group using a cell culture-based system to a 112-nt region surrounding the HIV-1 gag start codon. Briefly, the proviral DNA from COS-1 cells that were dually infected with HIV-1 NL43 and JRCSF strains was sequenced to screen for recombinant proviruses. NL43 and JRCSF differ in sequences in an average of one base substitution approximately every 25 nt, allowing the fine mapping of recombination sites (10). To understand the mechanistic basis, we recapitulated this gag recombination hotspot in vitro using a strand transfer system (15). Interestingly, we observed three evenly spaced runs of G residues (nt 363-367, 382-384, and 405-409) in the gag hotspot region, each corresponding to an RT synthesis pause during reverse transcription (15). When the G-runs were disrupted by alternative base substitutions, not only were these pauses diminished, but also the strand transfer frequency of the gag hotspot was reduced by approximately 70% (15). Some G-rich sequences have been shown to form G-quartet structures (16,17). A G-quartet consists of four G residues in a square planar array stabilized by Hoogsteen hydrogen bonds, usually coordinated by a monovalent cation (17). Possible structures can be further divided into intramolecular, bimolecular, and tetramolecular G-quartets. Because of the cation coordination and stacking interactions, G-quartets can be remarkably stable (18). G-quartet structures were initially identified to be formed by the tandem repeats of G-rich telomeric DNA sequences and later have been shown to be involved in various cellular processes that include, but are not limited to, DNA replication, RNA transcription, and genome recombination (19 -23). As an example, both Bloom's and Werner's syn-drome proteins, which belong to the RecQ family of DNA helicases, are involved in DNA replication. They bind and act on G-quartet DNA as a preferred substrate. Not surprisingly, the loss of either protein results in genomic instability, a partial consequence of G-quartet structures not being properly unwound (22). In addition, 43% of human protein-coding genes contain at least one G-quartet motif in their promoter regions, suggesting a role in mediating transcription activation or inhibition (21). Moreover, 37% of recombination hotspots within the human genome also contain at least one G-quartet motif, compared with 13.8% of the coldspots (23). Aside from the RecQ helicases, many other DNA and RNA binding proteins, such as replication protein A and Fragile X mental retardation protein, have also been shown to recognize the G-quartet motifs specifically (24 -26).
We previously reported the formation of an intramolecular G-quartet monomer structure within the G-rich gag hotspot sequence (15). Either the disruption of the G-runs in the gag hotspot or the depletion of preferential cations completely eliminated the folding of the structure. Only the sequence including and surrounding one of the G-runs (nt 405-409) within the gag hotspot region fulfills the sequence requirements for folding into a G-quartet monomer, whereas the other two runs of G residues (nt 363-367 and 382-384) have been identified in several other studies as able to fold into an intermolecular G-quartet dimer involving two strands (27)(28)(29). To further elucidate the contributions of these G-runs to the efficient recombination at the gag hotspot, we mapped the distribution of the strand transfer events in vitro with and without intact G-runs. Although we did observe a substantial decrease in the numbers of transfer events within the 112-nt gag hotspot region upon the disruption of the G-runs, an interesting yet complex phenomenon occurred that we could not explain at that time. That is, the transfer distribution in vitro of the WT gag sequence did not exhibit a similar high frequency in the hotspot region as seen in vivo. More than 60% of regional (459 bp) recombination events were mapped to the gag hotspot (112 bp) in vivo, whereas only 45.21% of regional (320 bp) transfer events were located in the gag hotspot in vitro. Almost equal amounts of transfer events occurred in the hotspot region (45.21% in a 112-nt segment) and in a highly structured region downstream of the gag hotspot (50.91% in a 123-nt segment) during DNA synthesis on the WT gag template in vitro (15). We suspected that certain factors, such as the folding of the RNA templates and the involvement of viral or host protein factors, were not accurately represented in our assays in vitro.
HIV-1 NC is a short, basic, nucleic acid binding protein with two zinc finger domains (30). It plays essential roles in nearly every step of the viral replication cycle, including genomic RNA packaging, reverse transcription initiation, minus strand transfer, and DNA integration (2,31,32). Two distinct and opposing features of NC-nucleic acid interactions have been described previously in detail: a duplex-stabilizing effect through its nonspecific interaction with phosphates and a duplex-destabilizing effect derived from the preferential binding of its zinc fingers to unpaired bases (33). Efficient viral strand transfer relies on both abilities of NC. The duplex-destabilization activity of NC disrupts RNA secondary structures that may interfere with the complementary viral genomic RNA and newly synthesized DNA annealing, and then the duplex-stabilizing activity of NC facilitates the intermolecular interactions between the complementary viral RNA and DNA (33). NC promotes strand transfer at all stages of reverse transcription: minus strand transfer, template switches during minus strand synthesis, and plus strand transfer (34 -36). Interestingly, although NC binds throughout strands, sequence-specific binding has been observed, such as to UG-or TG-rich sequences (37,38).
We anticipated that the presence of an appropriate concentration of NC, in combination with a favorable ionic environment, would promote G-quartets formation that would have potent effects on recombination. Here we present our studies in vitro examining the role of HIV-1 NC and cationic conditions on G-quartet folding and G-quartet-related strand transfers in the hotspot region of HIV-1 gag RNA.
[␥-32 P]ATP was obtained from PerkinElmer Life Sciences. Platinum Pfx DNA polymerase, Platinum TaqDNA polymerase, and a TOPO TA cloning kit were purchased from Invitrogen. The MEGAshortscript high-yield transcription kit was purchased from Ambion, Inc. (Austin, TX). P-20 Micro BioSpin 30 columns in RNase-free Tris were purchased from Bio-Rad. The Templiphi amplification kits were purchased from GE Healthcare. AccuGel 19:1 and 29:1 were purchased from National Diagnostics (Atlanta, GA). TMPyP4 was purchased from Fisher Scientific (Pittsburgh, PA).
Construction of Substrates-Genomic sequences from the NL43 and JRCSF strains of HIV-1 were amplified and cloned into pBluescript II KS(ϩ) to generate donor and acceptor constructs for RNA synthesis, respectively. In this study, gag donor pNL182-520, gag acceptor pJRC150 -502, pol donor pNL-RT3612-3773, and pol acceptor pJR-RT3541-3753 were generated as described previously (11,15). The gag sGGG-mut donor and acceptor constructs were generated using site-directed mutagenesis based on the gag WT donor pNL182-520 and gag WT acceptor pJRC150 -502, as described previously (15). The pol-G-quartet donor and acceptor constructs as well as the pol-G-quartet monomer donor construct were generated using the overlap extension PCR approach. For the pol-G-quartet donor, primers designated T7 promoter (5Ј-TAA TAC GAC TCA CTA TAG GG-3Ј) and NL43 pol-G-QUARTET (-) (5Ј-TAC CCC A AC CCC AAC CCC AAC CCC ATG TGG CTA TTT TTT GTA CTG CC-3Ј) were used to generate the 5Ј end 130-bp fragment, whereas primers designated NL43 pol-G-QUARTET (ϩ) (5Ј-ATG GGG TTG GGG TTG GGG TTG-GGG TAA GAA AGC ATA GTA ATA TGG GG-3Ј) and ws19 (5Ј-GCT TGC CAA TAC TCT GTC C-3Ј) were used to generate the 3Ј end 119-bp fragment, with the underlined sequences representing regions of overlap between the two fragments.
Amplified fragments were gel-purified and served as templates in the second round PCR with primers T7 promoter and ws19. A similar approach was used for the rest of the constructs. The pol-G-quartet acceptor, for the 5Ј end fragment, required primers T7 promoter and JRCSF pol-G-QUARTET (-) (5Ј-AAC-CCC AAC CCC AAC CCC AAC CCC AAT TGG CTA TTT TTT GCA CTG CC-3Ј) and for the 3Ј end fragment, required primers JRCSF pol-G-QUARTET (ϩ) (5Ј-TTG GGG TTG-GGG TTG GGG TTG GGG TTT GAA AGC ATA GTA ATA TGG GG-3Ј) and ws22 (5Ј-CCA TGT TTC CCA TGT TTC T-3Ј) and for the second round PCR required primers T7 promoter and ws22. The pol-G-quartet monomer donor, for the 5Ј end fragment, required primers T7 promoter and NL43 pol-G-QUARTET MONOMER (-) (5Ј-TAC CTT CCT TCC AAC-CTA GGG TAA TTT AAA TTT AGG-3Ј) and for the second round PCR required primers T7 promoter and 3Ј-NL43 pol-G-QUARTET MONOMER (5Ј-GCT TGC CAA TAC TCT GTC CAC CAT GCT TCC CAT GTT TCC TTT TGT ATT ACC-TTC CTT CCA ACC TA-3Ј). After the second round of amplification, the final PCR fragments were cloned into pBluescript II KS(ϩ) and transformed into Escherichia coli for analysis to confirm the sequence. RNA templates were synthesized by runoff transcription in vitro using a MEGAshortscript kit. Fulllength transcripts were purified on denaturing PAGE and analyzed for integrity.
Strand Transfer Assay-DNA primers "MB24" (5Ј-CCC AGT ATT TGT CTA CAG CC-3Ј) and ws19, complimentary to the 3Ј end terminus of the gag and pol donor RNA templates, respectively, were used to initiate reverse transcription. Primers were 5Ј end labeled with [␥-32 P]ATP using T4 polynucleotide kinase. Labeled primer (4 nM) was mixed with donor template (2 nM) and acceptor template (8 nM) in 50 mM Tris-HCl (pH 8.0), 50 mM KCl (or LiCl, as specified in Fig. 4C), 1 mM DTT, and 1 mM EDTA (pH 8.0). The solution was heated at 95°C for 5 min and then slow-cooled to room temperature to facilitate donor-primer annealing and any possible donor-acceptor interactions. For the strand transfer reactions with TMPyP4, 40 nM TMPyP4 was added to and incubated with the heat-annealed mixture at 37°C for 15 min. For the strand transfer reactions with NC, NCp9 was added to coat the nucleic acid 200% (100% NC coating defined as one NC binds 7 nt) if not otherwise specified (33). HIV-1 RT was then added to a concentration of 35 nM. The solution was incubated at 37°C for 5 min and followed by the addition of 6 mM MgCl 2 and 50 M dNTPs simultaneously to initiate the reaction. Strand transfer reactions were incubated at 37°C for certain times as specified and were terminated by adding the 2ϫ termination buffer containing 90% formamide, 10 mM EDTA (pH 8.0), 1% xylene cyanole, and 1% bromphenol blue. Strand transfer reaction products were then resolved on denaturing PAGE, visualized by Storm PhosphorImager, and analyzed by ImageQuant v1.2.
Transfer Distribution Assay-This assay was used to locate the positions of transfer events throughout the template homology. Briefly, transfer reactions were performed and loaded onto a denaturing PAGE, which was analyzed by autoradiography while still wet. Bands corresponding to the transfer products were excised, recovered by elution, and amplified by PCR using high-fidelity Platinum Pfx DNA polymerase. For the gag templates, primers MB24 and lg62 (5Ј-ACT GGG ATC CTG CCC AGT GTT TGT CTA C-3Ј) were used for amplification, whereas primers ws19 and ws20 (5Ј-TAA TAC GAC  TCA CTA TAG GAC ATA TCA AAT TTT TCA AGA GCC-3Ј) were used for the pol templates. The PCR products were then cloned into pCR2.1-TOPO vectors. Individual clones representing individual transfer products were amplified using the Templiphi amplification kit and then sequenced using the M13(-20) forward primer (5Ј-GTA AAA CGA CGG CCA GT-3Ј) to locate the sites for crossovers. The transfer frequency was obtained by dividing the number of transfer events between markers by the total number of transfer events throughout the template homology. Multiplying the transfer efficiency with the transfer frequency within each segment produces the transfer efficiency-corrected transfer distribution. Because the distances between markers are not the same, the transfer frequency within each segment was divided by the length of each segment to correct for distance.
G-quartet Formation Assay-The assay was performed in a 10 l of reaction volume. Both sWT and sGGG-mut gag donor RNA templates (100 M) were 5Ј end labeled with [␥-32 P]ATP. Each was incubated in 50 mM Tris-HCl (pH 8.0), 1 mM EDTA, and 100 mM salt (KCl, NaCl, or LiCl, as specified in Fig. 1) at 95°C for 5 min and then slow-cooled to room temperature. The reactions were further incubated at 37°C for at least 12 h before being loaded onto a non-denaturing PAGE. The gels were dried, visualized by Storm PhosphorImager, and analyzed by ImageQuant v1.2.

RESULTS
The HIV-1 gag Hotspot Forms Both Intra-and Intermolecular G-quartets-We previously found that an intramolecular G-quartet monomer can form from the G-runs in the RNA region that makes up the HIV-1 gag hotspot for recombination (15). In addition to this intramolecular folding, we now report the formation of both intermolecular G-quartet dimer and tetramer structures within the same G-rich gag sequence (Fig.  1A). A G-quartet dimer is formed by the association of two RNA strands, each containing at least two runs of G residues, whereas a G-quartet tetramer is formed by the association of four RNA strands, each containing at least one G-run (Fig. 1B). Using an RNA template from the gag hotspot region, indicated as sWT (Fig. 1A), we were able to observe the formation of the G-quartet dimer by non-denaturing PAGE with either one of all three tested cation conditions (K ϩ , Na ϩ , or Li ϩ ) but the formation of the tetramer only with Na ϩ (Fig. 1C). This is consistent with a previous report showing faster folding kinetics of a tetrameric G-quartet formation in the presence of Na ϩ , although the structures are more stable in the presence of K ϩ upon formation (42). Significantly, when all three G-runs were disrupted by mutagenesis (Fig. 1A), the mutant RNA template sGGG-mut formed neither dimer nor tetramer G-quartets, confirming that the intermolecular association we observed was dependent on the presence of the G-runs (Fig. 1C). It is worth mentioning that by using a non-denaturing gel assay, we cannot tell if an intermolecular G-quartet is a parallel or an anti-parallel configuration.

Effects of NC on G-quartet-related Pauses during Strand
Transfer-The G-quartets formed by the gag hotspot were shown previously to promote efficient strand transfer in vitro (15). Because the virally encoded nucleic acid chaperone NC had been found to affect strand transfer in vitro, we assessed the influence of NC on the formation of G-quartet structure and strand transfer in a gag hotspot substrate (33,41). A previously described gag RNA substrate containing the sequence from the HIV-1 primer binding site to a portion of the gag coding region was used in the strand transfer assay ( Fig. 2A) (15). The substrate system included two RNA templates, donor and acceptor, from two different HIV-1 strains, NL43 and JRCSF, respectively. NL43 and JRCSF have naturally occurring sequence variations that allowed us to define the position of every single transfer event at a resolution averaging about 25 nt. In this strand transfer system, the radio-labeled DNA primer is initially annealed to the donor RNA. The primer can extend and complete synthesis on the donor RNA without switching template, resulting in a 353-nt donor extension (DE) product, or it can transfer to and complete synthesis on the acceptor RNA, resulting in a 385-nt transfer product (TP).
The strand transfer reactions were performed either without NC or with increasing amounts of NC (200%, 400%, and 800% coating) (Fig. 2B). NC coating of 100% is defined as one NC molecule for 7 nt along an RNA or DNA template (33). We quantitated the intensities of the pause bands corresponding to  either the major hairpin-type RNA secondary structures, including primer binding sites, dimerization initiation sequence (DIS) and splicing donor, or G-quartet structures, indicated as G1 (nt 405-409), A/G (nt 382-384), and G2 (nt 363-367) (Fig. 2C). Secondary structure-related pauses were alleviated with NC, presumably because of the duplex-destabilizing activity of NC (33). Interestingly, we observed distinct and changing patterns of G-quartet-related pauses with increasing amounts of NC (Fig. 2B). The pause at G1, the site involved in the G-quartet monomer formation according to our computational prediction, diminished at 800% NC coating, whereas pauses at G2 and A/G, the sites known to be involved in G-quartet dimer formation, enhanced at 800% NC coating (Fig. 2C) (15,(27)(28)(29). It appears that NC, as reported previously in several other studies, has distinct effects on different forms of G-quartets (43,44). Specifically, the results suggested that NC stabilized the G-quartet dimer but destabilized the G-quartet monomer.
NC Contributes Substantially to Highly Efficient Transfer in the gag Hotspot-Our group previously reported a transfer profile with the WT gag template at 200% NC coating (15). However, the transfer profile in vitro did not display the unique high peak of transfers in the G-run region that had been seen in the recombination profile in vivo. The profile differences suggested that the reaction in vitro lacked some components needed to magnify the G-run region peak. After noticing the G-quartet dimer stabilization effect of NC, we realized that the NC concentration in our previous transfer distribution assays in vitro may not have been high enough to support the template dimerization, which has been known for some time as a key factor in promoting strand transfer (33,45). To confirm this idea, we performed transfer distribution assays with a WT gag template without NC or with 800% coating of NC. Briefly, transfer products were sequenced and aligned against either donor (NL43) or acceptor (JRCSF) templates. Fifteen naturally occurring sequence variations were used as markers to locate the transfer events. Without NC, only 17.12% of transfer events were located in a 112-nt-long region between markers four and nine, the gag hotspot for recombination (Fig. 3A). Significantly, the percentage of transfer events in the same region rose to 57.43% with 800% NC, approaching the recombination frequency in the gag hotspot in the previous study in vivo (Fig. 3B) (10). Although the profiles in vitro and in vivo are still not matched perfectly, this observation suggests that the gag hotspot can be effectively recreated when sufficient NC is present to promote intermolecular G-quartet formation. Because the pauses along the donor template are known to cause transfer, it is not surprising that 74.15% of transfer events without NC were located before marker four, where multiple pauses were evident, corresponding to several major hairpin structures (Fig. 3A). However, addition of 800% NC alleviated the hairpinrelated pauses, which very likely accounted for a 2.24-fold reduction in transfer in the hairpin region (Fig. 3B).
NC Protein Mediates the Switch between Monomeric G-quartet and Dimeric G-quartet-To further address the role of NC protein in G-quartet formation, we inserted a strong potential G-quartet-forming sequence (5Ј-(UGGGGU) 4 -3Ј) into a relatively structure-free template from the pol-coding region of the HIV-1 genome (46). The pol-G-quartet donor (NL43) and acceptor (JRCSF) templates share a 167-nt region of homology, with nucleotide base differences distributed at nine positions throughout the region (Fig. 4A). The G-rich sequence was inserted between markers five and six. There are several reasons why we chose this particular sequence for our studies. 1) A quadruplex of this sequence was confirmed to form readily in solution (46). 2) The sequence contains only guanosines and uracils, which are the preferred binding sites for NC (33). 3) The sequence has the potential to form either monomeric, dimeric, or tetrameric G-quartet structures. We performed strand transfer assays using this template either in the presence of K ϩ , which promotes G-quartet formation, or in the presence of Li ϩ , which disrupts G-quartet structures but not hairpin-type secondary structures (Fig. 4C). Without NC, a very strong pause band corresponding to the G-rich sequence was observed in the presence of K ϩ but not Li ϩ , indicating the formation of a G-quartet structure that paused the RT. However, very little RT passed this strong G-quartet structure, generating few fulllength products (both DE and TP). Notably, the transfer efficiency still increased from 23.27% with Li ϩ to 36.03% with K ϩ , indicating that the pause of RT did promote transfer (Fig. 4D). NC protein was then added to the transfer assays with either K ϩ or Li ϩ present (Fig. 4C). Strikingly, the transfer efficiency of the pol-G-quartet substrate jumped to almost 90.35% with NC in the presence of K ϩ . Moreover, we observed two pauses corresponding to the G-rich sequence with NC only in the presence of K ϩ but not Li ϩ , again indicating that both pauses were G-quartet-related.
The Pause Pattern Reveals That G-quartet Dimer Formation Promotes Efficient Transfer-To explain this dramatic increase in transfer efficiency and the different pattern of pausing with NC, we propose the following explanation (Fig. 4B). The inserted G-rich sequence contains four runs of G residues, four G residues per run. Because of this sequence feature, the G-rich sequence can fold into G-quartet monomer, dimer, or tetramer configurations. A G-quartet monomer is formed by repeated folding of a single polynucleotide containing four evenly spaced G-runs. As soon as the RT passes the first G-run, the G-quartet monomer structure will be disrupted, so there is only a single pause band. A G-quartet dimer can be formed by the association of two parallel or anti-parallel strands as long as each contains at least two G-runs. Such a structure formed between donor and acceptor can hold the two strands together. The G-runs are named from the 3Ј end to the 5Ј end of RNA as G-runs 1, 2, 3, and 4. For example, G-run 1 and G-run 4 on one strand can associate with G-run 1 and G-run 4 on the second strand. A similar structure can form with G-run 2 and G-run 3 on each strand. During synthesis by RT, the G-quartet formed by G-run 1 and G-run 4 will be disrupted first, followed by the G-run 2 and G-run 3, resulting in two pause bands. A G-quartet tetramer is formed by association of four G-rich strands. In our case, it will result in four pauses during the strand transfer assay because the G-rich sequence contains four runs of G residues. All of these G-quartet-related pauses can be observed in the presence of K ϩ but less well or not in the presence of Li ϩ . As another alternative, runs of G residues were proposed to stall RT on the basis of their natural template characteristics before folding into any types of structures (47,48). If this were true, four pauses should be observed regardless of which cations are present.
In the presence of K ϩ , without NC, only one pause band was observed corresponding to the G-rich sequence, but with NC, two pause bands of similar densities were observed (Fig. 4C). No G-related pausing occurred with Li ϩ . This is evidence of pausing in response to G-quartet formation and proves that NC mediated a shift from monomeric G-quartet to dimeric G-quartet. Importantly, this shift led to a jump in transfer effi-ciency to approximately 90%, demonstrating that dimerization between donor and acceptor is one of the key factors that promote transfer during HIV-1 reverse transcription. In addition, dimerization of the HIV-1 genomes may not be mediated only through DIS at the 5Ј end of the genome but also through runs of Gs throughout the HIV-1 genome.
Transfer Events Were Clustered around the G-rich Sequence in the Presence of K ϩ with NC-To better understand the correlation between the formation of a G-quartet dimer and recombination, we sequenced the transfer products of the pol-G-quartet substrate and analyzed transfer distributions in the presence of either K ϩ or Li ϩ with NC. The sequenced products were aligned against either NL43 or JRCSF sequences, which contain nine naturally occurring sequence differences along the 167-nt homology. Strikingly, in the presence of K ϩ , 63 of 69 total sequenced transfer products contained breakpoints in between markers five and seven, correlating with the G-rich sequence that was inserted between markers five and six (Fig.  5A). This result indicated that most transfer events happened either when the RT was paused by first encountering the G-quartet structure (between markers six and seven) or within the dimerization region held together by the G-quartet dimer (between markers five and six). This contrasts with the distribution in the presence of Li ϩ , where transfer events were spread out evenly throughout the homology region with slightly more transfers between markers six and seven (Fig. 5B). The transfer distribution results are in agreement with strand transfer efficiency results, demonstrating the significance of the G-quartet dimer in promoting strand transfer.
NC Disrupts the G-quartet Monomer Stabilized by a Tetramethylpyridinium Porphyrin, TMPyP4-To confirm the destabilization effect of NC on the G-quartet monomer, we designed a transfer substrate with the insertion of a potential G-quartet monomer sequence into the same pol substrate as described above (Fig. 6A). The pol G-quartet monomer substrate contains a 5Ј-UAG GUU GGA AGG AAG GUA-3Ј sequence, which was expected to fold into a G-quartet monomer in the presence of K ϩ and will pause the RT during the strand transfer assay. A donor extension assay was performed to determine the folding of the G-quartet monomer by the presence of a pause band at the corresponding site in the presence of K ϩ . However, we were not able to observe G-quartet monomer formation, possibly because of the slow folding kinetics of the structure. TMPyP4, a known G-quartet stabilization reagent, was preincubated with the annealed primer-donor to promote G-quartet monomer formation, followed by the strand transfer assay (49). The G-quartet-related RT pause was observed with 40 nM TMPyP4. Significantly, this pause was mostly alleviated with addition of 200% coating of NC, highlighting the ability of NC to destabilize the G-quartet monomer (Fig. 6B).

DISCUSSION
RNA template secondary structures have been shown to favor template switching during viral reverse transcription (2). Not only do these structures support primer DNA transfer by several mechanisms, but the NC protein is also known to influence viral recombination through modulating these RNA structures (33). In a recent study on a previously identified gag recombination hotspot in vivo and in vitro, we examined three G-runs along the gag hotspot RNA template, corresponding to RT synthesis pauses during the reverse transcription (15). These G-runs readily folded into G-quartet structures in our assays in vitro. Interestingly, the disruption of the G-quartet structures resulted in a significant drop in transfer frequency in the gag hotspot in vitro, suggesting their importance in promoting recombination (15). However, the distribution of the strand transfer events with intact G-runs mapped in vitro was not a close match with the distribution profile that we had obtained in vivo. Influential factors operative in vivo were evidently not presented correctly in our assay in vitro. These factors might  have included template folding, viral proteins, or host components.
Although the effects of NC on RNA hairpins had been characterized extensively, little was known about its interaction with G-quartets. Interestingly, an apparent divergence of the influence of NC on distinct forms of G-quartets was observed recently. A circular dichroism study demonstrated that NC destabilized DNA monomer quadruplexes formed by the sequence d(GGT TGG TGT GGT TGG) in the presence of K ϩ or Sr 2ϩ , presumably through the unstacking of the G-quartets upon protein binding (43). Significantly, this G-quartet monomer destabilization activity of NC relies on its zinc finger domain but not its basic domain because EDTA-treated NC failed to influence G-quartet folding, whereas truncated NC  lacking the N-terminal basic domain still maintained the G-quartet destabilization activity (43). NC has also been shown to exhibit a high binding affinity for but no destabilization effect on the sequence d(TTG GGG GGT ACA GTG CA) derived from the HIV-1 central DNA flap, which folds in vitro into an tetramolecular parallel DNA quadruplex (44). These observations suggest that NC affects the folding of different forms of G-quartets in distinct ways.
To assess the differential effects of NC, we designed the strand transfer substrate pol-G-quartet harboring a potential G-quartet formation sequence 5Ј-(UGGGGU) 4 -3Ј within a relatively structure-free sequence derived from the HIV-1 pol gene. We observed an RNA structure transformation from G-quartet monomer to dimer upon incubating with HIV-1 NC. Astonishingly, the efficiency of strand transfer reached approximately 90% after intermolecular G-quartet-dependent dimerization, a 2.5-fold increase compared with a control reaction with no NC, and no resultant dimerization. Two types of evidence attest that the substantial increase in the transfer efficiency results from a collaboration of RNA structure and NC, not just the presence of NC, which was previously known to enhance strand transfer (33). Firstly, NC had only a moderate enhancing effect on the transfer efficiency of the pol template without an inserted G-rich sequence. Secondly, even with the inserted G-rich sequence but in the presence of LiCl, so that G-quartet structures were not allowed to form, NC promoted the transfer of the pol-G-quartet template only by about 30%. Clearly, NC-promoted G-quartet dimer formation raised the transfer frequency, presumably by bringing donor and acceptor RNA templates into close proximity.
Examples of the enhancement of intermolecular G-quartet folding by nucleic acid chaperone proteins are not rare. The ␤-subunit of the Oxytricha telomere-binding protein enhances the rate of folding of Oxytricha telomeric repeats (T 4 G 4 ) into G-quartet dimers and tetramers by 10 5 -to 10 6 -fold (50). A prerequisite for intermolecular G-quartet formation is that either two or four strands of nucleic acids have to be brought into close proximity. The Oxytricha subunit is one of many basic proteins, including histone H1, cytochrome c, and poly-L-lysine that promote intermolecular G-quartet folding by facilitating the annealing of two complementary strands into a duplex (50). Because the presence of numerous basic amino acids is a feature even more conserved than the presence of zinc finger domains in retroviral NC proteins, it is evident why NC plays a positive role in G-quartet dimer and tetramer formation.
To understand the influence of NC on G-quartet folding, we have to consider how this protein affects nucleic acid duplex stability. NC-nucleic acid interactions produce two distinct and opposite effects on duplex stability: a weak duplex-destabilizing effect resulting from the preferential binding of the zinc fingers of NC to single-stranded nucleic acids and a duplex-stabilizing effect depending on the nonspecific polyelectrolyte interactions between phosphates and the NC N-terminal basic domain (also called the cationic domain) (33). On one hand, as discussed above, the destabilization activity of NC on the G-quartet monomer depends solely on NC zinc fingers but not the basic domain, perhaps because of the preferential binding of NC to unpaired bases, which unstacks G-quartet structures. On the other hand, the basic domain of NC was reasoned to account for G-quartet dimer and tetramer stabilization, partially because its basic characteristic can neutralize negative charges on the nucleic acid backbone, thereby facilitating intermolecular association. However, we note that the intermolecular G-quartet stabilization effect of NC is certainly not simply caused only by its basicity because it was reported that spermidine, which also effectively neutralizes negative charges on the DNA backbone, was not found to have any effect on intermolecular G-quartet folding (50).
Because we have evidence that NC selectively stabilizes G-quartet dimers but not monomers and the dimerization of the viral RNAs appears to greatly enhance strand transfer, we wanted to know how all these factors interact to form the gag hotspot for recombination. We performed the strand transfer assay using the WT gag sequence with increasing amounts of NC and observed an RT pausing pattern indicating the enhancement of G-quartet dimer formation but unfolding of the G-quartet monomer. More significantly, we were able to recapitulate the gag hotspot for recombination in vitro when the RNA templates in the strand transfer assay were coated with an amount of NC at 8-fold excess for template coating. Approximately 57% of transfer events were mapped to the gag hotspot region in vitro, compared with slightly more than 60% observed in vivo. However, only 17.12% of the transfer events were located in the gag region in the absence of NC. It should also be noted that NC was not observed to substantially redistribute the general location of transfers with many substrates or alter the general mechanisms of strand transfer (51,52). In addition, we have reported previously a 3.8-fold decrease in the number of the transfer events in the gag hotspot site when the G-runs involved in G-quartet formation were disrupted. All these data are consistent with the conclusions we drew from analysis of the pol-G-quartet substrate. Namely, both a template sequence that has the potential to fold into a G-quartet dimer and the presence of NC are critical for the formation of the gag hotspot.
On the basis of our investigation, we propose an NC-promoted G-quartet-derived mechanism that produces highly efficient strand transfer. During HIV-1 reverse transcription, both G-quartet monomers and dimers are able to stall RT synthesis. This induces RT-RNase H cleavage on the original donor RNA template, allowing strand invasion of a second acceptor RNA template to interact with the newly synthesized single-stranded DNA. The interaction propagates until transfer of the DNA strand to the acceptor RNA is complete. A key addition to this invasion mechanism is a proximity effect accomplished through a G-quartet dimer formed between donor and acceptor RNAs. Promoted by NC, the G-quartet dimer is stabilized, whereas the monomer is destabilized. This creates a physical connection between the two RNA templates. This proximity greatly enhances the efficiency of strand invasion at the point of joining and some distance into the surrounding sequence. Both mechanisms combine to create the major peak of strand transfer in the gag hotspot for recombination.
The "invasion-proximity" mechanism that we describe is not likely to be unique to the gag hotspot, and so we propose it as a general mechanism for sites of enhanced recombination in retroviruses. It clearly is also not the only mechanism of recombination because hairpins and other structures promote strand invasion. Even so, this NC-G-quartet-mediated mechanism has evolved to be a potent promoter of recombination that can contribute to the evolution of retroviruses.