Structural Insights into the HIV-1 Minus-strand Strong-stop DNA*

An essential step of human immunodeficiency virus type 1 (HIV-1) reverse transcription is the first strand transfer that requires base pairing of the R region at the 3′-end of the genomic RNA with the complementary r region at the 3′-end of minus-strand strong-stop DNA (ssDNA). HIV-1 nucleocapsid protein (NC) facilitates this annealing process. Determination of the ssDNA structure is needed to understand the molecular basis of NC-mediated genomic RNA-ssDNA annealing. For this purpose, we investigated ssDNA using structural probes (nucleases and potassium permanganate). This study is the first to determine the secondary structure of the full-length HIV-1 ssDNA in the absence or presence of NC. The probing data and phylogenetic analysis support the folding of ssDNA into three stem-loop structures and the presence of four high-affinity binding sites for NC. Our results support a model for the NC-mediated annealing process in which the preferential binding of NC to four sites triggers unfolding of the three-dimensional structure of ssDNA, thus facilitating interaction of the r sequence of ssDNA with the R sequence of the genomic RNA. In addition, using gel retardation assays and ssDNA mutants, we show that the NC-mediated annealing process does not rely on a single pathway (zipper intermediate or kissing complex).

(1). These strand transfers are enhanced by the HIV-1 nucleocapsid protein (NC) (2)(3)(4)(5)(6). During the first strand transfer, which involves reverse transcriptase (RT) with its RNase H activity to promote degradation of the 5Ј-copied genomic RNA (donor RNA), the 3Ј-end of the minus-strand strong-stop DNA (ssDNA) is transferred to the 3Ј-end of the genomic RNA (acceptor RNA). This transfer can occur in an intra-(i.e. transfer of ssDNA to the original RNA template) or intermolecular (i.e. transfer of ssDNA to the other genomic RNA copy in the virion) manner (1). The first requirement for synthesis of ssDNA is the annealing of the human primer tRNA 3 Lys onto the primer binding site of the genomic RNA (Fig. 1A). The model of reverse transcription presumes that the first strand transfer occurs after synthesis of the full-length ssDNA. More precisely, the transfer is mediated by base pairing of the R region at the 3Ј-end of the genomic RNA with the complementary r region at the 3Ј-end of ssDNA. The effect of shortening the length of complementarity has been studied using murine leukemia virus-based vectors capable of only a single round of replication (7). This study showed that removing more than 80% of the length of 3Ј R, from 69 to 12 nucleotides, did not significantly affect the efficiency of the first strand transfer. These results cannot be extrapolated directly to HIV-1, because the R regions of murine leukemia virus and HIV-1 are different. The effect of shortening the length of 3Ј R has also been studied in HIV-1 (8). This study showed that production of viral particles was barely affected by a 3Ј R sequence of only 37 nucleotides. However, this result did not directly demonstrate that the first strand transfer was not altered, because multiple rounds of viral replications were allowed. Studies on Moloney murine leukemia virus (9), spleen necrosis virus (10), and HIV-1 (11) with mutations in R suggested that partial ssDNAs ("weak-stop" DNAs) can be transferred prior to complete reverse transcription of the R region. However, the transfer of weak-stop DNAs occurred at low frequency (10,11). In addition, the results of a study performed with a series of mutations in the 5Ј, 3Ј, or both R sequences suggested that the great majority of first strand transfers in HIV-1 occur after the synthesis of the full-length ssDNA (12). Therefore, an anti-HIV-1 therapy based on inhibition of the first strand transfer requires knowledge of the structures of the full-length ssDNA and the complete 3Ј R. Consistent with this notion, actinomycin D inhibits the first strand transfer through direct interaction with the ssDNA (13,14).
The R sequence contains the transactivator response element (TAR) and a portion of the polyadenylation signal (poly(A)) (15,16). The 3Ј-TAR sequence folds into a hairpin in the complete HIV-1 RNA genome extracted from virions (17). The 3Ј-poly(A) sequence can form a hairpin (16,18), but there is no data showing that this secondary structure is present in the complete HIV-1 RNA genome. The r sequence of ssDNA is predicted to fold into hairpins that are complementary to the TAR and poly(A) RNA sequences and are therefore named cTAR and cpoly(A), respectively (19,20). To date, the secondary structure of the full-length ssDNA has not been determined. Thus, there is no data showing that the cTAR and cpoly(A) sequences form the predicted hairpins in the context of the full-length ssDNA.
NC promotes annealing of ssDNA to the 3Ј-end of the genomic RNA because it possesses nucleic acid chaperone activity that facilitates the rearrangement of nucleic acids into the most thermodynamically stable structures containing the maximum number of base pairs (21,22). This chaperone function depends on three properties (2): 1) ability to aggregate nucleic acids; 2) weak duplex destabilization activity; and 3) rapid on-off binding kinetics. The ability of NC to increase the rate of the first strand transfer is due at least in part to its stimulatory effect on the rate of annealing of the complementary hairpins (23,24). Unfolding of the complementary hairpins is thought to be rate-limiting in the annealing process leading to the first strand transfer (23). As mentioned above, mutational analysis of the HIV-1 R sequence suggests that the great major- 3 '

'
ΔG -3.72 kcal/mol -3.72 kcal/mol = wt 140-140--160 -160  Lys primer are annealed to the PBS, a complementary sequence in the viral genome (gray line); numbering is relative to the genomic RNA cap site (ϩ1). Once annealed, RT catalyzes extension of the primer to form ssDNA (black line). The 5Ј-end of the genomic RNA is degraded (broken line) by the action of the RNase H activity of RT. The first strand transfer is mediated by base pairing of the R and r regions that are complementary. B, ssDNA and RNA used in this study. ssDNA (178 nucleotides) is the extension product of PR primer, numbering is relative to the first nucleotide of ssDNA, which is complementary to nucleotide 178 of the genomic RNA; RNA ⌬3Ј (200 nucleotides) contains the 3Ј-end of the U3 sequence and the full-length R sequence. C, predicted secondary structures for the cTAR sequence in the wild-type and mutant ssDNAs. The DNA folding program of Zuker (43) was used to predict the most stable secondary structure for each cTAR sequence. The T⅐G base pair is supported by NMR analysis of the top half of cTAR (65). Mutations are shown as lowercase letters in boxes.
ity of first strand transfers occur after completion of ssDNA synthesis, i.e. the complete cTAR sequence would be required for efficient strand transfer in vivo (12). Several studies performed with the cTAR sequence (55 to 59 nucleotides) support a dynamic structure of the cTAR hairpin, involving equilibrium between both the closed conformation and the partially open "Y" conformation (20,(25)(26)(27)(28). Recently, we showed that NC slightly destabilizes the lower stem that is adjacent to the internal loop and shifts the equilibrium toward the Y conformation exhibiting at least 12 unpaired nucleotides in its lower part (29). We also showed that the apical and internal loops of cTAR are weak and strong binding sites for NC, respectively (29). To date, the interactions between NC and the full-length ssDNA have not been studied.
Because mutations in the TAR apical loop decrease the first strand transfer in vitro, Berkhout et al. (19) suggested that this process involves a "kissing" complex formed by the apical loops of TAR and cTAR hairpins. Consistent with this hypothesis, we found that efficient annealing of cTAR DNA to the 3Ј-end of the genomic RNA relies on sequence complementarities between TAR and cTAR apical loops (18). Studies using TAR RNA and cTAR DNA hairpins suggest that both the apical loops and the 3Ј/5Ј termini of complementary hairpins are the initiation sites for the annealing reaction under subsaturating concentrations of NC (30,31). In contrast, cTAR DNA-TAR RNA annealing in the presence of saturated NC depends only on nucleation through the 3Ј/5Ј termini, resulting in the formation of a "zipper" intermediate (30,32). Note that the two annealing pathways have not been demonstrated for the annealing reaction between the full-length ssDNA and the 3Ј-end of the genomic RNA.
In this study, we investigated the annealing process using ssDNA and RNA ⌬3Ј representing the 3Ј-end of the genomic RNA (Fig. 1B). Using chemical and enzymatic probes, our study is the first and the only one, to our knowledge, which determined the secondary structure of the ssDNA and the NC binding sites within this DNA.

Experimental Procedures
NC and Oligonucleotides-Full-length NC (NC(1-55)) was synthesized by the Fmoc (N-(9-fluorenyl)methoxycarbonyl)/ pentafluorophenyl ester chemical method and purified to homogeneity by HPLC (33). NC was dissolved at concentrations of 1 to 2 mg/ml in a buffer containing 25 mM HEPES (pH 6.5), 50 mM NaCl, and 2.2 mol of ZnCl 2 /mol of peptide. DNA oligonucleotides were purchased from Eurogentec. In the following oligonucleotide sequences, the uppercase let- Construction of Plasmids-Standard procedures were used for restriction enzyme digestion and plasmid construction (34). Restriction endonucleases and T4 DNA ligase were purchased from New England Biolabs. The expand high fidelity PCR system was from Roche Applied Science. Cloned sequences and mutations were verified by DNA sequencing. Plasmids pHIVCG-4 and pHIVCG8.6 contain DNA fragments of the HIV-1 genome derived from the MAL isolate (35,36). Plasmid pYC5Ј was generated by PCR amplification of linearized pHIVCG-4 with EcoRI using oligonucleotides O1 and O2. The resulting PCR product was digested with EcoRI and SalI and ligated into pHIVCG-4 digested with the same enzymes. Using oligonucleotides O3 and O4 and pYC5Ј linearized with EcoRI, plasmid pBCSL1 was generated by PCR amplification. The resulting PCR product was digested with BsaI and ligated into pYC5Ј digested with the same enzyme. Using oligonucleotides O5 and O6 and pYC5Ј linearized with AvaI, plasmid pCCinv was generated by PCR amplification. The resulting PCR product was then digested with AvaI and ligated intramolecularly. Plasmid pFC3Ј-2 was generated by PCR amplification of linearized pHIVCG8.6 with EcoRI using oligonucleotides O7 and O8. The resulting PCR product was digested with EcoRI and XhoI and ligated into pCG44 (37) digested with the same enzymes. Plasmid pFC3ЈUTR was generated by PCR amplification of linearized pHIVCG8.6 with EcoRI using oligonucleotides O8 and O9. The resulting PCR product was digested with EcoRI and XhoI and ligated into pCG44 digested with the same enzymes.
In Vitro RNA Synthesis and Purification-Plasmids pYC5Ј, pBCSL1, and pCCinv were digested with HaeIII to generate templates for in vitro synthesis of RNAs 1-415wt, 1-415SL1, and 1-415inv, respectively. These RNA transcripts start and end with authentic HIV sequences, i.e. they do not contain additional sequences resulting from DNA plasmid construction. Plasmid pFC3Ј-2 was digested with XhoI to generate the template for in vitro synthesis of RNA ⌬3Ј (200 nucleotides) that contains at the 3Ј-end a poly(A) tail (22 adenine residues) and five nucleotides corresponding to the XhoI site. Plasmid pFC3ЈUTR was digested with XhoI to generate the template for in vitro synthesis of RNA 3Ј UTR (620 nucleotides). Five g of the cleaved plasmids was transcribed with bacteriophage T7 RNA polymerase under the conditions stipulated by the Ribo-MAX TM large scale RNA production system (Promega). RNAs were purified by electrophoresis on a denaturing 5% polyacrylamide gel as described (34).
Synthesis, Labeling, and Purification of the ssDNAs and DNA Size Markers-The DNA oligonucleotides PR and PU3b1M were used as primers. For synthesis of 5Ј-end labeled ssDNAs and DNA size marker (M, 356 nucleotides), the primers were 5Ј-end labeled using T4 polynucleotide kinase (New England Biolabs) and [␥-32 P]ATP (PerkinElmer Life Sciences). Ninety pmol of 5Ј-end labeled primer (3 ϫ 10 5 cpm/pmol) and 60 pmol of RNAs 1-415 (templates for the wild-type and mutant ssDNAs), or 3Ј UTR (template for the DNA size marker, M) in 150 l of water were heated at 90°C for 3 min and frozen for 5 min on dry ice/ethanol bath. Then, 138 l of reaction buffer (final concentrations: 78 mM KCl, 1 mM DTT, and 52 mM Tris-HCl, pH 7.8) containing 30 units of HIV-1 reverse transcriptase (Worthington) was added and the sample was preincubated at 37°C for 10 min before the primer extension reaction was initiated by 6 l of MgCl 2 and 6 l of dNTPs. The final reaction contained 50 mM Tris-HCl (pH 7.8), 75 mM KCl, 2 mM MgCl 2 , 1 mM DTT, and 300 nM dNTPs. The reaction was incubated at 37°C for 1 h and terminated by addition of 200 l of 0.3 M NaOH, 0.05 M EDTA and heating at 90°C for 15 min. Then, the sample was extracted by phenol/chloroform followed by ethanol precipitation and the dried pellet was resuspended in 40 l of loading buffer A (7 M urea, 0.03% (w/v) bromphenol blue, and 0.03% (w/v) xylene cyanol). The 5Ј-end labeled ssDNAs and M were purified by electrophoresis on 6% denaturing polyacrylamide gels and isolated by elution followed by ethanol precipitation. The purified ssDNAs and M were dissolved in water and checked for purity and integrity on a 6% denaturing polyacrylamide gel.
Analysis of ssDNA Dimerization-The assays were carried out in a final volume of 10 l. The 5Ј-end labeled ssDNA (0.15 pmol at 4 ϫ 10 5 cpm/pmol) was dissolved in 7.2 l of water, heated at 90°C for 2 min, and chilled for 2 min on ice. Then 0.8 l of the renaturation buffer was added (final concentrations: 75 mM KCl, 0.2 mM MgCl 2 , and 50 mM Tris-HCl, pH 7.8) and the sample was incubated at 37°C for 45 min. The reaction mixtures were then incubated at 37°C for 15 min in the absence or presence of NC at various concentrations. The incubations were stopped by extraction with phenol/chloroform and each aqueous phase was mixed with 4 l of loading buffer B (50% (w/v) glycerol, 0.05% (w/v) bromphenol blue, 0.05% (w/v) xylene cyanol). The heat-denatured control of 5Ј-end labeled ssDNA (0.15 pmol at 4 ϫ 10 5 cpm/pmol in 10 l of water) was performed by heating at 90°C for 2 min and chilling for 2 min on ice, and mixing with 4 l of loading buffer B. Formation of homoduplexes was analyzed by electrophoresis on a 2% agarose (QA-Agarose TM , Qbiogene) gel at 4°C in 0.5 ϫ TBM (45 mM Tris borate (pH 8.3), 0.1 mM MgCl 2 ). After electrophoresis, the gel was fixed, dried, and autoradiographed as described (37).
Gel-shift Annealing Assay-The annealing assay was carried out in a final volume of 12 l. The 5Ј-end labeled ssDNA (0.15 pmol at 4 ϫ 10 5 cpm/pmol) in 4 l of water was heated at 90°C for 2 min and chilled for 2 min on ice. Then, 1.0 l of renaturation buffer (final concentrations: 75 mM KCl, 0.2 mM MgCl 2 , and 50 mM Tris-HCl, pH 7.8) and 1 l of NC (1.48 or 3.81 pmol) were added and the sample was incubated at 37°C for 15 min. Unlabeled RNA ⌬3Ј (0.45 pmol) underwent the same renaturation treatment with 1 l of NC (5 or 12.85 pmol) and then added to the refolded ssDNA. The protein to nucleotide molar ratios were 1:18 or 1:7. These ratios refer to total nucleotide concentration. The reaction mixture was then incubated at 37°C for 1, 3, 9, 15, 30, 60, or 120 min. At the end of incubations, the assays were phenol/chloroform extracted and each aqueous phase was mixed with 4 l of loading buffer B. The samples were analyzed by electrophoresis on a 2% agarose (QA-Agarose TM , Qbiogene) gel at 25°C in 0.5 ϫ TBE (45 mM Tris borate (pH 8.3), 1 mM EDTA). After electrophoresis, the gel was fixed, dried, and autoradiographed as described (37). The monomeric (m) and heteroduplex (hd) forms of ssDNA, and the high molecular weight DNA-RNA complexes (hmw) were quantified using a Typhoon TM TRIO (GE Healthcare) and Image-Quant software. The percent of heteroduplex was determined as 100 ϫ (hd/(hd ϩ m ϩ hmw)).
Structural Probing of ssDNAs-Potassium permanganate and piperidine were purchased from Sigma. Mung bean nuclease and DNase I were purchased from New England Biolabs and Promega, respectively. Structural probing of ssDNA was carried out in a final volume of 10 l. The 5Ј-end labeled ssDNA (0.15 pmol at 4 ϫ 10 5 cpm/pmol) in 7.2 l of water was heated at 90°C for 2 min and chilled for 2 min on ice. Then, 0.8 l of renaturation buffer (final concentrations: 75 mM KCl, 0.2 mM MgCl 2 , and 50 mM Tris-HCl (pH 7.8) for probing with KMnO 4 or DNase I; 75 mM KCl, 0.2 mM MgCl 2 , and 50 mM sodium cacodylate (pH 6.5) for probing with mung bean nuclease) was added and the sample was incubated at 37°C for 45 min. The reaction mixtures were then incubated at 37°C for 15 min in the absence or presence of NC at various concentrations. The samples were then incubated with 0.125, 0.25, or 0.5 units of mung bean nuclease for 15 min at 37°C or with 0.1 or 0.15 units of DNase I for 7 min at 37°C. These cleavage reactions were stopped by phenol/chloroform extraction followed by ethanol precipitation. The dried pellets were resuspended in 7 l of loading buffer A. For potassium permanganate probing, ssDNA was treated with 0.25, 0.5, 0.75, or 1 mM KMnO 4 for 1 min at 37°C. The treatment was stopped by adding 40 l of the termination buffer (0.7 M ␤-mercaptoethanol, 0.4 M NaOAc (pH 7.0), 10 mM EDTA, 25 g/ml of tRNA). DNA was then extracted with phenol/chloroform, ethanol precipitated, and dried. DNA was subjected to piperidine cleavage by resuspension of the dried pellet in 100 l of freshly diluted 1 M piperidine and heating at 90°C for 30 min. The samples were lyophilized, resuspended in 20 l of water, and lyophilized again. After a second lyophilization from 15 l of water, the samples were resuspended in 7 l of loading buffer A. Sequence markers of the labeled ssDNA were produced by the Maxam-Gilbert method (38). To identify all cleavage sites at the nucleotide level, the samples were analyzed by long and short migration times on denaturing 6 -8 and 12-14% polyacrylamide gels, respectively.
Alignment of the ssDNA Sequences-The ssDNA sequence alignments were performed using the HIV sequence database from the Los Alamos National Lab HIV. A position-weighted matrix was then computed using the open-source statistical software R from the Comprehensive R Archive Network (59). The sequence logo (39) was plotted using the seqLogo package.

Results
Synthesis, Purification, and Analysis of ssDNA-Commercial chemical synthesis of ssDNA was unattainable due to substrate length. Therefore, we synthesized the full-length ssDNA derived from the MAL isolate using HIV-1 reverse transcriptase. The 5Ј-end labeled ssDNA was synthesized and purified as described under "Experimental Procedures" section. Previous studies showed that both loose and tight duplexes can be characterized by native agarose gel electrophoresis at 4°C in the TBM buffer (40 -42). The labeled ssDNA was monomeric after incubation with or without NC and analyzed under the electrophoretic conditions described above (Fig. 2). Therefore, the annealing assays and the structural analyses were not complicated by the presence of dimeric forms of ssDNA.
Annealing of ssDNA to the 3Ј-R Region of the Genome-As mentioned under the Introduction, the annealing reaction between the full-length ssDNA and the 3Ј-end of the genomic RNA has not yet been studied. Here, we investigated the annealing process using NC, the full-length ssDNA, and RNA ⌬3Ј representing the 3Ј-end of the genomic RNA (Fig. 1B). The annealing assays were performed as described under "Experimental Procedures." Annealing of RNA ⌬3Ј to the wild-type ssDNA was barely detectable after 120 min incubation in the absence of NC (Fig. 3A, lane C1). Because the genomic RNA and ssDNA are probably folded and associated with NC during the first strand transfer in vivo, RNA ⌬3Ј and ssDNA were renatured in the presence of NC before being mixed together. Annealing time courses were performed at low (Fig. 3, A-C) and high (Fig. 3, D-F) NC concentrations. Annealing of RNA ⌬3Ј to the wild-type ssDNA was significantly increased in the presence of NC at a protein to nucleotide molar ratio of 1:18 (Fig. 3A). The average yield of heteroduplex product was 43% after 120 min incubation (Fig. 3G, black circles). NC at a protein to nucleotide molar ratio of 1:7 facilitated greatly the annealing of RNA ⌬3Ј-wild-type ssDNA (Fig. 3D). The average yield of the heteroduplex product was 75% after 120 min incubation (Fig.  3H, black circles). These results show that the wild-type ssDNA synthesized and renatured in vitro is competent for the annealing reaction and the chaperone activity of NC is required for efficient annealing of ssDNA to the 3Ј-end of the genomic RNA. To determine the role of the apical loop and the lower stem of the cTAR hairpin in the NC-mediated annealing process, two mutants were investigated (Fig. 1C). The SL1 mutant was designed such that the cTAR domain in ssDNA can form the stem-loop structure, but its apical loop cannot base pair with the apical loop of the TAR RNA hairpin. The INV mutant was designed such that the cTAR domain in ssDNA can form only the Y conformation and its lower part cannot base pair with the lower part of the TAR RNA hairpin. A band of very weak intensity migrating between the monomer and heterodimer forms was observed with the two mutants (Fig. 3, B and C). The position of this band is consistent with a heteroduplex that is shorter than the full-length heteroduplex. A likely explanation is that spontaneous cleavage occurred in a very small fraction of RNA ⌬3Ј molecules at the level of mismatched sites during the annealing process, mutant ssDNAs were thus annealed to a truncated form of RNA ⌬3Ј. Surprisingly, the rate of annealing in the presence of NC at a protein to nucleotide molar ratio of 1:18 was significantly increased with the mutants, compared with the wild-type (Fig. 3, A-C and G). The average yield of heteroduplex products was at least 65% after 120 min incubation with these mutants (Fig. 3G). The mfold program predicts that the cTAR hairpin is more stable in the INV mutant than in the wild-type (Fig. 1C). However, the results suggest that mutations in the apical loop and the lower part of the cTAR hairpin facilitated the annealing process by decreasing the stability of mutant ssDNAs. In other words, the ssDNA structure of mutants was more destabilized in the presence of a low concentration of NC than that of the wild-type. Consistent with this notion, the wild-type and mutants exhibited similar kinetic profiles in the presence of a high concentration of NC (Fig. 3H). High molecular weight complexes were clearly observed with the mutants (Fig. 3, E and F), suggesting formation of DNA-RNA multimers composed of at least three molecules. Taken together, the results show that NC-mediated annealing of ssDNA to RNA ⌬3Ј does not rely on a single pathway (kissing or zipper). The results are also consistent with the notion that the NC-mediated annealing process could be initiated through complementary sequences that do not involve the apical loop and the 3Ј/5Ј termini of the cTAR and TAR hairpins.
Strategy for Secondary Structure Determination of ssDNA-To determine the secondary structure of the full-length wildtype ssDNA, we used structural probes and the mfold program (43). Thus, the full-length ssDNA was probed with DNase I, mung bean nuclease (MBN) and potassium permanganate (KMnO 4 ). DNase I is a double-strand specific endonuclease that produces single-strand nicks (44). MBN is highly selective for single-stranded nucleic acids and single-stranded regions in double-stranded nucleic acids (45). Note that nucleotide mismatches in double-stranded DNA are poor substrates for MBN cleavage at 37°C (45,46). Due to the bulky size of DNase I and MBN, the absence of cuts in the folded DNA structure by these enzymes may also be a result of steric hindrance. In contrast, KMnO 4 is a small probe that can be used to detect all regions in the folded DNA structure that are unpaired or distorted (47,48): it is an oxidizing agent that preferentially attacks the 5,6double bond of thymine. In B-DNA, this bond is shielded by base stacking interactions and, thus, the T residues in such DNA duplexes are relatively resistant to oxidation. After treatment of ssDNA with piperidine, the DNA backbone was cleaved at the site of the modified thymines. The cleavage fragments generated by the nucleases and the KMnO 4 /piperidine treatment were analyzed by electrophoresis on denaturing polyacrylamide gels as described under "Experimental Procedures." Running Maxam-Gilbert sequence markers of ssDNA on the same gels in parallel allowed the identification of cleavage sites. Note that the nucleases cleave the phosphodiester bond and generate a 3Ј-hydroxyl terminus on 5Ј-end labeled DNA. In contrast, the KMnO 4 /piperidine treatment and Maxam-Gilbert reactions generate a 3Ј-phosphorylated terminus on 5Ј-end labeled DNA (38,49). The electrophoretic mobility of Maxam-Gilbert sequence markers is therefore slightly greater than that of fragments produced by nucleases. This slight difference is observable only with short DNA fragments.
Analysis of the ssDNA Secondary Structure in the Absence of NC-Representative examples of probing experiments without NC are shown in Fig. 4. The results of a series of independent experiments are summarized in the secondary structure of ssDNA that is the most consistent with the probing data (Fig.  5A). In the absence of NC, the DNase I cleavage pattern (Fig. 4,  A and B) is consistent with the secondary structure that we propose for ssDNA (Fig. 5A). Indeed, except moderate DNase I cleavage between G97-C98 occurring in an apical loop (Fig. 4B), all moderate and strong DNase I cleavages occurred within stems or at the ends of stems (Fig. 5A). Moderate DNase I cleavage between G97-C98 suggests the existence of transient base pairs in the apical loop. The ssDNA secondary structure is also supported by the MBN cleavage pattern (Fig. 4, C and D) because all moderate and strong MBN cleavages occurred within single-stranded regions and loops (Fig. 5A). Finally, the KMnO 4 probing results are also consistent with our secondary structure model. Thus, except the weak sensitivity of T115 to KMnO 4 , all paired thymine bases within the stems were barely (T5, T41, T45, T92, T108, T114, T130, and T131) or not reactive (T43, T80, T83, T84, T122, T154, T160, and T165) to KMnO 4 (Fig. 4, E and F). The weak sensitivity of T115 to KMnO 4 suggests that the G85⅐T115 mismatched base pair is not involved in strong stacking interactions with the surrounding base pairs. Consistent with this notion, Gogos et al. (50) showed that KMnO 4 interacts with the thymine residue of T⅐G mismatches. Residues T11, T32, T134, and T146, which are at the ends of stems or adjacent to a mismatch, exhibited moderate sensitivity to KMnO 4 because at least one side of the plane of the heterocyclic ring of these thymine residues is exposed. As expected, the reactivity level of all unpaired thymine bases toward KMnO 4 was high or moderate.
Analysis of the ssDNA Secondary Structure in the Presence of NC-To identify destabilized regions and protections induced by NC in ssDNA, we compared the enzymatic and KMnO 4 probing patterns of ssDNA in the absence or presence of increasing concentrations of NC (Fig. 6). The results of a series of independent experiments are summarized in the secondary structure of ssDNA that is the most consistent with the probing data (Fig. 5B). The DNase I cleavage patterns in the absence and presence of NC were similar (Fig. 6, A and B), indicating that NC did not change deeply the secondary structure of ssDNA.
However, DNase I cleavage at the level of almost all sites increased in the presence of NC at a protein to nucleotide molar ratio of 1:18. A likely explanation is that binding of NC to singlestranded regions changed the three-dimensional folding of ssDNA and therefore the accessibility of double-stranded regions. In the presence of NC at a protein to nucleotide molar ratio of 1:3.5, DNase I cleavage at the level of all sites decreased strongly (Fig. 6, A and B, lanes 5). These results are consistent with the notion that NC at high concentrations binds the double-stranded regions of nucleic acids nonspecifically through electrostatic interactions of the basic residues with the phosphodiester backbone (18,36,51). Interestingly, NC at a protein to nucleotide molar ratio of 1:18 induced specific protection against DNase I at the level of C129 (Fig. 6B, lane 2). This suggests that the nucleotides encompassing residue C129 constitute a strong binding site for NC or undergo a local rearrangement upon NC binding.
MBN cleavage at the level of 14 unpaired nucleotides (T15, A16, C17, T22, T24, A25, A27, A50, C52, A53, A59, A61, A63, and A64) did not change in the presence of NC at a protein to nucleotide molar ratio of 1:18 (Fig. 6, C and D, lanes 2), indicating that these unpaired nucleotides did not constitute strong binding sites for NC. In contrast, MBN cleavage at the level of 17 unpaired nucleotides (T34, A36, A66, A68, T69, C73, A74, A76, C89, A90, A99, A100, T105, A106, C128, A157, and C158) decreased in the presence of NC at a protein to nucleotide molar ratio of 1:18, showing that these unpaired nucleotides were less accessible to MBN. The strongest protections occurred at the level of T34, A68, T69, C73, A74, A76, A99, and C128, suggesting that these unpaired nucleotides constitute strong binding sites for NC. In the presence of NC at a protein to nucleotide molar ratio of 1:3.5, MBN cleavage at the level of all sites decreased strongly (Fig. 6, C and D, lanes 5), suggesting that the single-stranded regions were almost completely covered by NC. MBN cleavage at the level of G38, A94, A95, A141, C148, and C149 increased in the presence of NC at a protein to nucleotide molar ratio of 1:18. As mentioned above, it is likely that binding of NC to single-stranded regions changes the three-dimensional folding of ssDNA and therefore the accessibility of these nucleotides. MBN cleavage at the level of A40 was slightly increased with NC at a protein to nucleotide molar ratios of 1:18, suggesting that NC destabilized the T11-A40 base pair that is adjacent to the internal loop. Surprisingly, NC induced moderate MBN cleavages at the level of residues A116, C138, and A139 (Fig. 6D, lane 2) that are within stems (Fig. 5B) T11   T12   T13   T14   T15   T22   T24   T29   T32   T34   T41   T45   T57   T69 T48, T49 KMnO4 u5 cpolyA cTAR KMnO4 G + A Ct Cpip   F   T69   T92   T103   T104   T105   T107  T108   T114   T115   T126   T127   T130  T131   T134   T146   T163   1 1 and 0.15 units, lanes 1 and 2). C and D, the 5Ј-end labeled ssDNA was incubated with mung bean nuclease (MBN) (0.125, 0.25, and 0.5 units, lanes 1-3). E and F, the 5Ј-end labeled ssDNA was incubated with KMnO 4 (0.25, 0.5, 0.75, and 1 mM, lanes 1-4). Lanes Ct are controls without any treatment. Lane C pip is the control without KMnO 4 treatment but with piperidine treatment. TϩC and GϩA refer to Maxam-Gilbert sequence markers. Arrows indicate the cleavage sites and the reactive thymine residues. Schematic of the ssDNA is shown alongside the gel, indicating the u5, cpoly(A), and cTAR domains. pairs. Note that MBN can cleave double-stranded DNA within AT-rich regions that exhibit "structural breathing" (45,52). Interestingly, NC at a protein to nucleotide molar ratio of 1:18 induced a strong increase in MBN sensitivity for residues A166, A167, A169, and A170, indicating that these nucleotides were unpaired (Fig. 6D, lane 2).
As previously reported (29), KMnO 4 is expected to interact with the nucleobases of unpaired thymine residues that interact with the N-terminal zinc finger of NC. Indeed, a close examination of the structures of the NC⅐DNA complexes (53,54) shows that the 5,6-double bond of the nucleobase thymine in these complexes is accessible to KMnO 4 . Therefore, in the presence of NC at a protein to nucleotide molar ratio of 1:18, the sensitivity of thymine residues to KMnO 4 did not decrease (Fig.  6, E and F, lanes 2). Unpaired residues T12, T13, T14, T15, T34, T48, T49, T69, T126, and T127 became more reactive to KMnO 4 in the presence of NC (Fig. 5B), suggesting that these thymine residues were involved in transient and weak base stacking interactions that were destabilized by NC . Consistent  with the formation of stable stems, T43, T45, T80, T83, T84,  T92, T107, T108, T114, T122, T160, and T165 did not become more reactive to KMnO 4 in the presence of NC (Fig. 5B). NC induced a slight increase in KMnO 4 sensitivity for residues T5, T41, and T154, indicating that at least one side of the plane of the heterocyclic ring of each thymine residue was exposed. This suggests that NC increased the fraying of the adjacent base pairs that are located at the ends of stems (Fig. 5B). Consistent with this notion, T11, T29, and T32 became more reactive to KMnO 4 in the presence of NC. Residue T115 became markedly reactive to KMnO 4 in the presence of NC (Fig. 6F), indicating that NC increased the accessibility of the G85⅐T115 mis- The color code used for the reactivities is indicated in the insets. ⌬G values were predicted by mfold (43). The asterisks indicate protections induced by NC at the protein to nucleotide molar ratio of 1:18. The double asterisks indicate strong protections induced by NC at the protein to nucleotide molar ratio of 1:18. The stars indicate the cleavage sites and the thymine residues where the reactivity is increased by NC at the protein to nucleotide molar ratio of 1:18. matched base pair. Interestingly, NC induced a strong increase in KMnO 4 sensitivity for residues T130, T131, and T134 that are within the lower stem of the cTAR stem-loop (Figs. 5A and 6F). These results associated with the MBN probing data strongly suggest that the lower stem of the cTAR hairpin is open in the presence of NC (Fig. 5B).
Conservation of the ssDNA Sequence and Structure-Our study was performed with the ssDNA sequence of the MAL isolate. The HIV-1 NL4-3 isolate has been extensively used in studies dealing with the reverse transcription process (see Ref. 2 and references therein). Interestingly, ssDNA of the HIV-1 NL4-3 isolate can adopt a secondary structure that is close to that of the HIV-1 MAL isolate (Fig. 7). To provide phylogenetic support for the secondary structure of ssDNA in the presence of NC, we searched for its conservation among different HIV-1 groups. First, we determined a consensus sequence for each HIV-1 group, using the LANL HIV subtype reference database. The sequences retrieved from the QuickAlign procedure were used to calculate a position weight matrix from which a sequence logo was derived for each HIV-1 group (supplemental Fig. S1). A sequence logo is drawn such that the overall height of the stack indicates the information content at each position (derived from the Shannon entropy), whereas the height of the base symbols within the stack indicates their relative frequency (39,55). The information content, which can reach a maximal theoretical value of 2 bits for nucleic acids, is usually interpreted as an indicator of evolutionary conservation. The ssDNA sequence is relatively well conserved among different isolates and groups of HIV-1 (supplemental Fig. S1). Nucleotides 15-30 (numbering is relative to HIV-1 group M) located in the u5 domain correspond to the less conserved part of ssDNA. However, this domain displays highly conserved sequences between positions 1-14, 33-48, and 73-80. There are also highly conserved sequences between positions 97-112 and 115-122 in the cpoly(A) domain. In the cTAR domain, the most highly conserved sequences correspond to nucleotides 144 -151, 153-157, 163-166, and 177-181. Conservation of residues G36, G71, and G102 supports the hypothesis that these nucleotides are within the high-affinity binding sites for NC. The sequence logos were used to predict the ssDNA secondary structures of HIV-1 groups M, N, and O (Fig. 7). Only one full-length sequence was available to predict the ssDNA secondary structure of HIV-1 group P. Interestingly, the ssDNA sequences of four groups can fold into secondary structures that are close to that of the HIV-1 MAL isolate. In all secondary structures, there are three putative strong NC binding sites that are located in single-stranded regions containing at least a guanine residue. In HIV-1 groups M, N, and O, the putative site involving residue G36 is not in a loop. In HIV-1 group M, the base pairing interaction between nucleotides C18-A19 and T35-G36 is probably not stable because it is between an internal loop and a A-A mismatch. Therefore, the TG motif could be recognized by NC. In HIV-1 groups N and O, the strong NC binding site could be in the u5 apical loop containing a TG motif.

Discussion
The first strand transfer is essential for HIV-1 replication (1). This transfer requires a base pairing interaction between the R sequence at the 3Ј-end of the genomic RNA and the r sequence at the 3Ј-end of the ssDNA (Fig. 1A). Therefore, the HIV-1 full-length ssDNA (176 -181 nucleotides in length) is a key actor in the annealing process leading to the first strand transfer. The mechanism of annealing has been investigated using model nucleic acid substrates containing at least portions of the complementary R/r sequences. The first model systems were based on annealing of an RNA molecule (81 or 148 nucleotides representing a portion of the 3Ј-end of the genomic RNA) to a DNA molecule (81 or 131 nucleotides) corresponding to a portion of the 3Ј-end of the ssDNA (23,56). Short DNAs representing the cTAR sequence (55 or 59 nucleotides) or the mini-cTAR sequence (26 or 27 nucleotides) were used in many studies investigating the annealing process (18, 20, 28 -32, 57). Only one study examined the secondary structure of ssDNA using truncated ssDNAs (50, 63, and 128 nucleotides) (58). Thus, all in vitro studies deal-ing with the ssDNA structure and the annealing process were performed with single-stranded DNAs representing truncated forms of ssDNA. Therefore, there is no data showing that the cTAR and cpoly(A) sequences form the predicted long stem-loops (19) in the context of the full-length ssDNA. Moreover, the mfold program (43) predicts that some of the most stable secondary structures of ssDNA (MAL and NL4-3 isolates) do not form the TAR stem-loop structure (data not shown). Our study is the first to investigate the secondary structure of the full-length ssDNA and its interaction with the 3Ј-end of the genomic RNA.
We used structural probes (DNase I, mung bean nuclease, and potassium permanganate) to directly determine the secondary structure of ssDNA in the absence or presence of NC. Our probing data support the folding of ssDNA into three stem-loop structures in the absence of NC (Fig. 5A). Note that a large part of the u5 sequence (29 consecutive nucleotides) is single-stranded. It is important to emphasize that the r region (sequences cpoly(A) and cTAR) does not fold into an independent domain, because six base pairs are formed between the u5  (20,27,29). Interestingly, our probing data indicate that the cTAR sequence in ssDNA adopts the partially open Y form but not the closed form (conformer 1 in Fig. 8). Consistent with this finding, the mfold program predicts that the ssDNA containing the Y form is more stable than the ssDNA containing the closed form (⌬G Y ϭ Ϫ11.43 versus ⌬G C ϭ Ϫ7.54 kcal/mol). Furthermore, the three-dimensional folding of ssDNA could prevent the formation of the closed form. Taken together the data indicate that the cTAR DNA model adopts a secondary structure (closed form) that is not formed in ssDNA.
Interestingly, a low concentration of NC (protein to nucleotide molar ratio of 1:18) increases the accessibility of the cpoly(A) and cTAR domains to DNase I (Fig. 5B), suggesting that the three-dimensional folding of ssDNA is destabilized upon binding of NC to its strong sites. Destabilization of the three-dimensional structure of ssDNA could be a first step in the NC-mediated annealing process. In other words, NC could facilitate interaction of the r sequence of ssDNA with the R sequence of the genomic RNA by increasing the accessibility of these complementary sequences. Our probing data show that NC does not change deeply the secondary structure of ssDNA. Thus, the global architecture of the u5 and cpoly(A) stem-loops is not altered in the presence of NC. The KMnO 4 and MBN reactivities indicate that NC increases the fraying of base pairs that are located at the ends of stems in the u5 stem-loop. There is no evidence that a population of ssDNA molecules is totally melted in the presence of NC at a high concentration (protein to nucleotide molar ratio of 1:3.5). Indeed, T43, T80, T83, T84, T122, T160, and T165 were unreactive to KMnO 4 (Fig. 6, E and  F). However, the lower stem of the cpoly(A) hairpin is probably destabilized during the NC-mediated annealing process, i.e. in the presence of the poly(A) RNA hairpin. Furthermore, the long stem (11 base pairs) of the cpoly(A) hairpin should be destabilized to allow synthesis of full-length plus-strand DNA. The G85⅐T115 mismatched base pair is probably involved in DNA helix destabilization by NC, because the protein increases the reactivity of residue T115 to KMnO 4 (Fig. 5B). Godet   showed that mismatched base pairs play an important role in DNA helix destabilization by NC.
Interestingly, the cTAR region undergoes a significant conformational change upon NC binding. Indeed, the lower part of the cTAR hairpin is more open in the presence of NC than in its absence (Fig. 8). This open form is supported by strong MBN cleavage sites between positions 166 and 171 and the reactivities of T126, T127, T130, T131, and T134 toward KMnO 4 (Fig.  5B). These latter MBN data indicate that the 3Ј-end of cTAR is accessible and could therefore interact with TAR. Consistent with the loss of four base pairs in the most open form, C129 is a weak DNase I cleavage site in the presence of NC (Fig. 5B), whereas it is a moderate DNase I cleavage site in its absence (Fig. 5A). Our results are in agreement with the hypothesis that NC induces the O2 conformation of the cTAR hairpin exhibiting 22 unpaired nucleotides in its lower part (20). This conformation is consistent with the hypothesis that the 3Ј/5Ј termini of the cTAR and TAR hairpins form a zipper intermediate that initiates the annealing reaction (30,32). However, we showed that mutations preventing the formation of the zipper intermediate do not affect the annealing of ssDNA to the 3Ј-end of the genomic RNA (Fig. 3). Mutations within the first 10 nucleotides of the 5Ј R sequence, i.e. the 10 last nucleotides of the cTAR sequence, produce virions that are markedly defective for reverse transcription (12). Our results suggest that this reverse transcription defect is not due to inhibition of the annealing reaction. A kissing complex formed by the apical loops of TAR and cTAR could initiate the annealing reaction (18,19,29,30). The accessibility of the cTAR apical loop to MBN increases in the presence of NC (Fig. 5B), indicating that this loop should be more accessible to the TAR apical loop. However, we show that mutations preventing the loop-loop interaction do not affect the annealing process (Fig. 3). Finally, our study suggests that the NC-mediated annealing process does not rely on a single pathway (zipper intermediate or kissing complex). It is likely that the virus uses the two pathways and perhaps a third involving the poly(A) and cpoly(A) hairpins. Indeed, our results are consistent with the "acceptor invasion" model proposed by Bambara and co-workers (60,61). In this model, RT pausing at the base of the 5Ј TAR hairpin initiates RNases H cleavages in the 5Ј poly(A) hairpin creating a gap for the invasion of the 3Ј poly(A) hairpin (acceptor RNA) and interaction with the cpoly(A) hairpin. This initiates the strand exchange by a branch migration process until the 3Ј terminus of ssDNA is fully transferred.
Identification of NC binding sites in the full-length ssDNA contributes to an understanding of the annealing mechanism. To identify the ssDNA nucleotides coated by NC, we investigated the NC-induced protection of ssDNA to enzymatic digestion. To this end, we performed a footprinting analysis, using MBN and DNase I (Fig. 6, A-D). The DNase I and MBN probing data indicate that ssDNA is almost completely covered by NC at a high concentration (protein to nucleotide molar ratio of 1:3.5). NC at a low concentration (protein to nucleotide molar ratio of 1:18) induced the strongest protections against MBN at the level of five sites (T34, A68-T69, 73C-A-C-A76, A99, and C128). The internal loop (nucleotides 156 -158) of the cTAR domain was also protected by NC, but the protection was lower than the five protections mentioned above. In a previous report, we also showed that this internal loop is a binding site for NC in a short DNA (55 nucleotides) that folds into the cTAR hairpin (29). Protection at the level of the 73C-A-C-A76 sequence is probably not due to a direct interaction with NC, because this sequence does not display the features of a NC binding site. In contrast, four of the five protected sites are within single-stranded regions containing at least a guanine residue or the TG motif that are preferential binding sites for NC (62)(63)(64)(65). Note that the binding site size of NC is 5 to 8 nucleotides (24). Taken together, the data support the model presented in Fig. 8, in which NC interacts strongly with four sites at a protein to nucleotide molar ratio of 1:18. Our study is the first to identify the high-affinity binding sites for NC in the full-length ssDNA. Interestingly, a strong binding site for NC containing residue G132 is located between the cpoly(A) and cTAR hairpins. A recent study performed with a modified version of the cTAR sequence (57 nucleotides) also suggests that residue G132 is preferentially recognized by NC (59). In contrast to this study, we did not find that residue G171 is a strong binding site for NC. Therefore, our results show that the properties of the cTAR DNA model are not identical to those of the cTAR domain in the full-length ssDNA. Our probing data are consistent with the notion that the cTAR region is more open in the presence of NC than in its absence (Fig. 5, compare A and B). We propose that the great majority of ssDNA molecules in the absence of NC fold into conformer 1 and the minority into conformer 2 (Fig. 8). An attractive hypothesis is that NC binding to the 129C-T-T-G-C133 sequence in conformer 2 shifts the equilibrium toward this conformer (Fig. 8).
Phylogenetic analysis supports the notion that ssDNAs of different HIV-1 groups can adopt similar secondary structures containing three stem-loops and three single-stranded regions (Fig. 7). Because the base pairing interaction between nucleotides 4 -11 and 41-58 (numbering is relative to HIV-1 group M) is completely conserved (supplemental Fig. S1), it may play a key role during synthesis of minus-and plus-strand DNAs. Note that the counterpart RNA of this base pairing interaction, the primer binding site 3 helix, is also conserved and may be required for efficient initiation of reverse transcription from the tRNA 3 Lys primer (66). Consistent with the hypothesis that the first strand transfer is facilitated by two loop-loop interactions (19), the upper parts of the cpoly(A) (nucleotides 92-112) and cTAR (139 -165) hairpins are highly conserved (Fig. 7). The TAR and poly(A) RNA hairpins play important roles in HIV-1 transcription and polyadenylation at the 3Ј-end of HIV-1 mRNAs, respectively (15,16). Therefore, we cannot exclude the possibility that conservation of the cTAR and cpoly(A) DNA hairpins represent conservation at the level of RNA structure, or both. Indeed, Berkhout (67) showed that the upper part of the TAR RNA hairpin is well conserved and its group also showed conservation of the poly(A) RNA hairpin (16). Consistent with these findings, there is a strong evolutionary pressure to restore proper folding of the TAR and poly(A) hairpins upon mutational disruption (68,69). An attractive hypothesis, therefore, is that the TAR and poly(A) hairpins have been conserved at both levels, DNA and RNA, for opti-mal HIV-1 replication. NC is a weak duplex destabilizer that cannot destabilize long fully base paired nucleic acid helices, it can destabilize short base paired regions (4 -8 base pairs) flanked by duplex ends, loops, bulges, or mismatches (24,70). In agreement with the weak duplex destabilizing activity of NC, the stems of ssDNAs do not exceed eight consecutive Watson-Crick base pairs (Fig. 7). This suggests a coevolutionary relationship between the ssDNA sequence and the NC activity to promote the first strand transfer and synthesis of full-length plus-strand DNA. Interestingly, previous studies support the notion of a coevolutionary relationship between the cTAR structure and the NC activity (59,71). Finally, our phylogenetic analysis suggests conservation of the four high-affinity binding sites for NC that we identified in ssDNA of the MAL isolate.
Our study is the first to determine the secondary structure of the full-length HIV-1 ssDNA in the absence or presence of NC. To our knowledge, the secondary structures of ssDNAs of other retroviruses have not been determined. Our results support a model for the NC-mediated annealing process in which the first step is the preferential binding of NC to four sites. These binding events trigger unfolding of the threedimensional structure of ssDNA, thus facilitating interaction of the r sequence of ssDNA with the R sequence of the genomic RNA. The second step is simultaneous molecular crowding of nucleic acids and weak destabilization of secondary structures leading to the DNA-RNA heteroduplex. Finally, our results suggest that initiation of the annealing process does not rely on a single pathway (zipper intermediate or kissing complex).