Mechanism of polypurine tract primer generation by HIV-1 reverse transcriptase

HIV-1 reverse transcriptase (RT) possesses both DNA polymerase activity and RNase H activity that act in concert to convert single-stranded RNA of the viral genome to double-stranded DNA that is then integrated into the DNA of the infected cell. Reverse transcriptase–catalyzed reverse transcription critically relies on the proper generation of a polypurine tract (PPT) primer. However, the mechanism of PPT primer generation and the features of the PPT sequence that are critical for its recognition by HIV-1 RT remain unclear. Here, we used a chemical cross-linking method together with molecular dynamics simulations and single-molecule assays to study the mechanism of PPT primer generation. We found that the PPT was specifically and properly recognized within covalently tethered HIV-1 RT–nucleic acid complexes. These findings indicated that recognition of the PPT occurs within a stable catalytic complex after its formation. We found that this unique recognition is based on two complementary elements that rely on the PPT sequence: RNase H sequence preference and incompatibility of the poly(rA/dT) tract of the PPT with the nucleic acid conformation that is required for RNase H cleavage. The latter results from rigidity of the poly(rA/dT) tract and leads to base-pair slippage of this sequence upon deformation into a catalytically relevant geometry. In summary, our results reveal an unexpected mechanism of PPT primer generation based on specific dynamic properties of the poly(rA/dT) segment and help advance our understanding of the mechanisms in viral RNA reverse transcription.

The simulations strongly suggested that the catalytic interaction between the substrate and RNase H domain introduces certain structural stress into the substrate. The severity of this stress is sequence dependent, however, what was universally observed is that the largest disruptions of the base pairing occurred for the -2 base pair. Further, the structural distortions of the upstream regions of the helix often started first at this base pair. This is likely due to its proximity to the phosphatebinding pocket which requires the helix to unwind here. Therefore, the experimental preference for the rG/dC or rC/dG base pairs at the -2 position could be explained by the higher thermodynamic stability of the rG/dC or rC/dG base pairs over the rA/dT or rU/dA. Earlier studies suggested that a preference at position -2 may rely on a base-specific minor groove interaction with the Gln475 side chain (36). While we indeed observed this interaction in the simulations, it did not seem to have a clear base pair preference.
The next sequence preference site is at the -4 base pair where the adenine is disfavored in the RNA strand. This equals to the thymine being disfavored on the DNA strand. There is no direct interaction between the protein and the -4 bases. However, there is a hydrogen bond between Tyr501 side chain hydroxyl and the phosphate between nucleotides -3 and -4 of the DNA strand. There is also a vdW contact between the tyrosine aromatic ring and the -3 DNA strand sugar ring. What we noticed in simulations is that the stability of the h-bond strongly correlates with the pucker of the -3 nucleotide on the DNA strand. Specifically, a C2'-endo region pucker is required for ideal contact (Fig. S9A). It has been suggested earlier that a sugar pucker of a DNA nucleotide is lowered in case the following nucleotide base is a thymine (42). This is because of a steric repulsion between the thymine methyl group and the atoms of the sequentially preceding sugar ring. This effect is well reproduced in the simulations where the substrates with thymine as the -4 nucleotide produce vastly different sugar pucker populations in the -3 nucleotide than those with non-thymine nucleotide (Fig.  S9B). Therefore, the simulations suggest that the experimental preference of the -4 base pair not to be an A-T can be explained in the following fashion: the presence of thymine as a -4 base likely stabilizes lower sugar pucker of the -3 deoxyribose which in turn may destabilize the important interactions with the Tyr501 side chain.

Disulphide cross-linking as a method for capturing native complex structures
The thiol-based cross-linking approach that we employed traps mechanistically relevant states of nucleic acid enzymes. This is supported by several lines of evidence. In a related manuscript (8), we characterized chemically cross-linked HIV-1 RT-substrate complexes in more detail. We tested the specificity of this approach and showed that the covalent tether can specifically form for pairs of thiol-modified protein/nucleic acid residues that are located sufficiently close to each other in the structure of the complex. Thus, this approach stabilizes native conformations. Moreover, crystal structures were solved for HIV-1 RT in complex with analogous dsDNA substrates without crosslinking (2HMI) (43) and with the Q258C cross-link (1RTD) (4). These two structures are essentially identical, confirming that the cross-link captures a native structure. For RNA/DNA substrates, a crystal structure of a complex that is cross-linked using exactly the same approach as the one applied in this work (i.e., Q258C substitution combined with modification of the sixth base from the 3′ end of the primer) is available (4PQU) (6). In fact, we used it as a starting point in our MD simulations, which was possible because this structural model was very stable in these calculations. The 4PQU structure is in a fully productive polymerase mode, including the incoming nucleotide and divalent metal ions that are bound at the active site. In a related manuscript (8), we used two different methods and showed that the cross-linked complexes indeed bind the incoming cognate nucleotide specifically and with an affinity similar to the free enzyme. This confirms the proper organization of the polymerase active site, including orientation of the template and primer end. RNase H activity is also preserved in the cross-linked complexes. Finally, the cross-linking methodology has been used to determine crystal structures of multiple protein-nucleic acid complexes of important enzymes, including glycosylases (44-47).
S-3 SUPPLEMENTARY FIGURES Figure S1. Schematic of the cross-linking chemistry. The two-carbon linker with a thiol group (blue) is tethered to the N2 atom of the guanine base. The disulfide bond that forms between the thiol group connected to the base and a cysteine residue (green) is shown in red.      Fig. S5 were quantified by densitometry and the extent of cleavage was expressed as the ratio of the fluorescence intensity of the product band divided by the total fluorescence in the lane. Global fitting of the data was performed to a pseudo-zero-order model of the reaction, analogous to radioactive decay. Data for each substrate were fitted globally to three independent measurements of each timepoint. The red line is the fit to the data and blue lines represent the 95% confidence limits for prediction band. Outlier measurements from prediction band that were removed from analysis are shown in magenta. The resulting half-life values calculated from the fit are given in Table S1 and Fig. 2.    .  Table S1. Parameters of fitting of the time-course data shown in Figure S6 and S7. Data were fit with exponential and bi-exponential model (see Experimental Procedures). Amplitude (A, α) and halflife (T 1/2 , τ ½ ) values are given with errors in parentheses. The significance of the minor reaction was assessed according to the F-test and p-value. Bi-exponential model was selected only for those fits that resulted in p-value below 10%. Parameters of selected models are shown in bold.

Exponential
Bi Cl4_5U 200 + a To verify reproducibility of the results, two independent simulations were performed for all the systems (marked as "name_a" and "name_b" in the table). Both simulations usually provided mutually very consistent results. b Simplified assessment of the DNA/RNA substrate simulation stability: "++" are substrates where we observed no base pair slippage or distortions; "+" are substrates with moderate base pair distortions but no base-pair slippage; "-" are substrates with large base pair distortions and/or slippage.