HIV Integrase, a Brief Overview from Chemistry to Therapeutics*

Retroviruses are a large and diverse family of RNA viruses that synthesize a DNA copy of their RNA genome after infection of the host cell. Integration of this viral DNA into host DNA is an essential step in the replication cycle of HIV and other retroviruses (reviewed in Refs. 1–3). The integrated viral DNA is transcribed to make the RNA genome of progeny virions and the template for translation of viral proteins. Following assembly, virions bud from the cell surface and subsequently infect previously uninfected cells, thus completing the replication cycle. An infecting retrovirus introduces a large nucleoprotein complex into the cytoplasm of the host cell. This complex, which is derived from the core of the infecting virion, contains two copies of the viral RNA together with a number of viral proteins, including reverse transcriptase and integrase. Reverse transcription of the viral RNA occurs within the complex to make a double-stranded DNA copy of the viral genome, the viral DNA substrate for integration. The viral DNA remains associated with both viral and cellular proteins in a nucleoprotein complex termed the preintegration complex. One constituent of the preintegration complex is the viral integrase protein, the key player in the integration of the viral DNA into the host genome. The other components of the preintegration complex that are transported to the nucleus along with the viral DNA and integrase, and their possible functions, have not been firmly established and are not discussed here. The critical DNA cutting and joining events that integrate the viral DNA are carried out by the integrase protein itself. Here we review our current knowledge of the molecular mechanism of this reaction and discuss some of the key issues that are yet to be understood.

Retroviruses are a large and diverse family of RNA viruses that synthesize a DNA copy of their RNA genome after infection of the host cell. Integration of this viral DNA into host DNA is an essential step in the replication cycle of HIV 1 and other retroviruses (reviewed in Refs. [1][2][3]. The integrated viral DNA is transcribed to make the RNA genome of progeny virions and the template for translation of viral proteins. Following assembly, virions bud from the cell surface and subsequently infect previously uninfected cells, thus completing the replication cycle. An infecting retrovirus introduces a large nucleoprotein complex into the cytoplasm of the host cell. This complex, which is derived from the core of the infecting virion, contains two copies of the viral RNA together with a number of viral proteins, including reverse transcriptase and integrase. Reverse transcription of the viral RNA occurs within the complex to make a double-stranded DNA copy of the viral genome, the viral DNA substrate for integration. The viral DNA remains associated with both viral and cellular proteins in a nucleoprotein complex termed the preintegration complex. One constituent of the preintegration complex is the viral integrase protein, the key player in the integration of the viral DNA into the host genome. The other components of the preintegration complex that are transported to the nucleus along with the viral DNA and integrase, and their possible functions, have not been firmly established and are not discussed here. The critical DNA cutting and joining events that integrate the viral DNA are carried out by the integrase protein itself. Here we review our current knowledge of the molecular mechanism of this reaction and discuss some of the key issues that are yet to be understood.

The Mechanism of DNA Integration
Biochemical studies have elucidated the basic chemical mechanism of integration, even though the organization of the active complex of integrase with its DNA substrates remains to be determined. We will focus on HIV integrase, but the key properties of this enzyme appear to be shared among the entire retroviral integrase family. In the first step of the integration process, two nucleotides are removed from each 3Ј-end of the viral DNA, a reaction termed 3Ј-end processing. Cleavage occurs to the 3Ј-side of a CA dinucleotide that is conserved among retroviruses, retrotransposons, and many DNA transposons, both in prokaryotes and eukaryotes. This reaction exposes the terminal 3Ј-hydroxyl group that is to be joined to target DNA (Fig. 1B). In the second step, DNA strand transfer, a pair of processed viral DNA ends is inserted into the target DNA (Fig.  1C). In the case of HIV, the sites of integration on the two target DNA strands are separated by 5 base pairs. Repair of this integration intermediate (Fig. 1D) results in a direct duplication of 5 base pairs flanking the integrated viral DNA (not shown). The repair step requires removal of the two unpaired nucleotides at the 5Ј-ends of the viral DNA, filling in the single gaps, and finally ligation. Integrase is responsible for 3Ј-processing and DNA strand transfer, but the latter repair steps are likely to be carried out by cellular enzymes (4,5). There is little specificity for the sites of integration in host DNA, and insertion can occur at essentially any location. The DNA cutting and joining steps of 3Ј-end processing and DNA strand transfer closely parallel the reactions used by many transposons to move to new sites in the genome.
The 3Ј-processing and DNA strand transfer reactions can be carried out in vitro with purified integrase, a duplex oligonucleotide that mimics one end of the viral DNA, and a divalent metal ion. This simplified in vitro system has proved to be invaluable for dissecting the biochemical mechanism of DNA integration. Stereochemical analysis of the reaction pathway has demonstrated that both 3Ј-processing and DNA strand transfer occur by a one-step transesterification mechanism (6), the same result that was previously obtained for the corresponding reactions mediated by the related Mu transposase protein (7). In the 3Ј-processing reaction, water serves as the nucleophile for cleavage at the ends of the viral DNA. We may envisage that DNA strand transfer occurs by a chemically similar mechanism, except that integrase positions the 3Ј-hydroxyl groups at the ends of the viral DNA to simultaneously cleave the target DNA and make covalent connections between the viral and target DNA. Although 3Ј-processing and DNA strand transfer are very similar reactions at the chemical level, the way the active site region of integrase engages DNA substrate must differ between processing and DNA strand transfer; for the latter reaction the active sites must accommodate target DNA in addition to the viral DNA ends.

Integrase
HIV-1 integrase is comprised of three domains (Fig. 2) based on the susceptibility of the linker regions to proteolysis (8), functional studies (8 -10), and the structures of the domains (Fig. 3), which have been individually determined by x-ray crystallography or NMR.
The catalytic core domain contains the invariant triad of acidic residues, the D,D-35-E motif (8,(11)(12)(13), comprising residues Asp 64 , Asp 116 , and Glu 152 in the case of HIV-1 integrase. Mutagenesis of these residues and their counterparts in related retroviral integrases abolishes or severely diminishes all catalytic activities in parallel (8,11,14,15). By analogy with models of catalysis by DNA polymerases (16 -18), it has been proposed that coordination of divalent metal ion to these residues plays a key role in catalysis (11). The structures of catalytic domain of HIV-1 integrase (19,20) and the corresponding domain ASV integrase (21,22) have been determined by x-ray crystallography. The catalytic residues Asp 64 , Asp 116 , and Glu 152 of HIV-1 integrase and their counterparts in the ASV structures are in close proximity, coordinate divalent metal ion, and define the active site. However, the residues comprising the active site region exhibit considerable flexibility, suggesting that binding of DNA substrate is required to impose the precise configuration of residues that is required for catalysis. The structures of the HIV-1 and ASV integrase core domains are very similar to each other and to the catalytic domain of Mu transposase (23), reinforcing the parallelism of retroviral DNA integration and transposition. These structures revealed that retroviral integrases and their transposase cousins belong to a superfamily of polynucleotidyltransferases that share the same overall fold as Escherichia coli RNase H and have similar active sites (24,25).
The HIV-1 catalytic domain is dimeric in solution (26) and in the crystal structures (Fig. 3A). The extensive surface area of the dimer interface suggests that it is biologically relevant. Yet, the spacing between the active sites in the nearly spherical dimer is not compatible with the spacing between the sites of insertion on the two strands of target DNA. The sites of insertion on each strand of target DNA are separated by 5 base pairs, corresponding to about 15 Å for helical B-form DNA. The functional unit of integrase should therefore have a pair of active sites separated by a similar spacing. However, in the crystal structures (Fig. 3A), the active sites in the dimer are separated by more than 30 Å measured as a straight line through the protein and by an even greater distance measured around the circumference of the dimer. Assuming that the dimer interface is maintained in the functional integrase multimer, at least a tetramer of integrase must be required for the complete integration reaction.
The N-terminal domain of HIV-1 integrase contains a conserved pair of His and Cys residues, a motif similar to the zinc-coordinating residues of zinc fingers. Although this domain does indeed bind zinc (27,28), its structure (Fig. 3B) (29) is totally different from that of zinc fingers. It consists of a bundle of three ␣-helices (Fig. 3C) (29,30). It has an SH3 fold, although there is no known functional relationship with the SH3 domains of other proteins.
Although the core domain of integrase is clearly responsible for catalysis, the functional roles of the other two domains are less clear. The C-terminal domain binds DNA nonspecifically.

FIG. 1. DNA cutting and joining steps in retroviral integration.
A, the viral DNA (orange) made by reverse transcription is linear and blunt ended. B, in the first step of the integration process, 3Ј-end processing, two nucleotides are cleaved from each 3Ј-end of the viral DNA. C, in the next step, DNA strand transfer, the hydroxyl groups at the 3Ј-ends of the processed viral DNA attack a pair of phosphodiester bonds in the target DNA (blue). The spacing between the sites of attack on each target DNA strand is fixed and characteristic for each retrovirus. D, the resulting integration intermediate is redrawn to clarify the connections between viral and target DNA. Integrase is responsible for both the 3Ј-processing and DNA strand transfer reactions that give rise to the integration intermediate. Completion of DNA integration requires removal of the two unpaired nucleotides at the 5Ј-ends of the viral DNA, filling in the single strand gaps between host and viral DNA by a DNA polymerase, and finally ligation. These steps are likely to be carried out by cellular enzymes. Because the sites of integration into target DNA are relatively nonspecific, it has been suggested that this domain may interact with target DNA. However, experiments with chimeric integrases (31,32) assign recognition of the target site to the core domain, and cross-linking studies (33)(34)(35)(36) suggest that the C-terminal domain interacts with a subterminal region just inside the very ends of the viral DNA end. The C-terminal domain of retroviral integrases may therefore play a similar role to that of the site-specific DNA binding domain of transposases, which also recognize a subterminal sequence at the ends of the transposon DNA. The function of the N-terminal domain of integrase is at present unknown.

Unanswered Questions
Although the structures of all three domains of integrase have been individually determined, their spatial arrangement in the active complex with DNA substrate is unknown. Three retroviral integrase structures of the core together with the C-terminal domain have been recently reported in the absence of bound DNA (37)(38)(39). The spatial relationship between the core and C-terminal domains is different in each of these structures indicating considerable flexibility in the linkage between the two domains. It is likely that binding of viral DNA substrate imposes the proper configuration of domains for the reaction to occur. Although integrase exists as monomers, dimers, and tetramers at high ionic strength, formation of large aggregates under reaction conditions has frustrated attempts to directly determine the organization of the active unit. Efforts to crystallize integrase together with DNA are further challenged by the nonspecific nature of the interaction between integrase and the viral DNA ends.
Many transposases, unlike their retroviral integrase counterparts, bind specifically to the transposon ends and form discrete nucleoprotein complexes that are amenable to direct structural and functional analysis. In the case of Mu transposase, the active unit of transposase is tetramer. Within the tetramer, only two of the four active sites directly participate in the DNA cleavage and joining reactions. A dimer of transposase carries out these reactions in the case of Tn5 (40) and Tn10. 2 Thus, a dimer would seem to constitute the fundamental unit for this class of reactions, and the requirements for higher order multimers in some systems probably reflect differences in regulatory systems rather than a fundamental mechanism. The recent determination of the crystal structure of the Tn5 transposase dimer in complex with DNA substrate (40) provides a platform for modeling the interactions of DNA with the catalytic domain of integrase. Because the integrase catalytic domain is itself a dimer with an extensive interface, a similar architecture to that of the Tn5 complex would require a pair of dimers. Then, as in the case of Mu transposase, two of the active sites in the resulting tetramer would not directly participate in catalysis.
A puzzling phenomenon is the almost exclusive integration of a single viral DNA substrate into one strand of the DNA target in reactions with purified HIV integrase. In the cell and in vitro with preintegration complexes, integration is coupled so that a pair of viral DNA ends is integrated with a spacing of 5 base pairs separating the sites of insertion on the two strands of target DNA. Although inclusion of additional protein factors in the reaction has been reported to stimulate coupled integration (41,42), single-end integration events still predominate. It appears that a yet to be understood assembly pathway, and possibly additional factors, are required to reconstitute the integration reaction with the full fidelity observed with preintegration complexes in vitro and in vivo. An alternative possi-bility is that aggregation of purified integrase disfavors assembly of the correct complexes with two DNA ends poised for integration; such aggregation may normally be prevented by interaction with other components of the preintegration complex. Indeed, it has been shown that improving the solubility of the closely related Tn552 transposase greatly enhances the efficiency of double-end versus single-end strand transfer (43). Perhaps the strongest suggestion that additional host proteins may not be required is the finding that disrupted HIV-1 virions, which contain few host proteins, support efficient doubleend strand transfer of exogenous viral substrate DNA (44).

Prospects for Developing Integrase Inhibitors as Therapeutic Antiviral Agents
The development of effective inhibitors of HIV replication targeted to reverse transcriptase and protease has demonstrated the potential effectiveness of antiviral therapy for the treatment of AIDS. Drugs targeted to integrase would be a valuable complement to reverse transcriptase and protease inhibitors. However, no drugs have yet been developed that act on integrase. The bottleneck has been the absence of good lead compounds to serve as the starting point for drug development. Although many compounds have been reported to inhibit integrase, most of these lack selectivity and inhibit many other enzymes as well. A concern has also been raised that because many copies of integrase enter the cell with the infecting virus and only two integrations are required, integrase might be an intrinsically difficult enzyme to target. Recent identification of a class of a compounds that unambiguously inhibits HIV replication in cell culture by targeting integrase (45) counters this argument and demonstrates the potential of integrase as an antiviral target. The structure of a related inhibitor complexed with the active site of integrase has also recently been determined (46). However, the much higher affinity of these inhibitors for integrase in the presence of viral DNA (47) and selectivity for the DNA strand transfer reaction suggest that the binding modes in the absence and presence of DNA substrate are not identical. It will likely be necessary to determine the structure of these and other inhibitors in complex with an integrase active site engaged with DNA to understand the interaction in detail and provide a platform for drug design. Nevertheless, the foundation of basic knowledge in understanding the mechanism of retroviral integration promises to bear fruit in contributing to the fight against AIDS.