Retroviral integrase, putting the pieces together.

Retroviral integrase (IN) mediates retroviral DNA integration, a critical step in viral replication that ensures stable expression of proviral genes in the infected cell and perpetuation of the viral genome in all the host cell progeny. The IN protein is both necessary and sufficient for the integration of a linear DNA with viral end sequences into a target DNA in vitro (1, 2). The integration reaction is known to take place in two distinct steps (see Fig. 1). IN specifically recognizes sequences at both ends of newly synthesized viral DNA, most likely as a component of a large subviral preintegration complex. The first step in integration, a processing reaction, can take place in the cytoplasm of infected cells (3). This reaction produces site-specific cuts near the viral DNA 39-ends, adjacent to a conserved CA dinucleotide, removing (generally) two nucleotides and exposing new 39-hydroxyl ends. The second step, a joining reaction, is a concerted cleavage-ligation reaction (4), which produces a staggered cut in cellular DNA when the newly exposed 39-hydroxyls of the viral DNA ends attack the phosphate bonds at the cellular DNA cleavage site. The product is an intermediate in which the 39-ends of viral DNA are covalently linked to cellular DNA and the 59-ends of viral DNA are flanked by short gaps (5, 6). Repair of the gaps and completion of integration, which can be accomplished by cellular enzymes, produce a short direct repeat of host target DNA. The length of this repeat is characteristic for each virus. For example, HIV-1 proviruses are flanked by 5-base pair repeats and ASV by 6-base pair repeats. These features are likely to reflect subtle differences in the structure or multimeric organization of the two integrases. Details of the biochemistry of IN and its catalytic mechanism have been uncovered mainly through the use of in vitro, reconstructed systems, which employ purified enzymes and model DNA substrates that consist of short oligodeoxynucleotide duplexes (1, 2, 7, 8). Unlike several site-specific recombinases that utilize a protein-DNA covalent intermediate, the cleavage-ligation reaction is a direct transesterification. The in-line nucleophilic attack by the processed viral 39-hydroxyls on the phosphate bond of the target DNA occurs with chiral inversion when appropriate substrates are used for detection (4). It has recently been discovered that a similar, but intramolecular, transesterification is catalyzed by another eukaryotic recombinase complex (RAG1 and RAG2) in the early steps of V(D)J recombination of immunoglobulin genes (9). Retroviral integrases range from 270 to 350 amino acids in length. Various lines of evidence indicate that both ASV and HIV-1 IN function as multimers (minimally, dimers) in vitro (10–12). However, the number and arrangement of IN protomers in the active in vivo complex are still unknown. Deduced amino acid sequence alignments, limited proteolysis, site-directed mutagenesis studies, and complementation experiments (reviewed in Refs. 13 and 14) have revealed the presence of three distinct domains that form independent folding units within each monomer. The first two of these are highly conserved among retroviral and retrotransposon integrases (Fig. 2). Here we will review what is known about the structure of each domain and discuss current ideas concerning domain interactions and multimerization, as they relate to function.

Retroviral integrase (IN) 1 mediates retroviral DNA integration, a critical step in viral replication that ensures stable expression of proviral genes in the infected cell and perpetuation of the viral genome in all the host cell progeny. The IN protein is both necessary and sufficient for the integration of a linear DNA with viral end sequences into a target DNA in vitro (1,2). The integration reaction is known to take place in two distinct steps (see Fig. 1). IN specifically recognizes sequences at both ends of newly synthesized viral DNA, most likely as a component of a large subviral preintegration complex. The first step in integration, a processing reaction, can take place in the cytoplasm of infected cells (3). This reaction produces site-specific cuts near the viral DNA 3Ј-ends, adjacent to a conserved CA dinucleotide, removing (generally) two nucleotides and exposing new 3Ј-hydroxyl ends. The second step, a joining reaction, is a concerted cleavage-ligation reaction (4), which produces a staggered cut in cellular DNA when the newly exposed 3Ј-hydroxyls of the viral DNA ends attack the phosphate bonds at the cellular DNA cleavage site. The product is an intermediate in which the 3Ј-ends of viral DNA are covalently linked to cellular DNA and the 5Ј-ends of viral DNA are flanked by short gaps (5,6). Repair of the gaps and completion of integration, which can be accomplished by cellular enzymes, produce a short direct repeat of host target DNA. The length of this repeat is characteristic for each virus. For example, HIV-1 proviruses are flanked by 5-base pair repeats and ASV by 6-base pair repeats. These features are likely to reflect subtle differences in the structure or multimeric organization of the two integrases.
Details of the biochemistry of IN and its catalytic mechanism have been uncovered mainly through the use of in vitro, reconstructed systems, which employ purified enzymes and model DNA substrates that consist of short oligodeoxynucleotide duplexes (1,2,7,8). Unlike several site-specific recombinases that utilize a protein-DNA covalent intermediate, the cleavage-ligation reaction is a direct transesterification. The in-line nucleophilic attack by the processed viral 3Ј-hydroxyls on the phosphate bond of the target DNA occurs with chiral inversion when appropriate substrates are used for detection (4). It has recently been discovered that a similar, but intramolecular, transesterification is catalyzed by another eukaryotic recombinase complex (RAG1 and RAG2) in the early steps of V(D)J recombination of immunoglobulin genes (9). Retroviral integrases range from 270 to 350 amino acids in length. Various lines of evidence indicate that both ASV and HIV-1 IN function as multimers (minimally, dimers) in vitro (10 -12). However, the number and arrangement of IN protomers in the active in vivo complex are still unknown. Deduced amino acid sequence alignments, limited proteolysis, site-directed mutagene-sis studies, and complementation experiments (reviewed in Refs. 13 and 14) have revealed the presence of three distinct domains that form independent folding units within each monomer. The first two of these are highly conserved among retroviral and retrotransposon integrases (Fig. 2). Here we will review what is known about the structure of each domain and discuss current ideas concerning domain interactions and multimerization, as they relate to function.

N-terminal Domain
The N-terminal domain, comprising approximately the first 50 amino acids, is characterized by a "zinc finger"-like motif (HHCC domain). However, the length of the loop between the histidines and cysteines is longer than that seen in the canonical, DNAbinding zinc fingers of several transcription factors. This motif is highly conserved in the integrases of all retroviruses and eukaryotic retrotransposons. When expressed independently, this N-terminal domain does not bind DNA (15,16), but it may interact with DNA in the context of the intact protein (15,(17)(18)(19). The isolated domain has been shown to bind Zn 2ϩ and to assume an ordered structure upon metal binding, as expected for a zinc finger (20). Spectroscopic studies reveal that the isolated peptide coordinates metal with a tetrahedral geometry and that mutation of the conserved histidines or cysteines abolishes formation of the metalinduced ordered structure. Recent studies applying atomic absorption spectroscopy to various HIV-1 IN fragments confirm the 1:1 binding of Zn 2ϩ to this HHCC region. 2 Related Zn 2ϩ binding experiments also suggest a role for this region, together with the C terminus, in promoting tetramerization. 2 This agrees with previously reported evidence that the N and C termini of HIV-1 IN are likely to be in close proximity in the native structure (21). Other studies have suggested a role for this domain in determining the aggregation properties of HIV-1 IN (22).
Deletion of the zinc finger motif has little or no effect on the processing and joining activities of ASV IN as measured in vitro using assays that detect single-end cleavage and joining events (23,24). However, the high conservation of this motif and results from genetic experiments suggest that this domain is functionally important. Further biochemical studies that explore the role of this domain in coordinated cleavage and joining events may reveal its participation in interdomain or intersubunit interactions that occur in vivo.

Catalytic Core Domain
The catalytic domain, included in a central region of approximately 150 amino acids, is characterized by the D,D(35)E motif, a constellation of three invariant acidic amino acids, the last two separated by 35 amino acids in integrase proteins. These acidic residues are required for all catalytic functions of IN and have been proposed to bind the essential metal cofactor(s), Mn 2ϩ or Mg 2ϩ (15,25). The recently solved crystal structures of the catalytic domains of HIV-1 IN (26) and ASV IN (27) support this view. These structures reveal that IN proteins are members of a large superfamily of nucleases and polynucleotidyltransferases (reviewed in Refs. 28 -30) that includes RNases H from Escherichia coli and HIV-1 (31,32), the E. coli Ruv C resolvase (33), and bacteriophage Mu A transposase (34). Catalytic and structural similarities to the type II restriction endonucleases have also been noted (35). Fig. 3 (top) shows ribbon models of monomers of the catalytic domains of HIV-1 and ASV. The ASV model includes a bound divalent cation (Mg 2ϩ or Mn 2ϩ ) observed after soaking crystals in the appropriate solutions (36). The overall topology of the two domains is quite similar (root mean square deviation of 1.4 Å for 107 C ␣ pairs) and resembles that of other members of the superfamily. The characteristic features are: (a) a prominent mixed * This minireview will be reprinted in the 1996 Minireview Compendium, which will be available in December, 1996. Work from this laboratory was supported by National Institutes of Health Grants CA-47486 and CA-06927, a grant for infectious disease research from Bristol-Myers Squibb Foundation, and also by an appropriation from the Commonwealth of Pennsylvania.
‡ ␤-sheet (five-stranded in the integrases) with three long anti-parallel ␤-strands (␤1, ␤2, and ␤3) that are flanked on either side by ␣-helices and (b) a juxtaposition of the essential acidic residues that are presumed to form the catalytic center.
Because of the differences in the metal complexes of E. coli and HIV-1 RNase H (31,32,37), it is still uncertain whether one or two metal ions are required for catalysis by the members of this superfamily. In the original proposal of the two-metal mechanism (38), one metal plays a role in promoting the formation of the attacking hydroxide ion, and the second functions to stabilize the pentacoordinate transition state at the phosphate to be cleaved. Observation of a single metal in the ASV IN catalytic core structure does not preclude a two-metal mechanism, because Asp-64 in the first ␤-strand could contribute to the coordination of two metals, in conjunction with the other acidic residues in the full-length protein, as observed in other members of this superfamily (31,38). The binding of the second metal might also require DNA substrate or other domains of IN. The conservation of active site residue positions within a common folding topology presumably indicates a common catalytic mechanism for the members of this superfamily. Structural and biochemical data for full-length or isolated IN domains complexed to various substrates may be necessary to determine the number of metals and their specific roles in the reaction.
Despite the overall similarity between the two available retroviral IN catalytic core structures, certain differences are apparent. The HIV-1 IN structure contains an ␣-helix (␣-6) not observed in the ASV structure (see also alignment in Fig. 2). The ASV structure has an ordered active site in which all three of the invariant residues of the D,D(35)E motif are resolved (Fig. 3, top), whereas the published HIV-1 model has a disordered segment, which includes the third critical residue of this motif, Glu-152. Recent data from a different crystal form of the HIV-1 IN catalytic core domain 3 reveal more of ␣-helix 4, including the third catalytic residue Glu-152 (pictured in Fig. 3, top left panel). However, 10 amino acids between ␤-strand 5 and ␣-helix 4 remain disordered (dotted line in Fig. 3, top left panel). Flexibility is likely to be a hallmark of this loop in the isolated core domain; flexibility is also reported for the analogous portion of the ASV IN domain (27). Differences in the direction and orientation of the two catalytic Asp residues in the two structures are also observed. Several explanations for the latter difference are possible (36); however, a detailed description of the active site configuration will likely require analysis of complexes with DNA substrates.
Both the HIV-1 and the ASV IN catalytic cores crystallize as dimers (see Fig. 3, middle). The primary contacts between subunits in both structures involve ␣-helices 1 and 5. Because the same extensive interface is seen in both IN structures, it is likely that core domain dimerization plays an important role in the multimerization required for the processing and joining activities of integrase. As might be expected from the contribution of an additional ␣-helix mentioned above (␣-helix 6), the solvent-accessible surface area buried upon dimerization is more extensive in the HIV-1 IN dimer interface (1300 Å versus 766 Å for ASV). Correspondingly, the binding interactions between subunits are more numerous in the HIV-1 interface. These differences provide an explanation for the higher apparent association constant of the HIV-1 core dimer relative to ASV (39,40).
In each dimer, comparison of the distance between the active site residues Asp-64 (Fig. 3, middle) shows a difference of Ϸ3 Å, a value approximately equal to the length of one nucleotide in B-form DNA. This is consistent with the one-nucleotide difference in the target site duplication generated by these two enzymes during integration. However, as noted by Dyda et al. (26), the overall distance of Ϸ35 Å is too great to accommodate the 5 base pairs of B-DNA that separate the two HIV-1 IN cleavage sites in the target DNA. This difference could be accommodated if one allows for partial DNA unwinding during the reaction and coordinated but sequential cleavage at the target site. Independent lines of evidence support such a model (41,42). Alternatively, two active sites could be positioned closer in a higher order multimer (26). A similar disparity between active site separation and the distance between two DNA scissile bonds also exists in the co-crystal of ␥␦ resolvase and a model DNA substrate (43). Partial distortion of both protein and DNA, and sequential single-strand cleavage in the target DNA, have been proposed to reconcile the ␥␦ structural data with the known biochemical properties of the reaction.

C-terminal Domain
The amino acid sequence of the C-terminal domain of approximately 80 -100 residues is not highly conserved among IN proteins from different families of retroviruses (see Fig. 2, bottom). Moloney murine leukemia virus IN contains a sequence insertion of unknown function in this region, and other differences in the connec-tion of this domain to the catalytic core may exist between lentiviruses and oncoretroviruses (44). Mutation and deletion analyses with both ASV and HIV-1 IN indicate that the C-terminal domain contains nonspecific DNA binding activity (15,16,(45)(46)(47). The structure of an isolated fragment from this HIV-1 domain (amino acids 220 -270) was recently determined by NMR (48,49). It reveals an SH3-like fold (see Fig. 3, bottom) similar to that found in signal transduction proteins (50). The significance of this feature is not yet apparent; the SH3 domain may be a ubiquitous folding unit involved in a variety of binding functions including peptide, protein, and DNA ligands. The NMR data show that the isolated SH3-like domain of IN exists as a dimer in solution (48,49). Biochemical studies have also revealed that multimerization determinants reside in the C-terminal regions of both ASV (39) and  HIV-1 IN (40). The binding interactions at the interface in the NMR structure are predominantly hydrophobic and localize to ␤-strands 2, 3, and 4 (purple strands), although there is a possible hydrogen bond between Gln-252 and Glu-246 of the other subunit (Fig. 3, bottom left panel). Lodi et al. (49) suggest that a DNAbinding site may be formed by the dimeric structure of this domain where the loops between ␤-1 and ␤-2 strands form a saddle-like groove that could accommodate a duplex DNA molecule. This groove contains basic residues favorably positioned for contact with DNA (see Fig. 3, bottom right panel). In support of this model, substitution of Lys-264 (to Glu) significantly reduces the DNA binding capacity of an isolated fragment from this domain of HIV-1 IN (47).

Connecting IN Domains in the Full-length Monomer and in the Integration Complex with DNA
Juxtaposition of Domains-The study of retroviral INs has benefited greatly from structural information derived from these isolated domains. However, regions that connect the three domains may also perform important functions, some of which could be specific to different retroviral IN proteins. For example, significant sequence differences are observed between oncoretrovirus and lentivirus proteins in the "hinge region" that connects the catalytic core and the C-terminal, ␤-barrel domain. The glycine-rich loop between ␣-5 and ␣-6 is conserved only in the lentiviruses, and there is no clear homology to ␣-helix 6 in the oncovirus INs. Instead, there is a proline-rich region that would not favor helix formation (44). Sequences from this hinge region in ASV IN (but not HIV-1 IN) confer nuclear localization on a heterologous reporter protein. 4 This may represent a mechanism for nuclear import that is not common to all retroviral integrases.
The MuA transposase catalytic core structure (34) possesses a downstream domain that may be functionally analogous to the C-terminal domain of IN. It too forms a ␤-barrel known to include determinants for nonspecific DNA binding. However, analysis of the MuA structure is unlikely to provide insight into specific contacts between the IN core and C-terminal domains. The MuA ␤-barrel is a 6-stranded structure with a topology different from that of the 5-stranded ␤-barrel of HIV-1 IN. Moreover, the contacts between these two MuA domains seem to be unique to this protein (34), as they involve long loops (between ␣-helix 2 and 3, and between catalytic core ␤-strands 1 and 2) not found in the retroviral integrases. Correspondingly, the MuA ␤-barrel does not form a dimeric interface as seen in the HIV-1 IN C-terminal ␤-barrel structure.
Still undetermined is how the three IN domains interact with each other in the multimer-DNA complex that performs the coordinated integration reaction. While it is clear that the isolated catalytic core and C-terminal domains each can dimerize, these two interfaces are not likely to be the only determinants of multimerization. As noted above, there is biochemical evidence for interaction between the N-and C-terminal domains of HIV-1 IN. Furthermore, different IN domains exhibit trans-complementation (11,12). As is the case with the MuA transposase (Ref. 51  Multimerization-Although a tetrameric IN structure (see Fig.  1, middle) seems reasonable (26,42), currently there is no direct evidence that a tetramer is required for coordinated DNA cleavage or joining reactions. Conceivably, sequential action of a dimer could also catalyze both processing and joining steps. However, IN must coordinate four DNA cleavage events in the overall recombination reaction; these include both viral DNA ends and two nearby sites in the target host DNA. It is clear that there must be communication between the active sites that perform the coordinated reactions. In vivo (52) and in coordinated reactions in vitro (42,53), mutation at one viral DNA end will result in inefficient catalysis at the other viral end. The organization of IN protomers within the synaptic complex must determine the position of DNA substrates and the proper coordination of these four cleavages. Despite sharing a similar topology and reaction chemistry, the enzymes of this superfamily differ in substrate specificity, multimeric structure, and the number of cleavages required. For example, RNase H is known to act as a monomer (54), and correspondingly, its function does not require coordination of multiple cleavages. The position of several of the flanking ␣-helices would preclude a dimerization interface for RNase H similar to that seen in the IN structures (31,32). RuvC is known to function as a dimer and performs two DNA cleavage events when resolving the Holliday junction during homologous recombination (see Ref. 55 for review). In contrast to both IN core domain interfaces, the RuvC dimeric interface involves subunit contacts between an elongated ␣-helix analogous to IN ␣-helix 3. These examples indicate that proteins which share a conserved topology and catalytic mechanism may differ in the multimeric organization that is required for their specific functions.

Perspectives
The information on retroviral integrase structure acquired during the last few years has advanced the study of these enzymes significantly. However, it has also highlighted new, important questions concerning the relative orientation of IN domains, IN multimerization, and the positioning of substrate DNAs within the synaptic complex. The current structural data provide models that will aid in the rational design of active site inhibitors of these enzymes that may be used as antiviral drugs. Domain or subunit interactions represent another target that could provide new therapeutic agents. Still largely unexplored is the specific nature and relative importance of protein-protein interactions that involve IN and other viral (56,57) and cellular proteins (58). Their study is sure to advance our understanding of the molecular biology of this important class of viruses and of the cells in which they propagate.