ARCHITECTURE OF A FULL-LENGTH RETROVIRAL INTEGRASE MONOMER AND DIMER, REVEALED BY SMALL ANGLE X-RAY SCATTERING AND CHEMICAL CROSS-LINKING

We determined the size and shape of full-length avian sarcoma virus (ASV) integrase (IN) monomers and dimers in solution using small angle x-ray scattering. The low resolution data obtained establish constraints for the relative arrangements of the three component domains in both forms. Domain organization within the small angle x-ray envelopes was determined by combining available atomic resolution data for individual domains with results from cross-linking coupled with mass spectrometry. The full-length dimer architecture so revealed is unequivocally different from that proposed from x-ray crystallographic analyses of two-domain fragments, in which interactions between the catalytic core domains play a prominent role. Core-core interactions are detected only in cross-linked IN tetramers and are required for concerted integration. The solution dimer is stabilized by C-terminal domain (CTD-CTD) interactions and by interactions of the N-terminal domain in one subunit with the core and CTD in the second subunit. These results suggest a pathway for formation of functional IN-DNA complexes that has not previously been considered and possible strategies for preventing such assembly.


We determined the size and shape of fulllength Avian Sarcoma Virus (ASV) IN monomers and dimers in solution using small angle X-ray scattering (SAXS).
The low resolution data obtained establish constraints for the relative arrangements of the three component domains in both forms. Domain organization within the SAX envelopes was determined by combining available atomic resolution data for individual domains with results from cross-linking coupled with mass spectrometry.
The full-length dimer architecture so revealed is unequivocally different from that proposed from X-ray crystallographic analyses of two-domain fragments, in which interactions between the catalytic core domains play a prominent role. Core:core interactions are detected only in cross-linked IN tetramers and are required for concerted integration. The solution dimer is stabilized by C-terminal domain (CTD:CTD) interactions and by interactions of the Nterminal domain (NTD) in one subunit with the core and CTD in the second subunit. These results suggest a pathway for formation of functional IN-DNA complexes that has not previously been considered, and possible strategies for preventing such assembly.
Retroviral integrase (IN) catalyzes the insertion of viral DNA into the DNA of the infected host cell. IN is one of three retroviralencoded enzymes that are essential for retroviral replication and, therefore, is an important target for drugs to treat HIV/AIDS. One IN active sitedirected inhibitor, Raltegravir, is approved by the FDA for this purpose, and a second, Elvitegravir, is in advanced clinical trials. The availability of such drugs offer hope to HIV-positive individuals, especially those who have developed resistance to therapies that target the other two viral enzymes, reverse transcriptase and protease. Nevertheless, the inevitable development of drug resistant HIV mutants drives a continuing need for additional strategies to block the activity of this viral enzyme. Unfortunately, progress has been limited by a lack of critical details concerning the molecular structure of full-length HIV IN protein, and its functional multimers.
Much of the difficulty arises from the fact that the HIV protein is relatively insoluble and therefore difficult to study using biophysical methods. As retroviral INs are likely to share architectures for active complexes (1), we have focused our studies on avian sarcoma virus (ASV) IN (2), which is more soluble than HIV IN and, as we show here, suitable for structural analyses.
IN proteins are composed of three distinct structural domains ( Fig. 1A and B). The largest (amino acids  in ASV IN) is the central catalytic core domain (CCD or "core"). This domain contains the D,D(35)E motif of acidic residues that coordinate the required divalent metal ions, Mg +2 or Mn +2 (3)(4)(5)(6). The isolated HIV IN core domain forms a dimer in solution (7), and the 3-dimensional structure of the core from several retroviral IN proteins has been solved by X-ray crystallography of either the isolated domain or two-domain fragments that include the N-(NTD) or C-terminal (CTD) domains (8)(9)(10)(11)(12)(13)(14). The same extensive interface of two core domains (e.g. Fig. 1C) has been observed in every crystal structure analyzed so far and, consequently, it is thought to be physiologically relevant. The isolated, Zn +2 -binding NTD (amino acids  and the SH3-like CTD (amino acids 210-286) of HIV IN also form dimers in solution (15)(16)(17)(18)(19)(20), but the significance of these interactions is uncertain. For example, the spatial relationships between the CTDs and cores are different in each of the twodomain crystal structures that have been determined, and the crystal structure of the NTD+core, two-domain fragment shows a different NTD:NTD interface than that observed in the NMR structure of the NTD alone. One might ask if these interactions reflect alternate physiologically relevant assemblies.
Full-length IN proteins are known to exist as monomers, dimers, and tetramers in solution (21)(22)(23)(24)(25)(26), and complementation experiments indicate that IN functions as a multimer (27)(28)(29)(30). An IN dimer appears to be the most catalytically active form for the endonucleolytic processing of a single-end viral DNA substrate in vitro (31). However, as two processed DNA ends must be joined by IN to host DNA in a concerted fashion in vivo, a tetramer is assumed to be the minimal functional multimer for this step. Indeed, analysis of ASV IN-DNA complexes imaged by atomic force microscopy revealed that assembly of a tetramer is induced upon interaction with a "disintegration" substrate, which represents a viral-host DNA integration intermediate, and that four IN monomers are required for a single catalytic turnover with this substrate (32). Purification and analysis of covalently crosslinked multimers of HIV-1 IN showed that although a dimer could process and join a single viral DNA end substrate, only a tetramer was capable of catalyzing the concerted integration of two viral ends into a target DNA (33). Analyses of in vitro assembled HIV IN synaptic complexes containing viral and target DNA substrates, also indicate that concerted integration is catalyzed by an IN tetramer (34).
Models for IN dimers (Fig. 1C) and tetramers have been derived from consolidation of the crystal structures of two-domain protein fragments, and other studies (13,(35)(36)(37). However, their validity has been difficult to evaluate, as experimental knowledge has been lacking concerning the disposition of the three domains with respect to each other, in the intact full-length monomer or multimers. The crystal structure of full-length IN from the human prototype foamy virus (PFV) was recently solved in complex with viral DNA (38) revealing two different dimer interfaces for IN within the "intasome" structures. Here we report the use of small angle X-ray scattering (SAXS) and biochemical cross-linking analyses to determine the architecture of fulllength ASV IN monomers and dimers, in the absence of DNA substrates. Our results show that the solution dimer interface is distinct from that previously proposed in models derived from twodomain crystal structures, and more closely resembles that in the DNA-binding "inner" dimer in the prototype foamy virus (PFV) intasome (38,39). Analyses of the ASV apo-IN architecture described herein highlight key domain interactions and the specific conformational changes that may occur upon DNA substrate binding.

EXPERIMENTAL PROCEDURES General
Methods are provided in

SUPPLEMENTAL Materials
Light scattering analysis. Measurements were made with a Protein Solutions DynaPro Temperature Controlled Microsampler. Samples were adjusted to the desired concentration and particulates removed by filtration through a 0.2µ microcon device, and subsequent clearing by brief centrifugation at 14,000 x g and 4 °C. The protein concentration was then determined directly using absorbance at 280 nm and a calculated molar extinction coefficient, taking the average of 3 readings.
All samples were analyzed under conditions of 10 °C in a buffer of 25 mM BisTris pH 6.1, 500 mM NaCl, 1 mM DTT, 0.1 mM EDTA, 5% glycerol. The apparent molecular mass (MW-I) was calculated from the static light scattering measurements (at least 300 acquisitions per protein sample) using the DynaPro software.
SAXS and ab initio shape modeling methods. X-ray scattering experiments were performed at the Advanced Photon Source at Argonne National Labs, 5ID-D beamline. Data were collected at 10 keV (1.24 Å) with the SAXS detector at a distance of 2.584 m and simultaneous WAXS detector at 291 cm, which produced an accessible q-range of 0.005 to 1.8 Å -1 (where q =4πsinθ/λ, where 2θ is the scattering angle).
To minimize protein damage, four 10 second exposures were typically taken at 10 ºC with sample flowing at 4 μL/sec using a 0.3 x 0.3 mm 2 collimated X-ray beam. Exactly matched dialysates were sampled under the same conditions to subtract from protein samples, which were tested in the range of 0.8 to 3 mg/ml in the same buffer conditions used for light scattering. Samples were filtered and cleared by centrifugation at 16,000 x g just prior to placement in the sampler. Combining of SAXS and WAXS data and subsequent data reduction were as published previously (40).
While initial R g estimates were made at APS by a linear fit of a typical Guinier plot in the q range of 0.5 to 1.2/R g , subsequent data analysis using Irena software developed at APS (41) was used for data in the broader q range of .01 to 0.4 to determine R g , I(0), as well as to calculate a paired distance distribution function (or P(r) function) and D max , either by Fourier transform by the method of Moore, or conventional regularization. Goodness of fit was assessed with the reduced Χ 2 parameter. In all cases, equivalent results were obtained by regularization with the program GNOM (42). A dilution series was performed with the wildtype ASV IN protein and, using the ATSAS program PRIMUS (43), these data were extrapolated to infinite dilution.
The results (not included) showed no significant variation in Rg, confirming the absence of any concentration-dependent effects. Accordingly, our data analyses assume a single component ideal behavior under the concentrations tested. Subsequent ab initio shape modeling was performed with both DAMMIN and GASBOR programs (43), with and without P2 symmetry when appropriate. In each case, several q max cutoff values were sampled in the range of 0.3 to 0.9, with the standard final processing using a q max of 0.4. These produced dummy atom output files which were then used to generate the final envelopes with the Situs software (44). To test for uniqueness, 10 shape reconstructions were performed with the program GASBOR for wildtype ASV IN dimers and monomers (supplemental Fig. S1). The results showed a high degree of stability and convergence in the shape modeling for both monomer (NSDs of 1.0 to 1.19) and dimer (NSDs of 1.0 to 1.34) envelopes, with ranges comparable to those reported in other SAXS analyses (45).
Protein cross-linking and In-gel digestion. A mixture of 1:1, unlabeled and isotopically labeled ASV IN proteins (6.5 μM each) was equilibrated overnight (46) and dialyzed in 20 mM HEPES (pH 7.8), 0.5M NaCl, 2 mM DTT, 10% glycerol. Freshly prepared BS 3 (Pierce) homo-bifunctional cross-linker was used at increasing concentrations. After addition of the cross-linker the reaction was allowed to continue at 37 °C for 5 min, and then quenched by addition of 20 μl of 2M glycine and left on ice for 30 min. The reactants were precipitated with acetone and resuspended in the 20 mM HEPES (pH7.8), 0.5 M NaCl, 2 mM DTT, 10% glycerol. The products were separated on a denaturing NuPAGE 4-12% Bis-Tris gel using MES running buffer and Coomassie blue stain. We confirmed that sample recovery was unaffected by acetone precipitation. Furthermore, treatment with a different reagent, EDC/NHS (Pierce), produced a similar distribution of crosslinked products (data not shown). Monomer, dimer, and tetramer bands from a reaction in which the molar ratio of protein to BS 3 was 1:20 were excised and destained (50% MeOH, 5% HOAC in water) overnight, after which they were dehydrated completely using 100% acetonitrile. Reduction and alkylation were performed by adding 20 mM dithiothreitol (DTT) and 50 mM iodoacetamide (IAA). After a second dehydration, gel bands were rehydrated at 4 °C for 45 min in trypsin solution (10 ng/l Promega sequencing grade modified trypsin, 10 mM NH 4 HCO 3 , 10% acetonitrile). Proteins were digested overnight at 34 °C.
Mass spectrometry and database searching. The digested samples were acidified with 0.3% formic acid before being injected into a LC/MS/MS instrument QSTAR (Applied Biosystems/MDS Sciex, Foster City, CA). An Agilent nano-HPLC (Agilent, Wilmington, DE) was equipped to interface the Q-TOF mass spectrometer. Samples were automatically loaded onto a C-18 trap column (ZORBAX 300SB-C18, 0.3 x 5 mm, 5 mm) then eluted to a reversed-phase C-18 analytical column (ZORBAX 300SB-C18, 100 x 150 mm, 3.5 mm). A typical HPLC gradient for the tryptic mixture of peptides was 5-by guest on March 22, 2020 http://www.jbc.org/ Downloaded from 80% organic solvent over a period of about 85 min, followed by 80-100% organic solvent for the next 15 min and 100-5% in the last 15 min. The 300 nl/min flow from the column elution was sprayed through a coated emitter (FS360-50-5-CE, New Objective Inc., Woburn, CA) into mass spectrometer with a set voltage of +2.5 kV. The system was equilibrated for 15 min at the end of the gradient. The acquisition method of QSTAR was set at a 2 s TOFMS "survey" scan followed by three MS/MS scans (3s, 4s, and 5s, respectively). Parent ions with charge state of +2 and +3 or intensity above 15 counts were fragmented. The mass range for "survey" scan was 400 to 1000 amu and was 100 to 2000 amu for MS/MS scan.
The MS wiff files were processed into MGF files using Mascot Distiller with default parameters. Data were searched with MassMatrix PC suite 1.1.3 program (47), and search parameters were: MS accuracy, 10 ppm; MS/MS accuracy, 0.8 Da (at this level of search stringency, no peptide adducts were identified that are inconsistent with the reaching dimer); enzyme, trypsin; specificity, fully tryptic; allowed number of missed cleavages, four; fixed modifications, carbamidomethylation on cysteine.
Further allowed variable modifications were K+8 for lysine; R+10 for arginine; oxidation of methionine, tryptophan, and histidine; deamidation of asparagine and glutamine. End products of BS 3 mono cross-linked adducts with lysine and Ntermini were allowed with water or glycine. Results of the cross-linked peptides were also manually validated using GPAMW program (48).

Light scattering analyses reveal homogeneous ASV IN dimers.
Static light scattering provided a direct measure of the absolute molecular mass (MW-I) of our proteins and protein complexes in solution. The molecular uniformity of these preparations in the concentration range appropriate for SAXS analysis, 1 to 4 mg/ml (32 uM to 128 uM), was also evaluated by use of dynamic light scattering.
As summarized in Table 1, we obtained an apparent molecular mass (MW-I) of 69 kDa for wild type IN, only in slight excess of the calculated mass of a dimer, 64 kDa. This difference could reflect the presence of a minor amount of higher order multimers in the preparation. However, the values calculated from static light scattering can also differ somewhat from the theoretical due to the dynamic exchange of subunits in multimeric complexes (46). Enzymatic activity assays confirmed that this wild type protein preparation catalyzes single-end cutting and joining of viral DNA as well as concerted integration (Table 1). Among the other ASV IN derivatives prepared and analyzed, several contain an F199K substitution ( Fig. 1B and C). Structural alignments show that residue F199 in ASV IN is adjacent to that of F185 in HIV-1 IN (1) and, as with HIV, its replacement enhances protein solubility, a feature that was required for successful crystallization of the respective twodomain IN fragments of ASV and HIV-1 IN. To examine the effects of this substitution on ASV IN multimerization we analyzed full length derivatives containing the F199K substitution alone, or in combination with other substitutions. The MW-I values observed, 71 and 72 kDa , were not appreciably different from the value for wild type IN, indicating that these preparations also contained primarily dimers under the conditions tested. These results are noteworthy, as F199 lies at the core-core interface in crystals of the isolated core domain or the core+CTD of ASV IN, and substitution of this large hydrophobic side chain is predicted to reduce the stability of this interface, as illustrated with HIV-1 IN (49). While our data (Table 1) show that the F199K substitution in full length ASV IN does not compromise either dimerization or single end cutting and joining of viral DNA, a role in formation of higher order IN complexes (i.e. a tetramer) is likely, as the F199K derivative is unable to catalyze concerted integration.
The importance of the ASV CTD for IN multimerization is illustrated by comparison of the molecular mass of IN fragments in which either the NTD or CTD is absent. The MW-I of ASV IN , which lacks the NTD is 54 kDa, exactly twice the mass calculated from the amino acid sequence of a respective monomer, 27 kDa. In contrast, under comparable conditions the MW-I of the ASV IN (1-207) which lacks the CTD, is 28 kDa, a value close to the calculated monomer mass of 23 kDa. These light scattering data are consistent with previously published results from size exclusion chromatography of these same IN fragments (50).

Shapes and lengths of IN proteins in solution determined by small angle X-ray scattering analysis (SAXS).
SAXS analyses provide a rotationally-averaged version of the scattering of a single particle, from which size and shape can be determined. Certain features can be established unambiguously: the radius of gyration (R g ) and the longest dimension of the particle (D max ). As verification of our methods, we performed SAXS on a preparation of the two-domain fragment lacking the NTD, ASV IN (49-286) F199K, and compared the results with the shape and size determined from the published crystal structure of the same fragment (13). Figure 2 shows the light scattering data and the P(r) function for this fragment from which we determined the D max to be 75 Å, close to the maximum of 81 Å calculated from the coordinates of the crystal structure. A low resolution shape of the dimer was derived from the SAXS data (51). Computational methods were used in reverse to calculate the expected scattering and P(r) function from the published atomic coordinates of the dimer. As shown in Figure 2, these results were nearly super imposable on the experimental data, and the SAXS-derived envelope was found to accommodate the atomic model of the crystal dimer neatly within its borders (Fig. 2B, right).
SAXS was then applied to the full-length wild type ASV IN protein, which is a homogeneous dimer at the relevant concentrations (Table 1). From the results in Figure 3A, a D max of 109.4 Å was established for this dimer. Figure 3B shows a plot of the scattering intensity (I(q)) verses Q 2 (a "Guinier plot") for this protein; an R g of ~32.8 Å was calculated from the slope of a linear fit of these data in the Q• R g < 1.2 region. A similar value (R g =33.1 Å+/-0.6) was obtained from a nonlinear regression fitting (41). The linearity of the data at low angles verifies that the preparation was free of aggregates. A SAXs-derived envelope for the ASV IN dimer is shown in Figure 3C (see also supplemental Fig. S1A).
The SAXS parameters obtained for full-length ASV IN and several other IN derivatives are summarized in Table 2. We note that, as with light scattering (Table 1), data obtained with the IN fragment that lacks the CTD (IN 1-207) are as expected for a monomer, confirming that important determinants of dimerization reside in the CTD of ASV IN. Therefore, while core:core interactions can facilitate dimerization of the isolated catalytic core domain under crystallization conditions (9), under our conditions these interactions are not sufficient to allow dimerization of a protein that lacks the CTD. Furthermore, because a full-length derivative with an alanine substitution for residue W259 in the CTD also displays the parameters of a monomer in solution (Table 2), we conclude that this tryptophan residue plays a key role in the dimerization interface of full-length ASV IN in solution.
The SAXS-determined shape of monomeric IN establishes constraints for the relative arrangement of the three domains. In order to determine how the subunits and their respective domains could be arranged within the experimentally determined IN dimer envelope, we performed SAXS analysis on a full length ASV IN derivative that includes the W259A substitution. This protein contained three additional substitutions (C23S/C125S/F199K) that improve solubility, but have no affect on single-end cutting or joining activity (data not shown). The data obtained with this monomer (Fig. 4A), and its predicted elongated shape ( Fig. 4D; supplemental Fig. S1B), are consistent with a structure containing the IN core domain (at the base in the figure) and the two smaller terminal domains, one close and one distal to the core. To determine if the distal domain corresponds to the NTD or CTD, we produced a chimeric protein in which thioredoxin (trxA) is fused to the NTD of the W259A derivative (trxA-IN-W259A). The SAXS parameters for this derivative are summarized in Table 2, and the scattering data are shown in Figure 4B. The envelope derived for the chimeric protein is considerably longer than that of the monomer lacking the N-terminal trxA domain, consistent with a distal placement for the NTD (Fig. 4D). Furthermore the theoretical curve for a structure in which the CTD of this derivative is the distal domain, produces parameters and an envelope that are inconsistent with the data in Figure 4B (Fig. 4C). We conclude, therefore, that the distal domain in the ASV IN monomer is the NTD. A provisional model consistent with this conclusion is shown to the right of the trxA-ASV-IN envelope in Figure 4D. This model is also supported by results from our SAXS analysis of wild type PFV IN, which contains a natural N-by guest on March 22, 2020 http://www.jbc.org/ Downloaded from terminal extension called NED, and is a monomer under the conditions of analysis; like the chimeric protein, the PFV monomer is longer than the ASV IN monomer (Table 2) and has a shape consistent with an NTD extension (envelope not shown). Figure 4E shows how two monomer envelopes of ASV IN might fit within the dimeric envelope of wild type ASV IN with the approximate positions of each domain noted. We call this arrangement a "reaching dimer." Strategy for identifying amino acid proximities in the IN monomer and multimers. To identify their regions of proximity within a dimer, it is necessary to be able to distinguish the two subunits. To do so, we prepared wild type ASV IN protein that was isotopically labeled with 13 C and 15 N in lysine and arginine residues (supplemental Fig. S2). The doubly-labeled IN was then equilibrated with an equal amount of unlabeled IN using conditions described by Kessl et al. (46). After equilibration, half of the dimers are expected to be "mixed dimers," containing one labeled and one unlabeled monomer, and the remainder either fully labeled or fully unlabeled. The mixture was then treated with Bis (sulphosuccinimidyl) suberate (BS 3 ), a reagent that forms covalent cross-links between primary amines in lysine side chains and also with protein N-termini, that lie within 11.4 Å of each other. Samples were then subjected to electrophoresis in a denaturing polyacrylamide gel to determine the optimal concentration of BS 3 (Fig. 5A). Crosslinked monomer, dimer, and tetramer bands were excised from the 1:20 lane, and the proteins eluted for identification of intra-and inter-subunit crosslinks respectively, using trypsin digestion followed by mass spectrometry (see examples in supplemental Fig. S3).
Proximities determined from analysis of crosslinked monomeric IN. MS/MS analysis of protein excised from the monomer band, which contained both labeled and unlabeled IN protein, showed extensive intra-protein cross-linking (Fig. 5B, supplemental Table S1). However, no peptides corresponding to chemical cross-linking between the labeled and unlabeled IN proteins were detected in the isolated monomers. Demonstrating the uniform accessibility of side chains by this methodology, 15 of the total 20 surface accessible lysine residues in all three domains of IN, as well as the N-termini were found to be mono-modified by the BS 3 , with dead ends comprising glycine or water (supplemental Table S1). As summarized in Figure 5B, 5 of the 10 lysine residues in the CTD are within ~11 Å of lysines in the NTD and the core; CTD tail residue K278 was cross-linked to NTD G1, and residues in the core domain, K116 and K191 were cross-linked to CTD K264. In the CTD linker region, residues K211 and K225 were cross-linked to K266 and K272 respectively (Fig.  5B, supplemental Table S1), consistent with the SH3 like fold of the CTD.
A monomer structure of IN that satisfies the observed cross-link constraints would place the NTD close to the C-terminal tail region of the CTD.
In addition, the observed cross-links between lysine residues in the CTD with those in the NTD and core domains places the CTD in a position proximal to both. A structure consistent with all of the cross-linking data (Fig. 5B, right) has an extended NTD, which points away from the core domain, and a CTD in the cleft between the NTD-core-linker region.
This independentlyderived arrangement is consistent with the SAXS data summarized in Figure 4.
Identification of inter-subunit proximities in the IN dimer. Protein excised from the crosslinked dimer band was then analyzed. In this sample, inter-subunit proximities in mixed dimers can be identified unambiguously by mass spectrometry owing to the hybrid mass of crosslinked peptides. Results from analysis of such peptides revealed an extensive network of interactions with a total of 21 cross-links between lysine residues in all domains of both subunits (Fig. 5C, supplemental Table S2). For example, NTD residue K6 in the unlabeled IN monomer cross-linked with core domain K116 in the labeled IN, and NTD K21 in the unlabeled subunit formed cross-links with K166 in the core and the CTD K264. In addition, the amino group of the Nterminal glycine in the labeled IN subunit formed cross-links with core domain residues K116 and K164, and CTD residues K264, K266 and K278 near the base of the tail in the unlabeled subunit. Reciprocal cross-links were identified between the core domain residue K164 in the unlabeled subunit to the N-terminal G1 and CTD residue K264 in the labeled subunit. At least 5 lysine residues in the CTD of the unlabeled IN were found to cross-link with 6 residues in the labeled IN, the identified cross-link adduct pairs were: K211:K264,  Table S2). The failure to identify completely reciprocal adducts could be due to incomplete detection, or to minor asymmetry of interactions between the dimer interfaces, perhaps reflecting some flexibility in the subunit domains. With greater than 95% sequence coverage, we favor the latter interpretation.
The proximity data obtained from our analyses of cross-linked IN monomers and dimers support a dimer model that includes the following notable features: a) In the dimer interface, CTD domains from each monomer come into close enough contact (i.e. ≤11Å) to form the following cross-links: K264:K211, K264:K264, K264:K266, and others not included in Figure 5C (see supplemental Table S2). b) No cross-links between the two core domains were detected in the dimer. Consequently, the position of this domain in each subunit is sufficiently remote to exclude such interaction. As no cross-links were observed between NTDs in the mixed dimers, a similar constraint applies to this domain. c) The NTD from one subunit is sufficiently close to the core domain and CTD of the other subunit to permit the following cross-link interactions between the subunits: G1:K116, G1:K164, G1:K264, K116:K6, K166:K21, and K264:K21. Additional experiments with the zero length protein crossliner EDC confirmed NTD:core proximities (data not shown.) The features delineated above are uniformly inconsistent with the core:core dimer model proposed from the two-domain crystal structures (14). The full length dimer deduced from our results is stabilized by CTD:CTD interactions between both subunits and by interactions of the NTD of one IN subunit with the core domain and the CTD of the second subunit. (Fig. 5) revealed core:core crosslinks in addition to the novel cross-links identified in protein from the dimer band (supplemental Table S3). Cross-links were observed with 5 of the 7 lysine residues in this domain, of which reciprocal adducts of K164:K184 were observed between the labeled and unlabeled subunits (Fig.  6A). The remaining cross-links between these subunits were: K116:K166, K119:K164, K129:K116, K164:K116 and K211:K164. These interactions are consistent with the interface observed in crystals of the isolated ASV IN core domain and those of the core+CTD two domain fragment (3,13).

Identification of core-core interactions in the IN tetramer. MS/MS analysis of protein from the IN tetramer band
As illustrated in Figure 6B, reciprocal interactions in the core:core dimer interface of ASV IN are mediated predominantly by side chains from alpha helices 1 and 5; potential electrostatic interactions between R114 and E200, and H103 and E187 are highlighted.
To investigate the functional importance of these interactions, we made charge-reversing single substitutions, E187K and H103D, and a compensatory double substitution, E187K/H103D. Comparison of the single-end processing activity as a function of time showed no significant differences; each derivative was capable of -2 cleavage (Fig. 6C). In contrast, an assay for concerted integration activity showed that the protein with a single substitution, E187K, is defective in this reaction ( Fig. 6D; lanes 5 and 6). However, this function is restored in the derivative with the compensatory substitutions, which exhibits activity similar to that of the wild type protein (Fig. 6D, lanes 7 and 8). We conclude that stability of a core:core interface, which is detected only in the cross-linked tetramers, is required for concerted integration but not singleend processing.
The ASV IN solution dimer derived from datadriven docking. To gain more detailed insight into the architecture of the IN dimer, we employed the HADDOCK 2.0 docking program (52) with distance constraints established by our crosslinking data (Fig. 5C). These data-driven runs were performed on superimposed monomers constructed from the coordinates of the ASV core+CTD crystal structure (1C1A) and HIV core+NTD crystal structure (1K6Y) maintaining a by guest on March 22, 2020 http://www.jbc.org/ Downloaded from minimum distance of 2.5 Å to a maximum reach of 11 Å between the defined cross-linked lysines across both monomers.
Iterative runs were performed until R g 's from the docked structures were in close approximation to our experimentally-determined SAXS envelope (supplemental Fig. S4). Rigid body fitting of these models within the SAXS envelope was performed by steepest descent local optimization which converges on an orientation that minimizes the number of atoms lying outside the envelope. The resulting minimized symmetrical arrangement, shown in Figure 7A, is stabilized by face-to-face hydrophobic interaction between W259 from each of the monomers (see also supplemental movie 1). Figure 7B shows the P(r) function derived from our SAXS analysis of the wild type ASV IN dimer and theoretical curves calculated from the core-stabilized dimer model (Fig. 1C) and the reaching dimer in Figure 7A. This comparison shows that the core-stabilized dimer model possesses a significantly shorter D max , and is less elongated and more spherical in shape than that deduced from our experimental SAXS data. The theoretical curve for the reaching dimer matches the experimental data more closely, and shows the same D max . These results, together with the observation that the IN W259A derivative behaves as a monomer, are consistent with a subunit arrangement in which the CTDs, rather than the core domains, play a critical role in dimerization. Figure 7C shows a close-up view of potential stacking interactions between W259 residues in the CTDs of the reaching dimer.
A reaching dimer model for HIV-1 IN. Although sequence identity between ASV and HIV IN proteins is less than 20%, they have very similar domain structures (1). Consequently, we constructed a reaching dimer model for HIV IN, based on the ASV IN dimer, to uncover any conserved features and evaluate the correlation with previous mutagenesis data. A comparison of the two reaching dimers shows that the CTD interfaces of both can be stabilized by face-to-face interactions between aromatic residues: W259 residues as described above for ASV IN, and W243 residues for HIV-1 (Fig. 7C and 7D). As noted in Table 1 Further inspection of the reaching dimer interfaces of ASV IN and HIV-1 IN reveals a network of potential hydrogen bonds between the NTD from one monomer to both of the linkers and the CTD in the second monomer (summarized in supplemental Fig. S5). With ASV IN, we have observed that interruption of the proposed hydrogen bonding between N24 and R53 in the linker region (supplemental Fig. S5A) by replacement of the latter residue with alanine, resulted in loss of single-end joining activity (data not shown). Potential interactions between the CTD from one monomer with the CTD and NTD in the second monomer, include buried hydrophobic interactions involving W259 in ASV IN and W243 in the HIV IN. In the ASV IN structure, side chains from residues 244-246, can stabilize the dimer interface further through formation of inter-molecular hydrogen bonds between the two tyrosine side chains (supplemental Fig. S5A). We have observed that substitution of alanine for Y246 results in a 50% decrease in single-end joining activity (data not shown). These results are consistent with a role for such hydrogen bonds in the architecture of a functional dimer. The potential hydrogen bond interactions in the proposed reaching dimer interface of HIV IN (supplemental Fig. S5B) includes residues that are highly conserved in the HIV genome, with less than 1% variance observed in the genomes of viruses isolated from 488 inhibitor-naïve patients in a recent study (54). Such conservation would be expected from stringent evolutionary pressure for assembly of a functional form of the apoenzyme.

DISCUSSION
Here we describe the use of two complementary approaches to elucidate the architecture of both monomeric and multimeric forms of a full-length retroviral IN (45). Although relatively low resolution, the use of SAXS with wild type IN and IN derivatives provided valuable insight into the length, shape, and domain organizations in full length monomers and dimers. Protein crosslinking which tethers all dynamically involved lysines separated by ≤11 Å, coupled with mass spectrometry, provided independent constraints for docking within the SAXS-derived envelopes. After equilibrating an equal mixture of unlabeled and labeled IN proteins, inter-molecular crosslinks could be identified unambiguously by the isolation of adducts with hybrid mass. As no hybrid adducts were observed in our analyses of cross-linked monomers isolated from the mixture, we conclude that the native structure was conserved within the cross-linked proteins.
In the IN monomers, the CTD was found to cross-link with the core and the NTD, and the NTD with the CTD "tail" (residues 270-289). A model for the full length IN monomer structure that combines our SAXS and cross-linking data (Fig. 5B) shows the core and NTDs at distal poles and the NTD in close proximity to the extended tail of the centrally located CTD. The solution dimer interface revealed in our studies is noteworthy for the absence of any core:core domain interactions, which had previously been thought to stabilize this multimeric form. Analysis of our cross-linked dimers uncovered a cluster of hybrid adducts formed between two IN monomers.
The unanticipated architecture so revealed shows a reciprocal arrangement in which the CTD from one monomer anchors into the CTD of the second monomer, and the NTD from one monomer interacts with core and CTD of the second monomer. The absence of any core:core and NTD:NTD cross-links between the subunits implies that these domains are distantly separated in the two subunits. A model consistent with results from the SAXS and cross-linking studies places the two core domains at opposite ends, with the association of subunits stabilized by interactions between opposing NTDs and CTDs, which reach out to each other ( Fig. 4E and Fig.  7A).
Cross-links corresponding to the core:core interface observed in crystals of the isolated core and two-domain fragments were detected only in full-length ASV IN tetramers (Fig. 6A). Results from our mutational studies suggest that the stability of this interface is required for concerted integration, but it is not essential for catalysis of single-end processing or joining by IN, which can be accomplished by IN dimers (33). Consequently, we conclude that the primary role of the core:core interface is in assembly of a tetrameric synaptic complex, which can catalyze the concerted joining of two 3' viral DNA ends into a target DNA (33,34). This interpretation is supported by previously published effects of other substitutions in this core:core interface (55).
A detailed structural model of the reaching dimer of ASV IN was obtained by combining the observed chemical cross-linking distance constraints with data-driven docking ( Fig. 7A and  supplemental Fig. S4). In the iterative docking runs, the protein-protein buried surface area increased from 1100 Å 2 to 2200 Å 2 , while R g was reduced from 47 to 34 in the final minimum structure. Increase in the buried surface area and decrease in R g implies a structure that is considerably more compact than the sum of the initial docking monomers. In studies with other proteins, values within the range of 1600 ± 400 Å 2 , have been observed for the buried surfaces in complexes that undergo minimal conformational change during their assembly, whereas values in the range of 2000 to 4400 Å 2 are typical for complexes that undergo large conformation changes during formation (56).
As the SAXS envelope of the ASV monomer was only observed with an IN W259A derivative, we cannot be definitive about the orientation of the CTD in this structure. We note that our original rationale for substituting W259 stemmed from a comparison of the IN CTD to other SH3-like domains in proteins that bind DNA, such as the Sso7 chromatin binding protein (57). Alignment, modeling and identification of potential DNA binding residues in the ASV IN CTD suggested a likely role for W259 in binding to DNA substrates. It was only after purification and analysis of this protein that we noticed its deficiency in dimerization, implying a potential dual role for this particular residue in both dimerization and DNA binding.
The interface in the reaching dimer model is dominated by aromatic interactions between a cluster of residues in the CTDs, which represent a unique hot spot for the maintenance of dimer stability. Results from our mutational studies  (Table 1, Fig. 7C and D) (53). Amino acids surrounding such hot spots, classified as O-ring residues, are predicted to protect the hydrophobic residues from solvent, and stabilize their interaction via hydrogen bonds and salt bridges (58). Close analysis of reaching dimers of ASV-IN and modeled HIV-IN reveals that the majority of potential variable O-ring residues at the CTD:CTD interfaces in both structures can form hydrogen bonds or electrostatic interactions. The potential for similar interactions can be observed in a number of other retroviral IN proteins.
Previous studies on the mechanism of inhibition by monoclonal antibodies that are specific to the HIV-IN NTD and CTD, have uncovered critical epitope residues in these domains (59,60).
The inhibitory activity of mAb17 can be explained by distortion of the NTD helix-turn-helix motif via binding to residues 25-35. The epitope of mAb33 includes CTD residues F223, R224, Y226, K224, I267, and I268, and expression of an mAb33 sFv fragment in host cells blocks HIV-1 infectivity prior to the integration step (61). Our HIV-1 IN model predicts that binding of either one of these antibodies would interfere with the assembly of a functional reaching dimer. Furthermore, alanine substitution of any one of these epitope residues drastically reduces the single-end joining activity of HIV-1 IN (53,60). This result is consistent with the derived reaching dimer interface of HIV-1 IN in which K34 is predicted to hydrogen bond with E246, buried R262 is predicted to interact with the backbones of P30 and V31, and R263 to hydrogen bond with E33. Substitution of the conserved, buried R262 with alanine was also found to abolish catalytic activity (53), suggesting that this residue might represent a second hot spot in the dimer interface, in addition to the buried W243. Taken together, these data indicate that the dimer interfaces of both ASV and HIV-1 IN apoproteins are characterized by fully buried tryptophan hot spots as well as O-ring residues that are partly accessible to solvent. The deleterious effects of alanine substitution for some of the partly accessible residues can be modulated by water molecules. In contrast, substitution of buried residues that are close-packed, optimizing Van der Waals interactions, cannot be tolerated without significant structural and functional cost. The buried hydrophobic interactions in our model at the CTD:CTD interface likely shield the hydrophobic patch prior to interaction with DNA.
The organization of the reaching ASV IN dimer that forms in the absence of DNA substrate bares a striking resemblance to the "inner" dimers observed to bind DNA substrates in the recently reported crystal structure of the PFV intasome (38,39). The major difference between the two is in the position of the CTD (Fig. 8, and supplemental movie 2). However, the reaching dimer is in a conformation that is energetically favorable to accommodate association with viral DNA ends, simply by unpairing of the conserved CTD tryptophans (W259 for ASV and W243 for HIV IN) and rotating via the linker region. A DNA-induced conformational change that involves the linker of HIV-1 IN has been reported previously (62). A stable intasome would then be formed by association of the viral DNA strand to be processed with the catalytic triad, and stabilization of the non-transferred DNA strand by hydrophobic interactions between the terminal bases and the now translocated conserved tryptophans. Stabilizing or disrupting alternate multimeric assemblies to modulate enzyme activity in a allosteric manner has been suggested as an alternative to active site inhibition (63). While such an approach has been proposed for HIV-1 integrase (64), these studies targeted only the core:core dimer interface. Disruption of the critical CTD:CTD interface interactions could represent a novel strategy for development of anti-HIV drugs.  1K6Y). The conserved HHCC residues, in ball-and-stick representation, bind a Zn ion (cyan sphere). The three conserved, active site residues in the CCD ("core") are shown in ball-and-stick coordinating the metal co-factors (green spheres) required for catalysis. Two additional residues of relevance to the present studies (F199 and W259) are also shown in ball-and-stick. C. A core:core stabilized ASV IN dimer model based on the HIV-1 IN model of Wang and coworkers (14). One subunit is depicted in muted colors. B. Experimentally-derived plot of P(r) function for the SAXS data is compared with values calculated from the same crystal structure. Color code is the same as in A. Right: The SAXS envelope shape derived from the experimental data is portrayed as a blue wire mesh, and the atomic resolution coordinates of 1C1A are shown within the SAXS envelope, with one monomer colored red and the other yellow.   and CTD in labeled wild type monomers are joined with dashed lines; solid lines identify cross-links within CTD residues or between CTD and CCD residues. Similar cross-links were observed with unlabeled monomers (not shown). Right: A HADDOCK generated monomer IN structure, using the monomer cross-linking data and the SAXS envelope derived from the W259A IN derivative. C. Map of dimer cross-links between labeled and unlabeled IN subunits. CTD to CTD links are shown with red lines; some included in supplemental Table S2 are omitted here for clarity. NTD to core or NTD to CTD links are denoted with dashed black lines. Fig. 6. Cross-linking evidence for core-core interactions in tetramers and their functional relevance. A. Summary of core:core cross-link data. Red dashed lines show cross-links that were unique to protein in the tetramer band. B. Reciprocal interactions in the core:core dimer interface in the crystal structure of the isolated ASV domain (PDB code 1VSH) are mediated predominantly by side chains in alpha helices 1 and 5 of this domain; potential electrostatic interactions between R114' and E200, as well as H103' and E187 are highlighted; the prime designation distinguishes subunits. C. Single-end processing assays.   Conformation predicted from the inner dimer of an intasome complex that includes the viral substrate DNA. The subunit structure is modeled from the PFV intasome (PDB code 3L2T, (38)). The change in orientation of the CTD residue W259, shown in ball-and-stick, is highlighted with arrows. Active site residues are also shown in ball-and-stick fashion. A movie that simulates the conformational change between these two states, is provided in the SUPPLEMENTAL MATERIALS section.   a As determined by absorbance at 280 nm with concentration determined with a calculated molar extinction coefficient (MEC) at A 280 . b Q = 4πsinθ/λ, where 2θ is the scattering angle; recorded data in this range was used for P(r) analysis and subsequent ab initio shape reconstructions. c As determined using the program IRENA. Comparable results were obtained using the program GNOM, and by Guinier analysis with Auto R g (65). d Data were collected at APS beamline DND-CAT 5ID-D. e Data were also collected at local source. f Goodness of fit as assessed by reduced chi squared analysis.