A New Class of Allosteric HIV-1 Integrase Inhibitors Identified by Crystallographic Fragment Screening of the Catalytic Core Domain

our indicate that is an attractive building block for further development as a second generation ALLINI.

HIV-1 integrase (IN) 3 is the enzyme responsible for the integration of the viral DNA copy of the viral RNA genome into the host chromatin. HIV-1 IN consists of three distinct structural and functional domains: the N-terminal domain, the catalytic core domain (CCD), and the C-terminal domain (1,2). All three domains contribute to the assembly of the functional stable synaptic complex, where a tetramer of IN is bound to two viral DNA ends (1,(3)(4)(5)(6)(7). The cellular chromatin-associated protein lens epithelium-derived growth factor (LEDGF/p75) binds the IN tetramer and facilitates integration of viral DNA into active genes (8 -11). The CCD contains several functional determinants of the retroviral enzyme including the DDE catalytic triad, which mediates the catalysis of both 3Ј-processing and strand transfer reactions. Furthermore, the V-shaped pocket at the CCD-CCD dimer interface provides the principal binding site for the integrase binding domain (IBD) of LEDGF/p75 (12). Therefore, the CCD is an attractive target for the development of new HIV-1 IN inhibitors.
The current FDA-approved inhibitors, raltegravir, elvitegravir, and dolutegravir, bind near the CCD active site in the presence of viral DNA and impair HIV-1 IN strand transfer activity (13)(14)(15). Allosteric IN inhibitors (ALLINIs, also known as LEDGINs, NCINIs, INLAIs, or MINIs) bind away from the active site at the distant CCD-CCD dimer interface in the principal LEDGF/p75 binding pocket (16 -22). Consequently, ALLINIs impair HIV-1 IN-LEDGF/p75 binding and induce aberrant higher order multimerization of inactive IN in vitro (2,23). In infected cells, ALLINIs inhibit both early and late steps of HIV-1 replication but are significantly more potent for inducing aberrant IN multimerization during virus particle maturation, likely due to reduced competition with LEDGF/ p75 (20, 24 -28).
The development of antiviral compounds targeting the IN-LEDGF/p75 binding interface has been fueled by the crystal structure of the IN CCD in complex with the IBD (12). For example, IBD-derived peptides that bind to the CCD-CCD dimer interface have been shown to induce allosteric IN multimerization, thereby inhibiting its catalytic activity in vitro and impairing HIV-1 replication in cell culture (29,30). Furthermore, in silico screening using the CCD-IBD co-crystal structure was one method that led to the identification of quinolinebased ALLINIs (16). Strikingly, prior studies that used IN 3Ј-processing reactions for high-throughput screening identified essentially identical quinoline-based compounds with antiviral activities (17). The emergence of fragment-based drug discovery, which entails screening of libraries of small molecule compounds (typically Ͻ250 Da) using either biophysical techniques or enzymatic assays, has opened a novel avenue for the identification of new inhibitors that bind at the IN-LEDGF/p75 interface (31). Several new chemical classes of IN-LEDGF/p75 inhibitors, including benzylindoles (32,33), benzodioxole-4carboxylic acid (34), and 8-hydroxyquinoline (35), have been identified using in silico methods coupled with fragment-based approaches using surface plasmon resonance or nuclear magnetic resonance (NMR) spectroscopy as primary screening methods. However, further development of these initial fragment hits was hindered by the lack of structural data.
To facilitate structure-based drug design, we have conducted X-ray crystallographic fragment screening, which has led to the identification of new chemical scaffolds that bind to the IN CCD dimer interface at the principal LEDGF/p75 binding site. The optimized derivative impaired recombinant IN activities in vitro and inhibited HIV-1 replication in cell culture.

Results and Discussion
Fragment Screening-Crystallographic fragment screening was facilitated by the availability of high resolution IN CCD crystals, which diffract X-rays to 1.8 Å and can be produced within 3 h of setup using the previously described crystallization condition (36). The majority of crystallization drops produced microcrystals with only one of 24 yielding a crystal amenable for small molecule soaking. Subsequent optimization using a combination of pre-seeding and reducing the well volume from 500 l to 50 l improved crystal production to approximately 3 suitable crystals per drop. High throughput fragment screening of a chemically diverse library of 971 fragments, consisting of cocktails containing 4 -8 compounds each, was conducted using a previously described protocol (37).
Surprisingly, fragment binding for only one mixture soak was observed at the screening concentration of 20 mM (in 20% (v/v) DMSO, the solvent for solubilizing the fragments and a cryoprotectant for freezing crystals). Structure refinement revealed electron density for a fragment bound to a non-biologically relevant pocket formed by crystal contacts. Subsequent hit identification through individual soaking of the mixture components proved to be challenging. Although mixture soaking consistently showed positive electron density, individual fragment soaking at 20 mM failed to reveal fragment-specific electron density. When soaked individually at 50 mM, only one fragment, 1 (Table 1), from the selected mixture displayed binding not only to the previously identified crystal contact site but also revealed weak electron density at the CCD-CCD dimer interface at the principal LEDGF/p75 binding site (Fig. 1). The presence of strong electron density for fragment binding to both sites was confirmed by a 100 mM soak of 1 (Fig. 2, left). High concentration screening is not unusual with fragment screening because fragments, due to their small size, tend to have low affinity interactions. In addition, screening with X-ray crystallography often requires higher concentrations due to issues with solubility and occupancy to detect binding (38).
At the LEDGF/p75 binding pocket (Fig. 2, right), on monomer A, the carboxylate of fragment 1 forms a direct hydrogen bond with the backbone of Ala-169, whereas two water molecules bridge its interaction with Gln-168. Glu-170, His-171, and Thr-174, the latter of which is also in close contact with the thiophene ring. Relative to the apo structure, the Met-178 side chain rotates into the cavity to establish hydrophobic interactions with the pyrrole ring of 1. Similarly, on the monomer B, the Gln-95, Trp-132, and Leu-102 side chains rotated by ϳ170°, ϳ30°, and ϳ90°, respectively, establish further hydrophobic interactions with the pyrrole ring. Other hydrophobic interactions mediated via monomer B involve Ala-89, Phe-99, Thr-125, Ala-128, and Ala-129.
Fragment Derivatization-A fragment expansion approach was undertaken to probe structure-activity relationships (SAR) between the lead fragment and the IN CCD. Modification of the pyrrole ring was determined as an ideal starting point for frag-

TABLE 1 Chemical structures of the compounds and their inhibitory activities in LEDGF/p75-dependent integration assays
ment growth based on the superposition of crystal structures for the fragment hit and available ALLINIs (data not shown). With this in mind, a small series of two-and three-substituted pyrrole analogs (labeled X and Y in Table 1) were designed and synthesized. The two-substituted analogs were prepared via direct halogenation (4 and 8) or formylation (2, 3, and 5) and subsequent functionalization of the ester derivative of fragment 1, e.g. methyl 3-(1H-pyrrol-1-yl)thiophene-2-carboxylate (supplemental information). The preparation of 3-substituted analogs proved to be slightly more challenging due to the inherent preference of pyrroles to react with electrophiles at the 2rather than the 3-position. To overcome this reactivity, the 3-formyl derivative 6 could be synthesized via Paal-Knorr synthesis (39) from commercially available 2,5-dimethoxytetrahydro-3-furancarbaldehyde and subsequent saponification. This approach also facilitated the preparation of amine 7.
Biochemical Characterization-A LEDGF/p75-dependent in vitro integration assay was used for initial screening of the eight compounds (Table 1). Electron-withdrawing substituents, such as aldehyde and carboxylic acid, at the X position of the pyrrole ring slightly improved potency, whereas the substitution with an ethyl at the same position (5) led to near complete inhibition of LEDGF/p75-mediated IN enzymatic activity at the test concentration of 400 M. Based on these results our future experiments have focused on analyzing 5. To better understand the SAR, we also examined its less potent analogs 1 and 8 in parallel experiments.
To dissect the mode of action of selected compounds in vitro, we examined their ability to inhibit LEDGF/p75-dependent and -independent IN activities as well as to induce aberrant IN multimerization and interfere with IN binding with LEDGF/ p75 (Fig. 3, A, B, C, and D). Consistent with the initial results in Table 1, Fig. 3, A and D, show that compound 5 was significantly more potent (IC 50 of 72 M) than 8 (IC 50 Ͼ600 M) or 1 (IC 50 Ͼ800 M) for inhibiting integration in the LEDGF/p75dependent assay. The ligand efficiency (LE) for compound 5 was calculated to be 0.38 kcal/mol per non-hydrogen atom (40,41). This compares favorably to the LE of 0.31 kcal/mol per non-hydrogen atom for direct binding of a highly potent representative ALLINI, BI-D, to IN (42). The inhibition of LEDGF/ p75-dependent activity could be due to (i) compound-induced aberrant protein multimerization of IN resulting in inactivation, (ii) the compound competing with IN binding to LEDGF/ p75, or (iii) a combination of these activities. Therefore, a FRET-based IN multimerization assay was used next to help elucidate the mechanism of action for these compounds. In Fig.  3, B and C (zoomed-in portion of Fig. 3B), the multimerization assay shows dose-dependent increases in homogeneous timeresolved fluorescence (HTRF) signal with the addition of 5 or 8 but not for parental compound 1. However, the extent of HTRF signal increase for these compounds was markedly smaller than that of BI-D, a relatively potent quinoline-based ALLINI (Fig.  3B). Higher HTRF counts are indicative of higher order multimerization of IN due to increasing numbers of individual subunits, which are labeled with either donor or acceptor fluorophores, gathering together within the inhibitor-promoted complex. Therefore, the results in Fig. 3B suggest that the compounds induced a limited extent of higher order IN multimerization compared with their BI-D counterpart. However, the EC 50 value of ϳ26 M for 5 in the IN multimerization assay correlated well with the IC 50 value of 60 M in a LEDGF/p75independent integration assay ( Fig. 3D), suggesting that the relatively limited multimerization of IN in the presence of 5 was still sufficient to inhibit IN activity in the absence of LEDGF/p75. Finally, to examine if the derivatives could also compete with IN-LEDGF/p75 binding, direct binding assays were performed. In addition to promoting aberrant multimerization, 5 inhibited IN binding to LEDGF/p75 with an IC 50 value of 241 M (Fig. 3D). Overall these results indicate that 5 inhibits IN through a multimodal mechanism of action in vitro similar to other quinoline-based ALLINIs such as BI-D.
We next examined the inhibitory activities of 5 with respect to two IN mutants, A128T and H171T, that emerge under the selective genetic pressure of archetypal ALLINI BI-1001 and its more potent analog BI-D, respectively (43,44). As shown in Fig.  3E, 5 inhibited LEDGF/p75-dependent activities of wild-type, H171T, and A128T INs with very similar IC 50 values. These results indicate that these mutant INs, which were markedly resistant to their respective ALLINIs, were as susceptible to 5 as the wild-type enzyme. Collectively, our results indicate that 5 is an attractive building block for further development as a second generation ALLINI. Antiviral Activities-Compound 5 was subsequently tested for antiviral activity during both early and late phases of HIV-1 replication. To determine the activities of 5 during the early phase, SupT1 T cells treated with scale doses of 5 or BI-D, respectively, were infected with a single-round luciferase reporter construct based on HIV-1 NL4 -3 (HIV-Luc), and 2 days later cells were harvested and processed for luciferase assays. Expectedly (24), the EC 50 of BI-D was 1.2 M. Rather impressively, 5 had an EC 50 of 36 M under these infection conditions (Fig. 4A). To determine activities of 5 during the late phase, HIV-Luc was produced in HEK293T cells that were treated with log-scale compound doses. Subsequent infections of SupT1 cells proceeded in the absence of any additional drug in the T cell cultures. As expected (24), BI-D was significantly more potent under these conditions, yielding an EC 50 of 57 nM (Fig. 4B). By contrast, 5 was a few-fold less potent (EC 50 of 103 M) as compared with T cells that were directly treated. Unlike previously characterized ALLINIs, these data show that 5 inhibits the early phase of HIV-1 infection more potently than the late phase during HIV-1 particle production. 5 was somewhat more cytotoxic to SupT1 cells than BI-D (520 M versus 175 M; Fig. 4C), yielding a selectivity index of 14.4 during acute HIV-1 infection (Fig. 4C).
Structural Characterization-To better understand the SAR of the investigated compounds, crystal-soaking experiments were performed to determine the binding sites of 8 and 5. A 100 mM soak of 8 with apoCCD crystals showed binding to the two sites identified with 1 (crystal contact and LEDGF/p75 binding sites). Surprisingly, the binding mode of the dual ring system of fragment 8 was flipped (Fig. 5A). Compared with 1, the carboxylate of 8 is involved in a tighter hydrogen bond interaction network with CCD monomer A residues via the backbone amides of Glu-170 and His-171 and the side chains of His-171 and Thr-174. The side chain of Glu-170 formed an additional halogen bonding interaction with one of the chlorine atoms on the pyrrole ring of 8. Additionally, the aromatic ring system participated in a hydrophobic interaction network formed by residues Met-178 and Ala-169 on CCD monomer A and residues Ala-89, Gln-95, Tyr-99, Leu-102, Thr-125, Ala-128, and Ala-129 on CCD monomer B.
Docking-The most potent compound, fragment 5, was not amenable to X-ray crystallography. Therefore, we used Auto-Dock4 (45) to predict the binding mode of this fragment. Docking with the hydrated docking protocol (46) revealed 5 binds in a manner similar to 8. As shown in Fig. 5B, the docked binding mode of 5 was predicted to maintain the hydrogen bonds on CCD monomer A with the backbone amides of Glu-170 and His-171 and the side chain of His-171 and Thr-174. The hydrated protocol also predicted a water molecule bridging the interaction of the carboxylate group with the backbone of Gln-168 on CCD monomer A. This water matches the density of an experimental water at low contours present when fragments 1 (Fig. 2, right)  To better understand the structural basis for the ability of 5 to inhibit A128T and H171T INs, which confer resistance to certain quinoline-based ALLINIs, binding free energy simulations were performed. The calculated K d value of 309 M for 5 binding to the wild-type CCD-CCD (Table 2) was comparable with experimentally determined IC 50 values in various in vitro assays (Fig. 3D). Furthermore, the binding affinity of compound 5 to A128T or H171T CCD was reduced only modestly (Ͻ2fold, Table 2) when compared with wild type. The signifi-cantly smaller size of 5 avoids the steric and electrostatic repulsion effects seen upon binding of BI-1001 to the A128T mutant (43). The reason as to why H171T substitution has little effect on the binding affinity of 5 could be explained by the absence of the tert-butoxy moiety in this compound. The tert-butoxy group in BI-D hydrogen bonds with N␦-H of the His-171 side chain and the substitution of this amino acid by Thr reduces binding affinity by ϳ65-fold (44).
The ability of 5 to inhibit LEDGF/p75 binding to IN could be explained by the significant overlap in the binding interactions between 5 and LEDGF/p75 when bound to the CCD-CCD    Fig. 5B show that 5 interacts with both subunits of IN, which is consistent with the inhibitor-induced IN multimerization observed in vitro (Fig. 3, B and C). However, the overlay of binding sites of 5 and BI-D with the CCD-CCD dimer (Fig. 5C) reveals that the quinoline-based ALLINI establishes more extensive interactions with both subunits interacting deeper in the binding pocket, whereas the smaller compound 5 has relatively limited contacts with both IN subunits. This in turn could explain the differential extents of IN multimerization seen with these compounds (Fig. 3B). The robust, aberrant IN multimerization observed with BI-D could allow for the impairment of the proper maturation of virus particles, whereas the limited multimerization induced by 5 may not suffice to significantly alter IN function during virion morphogenesis, thus reducing its potency in the late stage of viral replication. Conclusion-In summary, we have conducted X-ray crystallography-based fragment screening to identify novel compounds that bind to the CCD of HIV-1 IN. From a library of 971 fragments, X-ray crystallographic screening identified only one compound, 1, which bound to a crystal contact site. Subsequent soaking of the individual mixture components at high concentrations revealed additional binding at the LEDGF/p75 binding pocket. Although 1 did not detectably inhibit HIV-1 IN activity, the availability of the crystal structure allowed the use of fragment expansion to generate chemical derivatives, including the most active compound 5, which inhibited LEDGF/p75-dependent HIV-1 integration at the IC 50 of 72 M in vitro and impaired the acute phase of HIV-1 replication with an EC 50 of 36 M. Furthermore, 5 similarly inhibited the activities of wildtype and mutant INs that confer resistance to quinoline-based ALLINIs. These findings coupled with the small size of the identified lead compound (molecular mass of 221 Da) argue strongly for the further development of 3-(1H-pyrrol-1-yl)-2thiophenecarboxylic acid derivatives as a second generation of ALLINIs.

HIV-1 IN CCD (F185K) Expression, Purification, and
Crystallization-The HIV-1 IN CCD (residues 50 -212) containing the F185K mutation was expressed and purified as described (36). The protein was concentrated to 5 mg/ml and crystallized using the hanging-drop vapor diffusion method with a crystallization buffer consisting of 100 mM sodium cacodylate pH 6.5, 100 mM ammonium sulfate, 10% (w/v) PEG 8000, and 5 mM DTT. Crystallization drops were prepared using an equal volume of protein and seed stock consisting of microseeds of HIV-1 IN CCD diluted in the crystallization buffer. Crystallization trays were prepared on ice at room temperature and then transferred to 4°C for storage.
Crystallographic Fragment Screening, Data Collection, and Refinement-Cocktails of 4 -8 compounds were used for the initial screening with single compound soaks to verify the bound fragment identity and to rule out cooperative binding. Fragments, dissolved in DMSO, were soaked into crystals at concentrations of 20, 50, and 100 mM, respectively, with a final DMSO concentration of 20% (v/v) for 2 h before flash-freezing in liquid nitrogen. X-ray diffraction data collection was performed at the Cornell High Energy Synchrotron Source (CHESS) F1 beamline and National Synchrotron Light Source (NSLS) beamline X25 and X29. The diffraction data were indexed, processed, scaled, and merged using HKL2000 (47). Structure refinement was carried out using PHENIX (1.8.2-1309) (48) and COOT (0.7-rev4459) (49) with riding hydrogens present when the X-ray diffraction resolution was Ն1.9 Å (Table 3). Coordinates and structure factors have been deposited in the PDB under the accession codes 5KRS and 5KRT.
In Vitro Activities of Compounds-Reported (18, 50 -52) HTRF-based assays were used to determine the activities of the compounds to inhibit LEDGF/p75-dependent and independent IN activity and IN binding to LEDGF/p75 as well as to induce aberrant, higher order oligomerization of IN.
HIV-1 Infection and Compound Cytotoxicity Assays-HEK293T cells were propagated in Dulbecco's modified Eagle's medium supplemented to contain 100 IU penicillin, 100 g/ml streptomycin, and 10% heat inactivated fetal bovine serum, whereas SupT1 cells were grown in similarly supplemented RPMI medium. HIV-Luc was produced by cotransfecting HEK293T cells with pNLX.Luc.R-.⌬AvrII (53) and vesicular stomatitis virus G envelope expression vector pCG-VSV-G (54) at a ratio of 9:1 using PolyJet TM transfection reagent (Signa-Gen). Virion concentration in the resulting cell supernatants was determined using a commercial p24 ELISA kit (Advanced Biosciences Laboratories). HIV-1 infection, luciferase assays, and the WST-1 cell proliferation assay to determine compound toxicity were conducted essentially as previously described (24,55). Chemical Synthesis-Compounds were prepared using standard synthetic techniques under an argon atmosphere. Purifications were carried out using silica flash column chromatography, and spectral data ( 1 H and 13 C NMR, mass spectroscopy, and IR) was obtained to confirm compound identity. Complete experimental details and copies of NMR spectra for compounds 1-8 are included in the supplemental information of this manuscript.
Docking-Three-dimensional coordinates of 5 were generated from the SMILES string using OpenBabel (56), whereas receptor coordinates were extracted from the CCD dimer structure with 8 (PDB code 5KRT). Structures were then prepared following the standard preparation protocol (57). Docking was performed using AutoDock4 with the hydrated docking protocol (46), and 100 poses were generated with the default GA search parameters. Clustering analysis (2.0 Å tolerance) resulted in a single cluster, and lowest energy pose was extracted as final result.
Molecular Dynamics (MD) and Free Energy Simulations-Before running binding free energy calculations, the proteinligand complexes were subject to several stages of equilibration-production MD simulations, starting from a docked structure for compound 5, for a total of 15 ns with gradually decreasing harmonic restraints. Then, the absolute binding free energies of 5 with respect to the wild-type, A128T, and H171T CCD dimer were calculated using the double decoupling method (58) in explicit solvent (TIP3P; Ref. 59) water model plus counter ions) at 300 K. The proteins were modeled by the Amber ff99SB-ILDN force field (60), and the compounds were described by the Amber GAFF parameters set (61). The partial charges of the ligands were obtained using the AM1-bcc method (62). For absolute binding free energy calculations, the MD simulation at each window was performed using the GROMACS (63, 64) version 4.6.4 for 15 ns; the last 10 ns were used for binding free energy calculations.