Designing a Soluble Near Full-length HIV-1 gp41 Trimer*

Background: The envelope glycoprotein gp41 is a key component of HIV-1 virus entry into host cells. Results: Structure-based mutagenesis and biochemical approaches lead to the design of soluble gp41 trimers. Conclusion: Trimers are in a prehairpin structure, similar to that observed during virus entry. Significance: The new gp41 recombinants could lead to the development of novel diagnostics, therapeutics, and vaccines. The HIV-1 envelope spike is a trimer of heterodimers composed of an external glycoprotein gp120 and a transmembrane glycoprotein gp41. gp120 initiates virus entry by binding to host receptors, whereas gp41 mediates fusion between viral and host membranes. Although the basic pathway of HIV-1 entry has been extensively studied, the detailed mechanism is still poorly understood. Design of gp41 recombinants that mimic key intermediates is essential to elucidate the mechanism as well as to develop potent therapeutics and vaccines. Here, using molecular genetics and biochemical approaches, a series of hypotheses was tested to overcome the extreme hydrophobicity of HIV-1 gp41 and design a soluble near full-length gp41 trimer. The two long heptad repeat helices HR1 and HR2 of gp41 ectodomain were mutated to disrupt intramolecular HR1-HR2 interactions but not intermolecular HR1-HR1 interactions. This resulted in reduced aggregation and improved solubility. Attachment of a 27-amino acid foldon at the C terminus and slow refolding channeled gp41 into trimers. The trimers appear to be stabilized in a prehairpin-like structure, as evident from binding of a HR2 peptide to exposed HR1 grooves, lack of binding to hexa-helical bundle-specific NC-1 mAb, and inhibition of virus neutralization by broadly neutralizing antibodies 2F5 and 4E10. Fusion to T4 small outer capsid protein, Soc, allowed display of gp41 trimers on the phage nanoparticle. These approaches for the first time led to the design of a soluble gp41 trimer containing both the fusion peptide and the cytoplasmic domain, providing insights into the mechanism of entry and development of gp41-based HIV-1 vaccines.

The HIV-1 envelope spike is a trimer of heterodimers composed of an external glycoprotein gp120 and a transmembrane glycoprotein gp41. gp120 initiates virus entry by binding to host receptors, whereas gp41 mediates fusion between viral and host membranes. Although the basic pathway of HIV-1 entry has been extensively studied, the detailed mechanism is still poorly understood. Design of gp41 recombinants that mimic key intermediates is essential to elucidate the mechanism as well as to develop potent therapeutics and vaccines. Here, using molecular genetics and biochemical approaches, a series of hypotheses was tested to overcome the extreme hydrophobicity of HIV-1 gp41 and design a soluble near full-length gp41 trimer. The two long heptad repeat helices HR1 and HR2 of gp41 ectodomain were mutated to disrupt intramolecular HR1-HR2 interactions but not intermolecular HR1-HR1 interactions. This resulted in reduced aggregation and improved solubility. Attachment of a 27-amino acid foldon at the C terminus and slow refolding channeled gp41 into trimers. The trimers appear to be stabilized in a prehairpin-like structure, as evident from binding of a HR2 peptide to exposed HR1 grooves, lack of binding to hexa-helical bundlespecific NC-1 mAb, and inhibition of virus neutralization by broadly neutralizing antibodies 2F5 and 4E10. Fusion to T4 small outer capsid protein, Soc, allowed display of gp41 trimers on the phage nanoparticle. These approaches for the first time led to the design of a soluble gp41 trimer containing both the fusion peptide and the cytoplasmic domain, providing insights into the mechanism of entry and development of gp41-based HIV-1 vaccines.
Acquired immunodeficiency syndrome (AIDS) caused by the human immunodeficiency virus type 1 (HIV-1) is a major global health epidemic. Although effective chemotherapeutics have been discovered, these inhibit virus replication after infection has already occurred (1,2). A preventative vaccine that can block HIV-1 entry at the site of infection is probably the best strategy to control the epidemic (3)(4)(5). Of the four large vaccine efficacy trials conducted in humans so far, only the RV144 trial showed a modest but significant protection (31.2%) from HIV-1 infection (6). Development of an effective HIV-1 vaccine remains as one of the biggest challenges, mainly because of the extreme genetic diversity of HIV-1 (7). Coupled with this diversity is the masking of essential epitopes by glycosylation and the extraordinary evolution of viral envelope to evade host immune responses (8). A major goal of HIV-1 vaccine development, therefore, is to understand the entry mechanism in detail and identify conserved intermediates that could serve as immunogens as well as targets for therapeutics and antibodies (Abs) that can block virus entry (4,9).
HIV-1, a "spherical" enveloped retrovirus, fuses with the plasma membrane of a host cell and delivers the mature core into the cytosol. A key component of entry is the trimeric spike embedded in the lipid bilayer of the viral envelope. It is a trimer of heterodimers, each dimer consisting of an extracellular glycoprotein gp120 and a transmembrane glycoprotein gp41 that are derived from proteolytic cleavage of the precursor protein gp160 (10). HIV-1 entry involves a series of initial interactions between the virus and host cell receptors. The virus is first captured through relatively weak interactions between gp120 and surface molecules, such as ␣ 4 ␤ 7 integrin and DC-SIGN (11)(12)(13), which then leads to high affinity interactions with CD4, the primary receptor on CD4 ϩ T cell (14). A conformational change in gp120 exposes the binding site for the chemokine co-receptor, CCR5 or CXCR4 (15). Further conformational changes lead to the opening up of the gp41 two long helices containing heptad repeat (HR) 2 sequences HR1 and HR2 and insertion of the N-terminal fusion peptide into the host cell membrane (16,17). A prehairpin intermediate, a threestranded coiled coil stabilized by inter-molecular interactions between HR1 helices, is formed (Fig. 1, A and B). gp120 subunits dissociate allowing the HR2 helices at the base of the spike to fold back and interact with the HR1 helices. The hexa-helical bundle thus formed brings the host and viral membranes in close proximity facilitating membrane fusion and release of the mature core into the cytosol (18 -20).
Understanding the structure and function of the intermediates is essential to design immunogen mimics that induce broadly neutralizing antibodies (bnAbs) against genetically diverse HIV-1 viruses (4,21,22). In fact, the conserved membrane proximal external region (MPER), which is present at the base of the spike between the HR2 helices and the transmembrane domain (Fig. 1B), consists of epitopes that are recognized by a series of bnAbs, 2F5 and 4E10 being the most well characterized among them (23)(24)(25)(26). Passive immunization with these bnAbs reduced viremia in HIV-1-infected individuals and nonhuman primates (27)(28)(29). The MPER epitopes are well exposed in the prehairpin intermediate (Fig. 1B), the most extended conformation of gp41 ectodomain, making it as a prime target for immunogen design (30 -32).
Although the crystal structure of the hexa-helical bundle intermediate (see Figs. 1B and 3A), the core of fusion-active gp41, has been determined (33), very little is known about the structure and function of the prehairpin intermediate (31,34,35). Attempts to produce any form of full-length gp41 in a soluble, trimeric state have not been successful because of the unusually high hydrophobicity of gp41 and its extreme propensity to precipitate (36). Only certain truncated or structurally constrained versions of gp41 ectodomain containing only HR1, HR2, and MPER motifs have been produced, but most of these collapse into a hexa-helical bundle conformation and induce either weak or no bnAbs (36 -40). Other components of the gp41 molecule, such as the fusion peptide and the cytoplasmic domain, might be necessary to generate a structure that mimics the native prehairpin intermediate, displaying the MPER and other functional motifs in the right conformation (41,42). However, there have been no reports of soluble, structurally defined gp41 oligomers containing the fusion peptide and/or cytoplasmic domain.
Here, we report the design of near full-length soluble gp41 recombinants containing the fusion peptide, the ectodomain, and the cytoplasmic domain. Our design includes introduction of mutations that weaken intramolecular interactions between HR1 and HR2 helices while retaining intermolecular interactions between HR1 helices. Such mutations minimized nonspecific interactions and improved the solubility of gp41. Attachment of foldon, a phage T4 trimerization tag, along with slow refolding led to folding of gp41 protein into trimers and defined oligomers. These gp41 trimers were displayed on bacteriophage T4 capsid nanoparticles by attaching to the small outer capsid protein (Soc), which also forms trimers by binding to the quasi-3-fold axes of the virus capsid (43). These gp41 recombinants potently inhibited HIV-1 virus neutralization by 2F5 and 4E10 mAbs, presumably by competing with the prehairpin structure formed during virus entry. These approaches have led to the design of soluble near full-length, gp41 trimers in a prehairpin-like structure that could be utilized to understand the mechanism of viral entry and to develop HIV-1 vaccines, diagnostics, and therapeutics.

EXPERIMENTAL PROCEDURES
Construction of the Expression Vectors-All the gp41 constructs were generated by splicing-by-overlap extension PCR using wild-type HXB2 gp41 DNA as a template (44). Mutations were introduced using primers containing the desired mutations in the nucleotide sequence. For construction of gp41 fusion recombinants, the DNA fragments corresponding to gp41, Soc, and foldon were amplified by PCR using the respective DNA templates and overlapping primers containing additional amino acids (aa) SASA as a linker between each fragment. The fragments were then stitched together, and the stitched DNA was amplified using the end primers containing unique restriction sites, XhoI or NcoI. The final PCR product was digested with XhoI and NcoI and ligated with the linearized and dephosphorylated pTriEx-4 Neo plasmid vector (Merck KGaA, Darmstadt, Germany). The recombinant DNA was transformed into Escherichia coli XL-10 Gold competent cells (Stratagene, La Jolla, CA), and miniprep plasmid DNA was prepared from individual colonies. The presence of DNA insert was identified by restriction digestion and/or amplification with insertspecific primers. The accuracy of the cloned DNA was confirmed by DNA sequencing (Davis Sequencing, Inc., Davis, CA). The plasmids were then transformed into E. coli BL21 (DE3) RIPL competent cells (Stratagene) for protein expression.
Expression and Solubility Testing of gp41 Recombinant Proteins-E. coli BL21 (DE3) RIPL cells containing gp41 clones were induced with 1 mM IPTG at 30°C for 3 h. The cells were lysed using bacterial protein extraction reagent (B-PER; Thermo Fisher Scientific Inc., Rockford, IL) and centrifuged at 12,000 ϫ g for 10 min. The soluble supernatant and insoluble pellet fractions were analyzed by SDS-PAGE. The pellets containing the insoluble inclusion bodies were treated with different denaturing reagents, SDS, urea, or guanidine hydrochloride. After centrifugation at 12,000 ϫ g for 10 min, the supernatants and pellets were analyzed by SDS-PAGE.
Purification of Recombinant Proteins-The cells after IPTG induction were harvested by centrifugation at 8200 ϫ g for 15 min at 4°C and lysed using an Aminco French press (Thermo Fisher Scientific Inc.). The inclusion bodies containing the gp41 recombinant protein were separated from the soluble fraction by centrifugation at 34,000 ϫ g for 20 min. The inclusion bodies pellet from 1 liter of culture was dissolved in 50 ml of 50 mM Tris-HCl (pH 8), 300 mM NaCl, and 20 mM imidazole buffer containing 8 M urea. After incubation at room temperature for 30 min, the sample was centrifuged at 34,000 ϫ g for 20 min to remove cell debris. The supernatant was loaded onto a HisTrap HP column (GE Healthcare) pre-equilibrated with the same buffer. The bound protein was eluted with 20 -500 mM linear imidazole gradient in the same buffer at 4°C. A slow refolding procedure (described below) was performed to refold the purified protein. The protein was further purified by Superdex 200 gel filtration chromatography (Hiload prep grade; GE Healthcare) in 20 mM Tris-HCl (pH 8) and 100 mM NaCl buffer at 4°C. For the gp41 recombinants expressed as soluble proteins, the supernatant of cell lysate was purified by HisTrap and Superdex 200 gel filtration columns. The purified proteins were stored frozen at Ϫ80°C.
Refolding of gp41 Recombinants-After purification by His-Trap chromatography in 8 M urea, the protein was refolded by slow dialysis with incrementally decreasing the urea concentration (6, 4, 2, 1, and 0.5 M or no urea). The dialysis buffer in addition contained 20 mM Tris-HCl (pH 8), 100 mM NaCl, 200 mM L-Arg, and 5 mM DTT. Protein was dialyzed for at least 8 h at 4°C before changing to another buffer with decreasing concentration of urea. At the last step, the protein was dialyzed against either 20 mM Tris-HCl (pH 8) and 100 mM NaCl buffer or PBS (pH 7.4) for 6 h, and the buffer was changed every 2 h.

SDS-Polyacrylamide Gel Electrophoresis (PAGE) and
Native PAGE-Twelve percent SDS-polyacrylamide gel (PAG) was used to determine the expression, solubility, and purification quality of gp41 recombinant proteins. The proteins were stained with Coomassie Blue R-250 (Bio-Rad). Native-PAGE (4 -20% gradient gels; Invitrogen) was used to determine the folding and oligomeric states of the recombinant proteins. The proteins were stained with Bio-safe Coomassie Stain (Bio-Rad).
Enzyme-linked Immunosorbent Assay (ELISA)-The gp41 recombinant proteins (or the control Soc protein) were coated on the microtiter plates at 0.25 g/well (coating buffer, 100 mM NaHCO 3 (pH 9.6)). Plates were incubated at 4°C overnight and blocked with 5% BSA in PBS at 37°C for 1 h. One hundred microliters of the monoclonal antibodies (mAbs) 2F5, 4E10 (Polymun), or NC-1 (AIDS Reagent Program) were added at 5-fold serial dilutions starting from 10 g/ml, and the plates were incubated at 37°C for 1 h. The HRP-conjugated anti-human IgG (for 2F5 and 4E10) or anti-mouse IgG (for NC-1) were added, and the samples were incubated further for 1 h at 37°C. The TMB Microwell Peroxidase Substrate (KPL, Inc.) was added, and the reaction was terminated by adding the TMB BlueSTOP Solution (KPL, Inc.). The absorbance at 650 nm was measured using the VersaMax microplate reader.
Peptide Binding Assay-The HR1 (N36) or HR2 (C34) peptides were added to the Soc-gp41M-Fd protein purified by His-Trap chromatography as described above. The protein was refolded by slow dialysis using a 2-kDa cut-off membrane (Amicon). An aliquot of the sample was then analyzed by native PAGE to determine the folding and oligomeric state of the protein. The rest of the sample was further dialyzed overnight against PBS using a 10-kDa cut-off membrane to remove the unbound peptide. The final sample was electrophoresed on a SDS-PAG, which separates the Soc-gp41M-Fd-peptide complex into two bands. The amount of bound peptide was quantified by laser densitometry.
Pseudovirus Neutralization Competition Assay-TZM/bl cells were used to determine HIV-1 neutralization by 2F5 and 4E10 mAbs (45,46). The mAb was titered in 3-fold serial dilutions starting at 50 g/ml in the growth medium (DMEM with 100 units/ml penicillin, 100 g/ml streptomycin, 2 mM L-glutamine (Quality Biologics Inc.), and 15% fetal calf serum (Gemini Bio-Products)). On a 96-well flat-bottom black plate, 12.5 l of the mAb at different dilutions was mixed with 12.5 l of gp41 recombinant proteins or other control competitors at a concentration of 120 nM for 2F5 neutralization and 200 nM for 4E10 neutralization. The samples were incubated for 30 min at 37°C, and 25 l of pseudovirus SF162 at a dilution optimized to yield ϳ150,000 relative luminescence units was added. The pseudovirus SF162 is a neutralization-sensitive B clade virus. It was prepared by transfecting 5 ϫ 10 6 exponentially dividing HEK 293T cells in 20 ml of growth medium (DMEM) with 8 g of env expression plasmid and 24 g of an env-deficient HIV-1 back-bone vector (pSG3⌬Env) using FuGENE as the transfection reagent (Roche Applied Science). The SF162 env plasmid was obtained from the AIDS Reagents Program. Pseudovirus-containing culture supernatants were harvested 3 days after transfection, centrifuged, titered, and stored at Ϫ80°C in 1-ml aliquots. The SF162 pseudovirus thus obtained is entry-competent but not replication-competent. Upon entry, it activates the HIV-1 Tat controlled Luciferase reporter gene.
After the addition of SF162 pseudovirus as above, the samples were incubated for an additional 30 min. TZM/bl cells (50 l; 2 ϫ 10 5 cells/ml in growth medium containing 60 g/ml DEAE-dextran) were added to each well. Each plate included wells with cells and pseudovirus (virus control) or cells alone (background control). The assay was also performed by omitting the first incubation of gp41 with 2F5 or 4E10. The plates were incubated for 48 h, and then 100 l/well reconstituted Brite Lite Plus (PerkinElmer Life Sciences) was added. The relative luminescence units were measured using a Victor 2 luminometer (PerkinElmer Life Sciences). The percent inhibition due to the presence of the mAb was calculated by comparing relative luminescence values from wells containing mAb to well with virus control. IC 50 was calculated for each mAb alone and mAb pre-mixed with gp41 recombinant proteins or other control competitors. Two independent assays were performed, and the results were averaged (45,46).
In Vitro Display of Soc-gp41 Trimers on Phage T4 Capsidhoc Ϫ soc Ϫ phage was purified by velocity sucrose gradient centrifugation. About 2 ϫ 10 10 pfu of purified hoc Ϫ soc Ϫ phage were centrifuged in 1.5 ml of LoBind Eppendorf tubes at 18,000 ϫ g, 4°C for 45 min. The pellets were resuspended in 10 l of PBS buffer. Purified Soc-gp41 fusion proteins were added at the desired concentration, and the reaction mixture (100 l) was incubated at 4°C for 45 min. Phage was sedimented by centrifugation as described above, and the pellets were washed twice with 1 ml of PBS and resuspended in 10 -20 l of the same buffer. The sample was transferred to a fresh Eppendorf tube and analyzed by SDS-PAGE. The density volumes of bound and unbound proteins were determined by laser densitometry. The copy number of displayed gp41 was calculated in reference to the known copy number of the major capsid protein gp23* (930 copies per phage) (* represents the cleaved form of the major capsid protein gp23) or the tail sheath protein gp18 (138 copies per phage) in the respective lane. The data were plotted as one site saturation ligand binding curve and fitted by non-linear regression using the SigmaPlot10.0 software.

RESULTS
gp41 Recombinant Design-The design of gp41 recombinant proteins has been extremely difficult for several reasons. First, gp41 structure is stabilized by interactions with gp120 in the native envelope trimer (47). Separation from gp120 led to exposure of highly hydrophobic regions such as fusion peptide, HR1 and HR2 helices, and MPER (Fig. 1A). Nonspecific high avidity interactions between these regions during heterologous protein expression led to aggregation of nascent polypeptide chains. Second, a series of interacting residues (hydrophobic and charged) of HR1 and HR2 helices (see Fig. 3A) favor combinatorial, rather than unique, interactions among the polypep-tide chains (33). Third, gp41 contains four cysteines (Fig. 1A) that can form nonspecific cross-links, especially in an aggregated state where the tightly packed polypeptide chains exclude water. We hypothesized that these problems can be addressed by rational modification of gp41 sequence and structure by (i) introduction of mutations, (ii) attachment of tags, and (iii) controlling folding kinetics (Fig. 1C).
Mutations-Introduction of mutations that disrupt intramolecular HR1-HR2 interactions should disfavor the formation of hexa-helical bundle and stabilize gp41 in a prehairpin intermediate structure, where the chains would be held by intermolecular HR1-HR1 interactions and the MPER epitopes would be better exposed (30,31).
Tags-Attachment of a trimerization tag such as the phage T4 foldon might help nucleate gp41 folding into a trimer (48). Fusion to Soc, which forms a trimer on T4 capsid, would display gp41 trimers on the phage nanoparticle (43,49).
Folding-Controlling the kinetics of gp41 folding by varying the protein concentration, denaturants, and reducing conditions could channel the folding pathway toward trimer assembly.
Deletion of Immunodominant (ID) Region-We hypothesized that deletion of part of the apical loop between HR1 and HR2 helices ( Fig. 2A; aa Gln-577-Thr-605) will have important consequences for gp41 recombinant design. (i) This sequence was reported to consist of ID epitopes (50 -52). Although strong Ab responses are directed toward this region, these Abs do not neutralize the virus. On the other hand, they might enhance HIV-1 infection through a complement-mediated mechanism (53,54). Deletion of this region, therefore, could improve the immunogenicity of gp41 by diverting the Ab responses to the relatively poorly immunogenic MPER epitopes (32). (ii) Because this sequence consists of two cysteine residues (Cys-598 and Cys-604), their deletion would minimize disulfide cross-linking and insolubilization. (iii) Deletion of 24 of the 46 aa of the loop would favor the tri-helical prehairpin structure rather than the hexa-helical bundle that requires kinking of the intervening loop (Fig. 1B).
Two near full-length recombinant gp41 proteins were constructed, one with the ID sequence (Soc-gp41) and another without it (Soc-gp41⌬ID), containing the fusion peptide, the ectodomain, and the cytoplasmic domain but not the 22-aa transmembrane domain (Leu-684 to Val-705) ( Fig. 2A) (transmembrane domain was found to be toxic to E. coli; data not shown). Soc fusions with a 4-aa flexible linker (SASA) in between Soc and gp41 were used in these experiments, because the constructs are eventually displayed on T4 phage (see below). Both Soc-gp41 and Soc-gp41⌬ID recombinant proteins were overexpressed in E. coli (ϳ20% of total cell protein) (Fig. 2, B and C, lanes 3) and as predicted partitioned into insoluble inclusion bodies (Fig. 2, B and C; compare lanes 4 of soluble fraction with lanes 5 of insoluble fraction). However, they exhibited distinct solubilization behavior (Fig. 2D); Soc-gp41 could not be solubilized either with 8 M urea or 6 M guanidine hydrochloride (Fig. 2D, upper panel, lanes 4 and 6), whereas Soc-gp41⌬ID was nearly completely solubilized under the same conditions (Fig. 2D, lower panel, lanes 4 and 6) and could be purified to near homogeneity by HisTrap affinity chromatography (Fig. 2E). Furthermore, the concentration of urea could be reduced to 2 M, and the protein remained in solution. However, precipitation occurred when the urea concentration was further reduced. On the other hand, the Soc-gp41 protein required SDS, a strong ionic detergent, for solubilization. Even with SDS, only partial solubilization was achieved (Fig. 2D, upper panel, lane 2), and SDS was required throughout purification to maintain solubility.
Mutations in HR1 and HR2 Helices-A series of interactions between HR1 and HR2 helices is central to the assembly of a trimeric envelope structure, and these interactions dynamically change during membrane fusion and virus entry (18,33,55) (Figs. 1B and 3A). These include intermolecular interactions between the HR1 helices leading to trimerization and intramolecular interactions by the looping back of HR2 helices into the hydrophobic grooves between two HR1 helices (Figs. 1B and 3A) (33). We hypothesized that destabilization of the intramolecular interactions would reduce nonspecific aggregation, but importantly, it would favor the tri-helical prehairpin structure, not the hexa-helical bundle because a combination of shortened apical loop and mutations would make it energetically unfavorable.
From the crystal structure of gp41 hexa-helical bundle (Fig.  3A) (33), we identified the interactions that if mutated would weaken the HR1-HR2 interactions but not the HR1-HR1 interactions. For instance, mutation of Arg-557 to Glu would change the electrostatic attraction between Arg-557 and Glu-648 to electrostatic repulsion (Fig. 3B), and introduction of Glu at Leu- 568 would disrupt the hydrophobic interactions between Leu-568 and Ile-635 and at the same time create electrostatic repulsion with Glu-634 (Fig. 3C). Using these principles, six mutant clones were constructed in the background of Soc-gp41⌬ID, and their solubility was compared (Fig. 3D). All the mutants overexpressed gp41, but the Mutant 5 (R557E,L565R, L568E,I635E,L645E) gave the best results, expressing the protein in soluble form (about 40% of the expressed protein was in the soluble fraction; lane 3, marked with an arrow). Hence this construct, namely Soc-gp41 mutant (Soc-gp41M), was selected for further design.
Attempts to purify Soc-gp41M protein from cell lysate, however, were not successful, as it did not bind to HisTrap column probably because the protein was misfolded and the histidine tag was buried in the structure. On the other hand, the 8 M urea-solubilized protein bound to the column efficiently and could be purified to Ͼ95% purity (Fig. 3E). The protein remained soluble upon "fast" dialysis against PBS (one-step transition from 8 M urea to PBS), but the resultant protein behaved as a very high molecular weight species by size exclusion gel filtration chromatography (Fig. 3F, blue curve). Also, it migrated as a smear by native PAGE (Fig. 3G, lane 1) suggesting that the mutant protein, even though soluble, formed heterodisperse aggregates but not defined oligomers.
Slow Refolding-We then hypothesized that the folding kinetics of the extremely hydrophobic gp41 must be controlled to channel the process toward the correct folding and oligomerization pathway. A number of variables including protein concentration, pH, reducing agents, L-arginine, and "slow" dialysis (see "Experimental Procedures") were optimized to control folding kinetics using native PAGE as an assay (L-arginine suppresses protein aggregation and enhances refolding (56)). Misfolded and aggregated protein would not enter the native gel or migrate as a smear, whereas the folded species would show distinct bands.
Data from a large series of experiments showed that slow dialysis against Tris-HCl buffer (pH 8.0 -9.0), protein concentration between 0.25 and 1 mg/ml, 5 mM DTT, and 200 mM L-arginine gave the best results. The gel filtration elution profile of the refolded gp41 under these conditions showed a shift from large aggregates (void volume; Fig. 3F, blue curve) to oligomers (Fig. 3F, pink curve). Native PAGE showed that a portion of gp41 folded into defined oligomers as evident by the appearance of a ladder of bands (Fig. 3G, lane 2, indicated by arrows). However, most of Soc-gp41M still remained as soluble aggregates and stayed near the well (see Fig. 3G, lane 2).
Trimerization Using Foldon Tag-Foldon, a 27-aa trimerization domain of T4 fibritin, has been extensively used to trimerize foreign domains and proteins (31,48). We hypothesized that attaching the foldon sequence to gp41 might nucleate trimerization of gp41 at the initial step of the folding pathway. We constructed Soc-gp41M-Fd as well as Soc-gp41ectoM-Fd in which the cytoplasmic domain was deleted (Fig. 4A). Both the proteins were overexpressed and purified. The results showed that foldon, as predicted, dramatically altered the folding and oligomeric states of gp41, producing trimers and higher order oligomers, and the solubility was also further improved. The Soc-gp41M-Fd protein purified from either the soluble fraction (ϳ500 g/liter culture, Fig. 4B, lane 2) or the insoluble fraction (ϳ20 mg/liter culture, Fig. 4B, lane 3) behaved similarly, producing trimers and defined oligomers (Fig. 4C, lanes 1  and 2). That the lowermost band in the ladder is a trimer was determined by the elution volume (Fig. 4D) of this species in comparison with the elution volumes of a series of known standard proteins used to calibrate the gel filtration column. The next higher oligomer band in the ladder was determined to be a hexamer. Indeed, unlike the Soc-gp41M, which produced mostly aggregates, essentially all the foldon-attached Soc-gp41M-Fd and Soc-gp41ectoM-Fd proteins were recovered as trimers and oligomers (Fig. 4C, lanes 1-3). The gp41 trimers and oligomers could be separated on a size exclusion column (Fig. 4, D and E). Indeed, fractions containing mostly trimers could be purified by this method. The distribution of the oligomers did not, however, change by a second-round gel filtration of trimer fractions, suggesting that the gp41 subunit interactions are of high avidity and not in a dynamic equilibrium. We speculate that the basic gp41 oligomer unit is a trimer. Hexamers (and higher order oligomers) are most likely dimers (or multimers) of trimers formed by (nonspecific) interactions between gp41 trimers. Although both Soc-gp41M-Fd and Soc-gp41ectoM-Fd gave similar oligomerization patterns (Fig. 4, D and E), we found that a greater fraction of the near full-length gp41 oligomerized into trimers than that of the ectodomain construct (Fig. 4C, compare lanes 2 (Soc-gp41M-Fd) and 3 (Soc-gp41ectoM-Fd); compare Fig. 4, D (Soc-gp41M-Fd), and E (Soc-gp41ectoM-Fd)), suggesting that the bulky cytodomain might have stabilized trimers, probably by restricting trimertrimer interactions.
gp41 Trimers Have a Prehairpin-like Structure-For the reasons described above, the gp41M-Fd mutants are predicted to be stabilized in a prehairpin structure. We tested this prediction by two approaches. First, if the gp41M-Fd construct has a prehairpin-like structure, it should not be recognized by the NC-1 mAb, which is raised against, and specific to, the hexahelical bundle structure (40,57). However, NC-1 mAb also binds to the trimer of HR1 peptide probably because its structure is similar to that of the HR1 trimer within the hexa-helical bundle (58). Our ELISA data showed that both the Soc-gp41M-Fd and Soc-gp41ectoM-Fd proteins do not bind to NC-1 mAb, whereas the Soc-gp41 and Soc-gp41⌬ID proteins that lacked the mutations bound strongly (Fig. 5A). These data indicate that the structure of Soc-gp41M-Fd trimer is distinct from that of the classic hexa-helical bundle, probably prehairpin-like, consistent with the recent evidence that the HR1 helices are less tightly packed in the pre-fusion state (55). However, because the specific epitope sequence recognized by the NC-1 is unknown, the possibility that the mutations in Soc-gp41M-Fd also affected NC-1 mAb binding cannot be ruled out.
Second, in a prehairpin structure, the groove between HR1 helices would be well exposed. Hence, an externally added HR2 peptide should be able to interact with the groove (16,33). To test this hypothesis, a 34-aa HR2 peptide (C34, 4 kDa) was added to Soc-gp41M-Fd, and the unbound peptide was removed by extensive dialysis using a 10-kDa cut-off membrane. If gp41 trimer is in prehairpin state, it would capture the C34 peptide and form a gp41-C34 complex. The results demonstrated that the C34 peptide was retained with gp41 (Fig. 5B,  lane 1). In fact, the ratio of gp41 to C34 in the complex remained the same whether the molar amount of C34 used was 2 times that of gp41 (Fig. 5B, lane 1) or 20 times that of gp41 (Fig. 5B, lane 2). On the other hand, the addition of a 36-aa HR1 (N36) peptide resulted in the precipitation of gp41 probably due to uncontrolled HR1-HR1 interactions. Secondly, the folding pattern of gp41 was unaffected by C34 (Fig. 5C, compare lane 1 without C34 to lane 2 with C34), which means that the conformation of gp41 with and without C34 binding was the same. Because C34 binding to HR1 is expected to occur only in the prehairpin conformation, it can be inferred that gp41 folded into the same conformation even in the absence of C34.

Neutralizing MPER Epitopes Are Well Exposed in gp41
Trimers-The bnAbs 2F5 and 4E10 bind to the conserved MPER eiptopes of gp41 and block HIV-1 entry, presumably by arresting fusion at the prehairpin stage where the epitopes would be well exposed and the ectodomain is most extended (31,34,35) (see Fig. 1B). Consistent with this hypothesis, these mAbs have the highest affinity to the prehairpin gp41 intermediate (30,31). If the trimeric Soc-gp41M-Fd and Soc-gp41ectoM-Fd have a prehairpin structure, they should bind to 2F5 and 4E10 mAbs at high affinity and inhibit their ability to neutralize HIV-1 infection. ELISA data showed that both of the constructs bound to 2F5 and 4E10 mAbs strongly (Fig. 6, A and  B). To test if this interaction can titrate out the mAbs and block their ability to neutralize HIV-1, a virus neutralization competition assay was performed. Soc-gp41M-Fd and Soc-gp41ectoM-Fd were added to the TZM/bl pseudovirus neutralization reaction mixture at varying molar ratios of gp41 to mAb, and the amounts of Abs for 50% virus neutralization inhibition (IC 50 ) were determined. The data demonstrated that both the constructs potently inhibited virus neutralization (Fig. 6, C and  D). gp41 concentration as low as 120 nM was sufficient to compete with the virus for binding to 2F5 and 4E10, causing a 7-10fold raise in IC 50 values (Fig. 6, C and D). At a 1:1 molar ratio of gp41 to mAb, 45-76% inhibition of virus neutralization was observed. The near full-length gp41 showed slightly higher  Binding to NC-1 mAb was tested as described under "Experimental Procedures". B and C, binding of HR2 peptide to Soc-gp41M-Fd is shown. B, SDS-PAG (12%) shows the HR2 peptide C34 bound to Soc-gp41M-Fd. The C34 peptide was added to Soc-gp41M-Fd at a molar ratio of 2 or 20 times C34 to gp41 molecules, and gp41 was refolded according to the procedure described under "Experimental Procedures." The unbound peptide was removed by extensive dialysis using a 10-kDa cut-off membrane. Lane 3, 0.4 g of C34 peptide used as size standard. C, native PAG (4 -20% gradient) shows the oligomeric state of Soc-gp41M-Fd with or without the addition of C34 peptide (1:20 molar ratio of Soc-gp41M-Fd to C34). The samples were electrophoresed before removing excess C34 by dialysis. Lane 3, 3 g of C34 peptide was used as size standard. The NC-1 mAb and C34 peptide were provided by the AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, National Institutes of Health.  Inhibition of virus neutralization by gp41 trimers. A and B, the Soc-gp41M-Fd and Soc-gp41ectoM-Fd bind to 2F5 (A) and 4E10 (B) mAbs. The microtiter plates were coated with Soc-gp41M-Fd, Soc-gp41ectoM-Fd, or Soc control proteins. Binding to 2F5 and 4E10 mAbs was determined by ELISA as described under "Experimental Procedures." C and D, virus neutralization was determined by the TZM/bl assay (45,46). Serial dilutions of purified 2F5 (C) or 4E10 (D) IgG were added to 96-well plates. gp41 trimers or other control competitors were added to the mAb and incubated for 30 min at 37°C. SF162 virus was added to the plate and incubated for 30 min at 37°C followed by the addition of TZM/bl cells. After incubation for 48 h at 37°C, the cells were lysed, and concentration of half-maximal inhibition (IC 50 ) was calculated from the luciferase activities determined by luminescence measurements. The sequence of MPER peptide is LELDKWASLWNWFNITNWLWYIK(amide), and that of MPER scrambled peptide is LSINEAFKWLDWWTLNDLWYIWK(amide). Soc-cyto is the fusion of cytoplasmic domain of gp41 to the C terminus of Soc. The protein was over-expressed and purified from E. coli after 8 M urea denaturation followed by refolding as described under "Experimental Procedures." inhibition than the ectodomain construct. No significant difference was observed whether gp41 was preincubated with the mAb or added directly to the neutralization mixture. Validating these results, the 23-aa MPER linear peptide, but not the scrambled MPER peptide, inhibited 2F5 neutralization (Fig. 6, C and  D). Also, the MPER peptide did not affect 4E10 neutralization, consistent with the fact that the 4E10 mAb recognizes a conformational epitope. Neither the gp41 cytoplasmic domain (Soc-cyto) nor Soc controls showed significant inhibition, attesting to the specificity of gp41-2F5/4E10 interactions. These results further support that the trimeric gp41M-Fd constructs are stabilized in a prehairpin structure exposing the MPER neutralization epitopes in a functionally relevant conformation.

Display of gp41 Trimers on the Bacteriophage T4
Nanoparticle-870 copies of a small outer capsid protein, Soc (9 kDa), decorate the surface of T4 capsid (43,59). Soc is a monomer in solution but trimerizes upon binding to capsid at the quasi-3-fold axes (Fig. 7A). Each Soc molecule binds to two gp23* major capsid protein subunits clamping adjacent capsomers and reinforcing the capsid structure. Both the C and N termini are exposed on the capsid surface, with the C termini at the quais-3-fold axes and the N termini at the quasi-2-fold axes (Fig. 7A). We hypothesized that by fusing gp41 to the C terminus of Soc and displaying it on T4, the trimeric gp41 would be stably displayed at the 3-fold axes of the phage capsid. Such particles with arrays of gp41 trimers would allow structurefunction studies as well as enhance immunogenicity (64). The  (43,59) is shown. The structure of Soc trimer at quasi-3-fold axes is shown. The N and C termini are labeled, and the C-terminus of each subunit is shown as a red dot. B, binding of Soc-gp41M-Fd on phage T4 is shown. About 2 ϫ 10 10 hoc Ϫ soc Ϫ phage particles were incubated with increasing ratios of Soc-gp41M-Fd molecules to capsid binding sites (1:1 to 40:1, labeled at the top), and assembly was carried out as described under "Experimental Procedures." Lanes: 1, control hoc Ϫ soc Ϫ phage; 2, 4, 6, 8, 10, and 12, phage displaying the bound fusion protein Soc-gp41M-Fd (B) ; 3, 5, 7, 9, 11, and 13, unbound protein in the supernatant (U). The position of the major capsid protein gp23* is marked with a black arrow. C, binding of Soc-gp41ectoM-Fd on phage T4 at a Soc-fusion protein to capsid binding sites ratio of 20:1 is shown. The bound Soc-gp41ectoM-Fd protein was indicated with a red arrow (lane 2). D, binding of CPP-Soc-gp41M-Fd on phage T4 at a Soc-fusion protein to capsid binding sites ratio of 20:1 is shown. Note that the 49-kDa CPP-Soc-gp41M-Fd protein migrates to the same position as the 48.7-kDa gp23* (indicated with a red arrow). E, the saturation binding curve of Soc-gp41M-Fd is shown. The density volumes of bound and unbound proteins from SDS-PAG (12%) were determined by laser densitometry and normalized to that of gp23* present in the respective lane. The copy numbers were determined in reference to gp23* (930 copies per capsid). The data were plotted as one site saturation ligand binding curve and fitted by non-linear regression using the SigmaPlot10.0 software, and the calculated binding parameters are shown. K d , apparent binding constant; B max , maximum copy number per phage particle. F, shown are the binding parameters of Soc and Soc-gp41 fusion recombinants. Because the CPP-Soc-gp41M-Fd band overlapped with the gp23* band, gp23* density was subtracted, and the copy number was determined in reference to the tail sheath protein, gp18 (138 copies per phage; marked with a black arrow in panel D, lane 2). gp41 trimers assembled on hoc Ϫ soc Ϫ capsids nearly as efficiently as native Soc (Fig. 7, B-F). Soc-gp41M-Fd binding increased with increasing ratios of Soc-gp41 molecules to capsid binding sites, reaching saturation at a ratio of ϳ20:1 (Fig.  7B). The apparent association constant (K d ) calculated from the saturation binding curve (Fig. 7E) was 121 nM, and the maximum copy number of bound gp41 (B max ) was about 859 per capsid, which is close to the copy number of 870 when all the Soc binding sites are occupied. Similar binding behavior as well as K d and B max values was observed for Soc-gp41ectoM-Fd (Fig.  7, C and F).
To further improve the gp41 nanoparticle design, a 13-aa cell penetration peptide (CPP), CPP-Tat (PGRKKRRQRRPPQ), was attached to the N terminus of Soc-gp41. CPPs are 10 -30-aa peptides rich in basic aa that facilitate passage of attached cargo molecules across the cell membrane (60). The CPP-Tat derived from HIV-1 trans-activator protein, TAT, is one of the most efficient CPPs (60). Our recent experiments show that T4 particles displaying targeting molecules attached to Soc are taken up by cells at high efficiency. 3 CPP-Soc-gp41M-Fd could be over-expressed, purified, and bound to T4 capsid efficiently, and the binding parameters are also similar (Fig. 7, D and F). Thus, CPP or another molecule such as the CD40 ligand (61) can be oriented at the quasi-2-fold axes for targeting of the nanoparticle to antigen presenting cells such as the dendritic cells.

DISCUSSION
Although the key interactions between HIV-1 and host cell have been well established, the extraordinary genetic diversity of viral envelope and masking of essential epitopes by glycosylation made it difficult to design recombinants that can induce protective immune responses (62,63). However, the HIV-1 virus, like many type 1 fusion viruses, undergoes dynamic transitions during entry, exposing some of the vulnerable sites on the cell surface making them accessible to therapeutics and neutralizing Abs. The prehairpin intermediate is one such target because it is relatively stable with a half-life on the order of several minutes (19) and its ectodomain most extended and the conserved neutralization epitopes most exposed (Fig. 1B) (30,31,62,63). Indeed, Enfuvirtide, a potent 20-aa entry inhibitor approved for clinical use (64), and a series of bnAbs, such as 2F5 and 4E10, arrest virus entry by binding to this intermediate. Design of gp41 recombinants stabilized in a prehairpin structure, therefore, will have important implications for understanding the mechanism as well as for development of effective therapeutics and vaccines.
The extremely hydrophobic gp41 is notoriously prone to aggregation, and attempts to produce soluble gp41 have not been successful (36). Previous studies could only produce short truncated parts of the gp41 ectodomain, most containing only the HR1 and HR2 helices (31,38,40,65). These and other synthetic peptide mimics could not elicit potent bnAbs, leading to the hypothesis that other gp41 structural and functional motifs might be essential to mimic the true prehairpin confor-mation (see Fig. 1B) (41,42). These might include, in addition to HR1/HR2 helices and MPER, the fusion peptide at the N terminus and the cytoplasmic domain at the C terminus, but none of the gp41 recombinants produced so far included these highly hydrophobic regions.
We hypothesized that three key problems should be addressed to generate a soluble trimeric gp41 stabilized in a prehairpin structure (Fig. 1C). First, the intermolecular interactions between HR1 and HR2 helices that lead to hexa-helical bundle formation as well as nonspecific aggregation should be disrupted to stabilize the molecule in a three-stranded coil. This we achieved by deleting part of the apical loop and the five C-terminal aa of HR1 helix as well as converting some of the complementary charge-charge and hydrophobic interactions into electrostatic repulsion, leaving intact the MPER epitope residues. These modifications greatly enhanced the solubility of gp41 but only a small fraction of the protein oligomerized into trimers (Fig. 8). Attachment of a foldon tag that has strong propensity to trimerize was necessary to trimerize gp41. Presumably, the foldon helped nucleate gp41 folding and assembly into a trimer. Because the tag is present at the C-terminal end, trimerization was probably initiated at this end and propagated through the rest of the molecule leading to folding of the protein into a three-stranded coiled coil through the strong HR1-HR1 interactions. Kinetically slowing down this process at relatively low protein concentration was also necessary; otherwise, nonspecific inter-chain interactions presumably channeled the protein into abortive folding pathways leading to rapid and uncontrolled aggregation.
Although our approaches yielded predicted outcomes (Fig.  8), each approach by itself was insufficient to produce gp41 trimers. For instance, introduction of mutations greatly improved solubility, but the protein chains still coalesced into aggregates because folding was not trimer-directed. Both trimerization tag attachment and slow refolding were necessary to correct this problem. Although hexamers and higher order oligomers were produced in addition to trimers, the core structure of all the oligomers appears to be a trimer, and the higher order oligomers are probably multimers of trimers formed by nonspecific interactions between trimers. This is not unexpected because several hydrophobic patches would be exposed in the gp41 ectodomain, which would otherwise be stabilized by interactions with the gp120 domains in the native spike. These would lead to multimerization of trimers, a commonly observed phenomenon even with the gp140 trimers produced by heterologous expression systems where only short regions of the gp41 ectodomain are exposed.
Evidence indicates that the gp41 trimers have a structure mimicking the prehairpin intermediate in which the external grooves of the three-stranded HR1 helices were not occupied by HR2 helices (16,33). Consistent with this prediction, a 34-aa HR2 peptide bound to the gp41 trimers, and the oligomerization pattern was identical with or without the peptide. The gp41 trimers did not bind to NC-1 mAb that is specific to hexahelical bundle structure but bound strongly to bnAbs 2F5 and 4E10 that have the highest affinity to the prehairpin structure (30,31). The trimers potently inhibited 2F5 and 4E10 virus neutralization even at an equimolar ratio of gp41 to mAb and in the presence of excess virus, suggesting that the MPER epitopes are well exposed, as would be expected in a prehairpin intermediate (30,31).
The potential use of gp41 trimer as an immunogen can be further enhanced by linking the recombinants to a robust platform that can induce strong immune responses. The bacteriophage T4 display provides a simple yet powerful strategy to convert soluble antigens into nano-particulate antigens by attaching Soc to one end of the antigen (66,67). We previously showed that such nanoparticles displaying HIV-1 Gag p24 and other antigens induced strong Ab as well as cellular responses (67,68). Attachment of Soc to the N terminus did not interfere with the folding or trimerization of gp41 nor did it affect binding to T4 capsid. Indeed, the Soc binding sites on the capsid were nearly saturated, resulting in the decoration of T4 phage with ϳ290 trimers of gp41. Because Soc C termini are projected outward at the quasi-3-fold axes (Fig. 7A) (43), the C-terminally attached gp41 trimers would be extending away from the capsid surface (Fig. 8), thereby exposing the MPER epitopes for capture by antigen presenting cells. Furthermore, we show that additional targeting ligands, such as CPP, can be incorporated into the displayed gp41 to enhance the uptake of the T4-gp41 particles and potentially induce robust immune responses.
In conclusion, using molecular genetics and biochemical approaches, a series of hypotheses were tested (Fig. 8), leading to the generation of soluble near full-length gp41 trimers containing the fusion peptide, the ectodomain, and the cytoplasmic domain as well as the same arrayed on phage nanoparticles. These, for the first time, allow structure determination of this critical intermediate, screening for novel therapeutics, development of new diagnostics, and design of gp41-based HIV-1 vaccines. For instance, the gp41 trimers could be used for screening peptides that exhibit high affinity to prehairpin structure and effectively block HIV-1 infection. The soluble near fulllength gp41 might be an attractive candidate for detection of gp41 Abs in HIV infected individuals. The recent RV144 trial showed a correlation between protection against HIV-1 infection and generation of Abs to the gp120 variable loop V2 (6,69). The near full-length gp41 trimers can be used in conjunction with gp120 to further improve the immunogenicity of the vaccine to induce binding and neutralizing Abs as well as cellular responses. FIGURE 8. Recombinant designs leading to soluble and nanoparticle arrayed gp41 trimers. A flow chart shows a series of approaches to generate soluble as well as phage T4 nanoparticle arrayed gp41 trimers. Schematic diagrams of soluble and displayed trimers are shown at the bottom. The trimers are stabilized in a prehairpin-like structure in which the HR1 helical grooves and MPER epitopes would be well exposed. Shown on the right is an enlarged cut-out of the capsid decorated with gp41 trimers. See "Results" and "Experimental Procedures" for additional details.