Fully hydrophobic HIV gp41 adopts a hemifusion-like conformation in phospholipid bilayers

The HIV envelope glycoprotein mediates virus entry into target cells by fusing the virus lipid envelope with the cell membrane. This process requires large-scale conformational changes of the fusion protein gp41. Current understanding of the mechanisms with which gp41 induces membrane merger is limited by the fact that the hydrophobic N-terminal fusion peptide (FP) and C-terminal transmembrane domain (TMD) of the protein are challenging to characterize structurally in the lipid bilayer. Here we have expressed a gp41 construct that contains both termini, including the FP, the fusion peptide–proximal region (FPPR), the membrane-proximal external region (MPER), and the TMD. These hydrophobic domains are linked together by a shortened water-soluble ectodomain. We reconstituted this “short NC” gp41 into a virus-mimetic lipid membrane and conducted solid-state NMR experiments to probe the membrane-bound conformation and topology of the protein. 13C chemical shifts indicate that the C-terminal MPER-TMD is predominantly α-helical, whereas the N-terminal FP-FPPR exhibits β-sheet character. Water and lipid 1H polarization transfer to the protein revealed that the TMD is well-inserted into the lipid bilayer, whereas the FPPR and MPER are exposed to the membrane surface. Importantly, correlation signals between the FP-FPPR and the MPER are observed, providing evidence that the ectodomain is sufficiently collapsed to bring the N- and C-terminal hydrophobic domains into close proximity. These results support a hemifusion-like model of the short NC gp41 in which the ectodomain forms a partially folded hairpin that places the FPPR and MPER on the opposing surfaces of two lipid membranes.

HIV-1 enters sensitive cells using its envelope glycoprotein complex, gp160. The protein is biosynthesized as a homotrimer and is transported to the cell surface after proteolytic cleavage into two subunits, gp120 and gp41 (1,2). Receptor binding and pH changes trigger a cascade of protein conformational changes that lead to the merger of the virus lipid envelope with the cell membrane (3). The first change involves unfolding of gp41 to expose and insert an N-terminal fusion peptide (FP) 4 into the host cell membrane while the C-terminal TMD remains anchored in the virus envelope. This extended intermediate then bends into a helical hairpin, forming a trimer of hairpins that brings the cell membrane and the viral envelope into proximity (4,5). Subsequently, the membrane-interacting FP and TMD disrupt the two bilayers in ways that are not yet well-understood, which merge the outer leaflet and inner leaflet of the two lipid bilayers progressively, eventually forming a fusion pore.
The trimer of hairpins represents a six-helix bundle (6HB), which is diagnostic of the end point of virus-cell fusion and which has been observed for the ectodomain of a number of viral fusion proteins (4,5). For gp41, the interior of the 6HB is composed of the NHR domain assembled as a parallel coiled coil, whereas the exterior of the 6HB is composed of the CHR, antiparallel to the NHR (6,7). However, to date, no crystal structures of the post-fusion gp41 include the membraneactive TMD and FP domains. Instead, these hydrophobic domains are either removed from the protein construct to enable crystallization, replaced by a different anchor, or retained in the protein but not detected due to disorder. Therefore, it is not known whether the 6HB of the ectodomain continues into the lipid membrane to put the FP and TMD in close proximity. This question is further amplified by the fact that gp41, like some other viral fusion proteins, contains a membrane-proximal external region (MPER) N-terminal to the TMD. This MPER has the propensity to lie on the membrane surface (8,9), but how it packs with the FP and TMD in the post-fusion state is unknown.
Extensive NMR studies of the conformations of the FP and TMD of viral fusion proteins have shown that these two domains are not always ␣-helical in biologically relevant lipid bilayers. For example, in cholesterol-containing lipid mem-branes, the gp41 FP assembles into antiparallel ␤-sheets that are shallowly inserted into the membrane (10 -12). This conformation is seen both in the presence and in the absence of the ectodomain (11,13). In comparison, the TMD is stably ␣-helical in lipid bicelles (14,15) and in lipid bilayers, but the oligomeric state is sensitive to the membrane environment. In cholesterol-containing lipid bilayers, the MPER-TMD forms a trimeric umbrella-like structure in which the helical TMD spans the bilayer, whereas the helical MPER lies on the membrane surface (8). In comparison, the TMD of the parainfluenza virus 5 (PIV5) fusion protein F is conformationally plastic even in phospholipid bilayers. In negativecurvature phosphatidylethanolamine membranes, the two ends of the PIV5 TMD adopt the ␤-strand conformation, but in lamellar phosphatidylcholine membranes, the entire TMD is ␣-helical (16). Given the ability of the FP and TMD of viral fusion proteins to adopt ␤-sheet conformations under certain conditions, it is important to determine the membrane-bound structure of these two domains in the presence of each other and in the presence of the ectodomain.
In addition to the conformation of each hydrophobic domain, the three-dimensional fold of full-length gp41 and how it changes along the fusion pathway are still poorly understood. Due to the difficulty of producing full-length gp41 and other viral fusion proteins with sufficiently high order for crystal structure determination, or in sufficient quantity for NMR studies in phospholipid bilayers, most studies of viral fusion proteins have so far involved truncating the protein and using simple membrane mimetics, such as micelles and bicelles. Because the very function of viral fusion proteins requires protein conformational changes and membrane curvature changes, truncated protein constructs that deviate from the ability of the full-length protein to undergo conformational changes and membrane mimetic solvents that do not fully reproduce the bilayer curvature can confound mechanistic understanding of virus-cell fusion.
Recently, several studies have been reported in which both the ectodomain and the membrane-interacting domains of gp41 were included. A solution NMR study of full-length gp41 bound to DPC micelles showed ␣-helical structures for the N-terminal FP, FPPR, NHR, and the immunodominant loop. In contrast, no signals were detected for the C-terminal CHR, MPER, and TMD (17), indicating that the C-terminal half of the protein undergoes slow-or intermediate-timescale motion. Analytical ultracentrifugation data of a nearly full-length construct that lacks only the FP showed that the protein is trimeric in DPC micelles, suggesting that the NHR may be the trimerization core of gp41 (18). When both the N-terminal FP-FPPR and the C-terminal MPER-TMD were removed, the remaining NHR-CHR protein was monomeric, indicating that the hydrophobic domains are required for trimerization (19,20). Although these studies approach full-length gp41, the minimum structural requirement for trimerization and the threedimensional fold of the full-length protein are still inconclusive.
In the present study, we investigate the structure of a gp41 construct that contains both the N-and C-terminal hydrophobic domains. The protein includes the FP-FPPR, the MPER-TMD, and an intervening shortened NHR-CHR segment, designed to simplify spectral assignment and analysis (Table 1).
We incorporated this "short NC" gp41 into a cholesterol-containing virus-mimetic lipid membrane and measured conformation-dependent 13 C chemical shifts, membrane insertion depths, and three-dimensional fold. We show that when both hydrophobic termini are present, the FP-FPPR predominantly adopts a ␤-strand conformation, whereas the MPER-TMD is ␣-helical. The ␤-strand FP is partially inserted into the membrane, whereas the ␣-helical TMD is well-inserted into the bilayer. Importantly, we observed long-range correlations between the MPER and FP-FPPR, providing evidence for close proximity of the two hydrophobic termini. These data suggest a hairpin-like fold of the protein that is associated with two lipid bilayers, thus resembling a hemifusion intermediate.

gp41 short NC has fusion activity and generates membrane curvature
To determine whether the short NC gp41 construct has fusion activity, we conducted lipid mixing assays. Fluorescence spectra (Fig. 1a) of dye-labeled vesicles mixed with unlabeled vesicles show an increase of the fluorescence intensities within 10 min, indicating that the two vesicle populations are wellmixed by the protein. Mixing is detected for both POPE vesicles and VMS(Ϫ) vesicles at low pH, but at neutral pH, minimal lipid mixing is detected for the POPE membrane. This result is consistent with the known higher fusion activity of gp41 at acidic pH, which has been attributed to stronger affinity of the protein for the lipids (21)(22)(23). Between the two lipid membranes, the POPE sample shows higher fusion than the VMS(Ϫ) membrane, suggesting that spontaneous negative curvature of the membrane may promote fusion.
To assess the global conformation of short NC gp41, we measured the CD spectra of the protein in DPC micelles and in lipid vesicles. The DPC-bound protein exhibits a strong ␣-helical signature at both low and high pH (Fig. 1b). In POPC/POPG membranes, the protein has slightly lower helicity, and the helical content is higher with increasing concentrations of negatively charged POPG lipids.
To investigate whether this short NC gp41 causes membrane curvature and dehydration, we measured static 31 P spectra and 2D 1 H-31 P HETCOR spectra (Fig. 2). The VMS(Ϫ) and POPE membranes show a uniaxial powder pattern characteristic of lamellar membranes, and the spectra are unperturbed by the protein (Fig. 2a). In comparison, the protein caused a strong isotropic peak to the inverse-hexagonal-phase DOPE membrane, indicating that gp41 alters the curvature of the membrane that possesses significant negative spontaneous curvature. The protein-induced isotropic peak is indicative of negative Gaussian curvature (16,24), which is the membrane topology necessary for hemifusion and fusion-pore formation. The generation of this isotropic peak in the DOPE membrane is similar to the behavior of the PIV5 fusion protein TMD, which causes lipid cubic phases to DOPE membranes (16). 2D 1 H-31 P HETCOR spectra of all three membranes show clear water cross-peaks with the lipid headgroups in the presence of gp41 (Fig. 2b), indicating that the protein does not dehydrate the membrane surface.

Domain-specific backbone conformations of gp41
To investigate the backbone conformation of membranebound gp41, we measured 2D 13 C-13 C correlation spectra of uniformly (U)-13 C-labeled and 1,3-13 C-labeled proteins. The amino acid sequence of short NC gp41 (Table 1) indicates that certain residue types are enriched in one of the domains. For example, the FP contains six of the nine Ala residues in the protein, whereas the FPPR contains three of the four Thr residues. Conversely, the MPER harbors five of six Trp residues in the protein, whereas the TMD contains half of the Ile and Val  Effects of gp41 on membrane curvature and hydration. a, static 31 P spectra of VMS(Ϫ), POPE, and DOPE membranes in the absence (black) and presence (red) of the protein. 3 mg of unlabeled protein was bound to each membrane at a P/L of 1:60. The protein did not perturb the VMS(Ϫ) membrane and only exerted a small impact on the POPE membrane curvature. In comparison, the protein caused a strong isotropic peak in the DOPE membrane, indicating the generation of negative Gaussian curvature. b, 2D 1 H-31 P MAS correlation spectra of three protein-bound membranes. All three membranes show water-31 P cross-peaks, indicating that the membrane surface is well-hydrated.

HIV gp41 conformation from SSNMR
residues. Therefore, the chemical shifts of these residue types serve as reporters of the backbone conformation of these domains when single-site resolution is challenging to obtain due to the broad linewidths of the protein in the cholesterolrich lipid membrane.
We measured 2D 13 C-13 C correlation spectra using 13 C spin diffusion mixing times of 55 and 300 ms for the U-13 C-labeled gp41 and 100 ms for the 1,3-13 C-labeled gp41 ( Fig. 3 and Fig.  S1). In all three spectra, characteristic chemical shifts of many amino acid residues are observed that allowed the evaluation of the conformations of the different domains in the protein (Table S1). We use the cross-peak intensities in the 55-ms spectrum ( Fig. S1) and the 100-ms spectrum ( Fig. 3a) to estimate the relative number of residues in different secondary structures, whereas the 300-ms spectrum is useful for identifying inter-residue cross-peaks (Fig. 3b). In the 1,3-13 C-labeled spectrum, a ␤-sheet Thr C␣-C␥ cross-peak at (59.0, 19.4) ppm and a C␤-C␥ cross-peak at (69.0, 19.4) ppm are observed. These peaks are well-separated from a weaker ␣-helical Thr C␣/C␤-C␥ cross-peak at (65.5, 19.9) ppm. The sheet/helix intensity ratio is about 2.7:1.0. Because three Thr residues exist in the FPPR, whereas one Thr exists in the MPER, this ratio suggests that the FPPR predominantly adopts the ␤-sheet conformation. Consistently, the 55-ms spectrum of the U-13 C-labeled gp41 shows a stronger ␤-sheet Thr C␤-C␣ cross-peak than ␣-helical peak (Fig. S1), supporting the assignment of the ␤-strand conformation for the FPPR. The 55-ms spectrum also displays well-resolved ␤-sheet and ␣-helical Ala C␣-C␤ cross-peaks at (48.7, 21.3) ppm and (53.0, 26.0) ppm, respectively (Fig. S1), with the ␤-sheet signal being 2-fold higher than the ␣-helical signal. Because six of the nine Ala residues in the protein are located in the FP-FPPR, this intensity distribution again suggests that the FP-FPPR domains adopt a ␤-strand conformation.
The conformation of the MPER-TMD domain can be assessed through Ile cross-peaks. Ile C␣-C␥1 and C␣-C␥2 cross-peaks are observed at ␣-helical chemical shifts such as (63.5, 15.3) ppm ( Fig. 3a and Fig. S1). These peaks are 2-3-fold stronger than the ␤-sheet Ile peaks at chemical shifts such as (57.6, 15.8) ppm. Six of eight Ile residues lie in the MPER-TMD; therefore, this intensity distribution indicates that the MPER-TMD is predominantly ␣-helical. In the carbonyl region of the 55-ms 2D spectrum, an ␣-helical Ala C␤-CЈ cross-peak at (16.0, 178.0) ppm and an Ile C␥2-CЈ cross-peak at (15.3, 175.1) ppm have an intensity ratio of 1:3. The MPER-TMD region contains two Ala and six Ile residues. Thus, this intensity ratio provides further support to the assignment of an ␣-helical conformation to the MPER-TMD.
The 300-ms spectrum revealed a number of inter-residue cross-peaks (Fig. 3b). Although these inter-residue 13 C-13 C cross-peaks cannot be assigned in a sequence-specific manner without three-dimensional 15 N-13 C correlation spectra, most of the cross-peaks correspond to residue types that exist as sequential pairs in the protein. Thus, we tentatively assign them to sequential contacts. The majority of these inter-residue cross-peaks are observed at ␣-helical chemical shifts, whereas the ␤-sheet inter-residue peaks are fewer and weaker. This is consistent with the fact that the distances between two sequential ␤-strand residues are much longer than those between two helical residues. We observed inter-residue Ala-Val crosspeaks at both ␣-helical and ␤-sheet chemical shifts, which is consistent with the fact that sequential Ala-Val pairs exist in both FP and TMD domains. An Ala C␣ cross-peak with Gly C␣ is also observed at the ␤-sheet chemical shift of (48.7, 42.6) ppm. The only region of the protein that contains sequential Gly-Ala pairs is the FP-FPPR (Table 1). Therefore, this ␤-sheet cross-peak provides unambiguous evidence that the ␤-sheet Ala residues occur in the N-terminal FP-FPPR domains, whereas the helical Ala cross-peaks result from the MPER-TMD domains.
Taken together, the relative intensities of the ␣-helical and ␤-sheet cross-peaks and the inter-residue correlations in these 2D spectra support the conclusion that the MPER-TMD is predominantly ␣-helical, whereas the FP-FPPR mainly adopts a ␤-strand conformation in the cholesterol-rich membrane.

Membrane-bound topology of gp41 from lipid-protein and water-protein polarization transfer
To investigate the depth of insertion of gp41 into the lipid bilayer, we measured 2D 1 H-13 C correlation spectra of the 1,3-13 C-labeled protein in the VMS(Ϫ) membrane. 1 H spin diffusion mixing times of 25-225 ms were used to transfer the lipid and water 1 H magnetization to the protein. Fig. 4 shows a representative 2D spectrum, measured using 25-ms mixing. Comparison of the lipid chain CH 2 cross-section and the water cross-section shows that the 53-ppm C␣ peak of ␤-strand residues is preferentially enhanced in the water cross-section compared with the lipid cross-section, whereas the 56-ppm ␣-helical C␣ signal is stronger in the lipid cross-section than the water cross-section. This difference indicates that the ␣-helical residues of the protein are more inserted into the membrane than the ␤-strand residues. The water-to-protein and lipid-to-protein buildup curves quantify this observation (Fig. 4c), and the lipid-to-protein transfer curves show clear differences for dif- Table 1 Amino acid sequence and distribution of gp41 short NC HIV gp41 conformation from SSNMR ferent residues. For example, the ␣-helical Ile C␣ shows the fastest lipid buildup intensities, whereas the helical Ser C␤ signal at 61 ppm shows the slowest lipid buildup. Because helical Ile residues are mainly located in the TMD, whereas Ser residues are mostly found in the NHR-CHR domain, these differential lipid-protein magnetization transfer rates indicate that the TMD is well-inserted into the bilayer, whereas the ectodomain is exposed to water. Interestingly, the ␤-strand Thr peaks at 59 and 69 ppm, and the 53-ppm ␤-strand signals show slow lipid buildup, indicating that the N-terminal ␤-strand domains are not deeply inserted into the membrane. Compared with the lipid-to-protein polarization transfer rates, the water buildup curves show less variation among different residues. Nevertheless, they display a trend approximately opposite that of the lipid-to-protein buildup curves; the 56-and 64-ppm ␣-helical peaks have slower water buildup rates than the 53-and 59-ppm ␤-sheet peaks, indicating that the ␣-helical C terminus of gp41 is more inserted into the membrane, whereas the ␤-sheet rich N terminus is more exposed to water.
To obtain more site-specific information about the water accessibility of the different domains of the short NC gp41, we measured a water-edited 2D 13 C-13 C correlation spectrum of the VMS(Ϫ)-bound protein (Fig. 5a). The spectrum shows higher water-transferred intensities for the ␤-strand peaks than the ␣-helical peaks. For example, the ␤-sheet Ala peak has higher intensity than the ␣-helical Ala peak, and the ␤-sheet Thr and Ser peaks are also better retained than the ␣-helical Ile peaks. Thus, the water-edited 2D spectrum is consistent with the 2D 1 H-13 C correlation spectra in indicating that the ␤-strand FP-FPPR domains are more exposed to water than the ␣-helical MPER-TMD. Fig. 5b compares the water-edited spectral intensities for residues enriched in different domains of the protein. The highest water accessibility is found for coil residues enriched in the NHR-CHR region, whereas the lowest water-transferred intensities are detected for the ␣-helical Val, Ile, and Gly residues enriched in the TMD. The helical Asn residues, which are only found in the MPER, show moderately higher hydration than the TMD residues, supporting the con- Figure 3. 2D 13 C-13 C correlation spectra of VMS(؊) membrane-bound gp41 for conformational analysis. a, 100-ms spin diffusion spectrum of 1,3-13 Clabeled gp41. Residue-type assignment is indicated together with the secondary structure motif in brackets. H, helix; C, coil; S, sheet. Green dashed rectangles indicate cross-peaks whose intensities are used to derive the conformational propensity of various domains. b, 300-ms spectrum of U-13 C-labeled gp41. Cross-peaks are assigned in pink, black, and blue for ␣-helical, random coil, and ␤-sheet chemical shifts, respectively. The two spectra were measured at 263 K using 7 mg of 1,3-13 C-labeled protein and 4 mg of U-13 C-labeled protein.  13 C cross-sections at the lipid CH 2 (orange) and water (blue) 1 H chemical shifts. The ␤-strand C␣ peak at 52 ppm is much higher in the water cross-section than in the lipid cross-section, indicating that the ␤-strand segment is preferentially hydrated and exposed to the membrane surface. c, lipid-to-protein and water-to-protein 1 H polarization transfer intensities as a function of mixing time. The ␣-helical peaks from the TMD (red and magenta) show fast lipid polarization transfer, whereas the ␣-helical Ser enriched in the ectodomain shows slow lipid buildup intensities.

HIV gp41 conformation from SSNMR
clusion that the MPER is less inserted into the bilayer than the TMD. The ␤-strand Ser residues enriched in the FPPR show a high water-transferred intensity of 25%, whereas the ␤-strand Ala, Val, Gly, and Ile residues in the FP have lower water-transferred intensities of 13, 16, 11, and 24%, respectively. Therefore, the FPPR is more exposed to water than the FP, supporting a membrane-surface location of the FPPR.

Long-range correlations between the FPPR and MPER domains
To obtain long-range correlations that indicate the global three-dimensional fold of the short NC gp41, we measured a 500-ms 2D 13 C-13 C correlation spectrum. We focus on the aromatic-aliphatic region (Fig. 6), which resolves many Trp side-chain 13 C signals, such as C⑀3 (117 ppm), C2 (122 ppm), and C␦1 (124 ppm). In addition, characteristic Phe C␥ and C␦1 chemical shifts are also resolved in this region. With 500-ms mixing, we detected many inter-residue correlations, most of which can be attributed to sequential residues in the protein. Importantly, two clear and uniquely assigned cross-peaks are observed between the ␤-sheet Ala C␣ peak at 49 ppm and the Trp C␦1 (124 ppm) and C2 (122 ppm) signals. Because ␤-sheet Ala residues are found only in the FP-FPPR, whereas ␣-helical Trp residues are only found in the MPER, these correlation peaks indicate that the N-terminal FP-FPPR lies in close proximity to the C-terminal MPER. Therefore, the short NC gp41 adopts a hairpin-like fold that places the N and C termini within 13 C spin diffusion reach of about 1 nm.

Membrane-bound gp41 undergoes small-amplitude local motions
To investigate the dynamics of membrane-bound gp41, we measured 13 C-1 H dipolar couplings at 303 K using the 2D 13 C-1 H DIPSHIFT experiment. Fig. 7 shows representative 13 C-1 H dipolar dephasing curves, which correspond to relatively large C-H order parameters of 0.69 -0.89 for all labeled sites. These large order parameters indicate that the protein does not undergo large-amplitude motions but only exhibits local motion in this cholesterol-containing membrane. The 61.3-ppm peak of ␣-helical Ser C␤ exhibits a relatively weak dipolar coupling with an order parameter of 0.60, suggesting larger mobility of the NHR-CHR ectodomain.

Discussion
The short NC gp41 construct in this study was designed to probe the post-fusion or hemifusion state of full-length gp41 by retaining the two essential hydrophobic termini while shortening the water-soluble ectodomain. The cholesterolcontaining VMS(Ϫ) membrane was chosen to mimic the composition of eukaryotic cell membranes as well as HIV-1 virus envelopes. This choice minimizes potential nonnative effects of detergent micelles and lipid bicelles, such as high membrane curvature and low membrane viscosity, which may perturb the protein structure. The results shown here provide new insights into the conformations and three-dimensional fold of gp41 and both validate and revise previous structural conclusions of gp41 obtained from shorter peptides. The measured 13 C chemical shifts indicate that the N-terminal FP-FPPR domains have significant ␤-sheet character, whereas the C-terminal MPER-TMD domains are predominantly ␣-helical. This result is in good agreement with previous studies of isolated FP and MPER-TMD peptides in lipid bilayers. In particular, it strongly suggests that the ␤-sheet FP conformation that has been observed for isolated peptides in cholesterol-containing membranes (10) persists in the full-length protein. A previous SSNMR study of a gp41 construct that contains the FP and NHR-CHR but lacks the MPER-TMD found that the FP adopts the ␤-sheet conformation (11). However, in DPC micelles, both the isolated FP (25) and FP in full-length gp41 (17) are ␣-helical, and in noncholesterol-containing membranes, the isolated FP is also helical (26). Therefore, the FP conformation is independent of the MPER-TMD but is sensitive to the membrane environment; the helical conformation is promoted by micelles and by the absence of cholesterol, suggesting that high membrane curvature and low membrane viscosity favor helix formation. Compared with the FP, the C-terminal Figure 5. Water accessibility of gp41 from 2D water-edited 13 C-13 C correlation spectra. a, full and water-edited 2D spectra of U-13 C-labeled gp41, measured at 263 K. The sample contained 4 mg of protein at a P/L of 1:60. The water-edited spectrum was measured using a 1 H spin diffusion mixing time of 9 ms. Assignments are shown in pink, gray, and blue for ␣-helical, random coil, and ␤-sheet chemical shifts, respectively. b, hydration values for residues in the various domains of gp41. The ␣-helical MPER-TMD segments show lower hydration than the FP-FPPR segments.

HIV gp41 conformation from SSNMR
MPER-TMD domains are stably ␣-helical in a variety of membrane environments (9,15,27) and both as an isolated peptide and in the nearly full-length protein. A recent SSNMR study found that an MPER-TMD peptide forms a trimeric helix-turn-helix structure, with the MPER lying on the membrane surfaces and the trimeric TMD helices span- Figure 6. Aromatic-aliphatic region of the 500-ms 2D 13 C-13 C correlation spectrum of membrane-bound gp41. Two cross-peaks between ␤-sheet Ala and ␣-helical Trp (assigned in blue) are observed, indicating that the protein adopts a hairpin-like fold that places the N-terminal FP-FPPR in close proximity with the C-terminal MPER. Other inter-residue cross-peaks (assigned in pink) can be attributed to sequential residue pairs. The spectrum was measured at 263 K using 4 mg of protein bound to the membrane at a P/L of 1:60.

HIV gp41 conformation from SSNMR
ning the membrane (8). These results indicate that the C-terminal helical structure may be important for maintaining the trimeric state of gp41 and for anchoring the protein in the virus envelope during the protein conformational changes.
What is the three-dimensional fold of the short NC gp41 and the corresponding membrane morphology? On the basis of three lines of evidence, we propose a partial hairpin fold that puts the FPPR and MPER in close proximity while keeping their neighboring FP and TMD apart. Moreover, we suggest that this partial-hairpin fold is associated with two bilayers, giving a hemifusion-like intermediate (Fig. 8). First, the lipid and water buildup curves of the ␣-helical, ␤-sheet, and random coil residues indicate that the N-terminal FP-FPPR domains are shallowly inserted into the membrane, the C-terminal TMD spans the membrane, and the random coil NHR-CHR residues are the most exposed to water (Figs. 4 and 5). The shallow insertion of the FP-FPPR is consistent with many studies of the FP (28), whereas the full insertion of the TMD is consistent with studies of MPER-TMD by solution and solid-state NMR (8,14,15). Second, the observed long-range correlations between ␣-helical Trp and ␤-sheet Ala (Fig. 6) put strong constraint on the proximity of the membrane-surface MPER and FPPR, because the distance upper limit of 13 C spin diffusion is only about 1 nm. Third, the lack of long-range correlations between the FP and  The trimeric ␣-helical TMD is bound to one bilayer, whereas the FP is shallowly inserted into the other bilayer in a ␤-sheet-rich conformation. The ␣-helical MPER lies on the surface of one bilayer, within ϳ1 nm of ␤-sheet FPPR, which lies on the surface of the other bilayer. The water-soluble ectodomain is dynamic and may be significantly disordered. Two trimeric assemblies are depicted, to be consistent with previous data that indicate that the FP associates as antiparallel ␤-sheets (28).

HIV gp41 conformation from SSNMR
TMD as well as their different backbone conformations suggest that the FP and TMD are more likely to interact with two opposing bilayers rather than the same bilayer. The alternative model of putting the FPPR and MPER on the same membrane surface would reduce the likelihood of spatial contact because of the trimeric nature of the protein. In comparison, when the FPPR and MPER lie on two different bilayer surfaces that are pulled close together by the ectodomain, the probability for Trp-Ala spatial contact should increase. Therefore, the current data favor a hemifusion model in which the protein bridges two lipid membranes, although it cannot exclude a membrane-reentry model. The hemifusion model implies the presence of local membrane curvature, which is observed in the static 31 P spectrum of the DOPE membrane as an isotropic peak diagnostic of negative Gaussian curvature (Fig. 2). The VMS(Ϫ) membrane does not exhibit such an isotropic peak; however, this could result from the fact that the curvature may occur only to the POPE fraction of the composite membrane and that the curvature required for the hemifusion protein structure may be moderate.
The structural model of Fig. 8 depicts two trimers at the curved hemifusion region. This hypothesis is based on previous SSNMR data indicating that the ␤-sheet FP assembles in an antiparallel fashion (28), which requires two parallel trimers to interdigitate with each other. It has been proposed that multiple trimeric subunits need to aggregate to induce sufficient membrane curvature for fusion to occur (29).
This partial hairpin and hemifusion model places the FP and TMD in two different membranes. Evidence both for and against the association of FP with TMD has been reported in the literature. For example, FRET studies of lipid membranes containing both FP and TMD found heteromeric association (30). In contrast, solution NMR studies of DPC-bound gp41 found the FP to have much faster motional rates than the TMD (17,18), suggesting that FP and TMD do not associate with each other. Because isolated peptides cannot fully recapitulate the conformational constraints imposed by the intervening segments of the FPPR, ectodomain, and MPER and studies in DPC micelles may be perturbed by the micelle curvature (31), the presence or absence of heteromeric association in these earlier studies should be evaluated with caution.
Although we have used sparse 1,3-13 C labeling and reverse labeling to simplify the spectra of gp41, resonance overlap still remains in the 2D 13 C-13 C correlation spectra. Despite this congestion, the use of well-resolved peaks such as Thr, Ala, Ile, and Trp nevertheless allows us to obtain domain-specific structural information such as backbone conformation, inter-domain interactions, and water accessibility. The overlapped peaks near (50, 30) ppm in the 2D 13 C-13 C correlation spectra are assigned to disordered Glu, Gln, Leu, and Lys residues in the NHR-CHR domains. However, because random coil and ␤-sheet chemical shifts are often similar, we cannot rule out that some ␤-sheet residues may exist in the NHR near the FPPR segment. Future strategies for obtaining higher-resolution spectra of gp41 include deuteration and amino acid-specific labeling of the protein. To further test the hemifusion structural model, it will be important to obtain additional long-range distance restraints. These may be obtained from paramagnetically tagged protein or fluorinated protein by exploiting paramagnetic relaxation enhancement or 19 F distance NMR techniques (32)(33)(34)(35), respectively. Finally, mixed labeled protein samples will be important to determine the oligomeric structure of this gp41 hemifusion intermediate.

Expression and purification of gp41
The amino acid sequence and residue distribution of "short NC" gp41 (here designated as gp41) is shown in Table 1. We expressed and purified U-13 C, 15 N-labeled protein and 1,3-13 C-labeled protein (36,37) for this study. U-13 C, 15 N-gp41 was expressed in Escherichia coli Rosetta pLysS (DE3) cells (Novagen), whereas 1,3-13 C-gp41 was expressed in E. coli Lemo21 (DE3) cells (New England Biolabs). Bacteria were grown in 2 liters of lysogeny broth medium at 37°C until A 600 reached 0.5. The cells were then centrifuged at 7,000 rpm for 10 min. For the 1,3-13 C-labeled protein, the cell pellet was resuspended in 1 liter of M9 minimal medium containing 3 g of 1,3-13 C-labeled glycerol and 1 g of 15 N-labeled ammonium chloride. 100 mg of unlabeled His, Glu, and Gln were added to the medium to prevent these residues from being isotopically labeled. For U-13 C, 15 N-labeled gp41, the cell pellet was resuspended in 1 liter of M9 minimal medium containing 3 g of U-13 C-labeled glucose and 1 g of 15 N-labeled ammonium chloride. 100 mg of unlabeled His, Lys, Met, and Leu were added to reverse-label these residues. Each cell pellet was resuspended in the M9 medium and equilibrated at 37°C for 30 min. Once the A 600 increased by ϳ10%, the temperature was decreased to 25°C, and 0.5 mM isopropyl 1-thio-␤-D-galactopyranoside was added to induce protein expression for 18 -20 h. After expression, the cells were collected by centrifugation at 7,000 rpm for 10 min at 4°C.
The cell pellet was resuspended in 40 ml of lysis buffer at pH 8.0 (50 mM Tris-HCl, 200 mM NaCl) and sonicated for 30 min in an ice bath. The lysed cells were centrifuged at 11,500 rpm and 4°C for 45 min. The protein was primarily expressed in the inclusion body. The pellets were resuspended in lysis buffer and sonicated one more time for 30 min in an ice bath. The lysed cells were centrifuged at 11,500 rpm and 4°C for 45 min. The inclusion body was washed with lysis buffer to remove any remaining soluble fractions.
The protein was purified using nickel affinity column chromatography. The inclusion body was dissolved in 8 M urea and 1% SDS and stirred overnight at room temperature. After 1 h of centrifugation at 11,500 rpm and 25°C, the supernatant was loaded onto the Ni 2ϩ column (Bio-Rad). The column was placed in a rotator at room temperature for 3 h to allow protein binding. The column was then washed stepwise by 1) 8 M urea and 1% SDS; 2) 4 M urea, 0.5% SDS, and 20 mM imidazole; 3) 2 M urea, 0.2% SDS, and 20 mM imidazole; and 4) 0.2% SDS. The protein was eluted using 250 mM imidazole and 0.2% SDS. All washing and elution solutions were prepared in pH 8.0 lysis buffer. The yield after the column purification was ϳ2 mg of protein per liter of M9 medium. To remove SDS, we dialyzed the eluent in a dialysis bag with a molecular mass cutoff of 10.0 -10.5 kDa against 1 liter of water for 5 days with water HIV gp41 conformation from SSNMR change every 12 h. The precipitated protein was collected and lyophilized to obtain a dry powder.

Membrane protein sample preparation
The purified gp41 was reconstituted into three different lipid membranes in this work: POPE, DOPE, and a composite membrane termed VMS(Ϫ), which consists of POPC/POPE/POPS/ cholesterol at molar ratios of 35:15:20:30. The protein/lipid molar ratio (P/L) was 1:60 for most NMR samples. For 2D 13 C- 13 C correlation experiments to investigate the protein conformation, 7 mg of 1,3-13 C-gp41 and 4 mg of U-13 C-gp41 were reconstituted into the VMS(Ϫ) membrane. For static and magic-angle-spinning (MAS) 31 P experiments to measure the impact of the protein on membrane curvature and hydration, we reconstituted 3 mg of unlabeled gp41 into the three membranes at a P/L of 1:60.
To reconstitute the protein into lipid membranes, we dissolved phospholipids in chloroform and the protein in hexafluoroisopropyl alcohol. The two solutions were mixed, the solvents were removed with nitrogen gas, and then the sample was lyophilized overnight. The dried protein-lipid film was resuspended in pH 7.5 HEPES buffer (10 mM HEPES-NaOH, 1 mM EDTA, 0.1 mM NaN 3 ) and subjected to seven freeze-thaw cycles between liquid nitrogen and 35°C water bath to create homogeneous vesicles. The vesicle solutions were spun at 40,000 rpm using a Beckman SW60Ti rotor at 4°C for 6 h to obtain wet membrane pellets. These pellets were allowed to equilibrate in a desiccator to 40% water by mass and then were spun into a MAS rotor through a pipette tip.

Lipid-mixing assays
VMS(Ϫ) and POPE membranes were used to measure peptide-induced lipid mixing. Lipids were dissolved in chloroform and dried under nitrogen gas. The dried film was resuspended in 10 mM HEPES buffer (pH 7.5), freeze-thawed 15 times between liquid nitrogen and a 35°C water bath, and then extruded 15-20 times through 100-nm membranes to produce homogeneous large unilamellar vesicles. Fluorescently labeled vesicles containing 2 mol % of the fluorescent lipid NBD-PE (1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N-(7nitro-2-1,3-benzoxadazol-4-yl)) and 2 mol % of the quenching lipid Rh-PE (1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N-(lissamine rhodamine B sulfonyl)) were prepared using the same method. Unlabeled and labeled vesicles were mixed at a 9:1 molar ratio with a total lipid concentration of 150 M. Purified gp41 was dissolved in 2,2,2-trifluoroethanol or formic acid and added to the lipid vesicle solution to reach a P/L of 1:20. A HORIBA Fluoromax-P fluorimeter was used to measure fluorescence at an excitation wavelength of 465 nm and an emission wavelength of 530 nm. Each measurement was carried out in 2 ml of large unilamellar vesicle solution under continuous stirring with a time increment of 1 s. 20 l of 10% Triton X-100 was added to the 2-ml solution to measure the maximum fluorescence, F max . We measured the initial fluorescence before (F 0 ) and after the addition of the peptide (F f ). The percentage of lipid mixing was obtained with the equation, % mixing ϭ ((F t Ϫ F 0 )/(F max Ϫ F 0 )) ϫ 100, where F t is the fluorescence intensity at time t.

CD experiments
We investigated the global conformation of gp41 in DPC and lipid bilayers using CD experiments. For DPC samples, the protein was dissolved in 0.5 ml of 0.5% DPC solution to a final concentration of 4 M. The POPC/POPG membrane samples were prepared with a P/L molar ratio of 1:60 and a final protein concentration of 25 M. The protein-containing membrane and protein-free membrane were freeze-thawed seven times and then subjected to bath sonication for 30 min to obtain a sufficientlyhomogeneousvesiclesolution.CDspectraweremeasured at room temperature on an AVIV 202 spectrometer using a 1-mm path length quartz cuvette. The spectra of the proteinfree membrane samples were subtracted from the spectra of the protein-containing samples to obtain the pure-protein spectrum.

Solid-state NMR experiments
Solid-state NMR spectra were measured on Bruker spectrometers at 1 H Larmor frequencies of 800, 600, and 400 MHz. 13 C chemical shifts were referenced to the adamantane CH 2 signal at 38.48 ppm on the tetramethysilane scale or the 14.0ppm peak Met C⑀ peak of the tripeptide, N-formyl-Met-Leu-Phe-OH (MLF). 31 P chemical shifts were referenced to the hydroxyapatite 31 P signal at 2.73 ppm on the phosphoric acid scale.
Static 31 P spectra of lipid membranes in the absence and presence of gp41 were measured to investigate the effects of the protein on the membrane curvature. MAS 2D 1 H-31 P heteronuclear correlation (HETCOR) spectra were measured to investigate membrane surface hydration in the presence of gp41 (16,(38)(39)(40). These HETCOR experiments were conducted at 298 K under 5-kHz MAS using a 100-ms 1 H mixing time.
Conformation-dependent 13 C chemical shifts and interresidue contacts were measured using 2D 13 C-13 C correlation experiments with combined R2 n v -driven spin diffusion (41). Mixing times were 55 and 300 ms for the U-13 C-labeled protein and 100 ms for the 1,3-13 C-labeled protein. The spectra were measured at 263 K under 13.5 kHz MAS.
To quantify protein dynamics, we conducted 2D 13 C-1 H DIPSHIFT experiments at 303 K under 7-kHz spinning (42,43). A 1 H frequency-switched Lee-Goldberg (FSLG) sequence (44) was used to remove the 1 H-1 H homonuclear couplings during the C-H dipolar evolution period. The time-dependent 13 C-1 H dipolar oscillation was fit to obtain the dipolar couplings. The measured couplings were divided by the FSLG scaling factor of 0.577 to obtain the true couplings. The order parameter S CH was calculated as the ratio of the true coupling to the rigid-limit one-bond dipolar coupling strength of 22.7 kHz.
To investigate the depth of insertion of gp41 in lipid bilayers, we measured 2D 1 H-13 C spectra that correlate the water and lipid 1 H chemical shifts with the protein 13 C chemical shifts (45). The spectra were measured at 303 K in the liquid-crystalline phase of the membrane under 14.5 kHz MAS. A total 1 H T 2 filter period of 0.90 ms was used to suppress the 1 H magnetization of the relatively rigid protein (45,46). A 1 H spin diffusion mixing period of 25-225 ms was used to transfer the water and lipid 1 H magnetization to the protein. To investigate the residue-specific water accessibility of gp41, we also measured water-edited 2D 13 C-13 C correlation spectra (47)(48)(49). The experiment uses a soft Gaussian 90°pulse centered at the water 1 H chemical shift of 5.1 ppm to selectively excite the water magnetization. A 9-ms 1 H spin diffusion mixing period was used to transfer the polarization from water to the protein. The spectrum was measured at 263 K under 10.5-kHz MAS. Due to frictional heating, the estimated sample temperature was about 273 K, as verified by the water 1 H chemical shift of 5.1 ppm.
Author contributions-M. H. designed the study and coordinated the project. M. L. and C. A. M. purified the protein and conducted the NMR experiments. All authors analyzed the data and wrote the manuscript.