The Transmembrane Domains of Hepatitis C Virus Envelope Glycoproteins E1 and E2 Play a Major Role in Heterodimerization*

Oligomerization of viral envelope proteins is essential to control virus assembly and fusion. The transmembrane domains (TMDs) of hepatitis C virus envelope glycoproteins E1 and E2 have been shown to play multiple functions during the biogenesis of E1E2 heterodimer. This makes them very unique among known transmembrane sequences. In this report, we used alanine scanning insertion mutagenesis in the TMDs of E1 and E2 to examine their role in the assembly of E1E2 heterodimer. Alanine insertion within the center of the TMDs of E1 or E2 or in the N-terminal part of the TMD of E1 dramatically reduced heterodimerization, demonstrating the essential role played by these domains in the assembly of hepatitis C virus envelope glycoproteins. To better understand the alanine scanning data obtained for the TMD of E1 which contains GXXXG motifs, we analyzed by circular dichroism and nuclear magnetic resonance the three-dimensional structure of the E1-(350–370) peptide encompassing the N-terminal sequence of the TMD of E1 involved in heterodimerization. Alanine scanning results and the three-dimensional molecular model we obtained provide the first framework for a molecular level understanding of the mechanism of hepatitis C virus envelope glycoprotein heterodimerization.

Oligomerization of viral envelope proteins is essential to control virus assembly and fusion. The transmembrane domains (TMDs) of hepatitis C virus envelope glycoproteins E1 and E2 have been shown to play multiple functions during the biogenesis of E1E2 heterodimer. This makes them very unique among known transmembrane sequences. In this report, we used alanine scanning insertion mutagenesis in the TMDs of E1 and E2 to examine their role in the assembly of E1E2 heterodimer. Alanine insertion within the center of the TMDs of E1 or E2 or in the N-terminal part of the TMD of E1 dramatically reduced heterodimerization, demonstrating the essential role played by these domains in the assembly of hepatitis C virus envelope glycoproteins. To better understand the alanine scanning data obtained for the TMD of E1 which contains GXXXG motifs, we analyzed by circular dichroism and nuclear magnetic resonance the three-dimensional structure of the E1-(350 -370) peptide encompassing the N-terminal sequence of the TMD of E1 involved in heterodimerization. Alanine scanning results and the three-dimensional molecular model we obtained provide the first framework for a molecular level understanding of the mechanism of hepatitis C virus envelope glycoprotein heterodimerization.
After their synthesis and integration into the membrane, a large number of membrane proteins associate and form homoor hetero-oligomeric complexes with new functions. To ensure specific assembly, these proteins must present complementary recognition regions to each other. These interacting regions may be located on the ectodomains and/or the transmembrane sequences. The importance of noncovalent interactions between transmembrane ␣-helices in oligomerization of membrane proteins is becoming increasingly apparent (1,2). However, interactions of transmembrane ␣-helices have only been well characterized for a restricted number of proteins and mainly in the context of homo-oligomerization. Recently, we have shown that the transmembrane domains (TMDs) 1 of hepatitis C virus (HCV) E1 and E2 envelope glycoproteins are required in heterodimerization (3). Therefore, these proteins constitute an attractive model to study hetero-oligomerization of transmembrane domains. In addition, like other viral envelope proteins, HCV glycoproteins are supposed to play a crucial role in viral entry by binding to a receptor present on the host cell and inducing fusion between the viral envelope and a membrane of the host cell (4). To ensure that fusion does not occur during protein synthesis and folding, nor during budding of the viral particle, this function has to be tightly controlled. It is well known that oligomerization plays a crucial role in regulating the fusogenic function of such proteins (4). Studying the early steps of oligomerization of HCV envelope glycoproteins is essential to understand how the virus controls its fusion activity.
HCV is the causal agent of hepatitis C which is a major health problem worldwide. It is a positive stranded RNA virus and is a member of the hepacivirus genus in the family Flaviviridae (5). Its genome encodes a single polyprotein of about 3000 amino acid residues that is co-and post-translationally cleaved to generate at least 10 polypeptides (C, E1, E2, p7, NS2, NS3, NS4A, NS4B, NS5A, and NS5B) (6). The lack of a cell culture system supporting efficient HCV replication and particle assembly have hampered the characterization of the envelope proteins present on the virion. However, indirect evidences, like viral neutralization by antibodies, support the idea that HCV envelope glycoproteins are present on the surface of the virion (7). Our current understanding of the assembly of HCV envelope glycoproteins is based on cell culture transient-expression assays with viral or nonviral expression vectors. E1 and E2 are obtained after cleavage of the polyprotein by host signal peptidase(s) (8). They are heavily modified by N-linked glycosylation and are transmembrane proteins with a large N-terminal ectodomain and a C-terminal hydrophobic anchor (9). E1 and E2 glycoproteins have been shown to * This work was supported by the CNRS, Institut Pasteur de Lille, European Regional Development Fund (ERDF), a PRFMMIP grant from the French Ministry of Research, European Union Grant QLK2-1999-00356, and Association pour la Recherche sur le Cancer (ARC) Grant 9736. Support was also provided by a fellowship from ARC and the Agence Nationale de la Recherche sur le Sida (ANRS) (to A. O. D. B.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. assemble as a noncovalent heterodimer (10). However, this process is not efficient, and misfolded heterogeneous disulfidelinked aggregates of E1 and E2 are also produced, at least when using heterologous expression systems (9). Coexpression of E1 and E2 has been shown to be necessary to ensure correct folding of E1 (11). The two glycoproteins likely play separate functions in HCV entry. Indeed, E2 might be involved in receptor binding (12,13), whereas E1 has been proposed to be the fusion protein (14).
The fact that the envelope proteins of HCV are translated from a single coding region implies that internal signal peptides must be used. The signal sequences of E1 and E2 are present at the C terminus of the immature form of the capsid protein and in the second half of the TMD of E1, respectively (8). In addition, a hydrophobic sequence present in the second half of the TMD of E2 is the signal sequence for a polypeptide called p7. Recently, we showed that the TMDs of E1 and E2 play a major role in the subcellular localization of E1⅐E2 complex (15)(16)(17). The TMDs of E1 and E2 have also been suggested to play a major role in the assembly of the heterodimer. Indeed, deletion of the C-terminal hydrophobic sequence of E2 (11,18), or its replacement by the membrane anchor of CD4 or a glycosylphosphatidylinositol moiety, have been shown to abolish the formation of E1⅐E2 complexes (15).
The multiple functions played by the TMDs of E1 and E2 glycoproteins are supposed to be essential for the formation of the viral envelope, and most likely impose limitations on the amino acid variability of these domains (3). These TMDs are composed of two stretches of hydrophobic residues separated by a short segment containing at least one fully conserved charged residue (Fig. 1). Replacement of these charged residues by alanine has been shown to alter all the functions of the TMDs of E1 and E2 (3), indicating that charged residues present within these domains are crucial for their multifunctionality.
In this report, we studied the role played by the TMDs of E1 and E2 in heterodimerization. We used alanine scanning insertion mutagenesis (19,20) in these domains to examine their role in the assembly of the heterodimer. This technique has been shown to be a powerful method to detect dimerization of transmembrane ␣-helices. Indeed, insertion of a single amino acid into a transmembrane helix displaces the residues on the N-terminal side of the insertion by 110°relative to those on the C-terminal side of the insertion, disrupting a helix-helix packing interface involving residues on both sides of the insertion. Two distinct segments of the TMD of E1 and one of the TMD of E2 were very sensitive to alanine insertion, demonstrating the essential role played by these domains in the assembly of HCV envelope glycoproteins. Neither ER retention nor signal sequence cleavage were affected by these mutations, indicating that assembly can be dissociated from other functions played by the TMDs of HCV envelope glycoproteins. In complement to the alanine scanning data obtained for the N-terminal part of TMD of E1 which contains GXXXG motifs, we used circular dichroism (CD) and nuclear magnetic resonance (NMR) to analyze, in the presence of various membrane mimetic solvents (detergents and trifluoroethanol/water mixtures), the threedimensional structure of an E1 peptide encompassing the sequence involved in heterodimerization (E1-(350 -370) segment). Alanine scanning results and the three-dimensional molecular model we obtained provide the first framework for a molecular level understanding of the mechanism of E1E2 heterodimerization. To our knowledge, this is the first report showing that oligomerization of viral envelope proteins can be controlled by transmembrane sequences.

EXPERIMENTAL PROCEDURES
Cell Culture-The HepG2, CV-1, and 143B (thymidine kinase-deficient) cell lines were obtained from the American Type Culture Collection, Rockville, MD. Cell monolayers were grown in Dulbecco's modified essential medium (Life Technologies) supplemented with 10% fetal bovine serum.
Plasmid Constructs-Plasmids expressing wild type and mutated E1E2 polyproteins (signal sequence of E1, and the sequences of E1 and E2) were constructed by standard methods (21). Briefly, DNA sequences of HCV proteins were polymerase chain reaction amplified and introduced into plasmid pTM1 (22). HCV sequences were amplified with the appropriate oligonucleotides from H strain clones (23,24). Site-directed mutagenesis was performed by enzymatic inverse polymerase chain reaction as described by Stemmer and Morris (25). A plasmid expressing the chimeric protein CD4-E1A358Ј was constructed as described (16). This plasmid contains the sequence of the signal peptide of CD4, followed by the sequence of the ectodomain of CD4 in fusion with the sequence encoding the C-terminal 37 amino acids of E1 with an alanine inserted at position 358Ј.
Generation and Growth of Viruses-Vaccinia virus recombinants were generated by homologous recombination essentially as described (26) and plaque purified twice in 143B cells under bromodeoxyuridine selection (50 g/ml). Stocks of vTF7-3 (a vaccinia virus recombinant expressing the T7 DNA-dependent RNA polymerase) (27), the wild type vaccinia virus strain, Copenhagen, its thermosensitive derivative ts7 (28), and vaccinia virus recombinants expressing HCV proteins or mutated proteins were grown and titrated on CV-1 monolayers.
Metabolic Labeling and Immunoprecipitation-Cells were infected with the appropriate vaccinia virus recombinants and metabolically labeled with 35 S-Protein Labeling Mix (3.7 ϫ 10 6 Bq/ml; NEN Life Science Products) as described previously (29). Cells were lysed with 0.5% Triton X-100 in Tris-buffered saline (TBS) (50 mM Tris-Cl (pH 7.5), 150 mM NaCl). Immunoprecipitations were carried out as described previously (31). For in vivo labeling of glycan moieties, HepG2 cells were infected with the appropriate vaccinia virus recombinants and pulse-labeled for 30 min with [2-3 H]mannose (3.7 ϫ 10 6 Bq/ml; Amersham Pharmacia Biotech) in ␣-minimum essential medium containing 0.5 mM glucose and 10% dialyzed fetal bovine serum. After 4 h of chase, cells were lysed in TBS, 0.5% Triton X-100, and the lysates were used for immunoprecipitation.
Analysis of Oligosaccharide Material-Immunoprecipitated [2-3 H] mannose-labeled proteins were digested overnight at room temperature with 0.2 mg of N-tosyl-L-phenylalanine chloromethyl ketone-treated trypsin in 0.1 M ammonium bicarbonate (pH 7.9). Trypsin-treated proteins were boiled for 10 min to inactivate the trypsin, and the peptides were lyophilized and dissolved in 20 mM sodium phosphate (pH 7.5) containing 50 mM EDTA and 0.2 mg of NaN 3 /ml in 50% glycerol. The peptides were incubated overnight at 37°C in the presence of peptide N-glycanase F (0.5 units; New England Biolabs). Size analysis of the glycan moieties was achieved by high pressure liquid chromatography (HPLC) on an amino-derivatized column ASAHIPAK NH 2 P-50 (250 by 4.6 mm) (Asahi, Kawasaki-ku, Japan) with a solvent system of acetonitrile/water from 70:30 (v/v) to 50:50 (v/v) at a flow rate of 1 ml/min over 80 min. Oligomannosides were identified as described previously (32) by their retention time. Separation of labeled oligosaccharides was monitored by continuous flow detection of radioactivity with a Flo-One ␤ detector (Packard).
Peptide Synthesis and Samples Preparation-E1-(350 -370) synthetic peptide was purchased from G. Blomberg, School of Medical Sciences, University of Bristol (United Kingdom). The numbering of this synthetic peptide (denoted E1-(350 -370)) refers to the genotype 1a and its sequence (GAHWGVLAGIAYFSMVGNWAK-NH 2 ) is identical to that of an infectious HCV cDNA clone recently published (33) (EMBL access number: AF009606). The synthesis was performed using N-tBoc chemistry and the C terminus was blocked by an amide group. Purity of the peptide was assessed by electrospray mass spectroscopy (molecular mass 2234 Da) and appeared to be higher than 90%. Moreover, only a minor spin system due to the impurity of the sample was observed in the NMR spectra. The peptide was rather insoluble in water but readily soluble in the presence of trifluoroethanol (TFE). Micellar solutions were prepared by adding concentrated aqueous solutions of lysophosphatidylcholine (LPC) or SDS to a solution of E1-(350 -370) dissolved in 90% TFE. Some TFE was eventually added to yield a clear and homogeneous mixture. After freezing and lyophilization to remove any trace of TFE, the samples were solubilized in 10 mM phosphate (pH 7.4). Micellar solutions were routinely checked for background absorbance, from which it was noted that light scattering was insignificant at the SDS and LPC concentrations used. Peptide concentrations were determined by amino acid analysis.
CD Measurements-CD spectra were recorded on a Jobin-Yvon CD6 spectrometer calibrated with ammonium d 10 -camphorsulfonate. Routinely, measurements were done at 298 K in 0.1-cm path length quartz cuvettes (Hellma) with peptide concentrations ranging from 30 to 50 M. Spectra were recorded in the 190 -250 nm wavelength range with 0.2-nm increments and 2-s integration time. The baseline-corrected spectra were smoothed by using a third-order least squared polynomial fit. Assuming that the residue molar ellipticity at 222 nm is exclusively due to ␣-helix (34), the ␣-helical content was estimated using the following equation where [] Hϱ is the maximum mean residue ellipticity for a helix of infinite length (Ϫ39,500 for ϭ 222 nm), f H is the fraction of helix, i is the number of helical segments (1 in this case), k is a wavelength dependent constant (2.57 for ϭ 222 nm), and n is the number of residues.
1 H NMR Spectroscopy-Lyophilized peptide was dissolved in an aqueous solution containing 50% TFE-d 2 (2,2,2-trifluoroethyl-1,1-d 2 alcohol Ͼ99% isotopic enrichment). The final peptide concentration was 4 mM and the pH measured was 5.7 (uncorrected). Sodium 2,2-dimethyl-2-silapentane-5-sulfonate was added as an internal reference. All NMR experiments were recorded at 500 MHz on a Varian Unity-plus spectrometer. Spectra were acquired non-spinning at temperatures of 293 and 303 K. Two-dimensional homonuclear 1 H experiments (DQF-COSY, Clean-TOCSY, and NOESY) were performed according to the conventional pulse sequences. Water suppression was carried out using selective, low power irradiation during the 1.5-s relaxation delay and during the mixing time in NOESY experiments. Routinely, the spectra were recorded with 6000 Hz spectral width in both dimensions and data sets collected as 512 and 2048 points in t 1 and t 2 dimensions, respectively, with 32 or 64 scans per increment. Data collection and processing were carried out as detailed previously (36,37). The resonances of protons were ascribed by the conventional assignment method (38).
NMR-derived Constraints and Structure Calculations-NOE intensities used as input for structure calculations were obtained from the NOESY spectrum recorded with a 300-ms mixing time and checked for spin diffusion on spectra recorded at lower mixing times (50 -150 ms). Spectra obtained at 303 K were also used to estimate NOE intensities for cross-peaks unresolved at 293 K. NOEs were partitioned into four categories of intensities that were converted into distances ranging from a common lower limit of 1.8 Å to upper limits of 2.8, 3.9, 5.0, and 6.0 Å, respectively. The cross-peak intensity of the H ␦ -H ⑀ protons of Phe 362 was used as a distance reference (2.45 Å). Protons without stereospecific assignments were treated as pseudoatoms, and correction factors were added to the upper and lower distance constraints (38). NOEs back-calculations were performed from calculated structures (see below) by using the standard procedure of X-PLOR 3.1 program (39). Three-dimensional structures were generated from NOE distances with X-PLOR 3.1, using the standard force fields and default parameter sets, except some minor modifications (36) to increase the duration of the molecular dynamic simulations and the number of energy minimization steps. A group of 50 structures was calculated to widely sample the conformational space and the structures were selected on the basis of low energy and NOE violations. The structures were compared by pairwise RMSD for the backbone atom coordinates (N, C␣, and CЈ). Local analogies were analyzed by calculating the local RMSD of a tripeptide window slided along the sequence. Statistical analysis, superimposition of structures, three-dimensional graphic displays, and manipulations were achieved by using RASMOL 2.5 software (40). The secondary structure elements and Ramachandran plots were analyzed according to the Kabsch-Sander definition rules, as incorporated in the program PROCHEK-NMR (41).

Identification of TMD Segments Involved in Heterodimerization
To confirm the role played by the TMDs of HCV envelope glycoproteins in heterodimerization, we used alanine scanning insertion mutagenesis, a technique which has been shown to disrupt helix-helix interaction in a membrane environment (19,20). The TMDs of E1 and E2 are present at the C terminus of these proteins (Fig. 1). The N-terminal amino acids of these domains have not been precisely determined but it has been predicted that they could start at position 353 and 718 for E1 and E2, respectively (16,42). A series of mutated proteins were obtained by introducing a single alanine residue in the TMD of E1 or E2 (Fig. 1). Since HCV envelope glycoproteins are produced after cleavage of a polyprotein, these mutations were introduced in the context of an E1E2 polyprotein. Mutated E1E2 polyproteins were expressed in HepG2 cells by using the vaccinia virus expression system. The ability of the mutants to form a noncovalent E1E2 complex was analyzed by immunoprecipitation with a conformation-sensitive E2-specific mAb (H53) which has been shown to specifically precipitate the native E1⅐E2 complex (15,17). Cells expressing mutated E1E2 polyproteins were pulse-labeled with 35 S-Protein Labeling Mix for 10 min and chased for 4 h. These conditions have been shown to be appropriate to detect the peak of heterodimer formation (10). To evaluate the level of expression of E1, a control immunoprecipitation with a conformation-insensitive anti-E1 mAb (A4) was performed. E1 expression was found to be constant whatever the alanine insertions ( Fig. 2A). Since mAb H53 is E2-specific, and because E2 can fold independently of E1 (11), the amount of E1 co-precipitated by mAb H53 is a good indicator of the assembly of the noncovalent heterodimer. To evaluate the percentage of heterodimerization, E1/E2 ratios were measured for each mutant and compared with the ratio obtained with wild type proteins. As shown in Fig. 2, a severe disruption of heterodimerization was observed when alanine residues were introduced at positions Ala 355 Ј, Ala 358 Ј, Ala 361 Ј, or Ala 369 Ј of the TMD of E1. Similar results were observed for E2 mutants Ala 727 Ј and Ala 730 Ј (Fig. 3); however, the effects of alanine insertion were less dramatic than for E1 mutants. Indeed, for E1 mutants Ala 355 Ј, Ala 358 Ј, Ala 361 Ј, and Ala 369 Ј, heterodimerization was reduced to less than 10% of the wild type level (Fig. 2B), whereas about 35% of E1⅐E2 complex was still detected for the most disruptive mutations in E2 ( Fig. 3; Ala 727 Ј, Ala 730 Ј). It is noteworthy that the intensity of E2 pre- cipitated by mAb H53 was lower for some of the E1 mutants impaired in heterodimerization ( Fig. 2A). It is likely that in the absence of heterodimerization, E1 which does not fold properly in such conditions (see below) interferes with the folding of E2 by forming disulfide-linked aggregated complexes. Since insertion of an alanine close to the predicted N-terminal amino acid of the TMD of E1 impaired E1E2 assembly (Fig. 2, Ala 355 Ј), two additional mutations were introduced outside of this domain (positions 346Ј and 352Ј, see Fig. 1). Insertion at position 346Ј had only a moderate effect on heterodimerization, whereas E1E2 assembly was reduced to approximately 30% for the Ala 352 Ј mutant (Fig. 2), suggesting that a segment involved in E1E2 interaction could extend outside of the TMD of E1. Alternatively, the N terminus of the TMD of E1 might be located upstream of its predicted position. Interestingly, mutations at positions 364Ј and 367Ј of E1 were less disruptive than in surrounding positions (Fig. 2B). These data suggest that two distinct segments involved in heterodimerization are present in E1. It is noteworthy that the second of these segments is located close to the charged residue which has been reported to be important for the multifunctionality of the TMD of E1 (3). Similarly, the segment identified in the TMD of E2 might include the two charged residues involved in the multifunctionality of this domain (Fig. 1). Interestingly, no change in the efficiency of cleavage of E1E2 polyprotein was observed for E1 mutants (data not shown), indicating that the signal sequence function located in the TMD of E1 is not affected by an alanine insertion in this domain.
Assembly of the native E1⅐E2 complex is not efficient. Indeed, misfolded disulfide-linked aggregates are abundant in cells expressing these proteins (9). We were interested to know whether the mutations introduced in the TMD of E1 or E2 would also affect the formation of these aggregates. For this purpose, we used a mAb that we recently described as specific of E1E2 aggregates (43), and we analyzed the formation of E1E2 aggregates in pulse-chase experiments. Similar levels of aggregates were observed for all the alanine mutants (data not shown), indicating that only noncovalent complex formation is impaired by the insertion of an alanine in segments critical for assembly.
Altogether, our data indicate that segments located close to the charged residue(s) of the TMDs of E1 and E2 are directly involved in heterodimerization. In addition, a second segment playing a role in E1E2 assembly is present near residue 358 of the TMD of E1.

Mutants Impaired in Heterodimerization Retain Their ER Retention Function
Since the TMDs of E1 and E2 are ER retention signals (15,16), we wanted to determine whether this function would be maintained despite defective assembly of some mutants. To answer this question, we determined the subcellular localization of some of the mutants showing the most disruptive effect on assembly (Ala 358 Ј and Ala 727 Ј) by analyzing the type of glycans associated with these proteins. We have shown previously that the type of glycans associated with HCV envelope glycoproteins, or chimeras having the TMD of E1 or E2, is a good indicator of their subcellular localization (16,17). To analyze the glycans associated with an E2 mutant, we used a vaccinia virus recombinant expressing E2-Ala 727 Ј alone. Since E1 does not fold properly in the absence of E2 (11), we analyzed the subcellular localization of a chimeric protein CD4-E1-Ala 358 Ј, comprising the ectodomain of CD4 fused to the TMD of E1-Ala 358 Ј, and expressed by using a vaccinia vector. Such a strategy has been used previously to demonstrate that the TMD of E1 is an ER retention signal (16).
To evaluate the level of processing of the glycans associated with E2-Ala 727 Ј or CD4-E1-Ala 358 Ј, HepG2 cells expressing these proteins were pulse-labeled with [2-3 H]mannose, chased for 4 h, and used for immunoprecipitation with an anti-E2 (H53) or anti-CD4 (OKT4) mAb. Glycans associated with these proteins were removed by peptide N-glycanase F treatment and analyzed by HPLC. Such an analysis demonstrated the presence of three species: Man 9 , Man 8 , and Man 7 -GlcNAc2, respectively (Fig. 4). These glycans were similar to those observed for CD4-E1 and -E2 (16,17) and typical of ER-retained glycoproteins, indicating that the alanine residues introduced at position 358Ј of the TMD of E1 and at position 727Ј of E2 do not modify the subcellular localization of CD4-E1 and -E2, respectively.

Folding of E1 Is Less Efficient When Heterodimerization
Is Impaired Since we have previously shown that E1 needs to be coexpressed with E2 to fold properly (11, 16), we were interested to know whether impairment in heterodimerization would lead to less efficient folding of E1. E1 folding was analyzed for one of the most affected mutants (Ala 358 Ј) expressed in the context of an E1E2 polyprotein. Disulfide bond formation in E1 was monitored by SDS-polyacylamide gel electrophoresis under nonreducing conditions as described previously (31). This method takes advantage of an increase in mobility as a protein acquires a compact conformation stabilized by the formation of intramolecular disulfide bonds. An oxidized form of E1, which appeared slowly, was clearly detected in the context of the wild type E1E2 (Fig. 5) as previously observed (31). In contrast, the intensity of the oxidized form of E1-Ala 358 Ј was lower, and a quantitative analysis showed a 40% reduction of this form. It has to be noted that part of E1 separated under nonreducing conditions formed high molecular weight aggregates as a function of time (data not shown), which explains the lower intensity of the bands observed during the chase (Fig. 5). Together, these data indicate that interaction of the TMDs of HCV envelope glycoproteins is important for assisted folding of E1 by E2.

Structural Characterization of the E1 Region Involved in Heterodimerization
Previous reports have shown that alanine insertions disrupt interactions when introduced near the center of transmembrane helices (19,20). Our data indicate that segments located close to the charged residue(s) present in the TMDs of E1 and E2 are directly involved in heterodimerization, suggesting that, despite their charge, these residues might be located near the center of the membrane spanning sequence (see "Discussion"). However, the presence of a second heterodimerization segment in the N-terminal region of the TMD of E1 is a novel finding which is not readily understood. To better understand the results observed for E1 mutants, the structure of a synthetic peptide identical to the sequence of the N-terminal part of the TMD of E1 (denoted E1-(350 -370); see Fig. 1) was studied by CD and NMR. Indeed, CD and NMR spectroscopies can provide conformational information at the residue level on membrane proteins or isolated fragments, especially transmembrane segments incorporated into membrane environment or organic solvent (see Refs. 44 -46 and references therein; for a review, see Ref. 47). For example, Shon et al. (48) have shown that the conformation of bacteriophage Pf1 coat protein with a single transmembrane segment is very similar in detergent micelles and phospholipid bilayer using solution and solid-state NMR spectroscopies, respectively. For larger membrane proteins Arseniev and colleagues (e.g. Refs. 49 -51) have shown that the structure of isolated transmembrane spans of bacteriorhodopsin in the presence of organic solvents or detergent micelles is quite consistent with the structure of the corresponding regions in the whole protein determined by crystallography (52).
CD Analyses-The secondary structure of E1-(350 -370) was readily examined by CD spectroscopy after solubilization in various solvent systems which provide a membrane-like environment. Similar approaches have been used in related studies (see above). Spectra shown in Fig. 6 were obtained in three distinct solvent systems in the presence of 10 mM sodium phosphate (pH 7.4): 50% TFE, 1% lysophosphatidyl choline micelles, and 200 mM SDS micelles. In all cases, the CD spectra of E1-(350 -370) exhibited two minima around 208 and 222 nm, and a maximum at 192 nm. This is typical of a peptide in ␣-helix conformation. Similar results were obtained when using dodecylmaltoside or dodecylphosphocholine as membrane mimetic systems (data not shown). Increasing the TFE concentration above 50% resulted in only minor changes in the amplitude of the spectrum (data not shown), indicating that the maximum ␣-helical folding is reached in 50% TFE. This is consistent with the suggestion of Jasanoff and Fersht (53) that, for a peptide with high helical propensity, the helicity is complete by 50% TFE. The maximal molar ellipticity per residue observed at 222 nm corresponded to an ␣-helical content of 48, 44, and 52% in TFE, SDS, and LPC, respectively. These values  5. Folding of E1 is impaired in E1-Ala 358 . HepG2 cells were co-infected with vTF7-3 and the appropriate vaccinia virus recombinant at a multiplicity of 5 plaque forming units/cell. At 4 h post-infection, cells were metabolically labeled for 5 min and chased for indicated times (min). Cell lysates were immunoprecipitated with mAb A4 (anti-E1), and proteins were separated by SDS-polyacrylamide gel electrophoresis (10% acrylamide) under nonreducing conditions. wt E1, wild type; E1 red , reduced E1; E1 ox , oxidized E1. are similar and clearly indicate that E1-(350 -370) is only partly folded into ␣-helix. Moreover, these results indicate that 50% TFE is an appropriate medium to mimic the membranelike environment of E1-(350 -370) and validate its use to analyze the conformation of this peptide by NMR (for an overview about TFE, see Refs. 54 and 55).
NMR Spectroscopy-Deuterated micellar SDS and dodecylphosphocholine are popular membrane mimetic solvents for structure analysis of membrane peptides by liquid NMR (47). Unfortunately, E1-(350 -370) appeared to be poorly soluble in SDS or dodecylphosphocholine in the millimolar range concentrations required for NMR experiments. Consequently, we studied the three-dimensional structure of E1-(350 -370) dissolved in 50% TFE (v/v) which yielded well resolved spectra as illustrated by the extract of NOESY (Fig. 7A). The spectra were assigned using the classical method (38): the spin systems were identified with DQF-COSY and TOCSY spectra. The sequential assignment was performed with the help of the NOESY spectrum obtained at a mixing time of 150 ms. Despite the poor dispersion of the NH and H␣ resonances (all NH resonances are in a range of 0.9 ppm, and 17 of 21 H␣ resonances are in a range of 0.5 ppm), the sequential attribution of all spin systems was completed (see Fig. 7B). No sign of any other stable peptide conformer was observed in the NMR spectra. The 1 H chemical shifts are available under the BMRB accession number 4699.
Secondary Structure- Fig. 7B shows an overview of the sequential and medium range NOE connectivities and the chemical shift analysis for 1 H␣. Despite the lack of data due to numerous overlapping cross-peaks (indicated by asterisks on Fig. 7B), the analysis of NOEs allows the distinction of several structural regions. The main body of the peptide (Ile 359 -Lys 370 segment) displays most of the characteristics of an ␣-helix conformation, such as strong dNN(i, iϩ1) and medium d␣N(i,  iϩ1) sequential connectivities, and weak d␣N(i, iϩ2), medium  d␣N(i, iϩ3), medium or strong d␣␤(i, iϩ3), and weak d␣N(i, iϩ4) medium range connectivities. The C-terminal part of the peptide (Ala 369 and Lys 370 ) remains more flexible with less medium range connectivities, which is the sign of a fraying end often encountered in peptide termini. The second structural unit (from Gly 354 to Gly 358 ) is charaterized by stronger d␣N(i, iϩ2) constraints and the lack of d␣N(i, iϩ4) connectivities present in more flexible helical structures or that observed in a 3/10 helix. The N-terminal part of the peptide (Gly 350 -Trp 353 ) is devoid of medium range NOEs and remains unstructured.
The 1 H␣ chemical shift difference (⌬␦ 1 H␣) is another NMRrelated parameter used to analyze the protein secondary structure, and it is negligibly affected by addition of up to 50% TFE (56). Relative to a random conformation, an increase of helicity results in an upfield shift of 1 H␣ resonances and a negative value of the corresponding ⌬␦ 1 H␣. In the central and C-terminal part of the peptide (Gly 354 to Trp 370 ), the negative ⌬␦ 1 H␣ is equal to or larger than 0.1 ppm. This is in agreement with the criterion used to define an ␣-helix conformation (57) and confirms the helical conformation of this part of the peptide. The N-terminal part of the peptide (Gly 350 -Trp 353 ) fluctuates positively and negatively, and no secondary structure could be unambiguously established.
Structure Calculation and Analysis-In a first round, only unambiguous NOE constraints were used for structure calculations. An improvement of the constraints set was achieved through NOE back-calculation that allowed the validation of all NOE-derived distance constraints. Finally, a total of 337 interproton distance constraints including 97 sequential and 105 medium range constraints were used for the calculations (Table I). No hydrogen-bond or angle constraints were introduced despite the clear indications of an ␣-helical folding. From the initial 50 structures calculated with X-PLOR 3.1 (39), a final set of 24 low energy structures was selected on the basis of no NOE violation greater than 0.5 Å. The pairwise comparison of these structures revealed only one structural family. This is illustrated in Fig. 8A which shows a view of the 24 structures superimposed from the Ile 359 to Asn 367 . The overall energy of the 24 structures is largely negative with values of less than Ϫ69 kcal mol Ϫ1 (Ϫ288 kJ mol Ϫ1 ). The stereochemical properties of the backbone dihedral angles provided by the Ramachandran plots showed that all the residues are located in the allowed regions (Table I). The parameters analyzing the deviation from the idealized covalent geometry are homogeneous. In summary, all the 24 structures fully satisfied the experimental data and fit well with the finding of one family of structures. The best representative structure of this family is presented in Fig. 8B.
The ␣-helix is well defined between residues Ile 359 and Asn 367 as reflected by a low RMSD for the backbone heavy atoms (CЈ, C ␣ , and N) of 0.31 Å and 1.03 Å when all the heavy atoms are taken into account (Table I). Both the C-and Nterminal parts of the peptide appear to be rather disordered.
However, some helical folding was observed on the C-terminal part of the peptide (Trp 368 -Lys 370 segment) in calculated structures (Fig. 8A). This is in agreement with the presence of some medium range NOE constraints in this region (see Fig. 7). Concerning the N-terminal part, it should be noted that the Gly 354 -Gly 358 segment was mostly observed as helical in each calculated structure (Fig. 8, B and C), but failed to superimpose in all structures. This could be related to the weak intensities of medium range ␣-helix NOEs in this region (see Fig. 7). This structural instability of the N terminus of the peptide is very likely due to the presence of glycine residues that are well known to be helix breakers. However, in the full-length E1 glycoprotein context and/or E1E2 heterodimer context, it is very likely that the stabilization of the conformation of this segment arises from its interaction with other segments of these proteins. In such a context, the presence of an helical folding of the Gly 350 -Gly 358 segment is possible. Fig. 8D shows a theoretical ␣-helix projection of the Ala 349 -Ile 359 segment assuming that such a conformation might occur in the fulllength protein context. This representation highlights that the three glycine residues 350, 354, and 358 are located on the same side of the putative ␣-helix, suggesting a role for these glycines in E1E2 heterodimerization.

DISCUSSION
The TMDs of HCV envelope proteins are extreme examples of multifunctionality of membrane spanning sequences. Indeed, besides their role as membrane anchor, they possess a signal sequence function in their C-terminal half, are responsible for ER localization of E1 and E2, and play a role in the assembly of these proteins. In this work, we used alanine scanning insertion mutagenesis to study the oligomerization function of these domains. We demonstrated that the TMDs of E1 and E2 play a direct role in heterodimerization and that E1E2 assembly can be disrupted without affecting the other functions. Moreover, whereas one segment was found to be critical for heterodimerization in the TMD of E2, two structurally different segments were identified for E1. In addition, we determined the three-dimensional structure of E1-(350 -370) peptide in the presence of membrane mimetic solvents by NMR to provide a structural framework for a better analysis of the mechanism of E1E2 heterodimerization.
The TMDs of HCV envelope glycoproteins play a direct role in heterodimerization. Deletion or replacement of the C-terminal hydrophobic sequence of E2, or mutation of the charged residues in the TMDs of E1 or E2 have been shown to dramatically reduce the formation of E1⅐E2 complexes (3,11,15,18). Although these data indicate that the TMDs are involved in the assembly of HCV envelope glycoproteins, the effect of the mutations of charged residues might be indirect. Indeed, these mutations lead to secretion or transport of the mutated protein out of the ER compartment, and the lack of complex formation could be due to the impossibility for E1 and E2 to find each other at the time of assembly. However, E1 and E2 interact before completion of their folding (9), and this process is supposed to occur in a specific compartment, the ER (58). The impairment in heterodimerization following alanine scanning insertion mutagenesis clearly shows that the TMDs of E1 and E2 are directly involved in assembly. Indeed, this technique has been shown to be very useful to identify critical segments of transmembrane ␣-helices involved in helix-helix interactions (19,20). In addition, data of alanine scanning insertion mutagenesis (20,59) fit very well with a recent model predicting the stability of transmembrane helix-helix interactions in mutants of glycophorin A (60). Interestingly, with the exception of the impairment in assisted folding of E1 by E2, no other functions of the TMDs of HCV envelope glycoproteins seems to be altered. This is clearly different from replacement mutagenesis of the charged residues which leads to an alteration of all the functions played by these TMDs (3). The TMDs of E1 and E2 are probably not the sole determinants for heterodimerization. N-terminal sequences in E2 and also in E1 have been suggested to be important for HCV envelope glycoprotein assembly (61), but deletion mutant analysis has failed to identify any single region which is required for noncovalent interaction (62). Interestingly, assisted folding of the ectodomain of E1 by E2 suggests that regions other than the TMDs might enter into contact. For many viral envelope proteins, the ectodomains have been shown to be involved in oligomerization (63), a feature which is important to regulate the fusogenic function of these proteins (64). Why should there be different regions involved in heterodimerization of HCV envelope glycoproteins? Assembly and folding of HCV envelope glycoproteins seem to be interconnected events. It is likely that an early contact between these proteins, probably initiated by their TMDs, is necessary to bring their ectodomains into contact, which seems to be necessary for the formation of a native complex. This could explain why an alanine insertion mutation which is disruptive for assembly is also disruptive for the assisted folding of E1 by E2.
The data of alanine scanning insertion mutagenesis in the TMD of E2 suggest that this domain forms a single transmembrane segment with the charged residues located in the middle of the membrane spanning sequence. This fits well with the prediction of a single transmembrane helical segment from residue 718 to 742. Indeed, a minimum of 16 leucines are required to form a transmembrane ␣-helix (65), but stretches of 20 -25 hydrophobic amino acids are generally observed in integral membrane proteins. Although the presence of charged residues within the center of this transmembrane helix seems to be surprising a priori, it is very unlikely that segment 718 -742 adopts a stable conformation allowing two transmembrane passages. In such eventuality, each of the two small hydrophobic stretches (718 -727 and 731-742) would have to adopt an extended structure. Such an extended structure is very unfavorable in a membrane environment because of the absence of stabilization by hydrogen bonds, while the hydrogen-bond network of an ␣-helix offers the most stable solution from a thermodynamic standpoint. The formation of transmembrane ␤-sheets as observed in the porin family of membrane proteins also yields a stable hydrogen-bond network compatible with the membranous environment. However, examination of the nature and position of residues in the TMD of E2 does not provide any sign that such a stable conformation might take place. Concerning the potential problem of the 2 charged residues within the membrane (Asp and Arg), it is possible that they could form ion pairs as already described in the cases of several membrane proteins (66). Taken together, these data clearly support the hypothesis that the TMD of E2 is a single transmembrane helix centered near amino acid 730.
Concerning the TMD of E1, the data of alanine scanning insertion mutagenesis allowed us to identify two distinct segments involved in heterodimerization with the TMD of E2. First, similarly to what was observed for the mutations in E2, the disruptive effect of Ala 369 Ј inserted very close to Lys 370 suggests that this charged residue might also have a central position in the membrane. Second, the N-terminal region of the TMD of E1 appears to be very sensitive to alanine insertions. This is likely due to the presence of GXXXG motifs which have been well documented to ensure specific homodimerization of transmembrane segments in membrane proteins (see below). These features, together with the thermodynamic reasons exposed above for the TMD of E2, strongly suggest that E1 is anchored in the membrane by a single transmembrane ␣-helix. This is also supported by the fact that in the E1-(350 -370) model (see Fig. 8), the Ile 359 -Asn 367 segment was found to form a stable ␣-helix and the Gly 354 -Gly 358 segment has a clear tendency to adopt an helical fold (see Fig. 8). In addition, taking into account that aromatic residues and positively charged residues are often encountered at the membrane interface, it might be predicted that His 352 and Trp 353 residues form the N terminus of the TMD of E1. In summary, we thus assume that the 25-amino acid long segment Gly 354 -Phe 379 forms a single transmembrane helix and is the counterpart of the E2 transmembrane helix. It should be underlined that a single membrane spanning topology for the TMDs of E1 and E2 would also fit very well with the current model of ER retention by membrane determinants. A usual feature of such signals is the presence of one or several hydrophilic residues within the hydrophobic TMD (67)(68)(69)(70), and their effect is strongest when they are localized toward the middle of the membrane (67).
To sequentially ensure some of their functions, the TMDs of E1 and E2 might adopt different topologies. Besides their role in heterodimerization, these TMDs also have a signal sequence function in their C-terminal half (3). Since the ectodomains of E1 and E2 are translocated into the lumen of the ER, the N terminus of their TMD should also be oriented toward the luminal side of the ER. The signal sequence function present in the second half of these TMDs suggests that the C terminus of these domains should also be oriented, at least transiently, toward the luminal side of the ER. These domains are therefore likely to form a hairpin structure with the charged residues close to the cytosolic face of the membrane. It is reasonable to think that this hairpin structure might be transient in the environment of the translocon before signal sequence cleavage has occurred. A reorientation of the second stretch of hydrophobic residues leading to a single transmembrane segment would occur immediately after signal sequence cleavage at its C terminus and before membrane integration and heterodimerization.
Assuming that the TMDs of E1 and E2 are both composed of a single transmembrane ␣-helix, it is reasonable to postulate that these 2 helices interact together to ensure E1E2 heterodimer formation. However, the putative existence of a single transmembrane ␣-helix between Gly 354 and Phe 379 suggests that part of the Gly 350 -Gly 354 segment might lie outside of the membrane while it is sensitive to alanine insertion. From a structural point of view, this segment appears to be unfolded (see Fig. 8), but this is likely due to the absence of conformational stabilization by tertiary interactions in the isolated E1-(350 -370) peptide context used here. In the E1 and/or E1E2 context, it is thus possible that the Gly 350 -Gly 354 segment folds into an ␣-helix and that the whole E1-(350 -379) segment forms a single long ␣-helix upon E1E2 heterodimerization. Interestingly, the simulation of such conformation for the Gly 350 -Gly 358 segment clearly shows that the three well conserved glycine residues (Gly 350 , Gly 354 , and Gly 358 ) lie on the same side of the putative helix (see helix projection, Fig. 8D) and form 2 consecutive putative GXXXG motifs (Gly 350 -Gly 354 and Gly 354 -Gly 358 ). The presence of such a glycine motif has been reported to be essential to ensure the specific helix to helix interaction for several transmembrane proteins such as glycophorin A (71) and phage M13 coat protein (72). Moreover, the importance of the GXXXG motif has been recently highlighted by two articles of Engelman and colleagues (73,74): (i) in one experimental study, they have selected TMDs exhibiting high affinity homooligomerization from a randomized sequence library based on the right-handed dimerization motif of glycophorin A and this GXXXG motif was the most frequent isolated using the TOX-CAT system which measures transmembrane helix-helix association in the Escherichia coli inner membrane (73); (ii) in a theoretical study, they have analyzed frequently occurring combinations of residues in a data base of putative TMDs and the main theme observed is patterns of small residues (Gly, Ala, and Ser) at i and iϩ4 found in association with large aliphatic residues (Ile, Val, and Leu) at neighboring positions (i.e. iϩ/Ϫ1 and iϩ/Ϫ2) (74). A very suggestive correlation can be made with our studies: coinciding with the most dramatic effects of Ala insertion, the TMD of E1 displays in its N-terminal part two successive GXXXG motifs, and the second one is GVXXGI. Consequently, it is tempting to speculate that the 2 glycine motifs of E1 play a major role for E1E2-specific heterodimerization and are located at the interface of E1 and E2 dimer.
In conclusion, the structural and functional analyses reported here allowed us to show for the first time that oligomerization of viral envelope proteins can be controlled by transmembrane sequences and to propose a molecular framework for the understanding of the mechanism of E1E2 heterodimerization. Besides the comprehension of the assembly of HCV envelope glycoproteins, the multifunctionality of the TMDs of E1 and E2 provides a very interesting model to study the structure/function relationship of transmembrane sequences.
The functions currently identified for these proteins are taking place during the early events of E1E2 synthesis and assembly. Such a model will also be very helpful to understand the complexity of the dynamic relationship between neosynthesized proteins and the translocation machinery.