Crystal Structure of a “Nonfoldable” Insulin

Protein evolution is constrained by folding efficiency (“foldability”) and the implicit threat of toxic misfolding. A model is provided by proinsulin, whose misfolding is associated with β-cell dysfunction and diabetes mellitus. An insulin analogue containing a subtle core substitution (LeuA16 → Val) is biologically active, and its crystal structure recapitulates that of the wild-type protein. As a seeming paradox, however, ValA16 blocks both insulin chain combination and the in vitro refolding of proinsulin. Disulfide pairing in mammalian cell culture is likewise inefficient, leading to misfolding, endoplasmic reticular stress, and proteosome-mediated degradation. ValA16 destabilizes the native state and so presumably perturbs a partial fold that directs initial disulfide pairing. Substitutions elsewhere in the core similarly destabilize the native state but, unlike ValA16, preserve folding efficiency. We propose that LeuA16 stabilizes nonlocal interactions between nascent α-helices in the A- and B-domains to facilitate initial pairing of CysA20 and CysB19, thus surmounting their wide separation in sequence. Although ValA16 is likely to destabilize this proto-core, its structural effects are mitigated once folding is achieved. Classical studies of insulin chain combination in vitro have illuminated the impact of off-pathway reactions on the efficiency of native disulfide pairing. The capability of a polypeptide sequence to fold within the endoplasmic reticulum may likewise be influenced by kinetic or thermodynamic partitioning among on- and off-pathway disulfide intermediates. The properties of [ValA16]insulin and [ValA16]proinsulin demonstrate that essential contributions of conserved residues to folding may be inapparent once the native state is achieved.

A fundamental problem in biophysics is posed by the efficiency of protein folding (1,2). The capability of folding is an evolved property of biological sequences, i.e. not broadly representative of polypeptides as a class of heteropolymers (3). A funnel-shaped free-energy landscape enables an efficient conformational search leading by multiple trajectories to the native state (4 -6). What distinguishes foldable from nonfoldable sequences (7), and how do productive trajectories avoid bottlenecks (8 -10)? Despite the centrality of these questions, experimental approaches are limited. Here, we combine biophysical and cellular studies to identify a residue critical to the efficiency of oxidative folding. A model is provided by insulin, a globular protein central to the regulation of vertebrate metabolism (11).
Following the folding of proinsulin in the ER, the biosynthetic pathway of insulin imposes additional structural constraints. After transit through the Golgi apparatus and entry into immature secretory granules (40), a specific set of prohormone convertases proteolytically excises the C-peptide at conserved dibasic sites (BC and CA junctions; Fig. 1, A and B,  green), liberating the bioactive hormone (41)(42)(43). Insulin thus contains two chains, A (21 residues) and B (30 residues), and is stored as Zn 2ϩ -stabilized hexamers within specialized secretory granules (Fig. 1C) (44). The hexamers dissociate upon secretion into the portal circulation, enabling the circulating hormone to function as a Zn 2ϩ -free monomer (Fig. 1C, right).
This study investigates a mutant proinsulin defective in folding and yet active and well organized once the native state is achieved. Substitution of Leu A16 by Val (45) reduces the efficiency of disulfide pairing in vitro and leads in mammalian cells to accelerated degradation associated with ER stress (46,47). Despite these perturbations, the crystal structure of the variant (as a mutant insulin) is essentially identical to wild type (48,49). We propose that Val A16 allows formation of off-pathway oxidative intermediates and/or destabilizes on-pathway intermediates (50). Such perturbations are similar to those proposed to underlie a newly recognized syndrome of toxic protein misfolding, permanent neonatal-onset diabetes mellitus (DM) due to mutations in proinsulin (36 -39). Although to date clinical mutations (excluding insertion or removal of cysteines) have clustered in the B-domain, our results demonstrate that the A-domain is also required for efficient folding. The aberrant properties of [Val A16 ]proinsulin thus provide a model for the molecular genetics of a disease of protein misfolding.

MATERIALS AND METHODS
Chemical Synthesis-Insulin was provided by Eli Lilly and Co.; S-sulfonate B-chain derivatives were obtained by oxidative sulfitolysis (51). A-and B-chain analogues were otherwise prepared by solid-phase synthesis (51). Insulin analogues were prepared by chain combination as described previously (45). Predicted molecular masses were confirmed by mass spectrometry.
Recombinant Expression of Proteins-Wild-type and variant proinsulins were expressed in Escherichia coli (52); wild-type and variant mini-proinsulins (53-residue construct) were expressed in folded secreted form in Pichia pastoris (53).
Mammalian Cell Culture-HEK293T cells were cultured at 37°C in high glucose Dulbecco's modified Eagle's medium containing 10% fetal bovine serum and 0.1% penicillin/streptomycin with 5% CO 2. CHO-CLA14 cells were maintained in low glucose Dulbecco's modified Eagle's medium plus 400 g/ml G418 and 200 g/ml hygromycin at 37°C with 5% CO 2 (54). For metabolic labeling, cells were plated into 6-well plates 1 day before transfection. Plasmid DNA (2 g) was transfected into each well using Lipofectamine (Invitrogen). For studies of the unfolded protein response (UPR), cells were seeded into FIGURE 1. Proinsulin and its biosynthetic pathway. A, pathway of insulin biosynthesis beginning with preproinsulin (top): signal peptide (gray), B-domain (blue), dibasic BC junction (green), C-domain (black), dibasic CA junction (green), and A-domain (red). In the ER the unfolded prohormone undergoes specific disulfide pairing to yield native proinsulin (middle panels). Cleavage of BC and CA junctions (by prohormone convertases PC1 and PC2 and by carboxypeptidase E) leads to mature insulin and the C-peptide (bottom). B, structural model of insulin-like moiety and disordered connecting peptide (dashed line). The A-and B-domains are shown in red and blue, respectively; the disordered connecting domain is shown in dashed black line. Cystines are labeled in yellow boxes. C, cellular pathway of insulin biosynthesis: nascent proinsulin folds as a monomer in ER (left) wherein zinc-ion concentration is low; in post-Golgi granules proinsulin is processed by cleavage of connecting peptide to yield mature insulin, and zinc-stabilized hexamers begin to assemble. Zinc-insulin crystals are observed in secretory granules. On secretion into the portal circulation (right), hexamers dissociated to yield bioactive insulin monomers. rER, rough endoplasmic reticulum; SRP, signal recognition particle.
24-well plates 1 day before transfection. Plasmid DNAs containing proinsulin (wild type or variant) were co-transfected with plasmids (provided by R. Kaufman) encoding luciferase (driven by BiP promoter) and ␤-galactosidase (driven by CMV promoter) at 5:2:1 ratio using Lipofectamine; assays were performed in triplicate.
Metabolic Labeling and PAGE-At 40 h post-transfection, cells were preincubated in methionine/cysteine-deficient medium for 30 min, metabolically labeled in the same medium containing 35 S-labeled Met and Cys for 1 h, washed once with complete medium, and chased for the indicated times. To examine degradation, labeled cells were chased for 4 h with or without 20 M MG132 or lactacystin. After chase media were collected, cells were lysed in 100 mM NaCl, 1% Triton X-100, 0.2% sodium deoxycholate, 0.1% SDS, 10 mM EDTA, and 25 mM Tris-HCl (pH 7.4). Lysates and chase media were immunoprecipitated with guinea pig anti-insulin antiserum (LINCO Diagnostics, Inc.) and analyzed by Tris-Tricine/urea-SDS-PAGE under reducing and nonreducing conditions (33,55).
UPR Assay and Statistical Analysis-The UPR assay (see supplemental Fig. S1 and supplemental Table S1) was performed as described previously (56). At 40 h post-transfection cells were washed three times with phosphate-buffered saline, lysed in luciferase assay buffer, and transferred to a 96-well plate; luminometry was performed in a Turner Biosystems plate luminometer using the Dual-Light assay system (Applied Biosystems). Statistical analysis employed a randomized block design analysis of variance model after checking for appropriateness of a normality assumption on errors using a Q-Q plot with Wilk-Shapiro test. Testing (two-tailed) was done at a significance level of ␣ ϭ 0.05. When significant findings were discovered, a nonparametric version of analysis of variance was run to ensure consistency.
Spectroscopy-CD spectra, obtained using an Aviv spectropolarimeter equipped with a titration unit for denaturation studies, were measured at a protein concentration of 5 M at 4°C (57). 1 H NMR spectra were obtained at 700 MHz in D 2 O solution at pH 7.0 and 25°C (21).
Thermodynamic Modeling-CD-detected guanidine denaturation data (222 nm) were fitted by nonlinear least squares to a two-state model (supplemental Table S2) as described previously (58). In brief, CD data, (x), were fitted by a nonlinear least-squares program according to Equation 1, where x is the concentration of guanidine hydrochloride, and where A and B are base-line values in the native and unfolded states. These base lines were approximated by pre-and post-tran- Fitting the original CD data and base lines simultaneously circumvents artifacts associated with linear plots of DG as a function of denaturant according to ⌬G 0 (x) ϭ ⌬G H 2 O 0 ϩ m 0 x (for reviews see Refs. 58,59). Crystallography-Crystals were grown by hanging-drop vapor diffusion with a 1:2.5 ratio of Zn 2ϩ to protein monomer and 3.7:1 ratio of phenol to protein monomer (49). Drops consisted of 1 l of protein solution (10 mg/ml in 0.02 M HCl) mixed with 1 l of reservoir solution (0.02 M Tris-HCl, 0.05 M sodium citrate, 5% acetone, 0.03% phenol, and 0.01% zinc acetate (pH 8.0)). Crystals (space group R3) were obtained at room temperature after 2 weeks. Data were collected from single crystals flash-frozen to 100 K. Reflections from 22.58 to 1.8 Å were measured using synchrotron radiation (Chess, Cornell University). Data were processed with programs DENZO (version 1.9.6) and SCALEPACK (version 1.9.6). The structure was determined by molecular replacement using CNS; initial model was the wild-type TR f dimer (PDB identifier 1RWE) following removal of water, zinc, and chloride ions. A translation-function search was performed following rotation-function analysis of data between 15.0 and 4.0 Å. Rigid-body refinement using CNS, employing overall anisotropic temperature factors and bulk-solvent correction, yielded values of 0.31 and 0.30 for R and R free , respectively, for data between 19.2 and 3.0 Å resolution. Between refinement cycles, 2F o Ϫ F c and F o Ϫ F c maps were calculated using data to 3.0 Å resolution; zinc, chloride ions, and phenol molecules were built using O (60). Water molecules were calculated and checked using DDQ (61). The geometry was monitored with PROCHECK (62); zinc ions and water molecules were built as refinement proceeded. Further refinement employed CNS (63), which implements maximum-likelihood torsion-angle dynamics and conjugate-gradient refinement. Statistical data are provided in supplemental Table S3. Potential packing defects were calculated with Surfnet (64).

RESULTS
The hydrophobic core of insulin is conserved among vertebrates (12); the neighborhood of cystine A20 -B19 is structurally invariant (48,65,66). Studies of insulin-related polypeptides have demonstrated the importance of cystine A20 -B19 as a first step in disulfide pairing (67)(68)(69). Leu A16 , inaccessible in the native state ( Fig. 2A, red), adjoins Tyr A19 and Cys A20 (Fig.  2B, blue and gold, respectively) along the inner surface of the A12-A19 ␣-helix and projects between Cys A11 and Leu B15 (gold and gray). Hydrophobic collapse of these side chains in the nascent polypeptide is proposed to orient Cys A20 and Cys B19 for initial pairing (20,29,57).
Val A16 Impairs Insulin Chain Combination and Refolding of Proinsulin-Insulin chain combination provides a peptide model of proinsulin refolding (70). The reaction leads to native disulfide pairing, demonstrating that information required for proinsulin folding is contained within the A and B sequences (71). Yield is limited by off-pathway products under kinetic control (disulfide-linked cyclic A-chains, cyclic B-chains, B-chain dimers, and B-chain polymers). The robustness of chain combination to amino acid substitutions has enabled synthesis of hundreds of analogues in past decades. As an initial test of the importance of Leu A16 in disulfide pairing, we prepared a variant A-chain containing Val A16 . Reactions employed 80 mg of wild-type A-chain and 40 mg of B-chain (each as S-sulfonate derivatives). Whereas this protocol ordinarily yields 8 -9 mg of wild-type insulin (following purification by reverse-phase high performance liquid chromatography (HPLC) (51,57,72)), no product was detectable in three attempted reactions. Because the threshold of HPLC detection is Ͻ40 g, the efficiency of chain combination was reduced by Ͼ200-fold.
Refolding of proinsulin is more efficient than chain combination (73). Accordingly, we sought to prepare Val A16 -human proinsulin (position 81 of the polypeptide) by recombinant expression in E. coli. Bacterial expression gave rise to inclusion bodies containing reduced and unfolded proinsulin; following purification, oxidative refolding ordinarily yields native disulfide pairing (52,74). Expression of wild-type and variant polypeptides within inclusion bodies achieved similar levels; the reduced polypeptides were purified in similar yield. Although refolding of wild-type proinsulin was robust (52), however, the yield of [Val A16 ]proinsulin was negligible. The refolding mixture gave rise to a broad and inhomogeneous HPLC elution profile, suggesting the presence of multiple products.

Val A16 Blocks the Cellular Folding of Proinsulin and Induces ER
Stress-Cellular folding efficiencies of variant proinsulins were evaluated in CLA14 cells (a subclone of the Chinese hamster ovary cell line) (54) and in human cell line HEK293T. Following transfection, we examined expression, disulfide isomer formation, and secretion of newly synthesized proinsulin as radiolabeled with 35 S-labeled amino acids for 1 h and chased for 4 h (Fig.  3, A-C). Denaturing PAGE (Tris-Tricine/urea-SDS-PAGE) in the absence of reduction (Fig. 3A) permitted examination of distinct proinsulin disulfide isomers as formed in the ER (33,35); with reduction, this gel system provides a probe of total protein expression (Fig. 3B). The wild-type construct gave rise to robust expression, primarily of a fast migrating species; previous studies established that the latter is the native species (33). This species is efficiently secreted from transfected cells (Fig. 3, lane C) to medium (lane M). Two minor species are also present as slower migrating isomers with mispaired disulfide bonds; these exhibit lower secretion efficiency (Fig. 3, brackets). In CLA14 cells, substitution of Leu A16 by Val markedly impairs expression (relative to wild-type; Fig.  3, A and B, lanes 3 and 4 and 3Ј and 4Ј) of the variant proinsulin in the ER (lanes 7 and 7Ј) and blocks its secretion (lanes 8 and 8Ј).
In the course of screening a variety of cell lines, we observed that HEK293T cells generally exhibit higher expression of human proinsulin (e.g. Fig. 3C, lanes 13 and 14) relative to rodent CLA14 cells (lanes 3 and 4). We further observed that HEK293T cells impose more stringent quality control as defined by the relative amounts of nonnative isomers secreted into the medium. Subsequent experiments thus utilized this human cell line. In HEK293T cells, folding of [Val A16 ]proinsulin in the ER preferentially yields non-native isomers (  The solvent-exposed A7-B7 disulfide bridge is shown in gold (top); internal cystines A6 -A11 and A20 -B19 are not visible. B, corresponding ribbon model in same orientation showing Leu A16 in relation to Tyr A19 (blue) and the internal side chains of Ile A2 (black), Leu B11 (gray), and Leu B15 (gray). The A and B main chains are shown as gray and black ribbons, respectively. The three disulfide bridges (labeled at left) are shown as gold spheres. Coordinates were obtained from 2-Zn insulin molecule 1 (Protein Data bank code 4INS).
A probe for UPR activation was provided by up-regulation of ER resident chaperone immunoglobulin-binding protein (76). An assay for induction of the UPR transcriptional program was provided by co-transfection of a proinsulin expression plasmid with a plasmid employing the BiP promoter to drive expression of luciferase (56). Transfection efficiency was normalized by co-transfection of a ␤-galactosidase construct driven by a viral promoter (cytomegalic virus; CMV). Each experiment was repeated three times with triplicate samples (nine in total), enabling statistical significance to be evaluated within and between data sets. Whereas co-transfection of wild-type insulin caused an ϳ2-fold increase in luciferase expression relative to base line, co-transfection of [Val A16 ]proinsulin induced an ϳ4-fold increase (p Ͻ 0.02); the extent of UPR activation by the proinsulin variant is similar to that induced by addition of tunicamycin (an inhibitor of protein glycosylation in the secretory pathway) to control cells co-transfected with the empty parent plasmid (see supplemental Fig. S1 and supplemental Table S1).
Partial Rescue of Folding by a Second-site Substitution-In the course of screening variant proinsulins, we observed that substitution Thr A8 3 His enhanced expression and secretion  1Ј-10Ј) conditions. CLA14 cells were transfected to express wild-type proinsulin (pro, lanes 3 and 4, 3Ј and 4Ј) or variants Thr A8 3 His (lanes 5 and 6, 5Ј and 6Ј), Leu A16 3 Val (lanes 7 and 8, 7Ј and 8Ј), or both His A8 ,Val A16 (lanes 9 and 10, 9Ј and 10Ј). Lanes 1 and 2, 1Ј and 2Ј provide an empty-vector control (con). The A8 substitution enhances overall expression and secretion; secretion of non-native isomers is also increased. At 48 h, cells were pulse-labeled with 35 S-labeled amino acids for 1 h and chased for 1 h. Chase media (M) were collected, and cells (C) were lysed; each fraction was immunoprecipitated with anti-insulin antiserum. Prior to transfections, 15 mM isopropyl ␤-D-1-thiogalactopyranoside was added to induce the expression of ER chaperone ATF6 as described previously (54). C, corresponding pulse-chase studies in HEK293T cells as analyzed under nonreduced conditions (33). For Val A16 mutant (lanes 15 and 16, cellular (C) and medium (M)), a higher fraction of nascent polypeptide migrated as misfolded disulfide isomers relative to wild-type proinsulin (lanes 13 and 14); an empty-vector control is provided in lanes 11 and 12. D, effect of proteosome inhibitor lactacystin (lactacys) on intracellular proinsulin expression in transfected HEK293T cells expressing newly synthesized 35  in CLA14 cells (lanes 5 and 6 in Fig. 3A and counterparts in Fig.  3B). This surface substitution (found in avian insulins) in principle enhances the ␣-helical propensity of the A1-A8 segment (77,78). We therefore tested whether His A8 might rescue the folding defect caused by Val A16 (Fig. 3A and counterparts in Fig.  3B, lanes 9 and 10; lanes 9Ј and 10Ј). Expression and secretion are partially restored relative to Val A16 alone but still markedly decreased relative to wild type. Refolding studies of [His A8 ,Val A16 ]proinsulin, obtained by recombinant expression in E. coli, yielded a distribution of misfolded products similar to those observed in studies of [Val A16 ]proinsulin, 6 likewise precluding isolation of the folded variant.
Despite the slight effects of His A8 in partial rescue of the Val A16 -associated folding defect, we undertook synthetic studies in an effort to obtain a sufficient quantity of a Val A16 -containing insulin analogue for analysis of structure and function. The efficiency of wild-type chain combination was enhanced by ϳ20% by the stabilizing His A8 substitution. More striking effects were observed in chemical synthesis of [His A8 ,Val A16 ]insulin; efficiency of chain combination rose from essentially unobservable (above) to "only" 8-fold impaired. Although this yield remained anomalously low, it represented an augmentation of Ͼ25-fold and, through brute force of mass action, enabled isolation of the analogue in milligram quantities.
Folding Efficiency and Stability Are Uncorrelated-Cellular folding may be impaired by one or more of several molecular mechanisms as follows: a kinetic block en route to an on-pathway intermediate, aberrant stabilization of an off-pathway intermediate, or instability and degradation of the folded state, once reached. The above PAGE assay, by distinguishing between native and non-native disulfide isomers, provided evidence for inefficient disulfide pairing but cannot distinguish between these possibilities. This issue was further pursued by correlating cellular and biophysical studies. Because proinsulin consists of a folded insulin moiety and disordered connecting domain, effects of amino acid substitutions on native-state stability were investigated in corresponding insulin analogues. CD provided a convenient probe for fractional unfolding as a function of chemical denaturation (Fig. 4). Fits to a two-state model (58) in each case were characterized by R-values greater than 0.99.
The stability of [His A8 ,Val A16 ]insulin was observed to be reduced relative to insulin and [His A8 ]insulin (Fig. 4A). Its free energy of unfolding (⌬G u ), as inferred from a two-state model (58), is 2.1 Ϯ 0.1 kcal/mol (supplemental Table S2). This represents a perturbation (⌬⌬G u ) of 2.8 Ϯ 0.2 kcal/mol relative to [His A8 ] (⌬G u 4.9 kcal/mol), presumably due to the following: (i) substitution of a ␤-branched residue in an ␣-helix (79), and (ii) a packing defect in the hydrophobic core (80) with possible transmitted structural changes (81). Native disulfide pairing in the three analogues was verified in each case by x-ray crystallography (below).
To test whether decreased native-state stability might in itself be responsible for the impaired folding efficiency of [Val A16 ]insulin and [His A8 ,Val A16 ]proinsulin, we investigated the cellular folding and secretion of two partially folded mutant proinsulins, a two-disulfide analogue containing pairwise substitution of Cys A6 and Cys A11 by Ser (and so lacking cystine A6 -A11; Fig. 3, lanes 29 and 30), and an analogue containing core substitution Ile A2 3 Gly (lanes 31 and 32). Biophysical studies of these analogues have previously been described in the context of an engineered monomer (DKP-insulin) (82). Native disulfide pairing was verified by two-dimensional NMR spectroscopy (21,83).
The two-disulfide analogue [Ser A6 ,Ser A11 ]DKP-insulin adopts a flexible conformation of marginal stability lacking the A1-A8 ␣-helix; [Gly A2 ]insulin also exhibits local unfolding of this ␣-helix (21,24,57,83). Denaturation studies indicated that the stabilities of these analogues are reduced to an extent similar to or greater than that of [His A8 ,Val A16 ]insulin ( Fig. 3B and supplemental Table S2), and yet the corresponding mutant proinsulins were efficiently folded and secreted by HEK293T cells (Fig. 3E). Chain combination yields in their chemical synthesis were likewise robust. The contrast between the folding properties of Val A16 -containing variants, on the one hand, and the unstable control analogues, on the other, suggests that Leu A16 contributes to efficiency of disulfide pairing in a way that is disproportionate to its role in the native state. Efficient folding, trafficking, and secretion of [Ser A6 ,Ser A11 ]proinsulin and [Gly A2 ]proinsulin further suggest that native structural organi- 6 Secretion of a corresponding variant single-chain insulin analogue in yeast P. pastoris, ordinarily robust to diverse mutations, was impaired by 40-fold by the paired substitutions His A8 ,Val A16 (from 40 to 1 mg/liter).  Table S2) (58). GuHCl, guanidine-HCl.
zation (i.e. folding of the A1-A8 ␣-helix) is not required to pass quality control checkpoints in the ER and secretory pathway. Structure-Function Relationships-Despite its impaired folding efficiency, the affinity of [His A8 ,Val A16 ]insulin for the insulin receptor (K d 0.15 Ϯ 0.03 nM) is almost 3-fold higher than that of wild-type insulin (K d 0.41 Ϯ 0.06 nM). Because [His A8 ]insulin exhibits similarly enhanced activity (77,78), these results indicate that, once folding is achieved, substitution of Leu A16 by Val does not perturb receptor binding. A probe for maintenance of the receptor-binding surface was also provided by dimerization. Comprising the central ␣-helix and C-terminal ␤-strand of the B-chain, the dimer interface of insulin contains multiple side chains also involved in receptor binding (48). This surface is buttressed by Leu A16 (45), and so its maintenance or perturbation in A16 analogues provides a read-out of possible transmitted conformational changes. Because the interface contains aromatic side chains (Tyr B16 , Phe B24 , Phe B25 , and Tyr B26 ), dimerization may conveniently be monitored by 1 H NMR spectroscopy (82,84). Our studies were simplified by incorporation of substitution His B10 3 Asp to block the independent trimerization surface (82, 85) (which would otherwise lead to further self-assembly) (48). Aromatic 1 H NMR spectra of [Asp B10 ]insulin and [Asp B10 ,Val A16 ]insulin exhibit similar concentration-dependent resonance broadening between 50 and 300 M (Fig. 5, A and B), indicating analogous dimerization properties. The spectrum of a related monomeric analogue retained a native-like pattern of chemical shifts ([Val A16 ]DKP-in-sulin; Fig. 5C). Although dispersion is partially attenuated, its two-dimensional NOESY spectrum contains long range inter-residue cross-peaks characteristic of native-like tertiary structure (supplemental Fig. S3).
To investigate the structure of [His A8 ,Val A16 ]insulin in detail, single crystals were grown in the presence of zinc ions under conditions well characterized in wild-type insulin (48,86). The crystals diffracted to a resolution of 1.8 Å with unit-cell dimensions consistent with hexamer form T 3 R f 3 . The variant structure was determined by molecular replacement; statistical information is provided in the supplemental Table S3. Wild-type and variant hexamers are essentially identical (Fig. 6, A and B); root mean square deviations are provided in supplemental Tables S4 -S7. The side chains of Leu A16 and Val A16 (Fig. 6, A and B, red) occupy similar core positions without perturbation of the A12-A18 ␣-helix. The Val A16 side chain is well defined in both Tand R f -protomers without evidence of multiple conformations (Fig. 6, C  and D). Surrounding density is likewise unambiguous without packing defects. (In the T-state density is weak for one branch of Leu B15 (Fig. 6C), but similar features have been observed in wild-type hexamers.) Inferred solvent accessibilities of the variant side chains are given in supplemental Table S8 relative to wild type. A comparison between respective inter-residue distances involving Val A16 (variant T-and R-protomers) or Leu A16 (in multiple wild-type structures) is provided in supplemental Table S9.
Global alignment of respective T-and R-protomers relative to a collection of wild-type protomers indicates overlapping B-chain conformations and similar but not identical A-chain positions (Fig. 7). B-chain-specific alignment reveals a small rigid-body displacement of each A-chain ␣-helix that is systematic relative to conformational variability among wild-type structures. The structure of the variant A-chain nonetheless closely adjoins wild-type structures. Comparison of the structures of [His A8 ,Val A16 ]insulin and [His A8 ]insulin (49) in the same crystal form indicates that the small adjustment of the A-chain is likely to be due to the A16 substitution, apparently in compensation for loss of side-chain volume (Fig. 8). These subtle changes seem unlikely to provide a structural basis for the marked effects of the A16 substitution on folding. The absence of significant perturbations in structure or function suggests that Leu A16 functions transiently in an unobserved folding intermediate and, despite its contribution to global stability, is otherwise dispensable once folding is achieved.

DISCUSSION
This study investigated the folding of a globular domain. An invariant residue in proinsulin (Leu A16 ; position 81 in intact proinsulin) is required for its folding efficiency but is open to substitution in the mature hormone. Whereas expression of [Val A16 ]proinsulin in mammalian cells leads to disulfide mispairing and degradation in association with ER stress, comparison of related insulin analogues ([Val A16 ,His A8 ]insulin and [His A8 ]insulin) demonstrated retention of native-like structure and function. We discuss these results in relation to the structural origins of disulfide pairing. Our findings have implications for neonatal DM as a prototypical disease of misfolding.
Substitution of Leu A16 by Val reduces the stability of the native state, presumably due in part to decreased ␣-helical propensity (79,87) and to loss of side-chain volume (80). Yet control studies of destabilizing substitutions elsewhere in the hydrophobic core (positions A2 and A6 -A11) revealed unperturbed fold efficiencies, both in a transfected human cell line and as probed by insulin chain combination. These contrasting findings indicate that effects on folding can be distinguished from extent of native-state perturbation. Multiple molecular models may account for these findings. The conformational search leading to native disulfide pairing may be rendered inefficient due to destabilization of on-pathway intermediates or stabilization of offpathway intermediates. Apparent bottlenecks may reflect thermodynamic or kinetic traps. A mutation may even create barriers not pertinent to the folding mechanism of the wild-type protein. The coarseness of our experimental probes thus prevents unambiguous interpretation of our results on the molecular level.
We suspect that more than one molecular mechanism may underlie the profound inefficiency of folding of [Val A16 ]proinsulin. A kinetic barrier to formation of an on-pathway intermediate, for example, may lead to increased occupancy of a non-native disulfide isomer as a thermodynamic trap susceptible to degradation. Which is cause and which is effect may in turn be difficult to distinguish. We nonetheless propose, as a general framework for discussion, that Val A16 disproportionately impairs the stability of a species containing the single cystine A20 -B19. Prior studies have suggested that this species defines an obligatory on-pathway disulfide intermediate. 7 Nascent native-like structure in such an intermediate could exert kinetic control of disulfide pairing, whereas folding efficiency would be unperturbed by substitutions extrinsic to this proto-core. Even as we recognize the limitations of the present data, it may illuminate future studies to consider how they relate to established bio-chemical features of proinsulin and the mechanism of insulin chain combination.
Disulfide Intermediates in Protein Folding-Oxidative folding may be probed by chemical trapping of populated disulfide intermediates (88). Application to proinsulin and related polypeptides is notable for the transient accumulation of oneand two-disulfide intermediates (67)(68)(69). Their partial folding may be represented by a series of trajectories on successive free-energy landscapes (Fig. 9A). We envisage that each landscape governs folding trajectories in the presence of a specific subset of disulfide bridges. Because the polypeptide acquires structure stepwise on successive pairing, the landscapes proceed from shallow to steep. A preferred sequence of disulfide intermediates, as defined by chemical trapping, hence provides a framework for visualizing a progression of multiple folding trajectories on funnelshaped landscapes. This perspective integrates the classical disulfidecentered paradigm (89) with general physical models of folding (1,5). Despite its elegance, the landscape perspective is difficult to establish experimentally and so in the present context should be regarded as an evocative metaphor. The utility of this metaphor lies in its potential for exploration by molecular dynamics simulations as a foundation for the design and interpretation of experiments (10).
The notion of successive landscapes is based on the observation of nonrandom disulfide intermediates. Although refolding studies of proinsulin are limited by aggregation (90), disulfide pathways of related polypeptides are well characterized (20, 23, 67-69, 91, 92). A structural pathway has been proposed based on equilibrium models (Fig. 9B) (20 -22, 24 -29, 93). A key role is played by initial formation of cystine A20 -B19, which in the native state connects the C-terminal ␣-helix of the A-domain to the central ␣-helix of the B-domain. This bridge packs within a cluster of conserved side chains in the hydrophobic core, including Leu A16 (31,32,57). Neighboring side chains include Leu B11 , Leu B15 , Leu B18 , Phe B24 , and Tyr A19 . In one-and two-disulfide models, Leu A16 plays a prominent structural role in maintenance of a specific partial fold, recapitulating salient features of the native state (29). We envisage that general hydrophobic clustering facilitates alignment of 7 The fractional contribution of a given interaction to the stability of a native-like partial fold can be larger or smaller than its fractional contribution to the native state. Whereas both Ile A2 and Leu A16 each contribute to native-state stability through core packing, for example, folding and chain combination are robust to A2 substitutions (45). Kinetic control via a subset of native-like contacts in a transition state differs from the concept of "pathway mutations" as originally envisioned in studies of a ␤-helical phage tail spike (46). Cys B19 and Cys A20 ; once pairing has occurred, the bridge stabilizes a more specific organization of neighboring aliphatic and aromatic side chains. This study has demonstrated that substitution of Leu A16 by Val, although compatible with native structure and function, imposes a block to the folding of proinsulin in mammalian cells. This block (recapitulated in chain combination) may arise from perturbations analogous to those that impair native-state stability but in the context of an ensemble of partial folds. Key perturbations might thus arise from the following: (a) a packing defect in the nascent core that misaligns Cys A20 and Cys B19 , and (b) decreased ␣-helical propensity of the A-domain segment of the unfolded polypeptide due to a ␤-branched substituent. Because Ala A16 (a helicogenic substitution of reduced volume) likewise blocks folding and secretion of a mini-proinsulin in Saccharomyces cerevisiae (94,95), perturbed hydrophobic collapse is likely to be the predominant mechanism. Mutations elsewhere in the A20 -B19-associated subdomain similarly impair chain combination and yeast biosynthesis (24, 94, 96 -98). This perspective suggests that abortive cellular folding of [Val A16 ]proinsulin reflects a primary defect in on-pathway processes and, as a secondary consequence, can markedly augment formation of non-native disulfide isomers (Fig. 9B, central panel). A reverse chain of causality, primary Val A16 -related stabilization of a non-native fold leading indirectly to inefficient native folding, is also possible.
Relationship to Insulin Chain Combination-Although chain combination is ordinarily robust to amino acid substitutions (57), position A16 represents an Achilles' heel (45). An analogous block to chain combination was observed on removal of Asn A21 (96,99,100). Remarkably, chain combination was largely rescued on amidation of Cys A20 . Because in wild-type crystal structures the main-chain amide of Asn A21 participates in a hydrogen bond to the carbonyl oxygen of Gly B23 (supplemental Fig. S10), Katsoyannis and co-workers (96,99,100) proposed that pairing of Cys A20 and Cys B19 was dependent on formation of this hydrogen bond. These pioneering observations demonstrated that the yield of chain combination could be enhanced by fluorous modification at the C terminus (ethylamide versus trifluoroethylamide; supplemental Fig. S11). The former electronegative modification would be expected to withdraw electrons from the amide donor and hence strengthen the A21-B23 hydrogen bond. This study did not distinguish between kinetic and thermodynamic effects of the modifications.
We envisage as a working hypothesis that the side chain of Leu A16 and the A21-B23 main-chain hydrogen each stabilize a native-like partial fold directing pairing of cystine A20 -B19. Whereas Leu A16 presumably participates in nonpolar interactions between nascent ␣-helices in the A-and B-domains (Fig.  9B), formation of the A21-B23 hydrogen bond apparently requires a specific orientation of the carbonyl group of Gly B23 . In the native state this conserved glycine exhibits a positive dihedral angle, ordinarily forbidden to L-amino acids (48). Whereas D-Ala is readily accommodated at B23, the efficiency of chain combination is markedly impaired by L-Ala B23 (98) and blocked by L-Val B23 . 8 Similar stereospecific effects of L-and D-substitutions at Gly B20 imply that the productive intermediate exhibits a native-like B20 -B23 ␤-turn.  in T-state protomer in relation to selected residues in A-chain (Ile A2 and Tyr A19 (red); Cys A6 and Cys A11 (gold)) and B-chain (Phe B1 , Leu B11 , Ala B14 , Leu B15 , and Val B18 (green)). [ is retained in a peptide model of a one-disulfide intermediate (29).
Why do insulin-related polypeptides exhibit initial pairing of cystine A20 -B19 when these residues are so distant in sequence? We propose that folding begins with nascent ␣-helix formation within segments B9 -B19 and A16 -A20; these segments then interact via nonpolar clustering of Leu B11 , Leu B15 , Leu B18 , Leu A16 , Tyr A19 , and the two cysteines. Diffusion-collision of ␣-helices aligns the thiol moieties of A20 and B19 for disulfide pairing. The stability (or meta-stability) of this intermediate would be augmented by the A21-B23 hydrogen bond and native-like packing of Phe B24 against one edge of the disulfide bridge.
Implications for Human Genetics-Dominant mutations in the insulin gene have recently been identified as a cause of permanent neonatal-onset DM (36 -39). Like Val A16 , these muta-tions are predicted to block folding in the ER and so impair the function of pancreatic ␤-cells. Although expression of the wildtype insulin allele would in other circumstances be sufficient to maintain homeostasis, studies of a corresponding mouse model (75,101,102) have demonstrated that misfolding of the variant perturbs wild-type biosynthesis (34,103). Impaired ␤-cell secretion of insulin is associated with ER stress, distorted organelle architecture, and eventual cell death (35,104).
Clinical mutations fall in two classes as follows: (a) substitutions that introduce or remove a Cys and so unbalance disulfide pairing, and (b) substitutions at other sites apparently critical to the structural mechanism of disulfide pairing. The latter class includes Gly B23 3 Val at the site of the key B23-A21 hydrogen bond (above). Additional mutational hot spots have been observed at B5, B6, and B8, conserved sites proposed to direct pairing of Cys A7 and Cys B7 (50, 55, 105, 106). To date non- FIGURE 9. Energy landscape view of proinsulin folding and disulfide pairing. A, formation of successive disulfide bridges may be viewed as enabling a sequence of folding trajectories on a succession of steeper funnel-shaped free-energy landscapes. B, preferred pathway of disulfide pairing begins with cystine A20 -B19 (left), whose pairing is directed by a nascent hydrophobic core formed by the central B-domain ␣-helix (residues B9 -B19), part of the C-terminal B-chain ␤-strand (B24 -B26), and part of the C-terminal A-domain ␣-helix (A16 -A20). Alternative pathways mediate formation of successive disulfide bridges (middle panel) en route to the native state (right panel). The mechanism of disulfide pairing is perturbed by clinical mutations associated with misfolding of proinsulin.
cysteine-related mutations have been observed only in the B-domain. The present results demonstrate that conserved sites in the A-domain may likewise contribute to folding efficiency and so are likely to be observed in neonatal DM. The growing catalogue of clinical mutations in the insulin gene promises to provide important insight into sequence determinants of folding in vivo. It is important to note that the distribution and frequency of clinical mutations may reflect the relative susceptibility of DNA codons to mutagenesis (due to genomic sequence and structure) as well as the consequences of the predicted amino acid substitutions in the variant polypeptides.
The genetics of neonatal DM and other diseases of misfolding suggest that the evolution of proteins has been constrained not only by its structure and function, but also by an underlying risk of toxic misfolding. The native-like crystal structure of a "nonfoldable" insulin analogue demonstrates that the transient function of a conserved side chain in the mechanism of folding may be structurally inapparent once the ground state is reached. Although neonatal DM is uncommon, the diversity of associated mutations in the insulin gene highlights the possible role of wild-type bottlenecks and resultant chronic ER stress in type 2 DM (34). The pathogenesis of progressive ␤-cell dysfunction in type 2 DM and the metabolic syndrome poses a medical problem of overarching societal importance.
Concluding Remarks-The evolution of proteins faces independent structural constraints at multiple levels of biosynthesis and function. A classical model is provided by insulin; its conserved residues may contribute to one or more essential processes, including folding efficiency in the ␤-cell, prevention of toxic misfolding, self-assembly within secretory granules, disulfide stability in the bloodstream, and receptor binding. Interlocking sets of constraints may account for the limited sequence diversity among vertebrate insulins (48).
Leu A16 provides an example of an invariant residue that enables effective folding, but once the native state has been reached, Val A16 is equally compatible with canonical structure, assembly, and receptor binding. This subtle substitution impairs ground-state stability and may disproportionately perturb the relative stabilities and kinetic accessibilities of obligatory on-or off-pathway intermediates. The marked compression of structural information within the sequence of insulin suggests that other conserved sites may facilitate folding but weaken receptor binding (105); alternatively, structural features favoring receptor binding may impede disulfide pairing (107,108). We thus envisage that specific residues in insulin play distinct roles at each stage of its conformational "life cycle." The growing clinical data base of DM-associated mutations suggests that the sequence of insulin lies at the edge of nonfoldability.
Supplemental Material-Eleven figures provide a histogram of ER stress assay, NMR spectra, additional depictions of the crystal structure, and summary of prior chemical studies of insulin chain combination; and nine tables give the original ER stress data, thermodynamic modeling of protein denaturation studies, and structural comparisons. Coordinates have been deposited in the RCSB Protein Data Bank (accession number 3GKY).