Deciphering the Hidden Informational Content of Protein Sequences

Protein sequences encode both structure and foldability. Whereas the interrelationship of sequence and structure has been extensively investigated, the origins of folding efficiency are enigmatic. We demonstrate that the folding of proinsulin requires a flexible N-terminal hydrophobic residue that is dispensable for the structure, activity, and stability of the mature hormone. This residue (PheB1 in placental mammals) is variably positioned within crystal structures and exhibits 1H NMR motional narrowing in solution. Despite such flexibility, its deletion impaired insulin chain combination and led in cell culture to formation of non-native disulfide isomers with impaired secretion of the variant proinsulin. Cellular folding and secretion were maintained by hydrophobic substitutions at B1 but markedly perturbed by polar or charged side chains. We propose that, during folding, a hydrophobic side chain at B1 anchors transient long-range interactions by a flexible N-terminal arm (residues B1–B8) to mediate kinetic or thermodynamic partitioning among disulfide intermediates. Evidence for the overall contribution of the arm to folding was obtained by alanine scanning mutagenesis. Together, our findings demonstrate that efficient folding of proinsulin requires N-terminal sequences that are dispensable in the native state. Such arm-dependent folding can be abrogated by mutations associated with β-cell dysfunction and neonatal diabetes mellitus.

The efficiency of protein folding poses a fundamental problem at the intersection of biophysics, cell biology, and medicine (1,2). Because the existence of a unique and accessible ground state is unrepresentative of polypeptides as a class of heteropolymers, foldability is an evolved property of biological sequences (3). Current kinetic models envisage a funnel-shaped free-energy landscape, enabling multiple trajectories to the native state (4 -6). What distinguishes foldable from non-foldable sequences (7), and how are bottlenecks avoided (8 -10)?
The salience of these questions has been reinforced by recognition of proteotoxicity as a general pathological mechanism underlying diverse diseases (11,12). Here, we describe a cryptic folding element in a protein that is dispensable once the native state has been reached.
A model is provided by insulin, a globular protein central to the regulation of vertebrate metabolism (13). Its impaired biosynthesis causes ␤-cell dysfunction and permanent neonatalonset diabetes mellitus (DM) 4 (14 -17). The insulin gene encodes a single-chain precursor, preproinsulin (Fig. 1A, top) (18). A signal peptide (gray bar) is cleaved on translocation into the endoplasmic reticulum (ER) to yield proinsulin. The precursor contains successive sequence motifs, defining B, C, and A domains (blue, black, and red , respectively, in Fig. 1A) (19). Whereas the translocated polypeptide is reduced and unfolded, oxidative folding in the ER yields a well organized A-B (insulinlike) core and disordered C-domain (dashed black segment in Fig. 1B) (20 -26). Folding is coupled to disulfide pairing (A6 -A11, A7-B7, and A20 -B19; gold in Fig. 1, A and B). 5 Proinsulin isomers are formed at low concentrations in ␤-cells (27), and their accumulation may be linked to ␤-cell dysfunction (28,29).
Insulin is obtained from proinsulin by proteolytic processing. After transit through the Golgi apparatus and entry into immature secretory granules (30), specific prohormone convertases excise the C-peptide at conserved dibasic sites (BC and CA junctions; green in Fig. 1, A and B), liberating the bioactive hormone (31)(32)(33). Insulin thus contains two chains, designated A (21 residues) and B (30 residues), and is stored as Zn 2ϩ -stabilized hexamers within specialized secretory granules (34). The hexamers dissociate on secretion; the circulating hormone functions as a Zn 2ϩ -free monomer. Because the structure of insulin is likely to change on binding to the insulin receptor (IR) (35), determinants of foldability may be distinct from (or even at odds with) determinants of activity (36 -38).
Three families of insulin hexamers related by concerted conformational changes (designated T 6 , T 3 R f 3 , and R 6 ) have been defined by x-ray crystallography (39). T and R protomers differ in the secondary structure of the N-terminal segment of the B-chain (residues B1-B8), either extended (T, green in Fig. 2A) or ␣-helical (R and R f , blue). 6 Alignment of crystallographic protomers highlights multiple possible N-terminal conformations (green and blue in Fig. 2B). The T-state-specific arm, delimited by a ␤-turn (residues B7-B10), packs variably against a shallow non-polar protein crevice (Fig. 2C). The solution structure of an isolated insulin monomer resembles the T-state (Fig. 2D) (40 -42). Flexible packing of Phe B1 within this crevice has also been observed in the T-like structure of proinsulin in solution (26). The extended and ␣-helical states of the B1-B8 arm, linked by the classical TR transition, provide a striking example of a native chameleon sequence in a globular protein (43).
Structure-function relationships in insulin have been inferred from its pattern of sequence conservation and divergence (39). The classical receptor-binding surface (spanning residues Ile A2 , Val A3 , Val B12 , Phe B24 , and Phe B25 ; Refs. 39, 44, and 45) is invariant (35). Substitutions at this surface markedly impair receptor binding (46 -54). Also conserved are side chains integral to the hydrophobic core (Leu A16 , Tyr A19 , Leu B11 , and Leu B15 (39)). Altering such core residues hinders disulfide pairing during chemical synthesis (51,55,56) and impairs biosynthetic expression (38,57). Whereas such selected results are readily rationalized, the extent of conservation among vertebrate insulin sequences exceeds the apparent requirements of structure and function. A seeming paradox is posed, for example, by the conservation of residues B1-B5 (sequence FVNQH) despite their dispensability for receptor binding (58) and marked structural variability (59).
Does the N-terminal arm of proinsulin have a hidden biological function? To address this question, the present study has brought together assays of protein folding and trafficking in mammalian cell culture with in vitro studies of protein structure, stability, and activity. Evidence has been obtained that mutations in the arm modulate folding efficiency with possible clinical implications for the genetics of ␤-cell dysfunction. Strikingly, deletion of Phe B1 blocks cellular folding of proinsulin, whereas des-Phe B1 -insulin retains native-like properties. Three lines of investigation: studies of B1 substitutions, Ala scanning of the arm, and construction of proinsulin/ IGF-I chimeras, together demonstrate that foldability requires a flexible N-terminal hydrophobic anchor.
The N-terminal arm of proinsulin provides an example of a cryptic folding element, highlighting foldability as an implicit constraint underlying biological selection of polypeptide sequences. The multistep pathway of insulin biosynthesis from nascent folding to assembly and secretion evidently imposes evolutionary constraints unrelated to the structure and function of the mature hormone. Arm-dependent folding may be abridged by mutations in proinsulin associated with ␤-cell dysfunction and permanent neonatal-onset diabetes mellitus (60). The arm and its N-terminal hydrophobic anchor thus exemplify the selective advantage of a flexible chameleon sequence in the conformational life cycle of a globular protein.

MATERIALS AND METHODS
Chemical Synthesis-Insulin, KP-insulin (containing substitutions Pro B28 3 Lys and Lys B29 3 Pro, which prevent dimerization (41,61,62)), proinsulin, and KP-proinsulin were provided by Eli Lilly and Co. (Indianapolis, IN); S-sulfonate B-chain derivatives were obtained by oxidative sulfitolysis (51). A-and B-chain analogs were otherwise prepared by solid-phase synthesis (63). Insulin analogs were prepared by chain combination (63) and purified as described (51). Predicted molecular masses were confirmed by matrix-assisted laser desorption-ionization time-of-flight mass spectrometry (MS). Insulin analogs were monocomponent by reverse phase high-performance liquid chromatography (RP-HPLC). Synthetic yields were calculated 6 The T and R families of insulin protomers also exhibit differences in orientation of the N-terminal A-chain ␣-helix associated with a change in conformation of the A7-B7 disulfide bridge.   (26). The flexible C-domain contains a nascent helix (see Fig. 10).
based on the mass of product (determined by optical density at 280 nm) relative to control syntheses of wild-type insulin.
Receptor Binding Assays-Dissociation constants for binding of insulin analogs to IR were determined in competitive radioligand binding assays with [ 125 I-Tyr A14 ]human insulin (64). The assay employed the B isoform of IR (IR-B). Assays were performed with isolated IR-B with a C-terminal FLAG tag using a microtiter plate antibody capture technique as described (65). The plates (Nunc Maxisorb) were incubated overnight at 4°C with FLAG M2 IgG (100 l/well of 40 g/ml in phosphatebuffered saline). In all assays the percentage of tracer bound in the absence of competing ligand was Ͻ15% to avoid liganddepletion artifacts. Dissociation constants of analogs were obtained by non-linear regression analysis (66) employing a model describing competitive binding of two different ligands to a receptor. Control studies of cellular extracts in the absence of prior IR-B transfection demonstrated negligible background binding to endogenous cellular proteins.
Metabolic Labeling and PAGE-At 40 h post-transfection cells were preincubated in methionine/cysteine-deficient medium for 30 min, metabolically labeled in the same medium containing 35 S-labeled Met and Cys for 1 h, washed once with complete medium, and chased for indicated times. After chase, medium were collected, cells were lysed in 100 mM NaCl, 1% Triton X-100, 0.2% sodium deoxycholate, 0.1% SDS, 10 mM EDTA, and 25 mM Tris-HCl (pH 7.4). Lysates and chase medium were immunoprecipitated with guinea pig anti-insulin antiserum (LINCO Diagnostics) and analyzed by Tris-Tricine-urea-SDS-PAGE under reducing and non-reducing conditions (27,38,67).
Spectroscopy-Circular dichroism (CD) spectra, obtained using an Aviv spectropolarimeter equipped with an automated titration unit, were measured at protein concentrations of 25 (far-ultraviolet spectra) or 5 M (denaturation studies) in 50 mM potassium phosphate (pH 7.4) at 4°C (56). 1 H NMR spectra were obtained at 700 MHz in H 2 O or D 2 O solutions containing 10 mM deuterioacetic acid (pH 3.0, direct meter reading) at 25°C. Spatial relationships within insulin-or proinsulin analogs were probed by nuclear Overhauser effect (NOEs). The following two-dimensional NMR spectra were acquired: double quantum-filtered correlation spectroscopy (DQF-COSY), NOE spectroscopy (NOESY), and total correlation spectroscopy (TOCSY). Resonance assignment of KP-insulin (complete) and KPproinsulin (partial) was obtained by standard methods (supplemental materials) (68). Presumptive resonance assignment of des-Phe B1 -insulin (supplemental Table S1) was obtained by analogy to published assignments (69).
Thermodynamic Modeling-CD-detected guanidine denaturation data (as monitored at 222 nm) were fitted by nonlinear least squares to a two-state model as described (70). In brief, CD data (x) were fitted by a nonlinear least-squares program according to Equation 1, where x is the concentration of guanidine hydrochloride and A and B are baseline values in the native and unfolded states. These baselines were approximated by pre-and post-transition Fitting the original CD data and baselines simultaneously circumvents artifacts associated with linear plots of ⌬G as a function of denaturant (70,71).

RESULTS
The Arm of Insulin and Proinsulin Exhibit T-like Conformations-1 H NMR studies of engineered insulin monomers have defined a spectroscopic signature of a T-like conformation based on inter-residue NOEs (41,42). As illustrated in the spectrum of KP-insulin (supplemental Fig. S1), these include longrange contacts between the side chains of Phe B1 -Leu A13 and His B5 -Ile A10 within a shallow inter-chain crevice; additional NOEs between Leu B6 and Leu B11 (not shown) reflect formation of an intervening ␤-turn at the base of the arm (residues B7-B10). Inconsistent with an R-like ␣-helical conformation, these and related diagnostic NOEs are retained in the spectrum of proinsulin analogs (supplemental Figs. S1 and S2). In spectra of engineered insulin analogs and proinsulin analogs, NOEs are not observed between the aromatic rings of Phe B1 and Tyr A14 , which among crystallographic T-state protomers of insulin exhibit a broad range of distances (supplemental Fig. S3) (72). Similarly, no NMR evidence has been observed for stable maintenance of a hydrogen bond between the B1 carbonyl oxygen and the side chain NH 2 of Gln B4 as observed in a minority of crystal structures (39).
Despite observation of long-range NOEs, the side chains of Phe B1 and Val B2 in monomeric analogs of insulin and proinsulin exhibit near random-coil 1 H NMR chemical shifts and motional narrowing. These trends hold in spectra acquired under a variety of conditions (in aqueous solution at neutral pH, in 10 mM deuterioacetic acid at pH 3.0, and in 20% deuterioacetic acid, pH 1.9). An example is provided by the ortho and meta 1 H NMR secondary chemical shifts of Phe B1 in KP-insulin at neutral pH (0.11 and 0.12 ppm, respectively; defined as the difference between observed chemical shifts and tabulated random-coil values): these values are markedly smaller than the corresponding secondary shifts of an analogous Phe ring elsewhere in insulin that stably packs against a nonpolar surface (Phe B24 , 0.61 and 0.43 ppm).
Additional evidence for the conformational variability of the N-terminal arm has been obtained by heteronuclear NMR spectroscopy. Whereas the main chain 13 C NMR chemical shifts of residues B9 -B19 in DKP-proinsulin and DKP-insulin (26, 73) exhibit canonical helical values (74), for example, the 13 C chemical shifts of residues B1-B3 conform to a pattern associated with segmental flexibility (see "Discussion" and supplemental Table S2) (75). Furthermore, although the amide resonances of Phe B1 and Val B2 are not observable due to rapid solvent exchange, those of Asn B3 , Gln B4 , and His B5 exhibit a gradient of attenuated { 1 H}-15 N heteronuclear NOEs characteristic of a progressive N-terminal flexibility (26). These findings indicate that N-terminal arm-related NOEs reflect flexible contacts and not fixed spatial relationships within a well organized substructure.

Deletion of Phe B1 Impairs Chain Combination but Is Well
Tolerated in the Mature Hormone-Classical studies of the total chemical synthesis of insulin have demonstrated that the isolated A-and B-chains contain sufficient information to specify native disulfide pairing (76). Robust to diverse amino acid substitution (56), chain combinations has enabled the preparation of Ͼ100 insulin analogs by academic and pharmaceutical laboratories (77). Although yield is limited by side reactions (disulfide-bridged cyclic chains, B-chain dimers and polymers), formation of insulin disulfide isomers is ordinarily negligible (78). Surprisingly, combination of the wild-type A-chain with des-B1 B-chain is perturbed; its yield is reduced 3-fold relative to wild-type chain combination. Although the predominant low molecular weight product is des-B1-insulin (elution time 45.6 min in the HPLC chromatogram shown in Fig. 3A), a major contaminant is a disulfide isomer with elution time delayed by 6 min (arrow in Fig. 3A); the additional elution peaks represent the expected side products. Despite such reduced yield, des-B1-insulin may readily be isolated (Fig.  3B), enabling its characterization.
Binding of des-B1-insulin to the isolated insulin receptor (isoform B; OE and dotted line in Fig. 4A) is essentially indistinguishable from that of wild-type insulin (f and solid line in Fig.  4A). Curve-fitting yields respective estimates of dissociation constants of 0.068 Ϯ 0.009 (analog) and 0.073 Ϯ 0.010 nM (wildtype). Far UV CD spectra of des-B1-and wild-type insulin are likewise similar (Fig. 4B). The CD-detected guanidine denaturation studies at 4°C suggest that the two proteins exhibit similar thermodynamic stabilities (Fig. 4C). Application of a twostate model (supplemental Table S3) yields similar estimates of free energies of unfolding (⌬G u ) of 4.1 Ϯ 0.1 kcal/mol (des-B1insulin) versus 4.0 Ϯ 0.1 kcal/mol (wild-type). The analog nonetheless exhibits a small left shift (⌬T mid about 4°C) in its thermal unfolding transition (i.e. toward lower temperatures in the A, reverse-phase HPLC chromatogram documenting 3-fold reduced yield of des-Phe B1 analog following chain combination due to side products, including non-native disulfide isomer (arrow at 51.6 min). B, final purification yielded a single symmetrical peak of appropriate molecular mass and native receptor-binding affinity. Mono-component reverse phase-HPLC profile was obtained under gradient elution conditions employing either acetonitrile or methanol as co-solvents.
broad temperature range 30 -70°C) as monitored by mean residue ellipticity at 222 nm (Fig. 4D). 1 H NMR studies of human insulin and des-Phe B1 -insulin were undertaken as dimers in 10 mM deuterioacetic acid (pH 3.0) at 25°C (79). Resonance assignments of des-B1-insulin are provided under supplemental Table S1. The aliphatic spectrum of wild-type insulin (Fig. 5B) provides a fingerprint of the T-state-specific packing of Phe B1 against the A-chain (Fig. 5A). The spectrum of des-Phe B1 -insulin (Fig. 5C) exhibits a similar overall envelope of resonances but with selected changes in chemical shift due to the absence of the Phe B1 magnetic ring current (Val B2 , Ile A10 , and Leu A13 ; red lines between panels B and C). These trends in 1 H NMR chemical shifts were well resolved in corresponding two-dimensional TOCSY spectra ( Fig. 6): whereas the majority of spin systems in insulin are unperturbed by the removal of Phe B1 (black labels), selective changes in chemical shift are prominent within the N-terminal arm of the B-chain and within or adjoining its T-state-specific docking site (red labels). Spatial relationships within this docking site (visualized in Fig. 7, A and B) were probed by comparison of two-dimensional NOESY spectra (Fig. 7, C and D). The region shown contains contacts between aliphatic protons ( 1 ; horizontal axis) and aromatic protons ( 2 ; vertical axis). Whereas B1-related cross-peaks in the wild-type spectrum (Fig.  7C) are absent as expected in the variant spectrum (Fig. 7D), T-state-specific NOEs from the imidazole ring of His B5 are retained within an inter-chain crevice (wild-type cross-peaks l-q versus variant cross-peaks r-u). Constraints between the aromatic ring of Tyr A19 and neighboring methyl groups (Ile A2 , Leu B11 , and Leu B15 ; not labeled in the figure), diagnostic of helix-helix packing in the core of insulin (41,42), are essentially identical in the two proteins.
Deletion of Phe B1 Perturbs Cellular Folding-Transient transfection of human cells with a plasmid expressing proinsulin provides a model for studying its folding within the endoplasmic reticulum, subcellular trafficking, and secretion (Fig. 8A). Following transfection of 293T cells, we thus examined the relative expression of wild-type or variant proinsulins.
Following pulse labeling of newly synthesized proteins with 35 S-amino acids, labeled wild-type or variant proinsulins were immunoprecipitated with polyclonal anti-insulin antiserum and subjected to nonreducing Tris-Tricine-urea-SDS-PAGE, which allows examination of distinct proinsulin disulfide isomers that form within the ER. The absence of endogenous proinsulin in these cells makes detection of the transfected proteins straightforward. As previously demonstrated (27,38,67), resolution of discrete proinsulin bands in this gel system provided an assay for extent of native folding, competing disulfide-  isomer formation, and efficiency of secretion. 7 Fig. 8 (panels B-F) provides a survey of proinsulin variants (radiolabeled for 1 h and chased for 1 h). The portions of the autoradiograms shown contain bands corresponding to monomeric proinsulin or proinsulin disulfide isomers (designated "low molecular weight" region below in contrast to high molecular weight complexes). In Fig. 9 the folding of des-B1-proinsulin was further analyzed (panels A and B), and the analysis was extended to Ala substitutions at positions B2, B3, and B4 (panel C). The polyclonal anti-insulin antiserum employed in these studies was shown by Western blot in control studies to recognize wildtype insulin, des-B1-insulin with similar effectiveness (Fig. 9D).
In accordance with prior studies (27) transfection of the wildtype proinsulin construct gave rise to robust expression, primarily of a fast-migrating species containing native disulfide bonds (Fig. 8B, lanes 3 and 4) relative to an empty vector control (lanes 1 and 2). The most rapidly migrating species (Fig. 8B, arrow) is efficiently secreted from transfected cells (lane 3, "C") to medium (lane 4, "M"), which typically achieves Ͼ95% efficiency by 4 h chase (not shown). In addition, there are less rapidly migrating disulfide isomers that generally represent a minor fraction of proinsulin: these exhibit a lower percentage secretion (highlighted by a bracket in Fig. 8B). Deletion of Phe B1 led to multiple intracellular bands with attenuated intensity (Fig. 8B, lane 7), indicating disproportionate disulfide mispairing. Secretion of des-B1-proinsulin was not detectable (Fig. 8B,  lane 8).
Perturbed cellular folding of des-B1-proinsulin is in accordance with perturbed chain combination of des-B1-insulin (above). Because band intensities were attenuated after the 1-h chase, the studies were repeated with chase times of 0 and 2 h (Fig. 9, A and B, respectively). Furthermore, because impaired folding could be associated with formation of high molecular weight complexes (due to aberrant formation of mispaired intermolecular disulfide bridges), PAGE analysis was undertaken with and without reduction by dithiothreitol. On reduction a single band was observed, presumably reflecting total expression of the variant proinsulin. Without chase, initial overall expression of des-B1-proinsulin was substantial (Fig.  9A, lane 6) albeit less than that of wild-type proinsulin (lane 5). The distribution of disulfide isomers favored non-native species (Fig. 9A, lane 3). Following an extended chase period of 2 h, a substantial intracellular accumulation of des-B1-proinsulin was observed (reduced band in lane 14 in Fig. 9B) without secretion (lane 15). In the absence of reduction any bands corresponding to low molecular weight species were faint (Fig. 9B,  lane 10), implying that the des-B1-proinsulin polypeptides are sequestered in aberrant high molecular weight complexes. Under these conditions wild-type proinsulin predominantly undergoes native folding and secretion (Fig. 9B, lanes 8 -9 and  12-13).
Control studies suggest that the impaired foldability of des-B1-proinsulin in 293T cells is unlikely to reflect thermodynamic instability of the protein once folded. Prior biophysical studies have established that two-disulfide insulin-and proinsulin analogs lacking cystine A6 -A11 form in vitro partial folds of low stability (⌬⌬G u Ͼ 2 kcal/mol) (41,80). In transfected 293T cells removal of the A6 -A11 disulfide bridge by pairwise mutation of Cys to Ser did not block expression or secretion (Fig. 8C, lanes 13 and 14). The substitutions likewise caused little detectable change in band mobility (81) despite the presumed loss of structural organization. Folding and secretion are likewise robust to the destabilizing mutation Ile A2 3 Gly (Fig.  8C, lanes 15 and 16); this substitution in the hydrophobic core was found to impair stability (⌬⌬G u ) of an insulin analog by at least 1.6 kcal/mol (82). The impaired foldability of des-B1-proinsulin thus stands in marked contrast to the native-like in vitro stability of des-B1-insulin. Cellular Folding of Proinsulin Requires a Hydrophobic Residue at B1-To further probe the nature of the B1 folding determinant, we next examined multiple substitutions (Fig. 8, B-E). Substitution of Phe B1 by Asp, for example, resulted in robust expression with a marked decrease in the fraction of Asp B1proinsulin molecules achieving the native disulfide-bonded form (Fig. 8B, lane 5); secretion of the variant proinsulin was  19 -20, 27-28, and 31-34). In each case a nonpolar B1 side chain enabled the variant proinsulin to pass quality control checkpoints within the cells (C) en route to the medium (M). The stringency of the requirement for a nonpolar amino acid side chain at position B1 is surprising given the marked structural variability of Phe B1 and its partial solvent exposure as observed among wild-type T-state crystal structures.
Role of the N-terminal Arm-In light of the unexpected contribution of Phe B1 to folding efficiency, the overall role of the N-terminal arm of proinsulin was probed by Ala scanning mutagenesis. An extended 2-h chase period was employed as a screen for secretion. An Ala substitution at position B4 imposed a severe block to secretion (Fig. 9C, lane 25, non-reducing conditions), whereas Ala B2 and Ala B3 variants exhibited mild and moderate impairments (lanes 21 and 19, respectively).
The corresponding intracellular samples (Fig. 9C, lanes 18 and 20) indicate preferential formation of a disulfide isomer. Previous studies demonstrated that Ala B5 likewise impairs folding and secretion (67).
These biological results motivated chemical synthesis [des-B1,Gly B2 ,Pro B3 ,Gln B4 ]insulin. As in synthesis of des-B1-insulin, the variant arm impaired the yield of chain combination (also by 3-fold). Although a predominant product was formed, analysis of reverse phase HPLC-resolved side products by MS demonstrated formation of two competing disulfide isomers (presumably containing aberrant disulfide pairings (A6 -B7, A7-A11, A20 -B19) and (A6 -A7, A11-B7, and A20 -B19); designated swap and swap2 in Ref. 90. Evidence that the predominant product contained native disulfide bridges was provided by its non-negligible biological activity (IR-B dissociation constant 0.136 Ϯ 0.019 nM; only 3-fold lower than wild-type insulin in the same assay).

DISCUSSION
The recent (and continuing) exponential increase in the size of the data base of protein structures poses the challenge of functional annotation. Whereas residues involved in substrate binding or catalysis may be readily recognizable, determinants of folding efficiency may not be apparent in the native state (3,7). Such residues may nonetheless impose key evolutionary constraints (38) and emerge as sites of mutation associated with human genetic diseases (60).
The present study has focused on the N-terminal arm of proinsulin (FVNQH; residues B1-B5). In classical studies of insulin (39) a seeming paradox was posed by the conservation of these 8 The imino-acid substitution Pro B1 , although non-polar, was observed to impair biosynthesis (Fig. 8D, lanes 17 and 18) in association with perturbed cleavage of the signal peptide (M. Liu and P. Arvan, unpublished results). residues despite their dispensability for receptor binding (58) and marked structural variability (59). Could the arm have a hidden biological function? An interdisciplinary set of studies was thus undertaken to investigate the contribution of the arm to the efficiency of protein folding. Although such a contribution seemed unlikely given that the arm is only partially ordered in an engineered proinsulin monomer (26), we posited that a transient role in folding might be hidden once the native state is reached. Such "hidden" contributions of specific residues to folding efficiency have been extensively explored in a trimeric ␤-helix (91). Insulin has been extensively studied since its isolation in 1922 and landmark clinical application (92). It may seem surprising, therefore, that a new function of a sequence motif has only now been recognized. Yet the role of the N-terminal arm of the B-chain has long been enigmatic. Indeed, its structure undergoes a fundamental change in secondary structure, from extended (T) to ␣-helical (R), as part of a long-range allosteric reorganization of insulin hexamers, designated the TR transition (93). Whereas the conformation of an insulin monomer in solution resembles the T-state (40), adoption of R-like features on receptor binding has been proposed (39,59,94). An intriguing hypothesis envisions that induced fit of the arm represents a switch between folding-competent and active conformations (36). This model thus represents the potential biological utility of a chameleon sequence in a globular protein (43).
Proinsulin contains an insulinlike moiety (the A and B domains) and flexible connecting segment (the C domain). A recent heteronuclear NMR study of an engineered proinsulin monomer has provided evidence that its N-terminal arm (residues B1-B8) exhibits partial disorder as indicated by a trend toward N-terminal attenuation of amide-related { 1 H}-15 N heteronuclear NOEs (26). Such studies employed substitutions in the classical dimer interface (Pro B28 3 Lys and Lys B29 3 Pro) and trimer interface (His B10 3 Asp) to enable NMR characterization at neutral pH (26,41). Although such signals at B1 and B2 were not observable, complete 13 C NMR resonance assignment was obtained. Trends in main chain 13 C secondary chemical shifts (Fig.  10A) provides an estimate of disorder as inferred from a random-coil index (Fig. 10B) and predicted residue-specific order parameters on the picosecond-nanosecond time scale (Fig. 10C) (75). In Fig. 10 chemical-shift index values and predicted dynamic parameters of the B, C, and A domains are shown in blue, black, and red, respectively, in relation to helical segments (spirals at bottom of panel A). Whereas the A domain exhibits consistent low values of the random-coil index (Fig. 10B) and high values of the order parameter (Fig. 10C), the B domain is remarkable for "ramps" from B1-B8 and B25-B30. These trends predict that the B domain should exhibit a progressive increase in disorder on the picosecond-nanosecond time scale toward the N terminus and BC junction, respectively. In the future it would be of interest to obtain quantitative estimates of such disorder by measurement of 15 N and 13 C NMR relaxation times and their interpretation in relation to spectral density functions.
Determinants of Foldability-The N-terminal arm of the B-chain consists of residues proximal to the ␣-helical domain of the hormone (residues B5-B8) and residues distal to this domain (B1-B4). Evidence for the biological importance of the proximal portion of the arm has been provided by clinical observations that mutations at B5, B6, or B8 can cause permanent neonatal-onset diabetes mellitus, presumably due to toxic misfolding of the mutant proinsulin in the ER of pancreatic ␤-cells (14 -17). Studies of insulin chain combination had earlier shown that these residues were critical for disulfide pairing FIGURE 8. Biosynthesis, folding, and secretability of proinsulin variants in cell culture. A, left, nascent proinsulin folds as a monomer in ER wherein zinc-ion concentration is low; in Golgi apparatus zincstabilized proinsulin hexamer assembles, which is processed by cleavage of connecting peptide to yield mature insulin. Zinc-insulin crystals are observed in secretory granules. Right, insulin hexamers dissociate in bloodstream to yield active monomers. B-F, analysis of folding and secretability by Tris-Tricine-urea-SDS-PAGE under nonreducing conditions (27). B, HEK293T cells were transfected with an empty vector (control; lanes 1 and 2) or expression plasmids encoding wild-type proinsulin (lanes 3 and 4), variants Phe B1 3 Asp (lanes 5 and 6), or des-Phe B1 -proinsulin (lanes 7 and 8). At 48 h cells were pulse-labeled with 35 S-amino acids for 1 h and chased for 1 h. Chase medium (lanes marked M) were collected, and cells (lanes marked C) were lysed; each fraction was immunoprecipitated with anti-insulin antiserum. C, control pulsechase studies of unstable proinsulin analogs [Ser A6 ,Ser A11 ]-proinsulin (lanes 13 and 14) and Gly A2 -proinsulin (lanes 15 and 16) as previously described (38). Empty vector and wild-type controls are shown in lanes 9 -10 and 11-12, respectively. D and E, corresponding pulse-chase studies of diverse B1 substitutions demonstrated impaired folding by Gly or polar residues (Asn, Gln, Lys, and Ser), whereas foldability is retained on non-polar amino acid substitution (Ala, Leu, Met, Val; asterisks). Imino acid substitution Pro B1 also impairs biosynthesis. F, corresponding pulse-chase studies of "arm swap" analogs of proinsulin in which (lanes 41 and 42; hybrid 1) residues B1-B5 were replaced by IGF-I residues 1-4 or (lanes 43 and 44; hybrid 2) residues B1-B4 were replaced by IGF-I residues 1-3. Because IGF-I lacks a residue at canonical position B1, the hybrids are also des-B1 analogs. Empty vector and wild-type controls are shown in lanes 37-38 and 39 -40, respectively. even when receptor binding was not significantly impaired (36,37,67,95).
The pertinence of the T-state to folding and the efficiency of disulfide pairing is supported by stereospecific effects of substitutions of Gly B8 . Stabilization (or destabilization) of the T-state-specific ␤-turn (residues B7-B10) by respective D (or L) amino acid substitutions at B8 is associated with reciprocal effects on foldability and activity. On the one hand, because in the T-state Gly B8 exhibits a negative dihedral angle, D-substitutions at B8 are compatible with native-like structure (36,37). Chiral enforcement of a negative angle enhances the efficiency of disulfide pairing in chain combination, but markedly impairs the activity of the insulin analog once formed (36,37). On the other hand, L-amino acid substitutions, although consistent with the positive B8 angle of Gly B8 in the R-state, were observed to impair chain combination as well as yeast expression of single-chain insulin analogs (57,96). Once formed, such analogs can nonetheless exhibit high activity (37). These reciprocal effects suggest that Gly B8 functions as an "ambidextrous switch" between folding-competent (T-like) and active (perhaps R-like) conformations.
We envisage that, within the partial folds of protein-folding intermediates, nascent T-like local structure in the proximal arm enhances the efficiency and fidelity of disulfide pairing. In particular, the conformation of residues B5-B8 (including inter-domain hydrogen bonding by the imidazole ring of His B5 (37)) would profoundly alter the position of Cys B7 , its orientation with respect to Cys A6 and Cys A7 , and in turn the trajectory of the distal arm once a disulfide bond was established. In the T-state residues B1-B4 pack loosely within an inter-chain crevice adjoining Cys A6 and Cys A7 . The present study has shown that deletion of Phe B1 blocks cellular folding of proinsulin whereas the two-chain model and mature product, des-Phe B1 -insulin, retains native-like properties. Ala scanning of the distal arm further demonstrated that each side chain influences the efficiency of folding and secretion (although to varying extents). Because Phe B1 (and indeed, residues B1-B4 (58)) may be deleted without loss of activity, the distal arm provides an example of a folding element that is dispensable once the native state has been reached.
We imagine that the pattern of non-polar and polar side chains in the distal arm contributes to the efficiency and fidelity of disulfide pairing. 9 General clustering of Nterminal hydrophobic side chains Phe B1 and Val B2 with A-chain side chains could, for example, enhance the probability of collisions between the thiol (or thiolate) groups of Cys B7 and Cys A7 . Productive alignment may then be enhanced by the proximal arm as a positive angle at B8 enables specific long-range interactions by the side chains of His B5 and Leu B6 . Because the coarseness of our experimental probes nonetheless prevents unambiguous atomic-scale interpretation, multiple molecular models may account for our findings. In the absence of the wild-type arm, the nascent conformational search of the polypeptide leading to native disulfide pairing may be rendered inefficient due to either destabilization of on-pathway intermediates or stabilization of off-pathway intermediates. Furthermore, apparent bottlenecks may reflect thermodynamic or kinetic traps. Arm mutations may even create barriers not pertinent to the folding mechanism of the wild-type protein.
The N-terminal arm of insulin is not conserved among the otherwise homologous growth factors IGF-I and IGF-II (97). The latter proteins lack a B1 residue and contain divergent side chains at positions B2-B5. In particular, IGF-specific residue Thr B5 is incompatible with efficient insulin chain combination (64) and is associated in vitro with formation of a competing 9 Although the present set of substitutions tested at B1 highlights the importance of its non-polar character, Arg B1 occurs as a rare variant in non-mammalian insulin sequences. An example is provided by the divergent insulin of the hagfish (103), a member of a primitive lineage of marine craniates that lack a vertebral column. We speculate that the aliphatic portion of Arg B1 may pack against an analogous arm-related groove, whereas its charged guanidinium moiety projects into solvent. Analogous proximal side chain packing may underlie the incomplete block to folding and secretion imposed by Lys B1 relative to Asp B1 in the present studies.  1-3) and reducing (lanes 4 -6) conditions. con (lanes 1 and 4)  IGF-I disulfide isomer on redox-coupled refolding (64,98). Such divergent folding properties reflect co-evolution of specific IGF-binding proteins (64,88,89). The incomplete folding information of IGF-I and IGF-II highlights the breach in the autonomous foldability of proinsulin. We thus speculate that co-evolution of IGF-binding proteins has enabled IGF-I and IGF-II to explore regions of sequence space forbidden to insulin due to constraints of autonomous foldability.
Concluding Remarks-Insulin is one of the most studied globular proteins and yet among the least well understood. Understanding its conformational lifecycle will require extensive future studies, from deciphering the molecular mechanisms by which proinsulin folds to determining structures of hormone-receptor complexes. Recent advances in human genetics have highlighted the direct connection between the biophysical chemistry of proinsulin and the pathogenesis of ␤-cell dysfunction in monogenic forms of DM (14,60).
The present study was designed to test the hypothesis that the N-terminal arm of proinsulin functions, despite its conformational variability in the mature hormone, as a cryptic folding element. Experimental design has integrated assays of protein  13 C ␣ and 13 C ␤ secondary chemical shifts. Data pertain to an engineered proinsulin monomer at neutral pH as described (26). Helical segments are indicated by spirals at the bottom; red arrow indicates the segmental attenuation of CSI values in the A1-A8 segment, presumed to reflect motions at time scales longer than nanoseconds. Horizontal arrow indicates the B24 -B27 ␤-strand. B and C, CSI values observed in proinsulin may be interpreted in relation to a random-coil index and predicted model-free order parameters (75). B, random-coil index values were calculated based on main chain secondary chemical shifts. C, predicted main chain order parameters as inferred by random-coil index values by the method of Berjanskii and Wishart (75). These parameters pertain to putative subnanosecond fluctuations. Whereas the A domain exhibits near-uniform high order parameters, the B domain N-and C-terminal segments exhibit increasing disorder ("ramps") at its N terminus (residues B1-B8) and BC junction (residues B25-B30).
folding and trafficking in mammalian cell culture with in vitro studies of protein structure, stability, and activity. Evidence has been provided that mutations in the arm can cause a broad range of effects on folding efficiency with possible clinical implications. Although no mutations associated with neonatal DM have been found to date in the distal arm (14,60), it is possible that such mutations would induce less marked ER stress and ␤-cell dysfunction than those associated with neonatal-onset DM (which cluster in the proximal arm or within the ␣-helical domain) and so would present later in life (17). Evidence for mutation-specific ages of DM onset associated with the extent of biosynthetic impairment has recently been provided by comparison of mutations at arm position B6 (99). Whereas a neonatal mutation (Leu B6 3 Pro) is predicted to distort both packing of the arm and its main chain conformation, a less perturbing substitution (Leu B6 3 Met) is associated with onset of DM in adolescence and adulthood (17-36 years of age) (99). 10 Distal arm mutations might thus present as a form of maturity-onset diabetes of the young or as polymorphisms conferring susceptibility to adult-onset ␤-cell dysfunction in the context of obesity (28,100).
The genetics of neonatal DM and other diseases of misfolding suggest that protein evolution has been constrained not only by structure and function, but also by folding efficiency and, in the breach, the associated risk of toxic misfolding. The present studies of des-B1-insulin and arm analogs of proinsulin have demonstrated that efficient folding may require the transient function of a conserved folding element and that this essential role may be structurally unapparent once the ground state is reached. The contributions of Phe B1 and Val B2 to cellular foldability are particularly striking in light of their high thermal B-factors in crystals (39) and 1 H NMR motional narrowing in solution (40 -42). Whereas these and other nonpolar side chains (including Leu B6 , Ile A10 , and Leu A13 ) may contribute to hydrophobic collapse near cysteines (at B7, A6, A7, and A11), the efficiency of disulfide pairing is likely to require specific structural features of His B5 and Gly B8 as visualized in the classical T-state of insulin (39). It is also possible that the aromatic ring of Tyr A14 contributes to folding efficiency despite its variable positioning among crystal structures and flexibility in solution (72).
The crystallographic TR transition in zinc insulin hexamers has long provided a model for the transmission of conformational change in proteins (39). Although chameleon sequences analogous to the N-terminal arm of proinsulin are uncommon in the overall crystallographic data base, structures of native states, once reached, may mask the extent of conformational plasticity among protein-folding intermediates. We anticipate that functional annotation of protein structures will in general require multidisciplinary efforts to decipher the folding information hidden in their sequences.