The Folding Nucleus of the Insulin Superfamily

Oxidative folding of insulin-like growth factor I (IGF-I) and single-chain insulin analogs proceeds via one- and two-disulfide intermediates. A predominant one-disulfide intermediate in each case contains the canonical A20–B19 disulfide bridge (cystines 18–61 in IGF-I and 19–85 in human proinsulin). Here, we describe a disulfide-linked peptide model of this on-pathway intermediate. One peptide fragment (19 amino acids) spans IGF-I residues 7–25 (canonical positions B8-B26 in the insulin superfamily); the other (18 amino acids) spans IGF-I residues 53–70 (positions A12–A21 and D1–D8). Containing only half of the IGF-I sequence, the disulfide-linked polypeptide (designated IGF-p) is not well ordered. Nascent helical elements corresponding to native α-helices are nonetheless observed at 4 °C. Furthermore, 13C-edited nuclear Overhauser effects establish transient formation of a native-like partial core; no non-native nuclear Overhauser effects are observed. Together, these observations suggest that early events in the folding of insulin-related polypeptides are nucleated by a native-like molten subdomain containing CysA20 and CysB19. We propose that nascent interactions within this subdomain orient the A20 and B19 thiolates for disulfide bond formation and stabilize the one-disulfide intermediate once formed. Substitutions in the corresponding region of insulin are associated with inefficient chain combination and impaired biosynthetic expression. The intrinsic conformational propensities of a flexible disulfide-linked peptide thus define a folding nucleus, foreshadowing the structure of the native state.

Although not well ordered, IGF-p exhibits an unexpected richness of nascent structure. Its conformational propensities, foreshadowing the structure of the native state, suggest a model for an IGF-I folding nucleus. This model is likely to generalize to other members of the insulin-related superfamily, including proinsulin.
Our studies are motivated by general principles of protein folding and their application to the insulin-related superfamily. The native state of a globular protein may be viewed as the coalescence of discrete subdomains (16). Folding trajectories exhibit transient formation and stabilization of native-like elements of secondary structure (17). 3 Coalescence of resulting microdomains is consistent with classical diffusion-collision and framework mechanisms (18). Although the existence of funnel-like free-energy landscapes suggest the importance of parallel events in folding (19), preferred trajectories may exist (20), effectively defining predominant classes of structural intermediates. Associated transition states and intermediates have been probed by respective analyses of residue-specific values (21) and hydrogen-deuterium exchange (22). Structural insights have been obtained from equilibrium models of proteinfolding intermediates (23,24), including peptide fragments (25,26).
is played in each case by formation of the canonical A20 -B19 disulfide bridge (cystine 18 -61 in IGF-I). In the native states of insulin and IGF-I ( Fig. 1) this bridge connects the C-terminal ␣-helix of the A-domain to the central ␣-helix of the B-domain, packing within a cluster of conserved aliphatic and aromatic side chains in the hydrophobic core ( Fig.  1; Refs. [31][32][33]. In the IGF-I pathway near neutral pH this is the only one-disulfide species to accumulate (34). 4 Cystine A20 -B19 in the insulin-related superfamily is thus proposed to stabilize a specific and polarized folding nucleus (specific folding nucleus (35,36)). The structural role of the A20 -B19 disulfide bridge in IGF-I intermediates has been investigated through construction of IGF-I analogs containing pairwise substitution of the other cysteines by Ala or Ser (15). Such analogs exhibit partial folds with attenuated but non-negligible ␣-helix content. As successive disulfide bridges are introduced in these equilibrium models, 1 H NMR spectra exhibit a progressive increase in chemicalshift dispersion, suggesting stepwise stabilization of structure (15). Although resonance assignments were not obtained in these studies, comparison to the assigned spectrum of native IGF-I (37, 38) suggest formation of a native-like subdomain near cystine A20 -B19 (IGF-I residues 18 -61; Ref. 15). Additional evidence for the critical role of this cystine has been obtained through structural analysis of insulin analogs lacking either cystine A6 -A11 or A7-B7 (36,39,40). Because information required for the folding of proinsulin is contained within the A and B chains (41,42), these analogs may be regarded as peptide models of corresponding two-disulfide proinsulin intermediates. Although of low thermodynamic stability and partially unfolded, their predominant conformations contain a nativelike cluster of non-polar side chains surrounding the A20 -B19 disulfide bridge. Structures of analogous one-disulfide insulin analogs have not been described. In this article we describe the synthesis and characterization of a one-disulfide-linked peptide model of the populated 18 -61, IGF-I intermediate (14,29). The pattern of polar and non-polar residues in the polypeptide (IGF-p; Table 1) is conserved among insulin, IGF-II, relaxin, and relaxin-related proteins (cysteines are boxed, and the critical residues Cys 18 and Cys 61 are marked by asterisks; top line of Fig. 2, A and B). An analogous one-disulfide PIP analog containing cystine A20 -B19 has been described by Feng and co-workers (43). Although the reduced polypeptide could efficiently form the A20 -B19 disulfide bridge in vitro, its folding and secretion in Saccharomyces cerevisiae were severely impaired. The PIP analog was found to exhibit marked attenuation of CD-defined helix con- tent 5 and non-cooperative reduction in a glutathione redox buffer (43). These findings provide evidence of major perturbations but do not resolve possible regions of native-like or nonnative structure.
The present NMR study provides a residue-specific view of a putative folding nucleus conserved within the insulin-related superfamily. The results demonstrate native-like conformational preferences within an A20 -B19-related microdomain. Use of a peptide model facilitates complete 1 H NMR resonance assignment, which can otherwise be confounded in partially folded proteins by limited dispersion. IGF-p exhibits native-like conformational propensities within an ensemble of disordered and partially folded states. Strikingly, although most chemical shifts are close to random coil values, non-random inter-residue NOEs are observed. These define native-like elements of nascent secondary structure and provide evidence for the transient clustering of non-polar side chains. Comparison of the 1 H NMR spectrum of IGF-p to that of a one-disulfide analog of intact IGF-I (15) suggests that the predominant conformational features of the physiological folding intermediate are specified by a subset of A-and B-domain sequences.
The conformational propensities of IGF-p foreshadow the native structure of IGF-I. Although these features represent only transient interactions within an otherwise unfolded ensemble, they may be central to the mechanism of oxidative folding. We propose that within the insulin superfamily an (A20 -B19)-related microdomain orients these thiolates for initial disulfide bond formation, stabilizes the one-disulfide intermediate once formed, and provides a platform for non-random formation of subsequent disulfide bridges. Implications for the foldability of insulin and related proteins are discussed.

MATERIALS AND METHODS
Peptide Synthesis-The protocol for solid-phase synthesis is as described (44). The unlabeled linear sequences (peptides P B and P AD ; Fig. 2, A and B) were assembled by Fmoc/t-butyl based methodology (45) using an ABI-431A instrument (Applied Biosystems, Foster City, CA). The manufacturer's single coupling protocol utilizing preformed N-hydroxybenzotriazole esters was utilized with no modification. Starting with commercially available Wang resins (46), each synthesis was carried out on 0.25-mmol scale using Fmoc-protected amino acids with standard side chain protecting groups. After completion of syntheses, resins were subjected to acidolytic deprotection and cleavage using a mixture consisting of trifluoroacetic acid, thioanisole, ␤-mercaptoethanol, water, and phenol in a ratio of 80:5:5:5:5, respectively. After stirring for 4 h at room temperature, the suspension was filtered, the filtrate was concentrated using a rotary evaporator, and the crude material was precipitated by addition of diethyl ether.
Isotopic Labeling of Peptides-Labeled peptides were assembled manually by t-Boc/benzyl methodology to enable realtime monitoring of coupling efficiencies by the ninhydrin reaction (47). Starting with either 4-4-methylbenzylhydrylamine resin (48), the sequences were assembled by a manual dicyclohexylcarbodiimide/N-hydroxybenzotriazole-mediated first coupling followed by a routine second coupling. At labeled residues the first coupling involved activation of the labeled material with dicyclohexylcarbodiimide/N-hydroxybenzotriazole followed by an extended coupling period of several hours. To conserve the labeled material, an additional coupling at that position was carried out using the unlabeled form of the same amino acid. Following removal of the N-terminal t-Boc group, peptide resins were cleaved under high HF conditions for 60 min at 0°C using 5% p-thiocresol and 5% m-cresol as scavengers. Sites of specific isotopic labeling are summarized in Table 1.
Peptide Purification-Reverse-phase high-performance liquid chromatography (RP-HPLC) was used to purify both Fmocand t-Boc-derived materials. Crude peptides were applied to a Vydac C-18 column and eluted with a linear acetonitrile gradient in 0.1% trifluoro-acetic acid. Fractions exhibiting the best analytical profile were pooled and lyophilized. Yields of purified peptides were in the 10 -20% range, based on initial resin substitution levels.
Disulfide Pairing-To obtain IGF-p, chain combination reactions were performed by mixing equimolar amounts of the individual chains at a concentration of 1.0 -1.5 mg/ml in distilled water/acetonitrile (2:1) at pH 9.5 for 48 -72 h. HPLC analysis of reaction mixtures revealed three peaks (each ϳ30% of the total), corresponding to the two homodimers as well as the desired heterodimer. To optimize the yield of labeled material following preparative RP-HPLC isolation of the heterodimer, labeled homodimers were reduced in the presence of dithiothreitol, re-purified, and used in another chain combination. Recycling labeled material resulted in an eventual yield of 10 -20 mg of each heterodimer. Each heterodimer 5 Deconvolution of CD spectra suggest that native PIP exhibits a helix content of 48%, consistent with crystal structures of related analogs (83), the onedisulfide PIP analog exhibits a helix content of about 9% (43), similar to that observed on removal of cystine A7-B7 (84). was subjected to amino acid analysis and mass spectroscopy; results coincided closely with expected values. As a control for disulfide-dependent structural features, spectroscopic studies were in part repeated following reduction of the inter-chain disulfide bridge by dithiothreitol; perdeuterated dithiothreitol was purchased from Cambridge Isotopes, Inc.
(Woburn, MA). Gel Permeation Chromatography-To assess its oligomeric status, IGF-p was fractionated with Superdex Peptide column HR 10/30 (30 cm ϫ 10 mm; Amersham Biosciences) using a Waters 515 HPLC system. The column was equilibrated in 5% acetic acid (pH 2.5) in 50 mM KCl and run at a flow rate of 0.5 ml/min. IGF-p and control samples (25 l) were introduced through a Waters 717-plus autoinjector. A range of molecular weight standards was employed to calibrate the column: mellitin (2.9 kDa), a 34-residue fragment of PTH-rp (a monomeric and disordered peptide of similar length to IGF-p), the dimer of HNF-p1 (the dimerization domain of hepatocyte nuclear factor-1␣, 7.7 kDa), human proinsulin (86 residues), and cytochrome c (12.5 kDa). IGF-p was loaded at a polypeptide concentration of 200 M; elution was monitored at 280 nm. Samples and running buffer were cooled to 4°C prior to loading; at elution the temperature was about 15°C.
Circular Dichroism-Far ultraviolet CD spectra were acquired using an Aviv spectropolarimeter equipped with thermister control. Samples were made 30 -50 M in dilute HCl (pH 2); spectra were acquired at 4 and 25°C in quartz cuvettes with 1-mm path length. Spectra were also obtained after addition of 20% ethanol and 20% trifluoroethanol (TFE), chosen as helicogenic organic cosolvents to mimic possible effects of 20% deuteroacetic acid in NMR studies. Spectra were also obtained in phosphate-buffered saline (pH 7.4). Helix contents were estimated by Selcon-3 (49).
NMR Studies of IGF-p-Spectra were obtained at 600 and 700 MHz in 20% deuteroacetic acid at 4 -25°C. These conditions, previously employed in NMR studies of IGF-I analogs (15) and insulin (33,35,50), enhance solubility relative to neutral pH. Resonance assignment was based on homonuclear two-dimensional NOESY (mixing times 80 and 250 ms), total correlation spectroscopy (TOCSY; mixing time 55 ms), and double-quantum filtered correlated spectroscopy spectra. In key cases resonance overlap was resolved based on selective 2 H and 13 C labeling; 1 H-13 C HSQC and 13 C-edited NOESY spectra of labeled samples (Table 1) were obtained at 700 MHz. Control spectra were obtained in the absence of acetic acid at pH 2 (0.01 N HCl) and on progressive dilution in 20% acetic acid to a polypeptide concentration of 50 M. Amide proton exchange was monitored in a D 2 O solution containing 20% deuteroacetic acid (51).
Natural Abundance 13 C NMR Studies-1 H-13 C HSQC spectra of unlabeled IGF-p were obtained at a polypeptide concentration of 1.5 mM at 700 MHz with cryogenic probe. The acquisition time was 18 h per spectrum. Spectra were obtained in (i) 20% deuteroacetic acid at 7, 15, 25, and 35°C; (ii) 100% D 2 O at pH 2 and 7°C; and (iii) 20% deutero-TFE at pH 2 and 7°C. Due to low digital resolution, line broadening is manifest as attenuated signal intensity resulting from transverse relaxation during the pulse sequence. 1 H-1 H TOCSY experiments were obtained D, corresponding CD spectra of native IGF-I. Because the apparent helix content of IGF-I in 80% TFE is consistent with crystal structures, its successive increase in helix content on addition of TFE may in large part reflect progressive stabilization of native ␣-helices in otherwise molten structure at pH 2.

TABLE 1 Sequence of IGF-p and sites of isotopic labeling
Sites of 13 C labeling are shown in bold; sites of 2 H labeling are underlined. Sample 1 indicates unlabeled IGF-p. Peptide P B comprises residues 7-25 in intact IGF-I (italics above sequence) and P AD comprises residues 53-70.
following each HSQC experiments to enable partial resonance assignment. NMR Studies of IGF-(18 -61)-1 H NMR spectra of the corresponding one-disulfide analog of IGF-I were obtained in 10 and 20% deuteroacetic acid as described (15). Presumptive selected resonance assignments, proposed by Narhi et al. (15) by analogy to the assigned spectrum of native IGF-I, were verified by explicit sequential assignment. Assignments were obtained in the regions of IGF-(18 -61) spanned by IGF-p but incomplete elsewhere due to degeneracy. Spectra were obtained at 600, 700, and 750 MHz, the latter were acquired at Varian Instruments Inc. (Palo Alto, CA).

RESULTS
IGF-p and three labeled analogs were prepared by chemical synthesis (Table 1). Residue numbers refer to the corresponding positions in native IGF-I; disulfide bridges and structural elements are also designated by canonical insulin nomenclature (e.g. the central ␣-helix of the B-domain spans residues 8 -18 in IGF-I, corresponding to canonical helix B9 -B19 in insulin). Peptide P B (19 amino acids; IGF-I residues 7-25) thus spans canonical residues B8 -B26. In the structure of native IGF-I (37,38,52,53) this region forms a U-shaped helix-turn-strand super-secondary structure. Peptide P AD (18 amino acids) spans IGF-I residues 53-70 (positions A12-A21 and D1-D8). In native IGF-I this region comprises the C-terminal A-chain helix (residues 53-60) and less well organized D-domain extension (52,53). Gel-permeation chromatography under conditions similar to those employed for NMR studies indicates that IGF-p is monomeric at a polypeptide concentration of 200 M. 1 H NMR spectra are unaffected by polypeptide concentrations in the 50 M to 1.5 mM range.
CD Studies Suggest Disorder with Latent Helical Propensity-Far UV CD spectra of IGF-p, obtained at 4°C at either pH 2 or 7.4 (solid lines in Fig. 3, A and B, respectively), suggest that the polypeptide contains little organized structure at either pH. The estimated mean helix content is at most 8 -10% (corresponding to one helical turn). The spectrum of native IGF-I under these conditions (dotted lines in Fig. 3, A and B) by contrast exhibits a distinct helix-related maximum near 195 nm, minimum at 208 nm, and shoulder at 222 nm. These features correspond to an apparent helix content of 16 -20%, less than the actual extent of ␣-helix in crystal structures of IGF-I (25 of 70 residues, or 36%; Ref. 52 and 53). This discrepancy, which presumably reflects the flexibility of IGF-I in solution (37,38), highlights the limitations of CD deconvolution algorithms trained on the basis of well ordered globular proteins (49). CD spectra of the one-disulfide analog IGF-(18 -61) has features intermediate between the spectra of IGF-p and native IGF-1 (dashed lines in Fig. 3, A and B).
Helical CD features of both IGF-p and IGF-I are markedly enhanced on addition of TFE (Fig.  3, C and D, respectively). In 20% TFE (v/v) the spectrum of IGF-p exhibits a distinct minimum at 208 nm and shoulder at 222 nm (dashed line in Fig. 3C). These features, similar to those of IGF- (16 -81) in the absence of TFE (dashed lines in Fig. 3, A and B), correspond to an estimated helix content of about 13%. Helical CD features are further accentuated in 40, 60, and 80% TFE; these spectra, nearly identical (Fig. 3C), correspond to helix contents of 25-27%. The CD spectrum of IGF-I also exhibits accentuated helical features in the presence of TFE (Fig. 3D). Unlike in studies of IGF-p, successive addition of TFE in the 40 -80% range is associated with a progressive increase in the magnitude of helix-related features at 195, 208, and 222 nm. In 80% TFE the CD spectrum of IGF-I is consistent with its crystal structure. Comparison of the TFE dependence of the CD spectra of IGF-p and IGF-I suggest that the peptide model may contain a latent helical propensity analogous to that of the native protein.
NMR Studies Demonstrate Native-like Conformational Propensities-In light of the low resolution of CD spectroscopy, NMR studies were undertaken to provide a residue-specific probe of possible conformational propensities. One-dimensional 1 H NMR spectra of IGF-p are shown in Fig. 4 (A and B, 4 and 25°C, respectively) in relation to spectra of IGF-(18 -61) (C) and native IGF-I (D). The one-disulfide polypeptides ( panels A-C) exhibit a marked reduction in chemical shift dispersion relative to native IGF-I (panel D), especially evident in the methyl region (0 -1 ppm). The majority of 1 H NMR resonances in IGF-p are near random coil values, consistent with the absence of significant CD-detectable structure in the absence of TFE. As expected, the spectrum of IGF-p (37 residues) contains less than half of the overall spectral amplitude of the corresponding IGF-(18 -61) (70 residues). Line widths at 4 (Fig. 4) and 40°C (not shown) are in each case as expected for a monomeric polypeptide of the expected molecular masses. Resonance assignments of IGF-p and IGF- (18 -61) are provided as supplemental materials (Tables S1 and S2, respectively). Resonances assignments of native IGF-I are given in Table S3.
Despite the overall paucity of chemical shift dispersion, aromatic chemical shifts in IGF-p and IGF-  The NOESY spectrum of IGF-p at 4°C (Fig. 5) contains an unexpected richness of inter-residue cross-peaks, suggesting transient formation of non-random local and non-local contacts in the ensemble. Key NOEs are observed with similar intensities (relative to diagonal resonances) at polypeptide concentrations of 1 mM and 50 M (supplemental Materials Fig.  S1), suggesting that these contacts are intrinsic to the ensemble of monomeric conformations and not a consequence of weak dimerization. To explore such conformational preferences, a mixing time (250 ms) was chosen to allow spin diffusion at low temperature (such mixing time is suitable for a 37-residue polypeptide). The results are in striking contrast to the low CD-detectable helix of IGF-p content in the absence of TFE (Fig. 3). Although resolution is incomplete, NMR evidence of the nascent helical structure is observed in each component peptide. These features (strings of successive (i, iϩ3) d ␣N and d NN NOEs spanning residues 8 -13 in P B and residues 53-60 in P AD ; Figs. 5B and 6) are consistent with the location of helices in the native protein. A summary of nascent helix-related contacts is given in Fig. 6.
Despite these helical NOE signatures, rates of amide proton exchange in IGF-p are consistent with sequence-dependent rates predicted in a random coil at this pH and temperature. The absence of amide protection in D 2 O is in accord with the negligible CD-detectable helical structure. Explicit demonstration of a conformational equilibrium between an unfolded ensemble of states and transient native-like structures is pro-vided by natural-abundance 1 H-13 C HSQC spectra ( Fig. 7 and supplemental materials Fig. S2). The positions of cross-peaks in the H ␣ -C ␣ region, relative to residue-specific random coil values, reflect the influence of local secondary structure on chemical shifts. Although the 1 H-13 C HSQC "fingerprint" of IGF-I at 35°C exhibits random coil shifts (supplemental materials Fig.  S2, A) (has not completely been assigned), a trend is observed at lower temperatures wherein cross-peaks of residues within regions of nascent structure are broad and hence attenuated, whereas cross-peaks outside of these segments are intense. This trend is more pronounced in 20% deuteroacetic acid at 7°C (Fig. 7) than in dilute HCl at the same pH and temperature (supplemental materials Fig. S2, B). Examples are provided by the broad H ␣ -C ␣ cross-peaks of Val 11 , Ala 13 , Val 17 , Cys 18 , Gly 19 , and Gly 22 in P B and by Cys 61 in P AD . In fact, only a minority of resonances at 7°C (in the presence or absence of deuteroacetic acid) exhibits intensities similar to those at 35°C as expected of a random coil. These anomalous features provide evidence for conformational exchange within a disordered ensemble. Addition of 20% deutero-TFE leads to further broadening of these conformationally sensitive resonances and a shift in the H ␣ -C ␣ cross-peak of Val 11 toward a helical fingerprint position (supplemental materials Fig. S2, D). Traces through representative cross-peaks in the natural abundance 1 H-13 C HSQC spectra, assigned to regions of disorder or nascent structure, are provided in the supplemental materials (Fig. S2, E and F, respectively). The greater sensitivity of 1 H-13 C HSQC spectroscopy (relative to homonuclear TOCSY and NOESY spectroscopy) to conformational broadening is due to a combination of the breadth of the 13 C chemical shift scale (⌬ between substates) and the marked dependence of HSQC cross-peak amplitude on 13 C ␣ T 2 relaxation times.
To investigate the dependence of the nascent structure on maintenance of the 18 -61 disulfide bridge, IGF-p was reduced with 25 mM deuterated dithiothreitol in 25 mM ammonium bicarbonate (pH 8). Following lyophilization, the reduced sample was reconstituted in 20% deuteroacetic acid. Its natural abundance 1 H-13 C HSQC spectrum at 7°C contained random coil P AD resonances without evidence of conformational broadening; the NOESY spectrum likewise contained few non-local contacts. P B cross-peaks were not observed in either spectrum, presumably due to aggregation or fibrillation of the reduced peptide.

C-Edited NMR Studies Demonstrate Nascent Tertiary
Structure-Comparison of the NOESY spectrum of IGF-p with spectra of native IGF-I (37,38) and native insulin (50) indicates that P B contains a nascent native-like supersecondary structure: NOEs following the helix (Arg 21 H N /Gly 22 H N , Gly 22 H N /Asp 20 -H ␣,␤ , and Phe 23 H ␦,⑀ /Leu 14 ␦ 1,2 -CH 3 ) provide evidence of a ␤-turn. Although the nascent helix in P AD also corresponds to a ␣-helix in native IGF-I and insulin, some (i, iϩ3) NOEs in P AD are anomalous (supplemental materials Table S5). Such contacts, not characteristic of canonical ␣-helices, presumably reflect the flexibility of the nascent helix. The disulfide bridge in IGF-p gives rise to characteristic inter-peptide NOEs between Cys 18 and Cys 61 (supplemental materials Table S5). The features are independent of the polypeptide concentration  SEPTEMBER 22, 2006 • VOLUME 281 • NUMBER 38 in the range 50 M to 1 mM at 4 -7°C but are not observed at 25°C.

An Insulin-related Folding Intermediate
The NOESY spectrum of IGF-p contains two key long-range NOEs, one between side chains Phe 23 -Leu 14 ( Fig. 5D and supplemental materials Table S5), and the other between side chains Tyr 60 -Leu 14 (and/or Tyr 60 -Leu 10 ). The first NOE provides evidence for a turn in P B leading to a native-like supersecondary structure (see above); the second provides evidence for a native-like inter-chain helix-helix interaction. Because these and other diagnostic resonances are only partially resolved, site-specific 13 C and 2 H labels were introduced (Table  1). 1 H-13 C HSQC spectra verify sites of 13 C labeling (supplemental materials Fig. S4). 13 C-Edited NOESY spectra (Fig. 8) confirm assignments of cross-peaks that would otherwise be ambiguous in the spectrum of the unlabeled polypeptide (Fig.  5). An example is provided by the overlapping methyl resonance of Leu 57 and Leu 10 . In one of the labeled samples the three leucine residues in P AD are perdeuterated, making unambiguous the assignment of the remaining cross-peak between Tyr 60 and Leu 14 (Fig. 8A). This contact corresponds to canonical packing of Tyr A19 against Leu B15 in the core of insulin (13). A long-range NOE between Phe 23 and Leu 14 was likewise resolved by isotope editing (Fig. 8B), ambiguous in the unlabeled spectrum (Fig. 5) due to overlap between the aromatic resonances of Phe 23 and Tyr 24 /Phe 25 . A potential non-native NOE between Tyr 24 and Leu 14 was excluded by its absence in an appropriately labeled sample (Fig. 8C). The latter spectrum also identified a medium-range Tyr 24 /Arg 21 contact (Fig. 8C). Other regions of this spectrum (supplemental materials Fig. S5) contain native-like helix-related (i, iϩ3) NOEs in P AD and a long-range NOE (Ala 62 -Val 17 ), whose proximity is constrained by the adjoining 18 -61 disulfide bridge.
Relationship of IGF-p to Corresponding One-disulfide Intermediate-IGF- (18 -61) at 40°C yields a pattern of inter-residue NOEs similar to that of IGF-p at 4°C (supplemental materials Fig. S3 and Table  S5). Such similarity suggests that cooling IGF-p enhances intrinsic conformational preferences that in IGF- (18 -61) are stabilized at higher temperature by flanking regions of the 70-residue polypeptide. Sequential assignment of the A and B-domains of IGF-(18 -61) demonstrates corresponding helical segments and long-range NOEs (supplemental materials Fig. S3 and Table S3), including Phe 23 -Leu 14 and Tyr 60 -Leu 14 . Significantly, the methyl chemical shifts of Leu 14 are upfield of those in IGF-p (and downfield of those in native IGF-I), suggesting that the nascent core of the longer polypeptide is more ordered. Of particular interest, the non-random chemical shifts of the two aromatic rings in the nascent microdomain (Phe 23 and Tyr 60 ; canonical positions B24 and A19, respectively) are essentially identical in the two samples ( Fig. 5D and supplemental materials Table S2).

DISCUSSION
How the native state of a globular protein is encoded by its sequence defines a major unsolved problem (54). Although folding may be assisted by chaperones and enzymes such as protein-disulfide isomerase (55), structures ordinarily represent the ground state of a multidimensional energy landscape (56). Globular proteins form cooperatively folded structures, stabilized primarily by the hydrophobic effect and multiple weak interactions simultaneously permitted in the native state (entropic cooperativity; Ref. 20). The conformational search leading to the ground state is of broad biomedical importance as folding defects underlie diverse human diseases (57,58). This search is envisaged to occur in a funnel-like landscape (59) wherein the native state is reached from an initially large distribution of conformers (60). The efficiency of this search may be enhanced by preferred trajectories, one limit of which is the population of transient intermediates. Off-pathway intermediates may also accumulate, leading to kinetic barriers (61).
These concepts have been extensively investigated in studies of oxidative protein folding. Of interest in relation to disulfide pairing in the endoplasmic reticulum (62), oxidative folding is central to the biosynthesis of transmembrane and secretory proteins (63). In vitro the relative reactivities of thiol groups provide site-specific kinetic probes, enabling Whereas N-and C-terminal residues in each chain exhibit motional narrowing, evidence of a conformational equilibrium is provided by selective broadening of cross-peaks in regions of nascent structure. Assignment of selected C ␣ -H ␣ cross-peaks are as indicated (Gly 7 , Val 11 , Ala 13 , Val 17 , Gly 19 , and Ser 69 ); C ␤ -H ␤ cross-peaks of Ser 69 and two prolines (Pro 63 and Pro 66 ) are also indicated. The spectrum was observed in 20% deuteroacetic acid at 7°C; additional natural abundance spectra are provided as supplemental materials (Fig. S2). reaction intermediates to be trapped (64). The time course of formation and disappearance of free cysteines and specific pairing arrangements thus provides a chemical map of the oxidative folding pathway (65,66). A paradigm has been provided by bovine pancreatic trypsin inhibitor (23,24,67). Like IGF-I and insulin, bovine pancreatic trypsin inhibitor is a small globular protein containing three disulfide bridges, and so its folding pathway provides a context for comparison. The subject of vigorous debate (36), refolding is characterized by a preferred sequence of one-and two-disulfide species. Both native and non-native pairings may be observed whose relative ratios are sensitive to experimental conditions (68). Unlike IGF-I intermediates (15,35), bovine pancreatic trypsin inhibitor intermediates that are well populated at neutral pH exhibit native-like structures (23,24,67). Large kinetic barriers among such folded intermediates lead to a preferred final step: formation of an external disulfide bridge between solvent-exposed loops. Kinetic barriers are thus central to the logic of the overall disulfide pathway (27,69). Given that a populated one-disulfide species exhibits a native-like fold (67), the kinetics of disulfide formation and rearrangement are unrelated to the logic of the initial conformational search from the unfolded state ensemble.
Refolding of IGF-I-The insulin-related superfamily provides a complementary system for studies of oxidative folding. Profound differences from bovine pancreatic trypsin inhibitor are observed. IGF-I and related proteins (including proinsulin and insulin) are significantly less stable (35,68); kinetic barriers among populated two-disulfide species are also lower (35,39). Furthermore, and most significant, the conformational search is fundamentally intertwined with formation of disulfide bridges: analogs of IGF-I and insulin exhibit stepwise stabilization of native structural elements with successive disulfide pairing (29,32,36). Oxidative folding of IGF-I and related proteins may thus be viewed as occurring on a sequence of successive energy landscapes whose topography is constrained by disulfide pairing (35).
Evidence for non-random initial folding trajectories is provided by the trapping of a unique one-disulfide intermediate containing the A20 -B19 disulfide bridge (cystine- (18 -61) in IGF-I; Refs. 15 and 43). Although several subsequent twodisulfide species have been characterized, this species appears to be an obligatory on-pathway intermediate. 6 The initial conformational search leading to formation of this disulfide bridge is thus of central importance. Here, we have constructed and characterized a peptide model of this key intermediate. Although highly flexible, analysis of inter-molecular NOEs at low temperature indicates transient formation of a native-like microdomain surrounding the A20 -B19 disulfide bridge. We thus view IGF-p as adopting an ensemble of disordered structures in equilibrium with native-like conformational substates (Fig. 9). We imagine that transient formation of a native-like subdomain provides an "internal template" constraining subsequent folding trajectories and possible disulfide pairings.
The richness of the NOESY spectrum of IGF-p is remarkable in light of its uninformative CD spectrum. This apparent contradiction presumably reflects the differing physical mechanisms of these probes: whereas CD detects the mean handedness of the ensemble of main chain conformations, NOEs can arise through the transient proximity of protons in a flexible polypeptide. Because the CD spectrum excludes formation of well ordered ␣-helices, we interpret non-random NOE patterns as evidence of nascent conformational preferences rather than fixed structure. Despite such NOEs, chemical shift dispersion is limited. We presume that conformational fluctuations lead to averaging of ring-current shifts and other sources of chemical  Table 1: sample 2 (spectrum A), sample 3 (B), and sample 4 (C ). In each case NOEs between aromatic and aliphatic resonance are shown in the upper box, and intra-residue aromatic NOEs in the lower box.
shift anisotropy that could otherwise lead to non-random secondary shifts (70). 7 The low apparent helix content implied by the CD spectrum of IGF-p is also in accord with the absence of protected amide resonances in D 2 O within NOE-defined nascent helical segments. Evidence for conformational fluctuations in a related analog 1GF-(18 -61) has previously been obtained from tyrosine fluorescence studies (15).
Although the NOESY spectrum of IGF-p contains far fewer cross-peaks than the spectrum of native IGF-I, no non-native NOEs are observed, i.e. contacts not in accord with the crystal structure of IGF-I. Comparison of IGF-p and IGF-(18 -61) (15) suggest that structural relationships in the peptide are retained and extended in the physiological intermediate, presumably due to interactions involving segments of IGF-I not represented in the peptide model. By analogy to two-disulfide insulin analogs (35,36), we speculate that the non-polar side chain of Ile 43 contributes to desolvation of a hydrophobic mini-core even in the absence of A-domain helix 42-49. The conformational preferences of IGF-p nevertheless demonstrate that essential folding information is intrinsic to a subset of A-and B-domain residues. These results significantly extend previous NMR studies of IGF-I disulfide analogs (15,35).
An unusual feature of IGF-I refolding is its bifurcation to yield two products (14,29). Designated native IGF-I and IGFswap (14), the two products share the 18 -61 disulfide bridge but differ by interchange of Cys 46 and Cys 47 ; respective pairing schemes are (18 -61, 6 -48, and 47-52) and (18 -61: 6 -47 and 48 -52). These isomers exhibit near equal thermodynamic stabilities. Proinsulin and single chain insulin analogs by contrast refold to form a unique native state; alternative pairing schemes are metastable, and if present, rearrange to the native state. That the number of products is precisely two demonstrates that a non-random folding pathway is encoded but must be "ambiguous" following formation of cystine- (18 -61). Studies of chimeric PIP analogs indicate that the respective B-domains of proinsulin and IGF-I are responsible for determining the rela-tive stability of the swapped isomer (72). Specification of native IGF-I disulfide pairing in vivo is attributed to specific IGF-1binding proteins, present in equimolar proportions. Such partner proteins bind native IGF-I but not IGF-swap, thus favoring formation of the native isomer (73).
Application to Insulin and Proinsulin-Insulin chain combination provides a peptide model for the folding of proinsulin (31). Although the mechanism of disulfide pairing is not well characterized, evidence for the kinetic importance of an homologous A20 -B19-related microdomain is provided by the contrasting effects of N-and C-terminal A chain substitutions. Whereas diverse substitutions in the N-terminal segment of the A chain are well tolerated (39), substitutions in the C-terminal ␣-helix can profoundly impair efficiency of chain combination (74). Because the A20 -B19-related microdomain is proposed to function as a specific folding nucleus, destabilization of this substructure would also destabilize the transition state. An example of blocked chain combination is provided by mutations of Leu A16 (Leu 57 in IGF-I). This side chain packs against the A20 -B19 disulfide bridge and is proposed to assist in orienting Cys A20 and Cys B19 for proper disulfide pairing. Off-nucleus contacts involving the N-terminal segment of the A chain are optional in the proposed polarized transition state and so may be bypassed. Unstable N-terminal analogs are thus readily prepared. Substitutions in B-domain residues involved in the A20 -B19-related microdomain are also associated with impaired chain combination (75) and decreased efficiency of PIP expression in yeast (43).
The robustness of insulin chain combination to drastic changes in the sequence and structure of the N-terminal A chain segment demonstrates asymmetric encoding of folding information. A polarized kinetic model rationalizes why it is possible to achieve efficient biosynthetic expression and folding of PIP variants containing pairwise substitution of either "off nucleus" cystine (A6 -A11 or A7-B7) but not of cystine A20 -B19 (76). Likewise, syntheses of two-disulfide insulin analogs lacking either cystine A6 -A11 or A7-B7 were accomplished with high yield (36,39). Efficient pairing of such analogs seems remarkable in light of their marked instabilities. In transfected mammalian cells a variant proinsulin analog lacking cysteines A6 and A11 exhibits an efficiency of folding and secretion similar to that of wild-type proinsulin (39). Expression and secretion are impaired on pairwise substitution of either A7-B7 or A20 -B19, presumably due to misfolding and/or degradation.
Populated one-and two-disulfide intermediates in the folding of proinsulin have not been characterized. This is due in part to a technical barrier. Whereas the refolding of reduced IGF-I can be investigated near neutral pH (14,29), studies of proinsulin are restricted to basic conditions (pH 9.5-10.5) due to pH-dependent aggregation of the unfolded polypeptide. Under these conditions the protein rapidly oxidizes to form an assortment of three-disulfide isomers, which then rearrange to the native state (31). The relevance of this phenomenology to in vitro folding at neutral pH or physiological folding in the ER (pH 6) is unclear. Because aggregation of reduced proinsulin at neutral pH can be circumvented by the addition of proteindisulfide isomerase (  refolding pathway of proinsulin in the presence of this ER foldase. IGF-p is related to a one-disulfide analog of a mini-proinsulin (43). As in IGF- (18 -61) (15), this PIP analog was engineered by pairwise substitution of cystine A6 -A11 by Ala and A7-B7 by Ser. Its efficiency of folding and secretion in S. cerevisiae is reduced by 10 2 -10 3 -fold relative to the wildtype PIP, suggesting failure of quality control in the ER. In vitro the one-disulfide variant exhibits a marked attenuation of CD-detected helix content and non-cooperative reduction properties. Although efficient in vitro formation of the A20 -B19 disulfide bridge was observed and ascribed to a kinetically preferred pathway (43), interpretation of these results remains unclear as no control studies were described. In particular, because the variant polypeptide contained only two cysteines, assessment of efficiency and consideration of kinetic guidance await future comparison of the rates of oxidative pairing of (a) the same polypeptide in denaturant, (b) a variant polypeptide containing non-conservative substitutions in the A20 -B19-related microdomain, and (c) intrinsically unfolded model polypeptides engineered to contain two cysteines with the same spacing as in the PIP analog.
Concluding Remarks-A major challenge is posed by the problem of proinsulin folding in ␤-cells. The subtlety of this problem is highlighted by the findings of Arvan and colleagues (78) that substitutions well tolerated in vitro can be associated with disulfide mispairing in cell culture (78). Furthermore, mini-proinsulin analogs, although highly efficient in refolding assays in vitro (79), can quantitatively misfold in mammalian cell culture to yield a metastable disulfide isomer. This isomer passes ER quality control and is efficiently secreted (79). These observations strongly suggest that folding of proinsulin in vivo is under kinetic control. By implication, determinants of structure and stability in vitro may or may not correlate with determinants of kinetic guidance in vivo.
The conformational repertoire of IGF-p foreshadows the native state of IGF-I. An analogous bias toward native-like topologies within an unfolded state ensemble has been proposed by Shortle and co-workers (80) to accelerate folding and provide a barrier against misfolding (80). Although this proposal is controversial (81,82), our results establish that in the presence of a disulfide bridge a flexible peptide model of a protein-folding intermediate can exhibit nascent native-like NOEs even in the absence of stable secondary structure. Accordingly, we envisage that the A20 -B19-related microdomain of proinsulin functions in the ␤-cell as a specific folding nucleus to accelerate the initial conformational search leading to formation of canonical cystine A20 -B19 (cystine- (18 -61) in IGF-1). In the future this hypothesis can be addressed through mutagenesis of residues in proinsulin predicted to be either engaged in or peripheral to the A20 -B19-related microdomain. The existence of a polarized transition state would imply that destabilization of this microdomain, but not of flanking structural elements, would block folding, induce ER stress, and activate the unfolded protein response. The relationship between the folding of proinsulin and ER stress is of central interest in relation to ␤-cell exhaustion in Type II diabetes mellitus (83,84).