Solution Structure of Proinsulin

The folding of proinsulin, the single-chain precursor of insulin, ensures native disulfide pairing in pancreatic β-cells. Mutations that impair folding cause neonatal diabetes mellitus. Although the classical structure of insulin is well established, proinsulin is refractory to crystallization. Here, we employ heteronuclear NMR spectroscopy to characterize a monomeric analogue. Proinsulin contains a native-like insulin moiety (A- and B-domains); the tethered connecting (C) domain (as probed by {1H}-15N nuclear Overhauser enhancements) is progressively less ordered. Although the BC junction is flexible, residues near the CA junction exhibit α-helical-like features. Relative to canonical α-helices, however, segmental 13Cα/β chemical shifts are attenuated, suggesting that this junction and contiguous A-chain residues are molten. We propose that flexibility at each C-domain junction facilitates prohormone processing. Studies of protease SPC3 (PC1/3) suggest that C-domain sequences contribute to cleavage site selection. The structure of proinsulin provides a foundation for studies of insulin biosynthesis and its impairment in monogenic forms of diabetes mellitus.

Insulin plays a central role in metabolic regulation. The hormone derives from single-chain precursors preproinsulin and proinsulin in ␤-cells (Fig. 1). The nascent polypeptide contains a signal peptide, which is cleaved on translocation into the endoplasmic reticulum. Folding is coupled to specific pairing of three disulfide bridges (Fig. 1, gold). On trafficking from the Golgi apparatus to glucose-regulated secretory granules (1), the connecting (C) domain is excised by specific proteases (2), liberating the C-peptide (Fig. 1, bottom). The mature hormone thus contains discrete A-and B-domains (3).
Here, we report the structure of proinsulin. Although insulin itself has been characterized at atomic resolution (4), proinsulin is refractory to crystallization, and its NMR study has been limited by self-association (5). To circumvent these obstacles, we undertook heteronuclear NMR analysis (6) of an engineered monomer (DKP-proinsulin) 3 (7). A structural ensemble was obtained by distance geometry (DG) and simulated annealing (SA); associated subnanosecond disorder was probed by measurement of { 1 H}-15 N nuclear Overhauser enhancements (het-NOEs) (8). Proinsulin contains a well organized insulin moiety whose folding contrasts with the flexibility of the C-domain. To assess the biological significance of C-domain sequences, we investigated effects of C-domain deletions and modifications on junctional cleavage by convertase SPC3 (PC1/PC3; Ref. 9). The structure of proinsulin provides a foundation for studies of insulin biosynthesis with potential application to diabetes-associated mutations in the insulin gene (10,11).

MATERIALS AND METHODS
Bacterial Expression-DKP-human proinsulin, containing substitutions His B10 3 Asp, Pro B28 3 Lys, and Lys B29 3 Pro, was expressed in Escherichia coli and purified as described (7). The protein was labeled by fermentation in M9 minimal medium containing [ 13 C]glucose and [ 15 N]ammonium chloride as sole carbon and nitrogen sources. 13 C, 15 N-labeled C-peptide was obtained from DKP-proinsulin by tryptic digestion (12). Receptor binding activity and stability were characterized as described (13).

RESULTS
DKP-proinsulin is 8 -10-fold more active than wild-type proinsulin (supplemental Table S1). Its thermodynamic stability is also greater than that of wild-type proinsulin (⌬⌬G u 0.9 Ϯ 0.2 kcal/mol; supplemental Table S2). Enhanced receptor binding and stability are presumably the result of the Asp B10 substitution (21).
Insulin Moiety-The 1 H-15 N heteronuclear single-quantum coherence spectrum of DKP-proinsulin resembles that of DKPinsulin with the addition of C-domain cross-peaks (supplemental Fig. S4). Further insight was provided by analysis of secondary 1 H ␣ , 13 C ␣ , and 13 C ␤ chemical shifts (differences between observed and random-coil values), which are sensitive to secondary structure (22). Trends in such shifts indicate that DKPproinsulin and DKP-insulin exhibit similar structural elements within the A-and B-domains ( Fig. 2A, blue and red; supplemental Table S3). These include N-and C-terminal A-domain ␣-helices (residues A1-A8 and A12-A20), central B-domain ␣-helix (B9 -B18), and B-domain ␤-strand (B24 -B27). In accord with canonical features of an ␣-helix, the C-terminal A-domain ␣-helix and central B-domain ␣-helix each exhibit large positive secondary 13 C ␣ shifts and negative secondary 13 C ␤ shifts ( Fig. 2A). Helical secondary structure was corrobo-rated by characteristic NOEs (H N (i, iϩ1) and H N (i, iϩ3) contacts; Ref. 23). The N-terminal A-domain ␣-helix by contrast exhibits attenuated 13 C secondary shifts ( Fig. 2A) (24) despite retention of helix-related NOEs. Residue-specific SSP scores of DKP-insulin and DKP-proinsulin are illustrated by color-coded ribbon models (Fig. 2B). 5 C-domain-The BC junction and residues C1-C26 exhibit near random-coil chemical shifts ( Fig. 2A; supplemental Table  S4) and motional narrowing (supplemental Fig. S5). The absence of medium and long range contacts (despite retention of sequential and intraresidue NOEs) suggests disorder. Near the CA junction, however, residues C27-C31 (sequence ALEGS) exhibit helix-related NOEs. 13 C ␣/␤ secondary shifts and SSP scores are similar to those of the A1-A8 ␣-helix (Fig. 2,  A and B; supplemental Tables S3 and S4). No segmental substructure was observed in control NMR studies of the isolated C-peptide.  Table S5). Residues B3, B8, B21, B27-B30, and A9 are less well ordered than the remainder of the insulin moiety. 6 The V-shaped pattern of C-domain hetNOEs (Fig. 2C, black) is a signature of a flexible loop tethered at each end. Residues immediately adjoining the BC and CA junctions (including the dibasic cleavage sites) exhibit hetNOE values lower than those of adjoining residues in the insulin moiety in accord with their motional narrowing and absence of non-local interproton contacts.
Structure-A DG/SA ensemble (Fig. 3A) was calculated based on 939 distance restraints and 97 dihedral angular restraints (supplemental Table S6). The number of restraints per residue in the insulin moiety (A1-A21 and B1-B28) and C-domain nascent ␣-helix (C27-C31) was 14. There were no distance violations Ͼ0.3 Å and no dihedral angle violations Ͼ5°. r.m.s.d. values within the insulin moiety are 0.24 Å (main chain) atoms and 0.89 Å (all heavy atoms); the C-domain ␣-helix exhibits segmental precisions 0.47 Å (main chain) and 0.91 Å (all heavy atoms) (supplemental Table S6). The A-and B-domains closely resemble the cognate chains of crystallographic T-state protomers (supplemental Fig. S6 and supplemental Table S7) (4). The structure of the C-domain is not well defined. The position of its nascent ␣-helix appears uncorrelated with that of the insulin moiety (Fig. 3B). Flexible tethering of the C-domain helix to the insulin moiety is shown in schematic form in Fig. 3C (dashed lines); a representative ribbon model is shown in Fig. 3D.
SPC3 Cleavage-To establish a model of prohormone processing, we investigated sequence determinants of cleavage by SPC3. Variant C-domains were introduced within DKP-proinsulin and tested as substrates (supplemental Table S8). Each variant retained native dibasic sites (Arg 31 -Arg 32   In endoplasmic reticulum (ER), the unfolded prohormone undergoes specific disulfide pairing to yield native proinsulin (middle). Cleavage of BC and CA junctions by prohormone convertases SPC3 and SPC2 (PC1/3 and PC2) and carboxypeptidase E leads to mature insulin and the C-peptide in secretory granules (bottom). SRP and SRP-R designate the signal recognition particle and its receptor.
cleaves the BC junction with higher efficiency than the CA junction (9). An HPLC/mass spectrometry assay enabled independent assessment of BC and CA cleavage. Major determinants of substrate recognition by subtilisin/ Kex-related proteases are mediated by residues N-terminal to the scissile bond (9). Surprisingly, however, deletion of residues C-terminal to the BC junction (residues C3-C6; sequence EAED) completely blocked BC cleavage; CA cleavage was impaired by less than 2-fold. The deletion shifted uncharged residues to positions C3-C6 (LQVG). Impaired BC cleavage is unlikely to be due to foreshortening of the C-domain as an extensive deletion (⌬C8 -C32; lacking residues 38 -62) impaired cleavage of either junction by Ͻ2-fold in accord with past studies (25). Deletions ⌬C3 and ⌬C3-C4 (leaving adjoining sequences AEDL and EDLQ) impaired BC cleavage by ϳ4-fold. By contrast, activity was regained with successive deletion ⌬C3-C5 (adjoining sequence DLQV). These results suggested that the P2Ј residue (underlined above) contributes to SPC3 site selection. Indeed, a single acidic substitution at this position (Ala C4 3 Glu; position 34) completely blocks BC cleavage. Ala substitutions at positions P1Ј, P3Ј, or P4Ј (positions 33, 35, or 36) are by contrast well tolerated (supplemental Fig. S7A). The specific structural role of Ala C4 in the SPC3-proinsulin complex may enhance substrate binding as its substitution by Ser (in the context of HAED derived from glicentin; supplemental Table S8) likewise impairs BC cleavage (supplemental Fig. S7B). These findings suggest that the structural mechanism of prohormone processing has constrained C-domain sequences independently of its role in nascent protein folding.

DISCUSSION
The structure of an engineered proinsulin monomer, obtained by NMR methods, exhibits similarities to and differences from insulin. Alignment with a representative crystallographic insulin protomer (supplemental Fig. S6A) yielded average r.m.s.d. values of 1.04 Å (aligned atoms) and 1.78 Å (all heavy atoms). Although side-chain packing in the core is similar (supplemental Fig. S6B), salient differences occur in the B-domain ␤-turn (residues B20 -B23), whose conformation may be influenced by lattice packing. Solution structures of DKP-proinsulin and DKP-insulin (26) are also similar. Following main-chain alignment (A1-A20 and B5-B28), r.m.s.d. values between mean structures are 1.06 Å (main chain) and 2.22 Å (all heavy atoms). Pronounced differences occur only in the A1-A8 segment (main-chain r.m.s.d. 1.41 Å) and its A1-A4 subsegment (main-chain r.m.s.d. 1.87 Å). This difference presumably reflects tethering of the C domain.
Dynamics of Proinsulin-{ 1 H}-15 N hetNOEs provide a sensitive probe for subnanosecond fluctuations. Such studies revealed a V-shaped pattern in the C-domain, indicating progressive flexibility from its tethered ends. Although the isolated C-peptide is a random coil (supplemental Table S4), the intact C-domain contains a short ␣-helix within its C-terminal subsegment. Its position is likely to be poorly correlated with that of the insulin moiety. { 1 H}-15 N hetNOE-detected sites of flexibility within the insulin moiety are consistent with patterns of crystallographic B-factors (4). 13 C NMR resonance assignment enabled calculation of residue-specific SSP scores (18). This score predicts at each residue the percentage of folded conformers (␣-helix or ␤-strand). Limiting SSP scores of 1 or Ϫ1 imply formation of a fully formed ␣-helix or ␤-strand, respectively. In DKP-proinsulin, the mean segment-specific SSP scores are 18% (N-terminal A-domain segment), 77% (C-terminal A-domain ␣-helix), 100% (central B-domain ␣-helix), and 27% (nascent C-domain ␣-helix). The similar pattern in DKP-insulin (Fig. 2B) (12) implies that the Cdomain does not "tighten" the secondary structure of the insulin moiety. Segmental SSP scores of the A1-A8 A-domain ␣-helix and C-domain ␣-helix imply conformational fluctuations (leading to averaging of chemical shifts; Ref. 24) despite maintenance of helix-related NOEs. Because A1-A8 hetNOE values are unremarkable, the time scale of such fluctuations must be longer than the rotational correlation time of the protein in accord with previous observations of conformational broadening on the millisecond time scale (26). Further characterization of such motions may be obtained by analysis of rotating-frame spin relaxation spectroscopy (27).
Biological Implications-Conversion of proinsulin to insulin occurs in the course of trafficking through the trans-Golgi network (1). C-peptide excision requires cleavage at Arg 31 -Arg 32 (positions C1 and C2; the BC junction) and Lys 64 -Arg 64 (positions C34 and C35; the CA junction). These cleavages are effected by prohormone convertases SPC3 (PC1/PC3) and SPC2 (PC2), members of a family of calcium-dependent endoproteases that process a wide variety of precursor proteins (9). The active sites of such enzymes accommodate extended and flexible peptide substrates as observed at the BC junction. Because excision of the C-domain occurs after folding of proinsulin, however, formation of enzyme-substrate complexes may require a conformational change at the CA junction to expose an extended peptide conformation (28). The attenuated SSP profile of the A1-A8 segment, evidence of a molten conformation, implies that kinetic and thermodynamic barriers to such distortion may be low. The dibasic residues at the CA junction are themselves flexible. The nascent C-domain ␣-helix would not in itself be expected to interfere with formation of an enzyme-substrate complex.
The molten character of the A1-A8 ␣-helix may reflect the anomalous presence of three ␤-branched residues (sequence GIVEQCCT; ␤-branched residues, underlined). Ile A2 , ordinarily buried in the core, must be exposed for both prohormone processing and binding of the mature hormone to the insulin receptor (29,30). Flexibility of the N-terminal A-domain segment may also facilitate its disulfide-coupled folding in the ␤-cell by reducing kinetic barriers to pairing of cysteines A7-B7 and A6 -A11. Oxidative folding of proinsulin proceeds through a defined series of intermediates (31). Pairwise substitution of cysteines A6 -A11 by Ala or Ser leads to segmental unfolding of the A1-A11 segment (32), and more marked flexibility occurs on pairwise substitution of cysteine A7-B7 (33). Flexibility of the A1-A8 segment in protein-folding intermediates may facilitate transient deprotonation of thiolate moieties, their alignment for disulfide pairing, and interactions with protein disulfide isomerase (34). In the future, probes of protein dynamics (such as { 1 H}-15 N hetNOEs and 13 C CSI and SSP scores) may provide insight into the biophysical properties of populated disulfide intermediates (32) and perturbations caused by diabetes-associated mutations (10,11).
A general mechanism of disease is mediated by proteotoxicity, ranging from aberrant intracellular aggregation of folding intermediates to formation of extracellular fibrils. Such pathological processes are exemplified by the toxic misfolding of insulin and proinsulin (31). Due to the presence of the C domain, proinsulin is markedly less susceptible to fibrillation than is insulin (35). We suggest that despite its flexibility, the C-domain sequence has evolved to minimize its propensity to fold or misfold. The extreme resistance of the isolated C-peptide to fibrillation (35), which is otherwise a universal property of polypeptides, highlights the special nature of such a nonfolding sequence.
Concluding Remarks-The present study has demonstrated the utility of heteronuclear NMR spectroscopy in combination with protein engineering to investigate a protein refractory to crystallization. DKP-proinsulin provides a tractable model for both structural analysis and studies of prohormone processing. Asterisk indicates two ␣-helixes that exhibit conformational fluctuations as inferred from attenuation of 13 C CSI and SSP scores (Fig. 2, A and B).
Although the organized insulin moiety foreshadows the structure of the mature hormone, its flexible tethering by the C-domain facilitates nascent folding. The further characterization of variant proinsulins may enable molecular rules governing dibasic target site selection to be deciphered. The solution structure of DKP-proinsulin thus provides a foundation for analysis of prohormone processing and comparative studies of non-foldable variants associated with neonatal diabetes mellitus.