Structural Analysis of Proinsulin Hexamer Assembly by Hydroxyl Radical Footprinting and Computational Modeling*

Background: Proinsulin, an intermediate in insulin biosynthesis, is refractory to crystallization. Results: Synchrotron-based hydroxyl radical footprints of proinsulin were obtained in relation to classical structures of insulin. Conclusion: Molecular models based on footprinting provide evidence for native self-assembly and an ensemble of C-domain orientations. Significance: Footprinting promises to enable analysis of toxic aggregation of clinical proinsulin variants in neonatal diabetes mellitus. Mutations in the insulin gene can impair proinsulin folding and cause diabetes mellitus. Although crystal structures of insulin dimers and hexamers are well established, proinsulin is refractory to crystallization. Although an NMR structure of an engineered proinsulin monomer has been reported, structures of the wild-type monomer and hexamer remain undetermined. We have utilized hydroxyl radical footprinting and molecular modeling to characterize these structures. Differences between the footprints of insulin and proinsulin, defining a “shadow” of the connecting (C) domain, were employed to refine the model. Our results demonstrate that in its monomeric form, (i) proinsulin contains a native-like insulin moiety and (ii) the C-domain footprint resides within an adjoining segment (residues B23–B29) that is accessible to modification in insulin but not proinsulin. Corresponding oxidation rates were observed within core insulin moieties of insulin and proinsulin hexamers, suggesting that the proinsulin hexamer retains an A/B structure similar to that of insulin. Further similarities in rates of oxidation between the respective C-domains of proinsulin monomers and hexamers suggest that this loop in each case flexibly projects from an outer surface. Although dimerization or hexamer assembly would not be impaired, an ensemble of predicted C-domain positions would block hexamer-hexamer stacking as visualized in classical crystal lattices. We anticipate that protein footprinting in combination with modeling, as illustrated here, will enable comparative studies of diabetes-associated mutant proinsulins and their aberrant modes of aggregation.

Insulin, central to the hormonal control of metabolism, has long provided a model for the development of biophysical techniques (1). Renewed interest in the folding of insulin and its biosynthetic precursor (proinsulin) has been stimulated by the discovery of clinical mutations associated with permanent neonatal onset diabetes mellitus (DM) 2 (2)(3)(4). Although such patients are heterozygous, the remaining wild-type insulin allele fails to enable metabolic homeostasis. A variety of evidence suggests that such DM-associated mutations impair the nascent folding of the variant proinsulin and block in trans the biosynthesis of the wild-type hormone. Disulfide-coupled protein misfolding disrupts trafficking from the endoplasmic reticulum, leading to ␤-cell dysfunction and eventual cell death (5). This syndrome (which accounts for almost half of the cases of DM presenting within the first year of life) has motivated structural studies of proinsulin. Although an extensive crystallographic database of insulin structures has been obtained over the past four decades (1), proinsulin has proven refractory to crystallization. 3 We therefore sought to establish an alternative approach to the biophysical characterization of proinsulin. The present study thus employed mass spectrometry (6) in combination with synchrotron-based hydroxyl radical modification (7,8).
Insulin is a globular protein containing two chains, designated A (21 residues) and B (30 residues). The hormone is the product of single-chain precursor proinsulin (86 residues), in which a connecting domain (C; 35 residues) links the C terminus of the B-domain to the N terminus of the A-domain. Excision of the C-domain by specific proteases (prohormone convertases) occurs in post-Golgi vesicles; the mature hormone is stored within secretory granules of pancreatic ␤-cells as zincstabilized hexamers (9). A recent NMR structure of an engineered proinsulin monomer (designated DKP-proinsulin (10)) 4 provided evidence that the precursor consists of a folded insulin-like moiety with a disordered C-domain projecting from one surface. Although the range of positions accessible to the C-domain was not well defined in the NMR structure (10), 5 the results are in accordance with an analogy between the C-domain and hairs on the head of Medusa (comprised by the A/Bdomains (11)). We hypothesized that the ensemble of C-domain conformations is not truly random but instead populates an envelope of accessible conformations along a well organized insulin surface.
To test this hypothesis, we sought to define the "shadow" of the connecting domain as projected on the surface of the insulin moiety by MS-based hydroxyl radical footprinting (7,8,12). In this approach, hydroxyl radicals, generated by radiolysis of water, oxidize solvent-accessible and reactive amino acid side chains on the surface of proteins (8,12). Potential changes in accessibility of multiple side chains (as probed by increases or decreases in respective rates of oxidation rates as a function of ligand binding, protein assembly, or conformational changes) provide powerful tools for analysis of protein structures. We thus anticipated that sites of rapid modification on the surface of insulin, if by contrast protected in proinsulin, would define a C-domain footprint. An advantage of such an approach would be its broad applicability irrespective of oligomeric state. Unlike x-ray crystallography or conventional NMR spectroscopy, MSbased footprinting would in principle extend to non-native modes of protein aggregation characteristic of diverse diseases of protein misfolding.
In this study, we present the MS hydroxyl radical footprinting of human insulin (HI) and proinsulin (HPI) under two conditions: monomer and hexamer. Our analysis utilized classical crystal structures as a foundation for analysis of the additional C-domain; our results enabled a model of the HPI hexamer to be refined. In combination with the recent NMR study of DKPproinsulin ((10), a coherent dynamic model of the wild-type HPI has been obtained as a monomer and hexamer. We anticipate that these studies will provide a foundation for comparative studies of disease-associated mutant proinsulins (13,14).

EXPERIMENTAL PROCEDURES
Protein Preparation-Solutions of HI and HPI were prepared as T-like zinc-free monomers as described (15). Solutions of HI and HPI hexamer were prepared as a T 6 zinc-stabilized hexamers according to a published protocol (1). Footprinting was undertaken at protein concentrations of 5 M in 10 mM sodium cacodylate (pH 7.4).
Radiolysis-Each sample (volume 5 l) was exposed to the X-28C x-ray white beam at the National Synchrotron Light Source, Brookhaven National Laboratory for 0, 8, 15, and 20 ms at ambient temperature. Exposure times were controlled by using an electronic shutter (Vincent Associates, Rochester, NY). Experiments were performed at ring energy 2.8 GeV with beam currents ranging between 195 and 225 mA. X-ray beam parameters were optimized using Alexa Fluor 488 fluo-rophore assay as described (16). To quench Met oxidation unrelated to primary hydroxyl radical attack, a buffer consisting of 10 mM Met-NH 2 ⅐HCl (pH 7.0) was added immediately after irradiation.
Proteolysis and MS Analysis-After exposure, proteins were reduced, alkylated, and subjected to proteolysis by modified trypsin (Promega) at an enzyme-to-protein ratio of 1:20 w/w at 37°C overnight. The digestion reaction was terminated by freezing. The digests (ϳ1 pmol) were loaded onto a PepMap reverse-phase trapping column (300 m ϫ 5 mm C18) to preconcentrate and wash away excess salts using a nano HPLC UltiMate-3000 (Dionex) column switching technique; reversephase separation was then performed on a C18, PepMap column (75 m ϫ 15 cm). Buffer A (100% water and 0.1% formic acid) and buffer B (20% water, 80% acetonitrile, and 0.1% formic acid) were employed in a linear gradient. Proteolytic peptides eluting from the column with an acetonitrile gradient (2% per min) were directed to an LTQ-FT mass spectrometer (Thermo Fisher Scientific) equipped with a nanospray ion source and with the needle voltage of 2.4 kV. Mass spectra were acquired in data-dependent experiments such as MS, and tandem MS spectra were acquired in the positive-ion mode, with the following acquisition cycle: a full scan recorded in the Fourier transform analyzer at resolution (R) 100,000 followed by MS/MS of the eight most intense peptide ions in the linear trap quadrupole (LTQ) analyzer. Dose-response curves were obtained by plotting the fraction unmodified for each peptide as a function of exposure time. Mass spectra were acquired in the positive ion mode; detected ion currents were utilized to determine the extent of oxidation by separate quantitation of the unmodified proteolytic peptides and their radiolytic products. Sites of oxidation were determined from tandem MS/MS data. MS/MS spectra of the peptide mixtures were searched against the human database for modifications (oxidation) of the tryptic peptides from HI using the Mascot search engine (Matrix Science Co.). Interpretation of MS/MS mass spectra of modified peptides was manually verified and correlated with hypothetical MS/MS spectra predicted for proteolysis products.
Calculation of Modification Rates-Integrated areas for each unmodified and modified peptide ion were calculated from selected ion current chromatograms. All peak areas obtained for multiple modified species with single or multiple modifications in the same peptide were added into the sum total modified. The extent of modification was then calculated from the ratio of the integrated peak area under the ion signals for the unmodified peptides to the sum of those for the unmodified peptide and their radiolytic products (sum total modified). Background modification seen for some peptides in the unexposed sample was subtracted from the totals. The fraction unmodified peptide was fit to the equation Y ϭ Y 0 e Ϫkt using Origin 6.0 (MicroCal Software, Inc., Northampton, MA), where Y and Y 0 are the fraction of unmodified peptide at a time t and 0 s, respectively, and k is a first-order rate constant. Dose-response curves were generated by plotting the fraction unmodified for each peptide as a function of x-ray exposure time.
Modeling of HPI-Cartesian coordinates of both insulin and DKP-proinsulin were retrieved from the Protein Data Bank (17). The proposed hexameric configuration of HPI was con-structed based on the crystal structure of the T 6 HI hexamer (1.5 Å resolution; Protein Data Bank (PDB) entry 4INS) (1). The solution structure of DKP-proinsulin has been solved by NMR spectroscopy (PDB entry 2KQP) (10). Mathematical averaging of coordinates of 20 model of HPI derived by NMR was performed; the average geometry was relaxed by energy minimization. Sites of mutation in DKP-proinsulin (residues B10, B28, and B29) 6 were reverted to wild type using MODELLER (18). In this procedure, the positions were sequentially substituted with concomitant local conformational optimization of the mutant side chain. Optimization was achieved using a combination of energy minimization employing a conjugate gradient algorithm and simulated annealing with molecular dynamics. Rotation and translation matrices were computed to align this average structure of HPI to each chain of the HI hexamer. LSQMAN (19) software was used to perform brute force alignment of the HPI protomer with each of the six chains of the HI hexamer; for these calculations, the minimum length of fragment to match was 30 with a step size of 15.
Side-chain Solvent Accessibility-To verify the correlation of the side-chain reactivity and solvent accessibility (SA) for modified residues for HI monomer and hexamer, SA surface areas of side chains were calculated in Å 2 using the VADAR computer program (PENCE, University of Alberta, Edmonton, Canada). The crystal structure of porcine insulin hexamer in the T 6 form (PDB entry 4INS) and the NMR structure of DKP-proinsulin (PD entry 2KQP) were utilized for this analysis. Models of HI monomers and dimers were extracted from protomers in the crystallographic hexamer. Similarly, the SA of the modeled HPI hexamer was also calculated.

RESULTS AND DISCUSSION
Radiolytic Footprinting Strategy-Samples of HI and HPI as T-like monomers (15,20) and T 6 Zn 2ϩ -stabilized hexamers (1) were exposed to a focused x-ray beam and rapidly quenched (see above). On the time scale of irradiation, the primary products were oxidative modifications of side chains of reactive amino acids without protein cross-linking or cleavage (21). Following digestion with trypsin, the four peptide products of HI and six peptide products of HPI (covering 99% of their respective sequences) were analyzed by LC-MS. Tandem MS analysis was employed to confirm the sequence identities and define specific sites of oxidative modification.
The extent of side-chain modifications was evaluated as a function of exposure time to yield first-order rate constants (7,12). Such rate constants typically correlate with measures of SA of amino acid side chains within a protein structure (8,12). The pattern of modification (i.e. increased accessibility of some protein regions and protection of others) is thus relevant to potential structures and conformational rearrangements.
Consistency of Footprinting and Crystallography-We first analyzed HI by hydroxyl radical footprinting to demonstrate its consistency with available crystallographic data; a model hexamer was provided by Protein Data Bank code entry 4INS. Four tryptic peptides and their oxidative products, as derived from the HI monomer and hexamer, were analyzed quantitatively by MS-coupled nano-HPLC; 16 independent MS experiments (two sets for each sample) were performed. The four HI-derived peptides from the hexamer each exhibited significant decreases in oxidation rates relative to the monomer ( Table 1). The largest changes from monomer to hexamer were observed within peptides B11-B22: in particular, residues Tyr B16 , Leu B17 , and Arg B22 exhibited a 4.8-fold decrease in oxidation rate. These observations suggest that the latter reactive side chains are protected in the HI hexamer relative to HI monomer (8,12,22,23) in accordance with the burial of this surface in the dimer interface (5-14-fold). Residue Phe B1 (within peptides B1-B10) experienced the next largest decrease in oxidation rate on hexamer assembly (3.5-fold), whereas the rate constants for the side chains of His B5 , Cys B7 , and His B10 within the same peptide dropped by 2.0-fold. The crystal structure of the HI hexamer revealed a large decrease in SA for Phe B1 residue (27fold), whereas His B10 showed a 2.4-fold decrease. The side chains of Phe B24 , Phe B25 , and Tyr B26 within segment B23-B29 each showed decreases in oxidation by 1.8-fold in the HI hexamer; Pro B28 side chain within the same peptide exhibited a drop in modification rate by 2.4-fold. This segment participates in dimerization; the SAs of Tyr B26 and Phe B24 were reduced by 9-fold and to 0 Å 2 , respectively, and the Pro B28 SA dropped by 5-fold in the HI hexamer relative to the HI monomer. Footprinting analysis of the monomeric and hexameric forms of HI further showed a modest decrease in modification rates (1.6fold) for residues Cys A7 , Thr A8 , Ile A10 , Leu A13 , Tyr A14 , and Tyr A19 in the A1-A21 segment. Although SAs for the rest of the oxidized side chains within peptides A1-A21 exhibited no changes upon hexamer formation, the SA for the Leu A13 side chain (as calculated from the crystal structure) decreased from 44 Å 2 (monomer) to 0.7 Å 2 (hexamer). In striking accord, residue Leu A13 of the A-chain was susceptible to oxidation only in the monomeric form; its modification was not detected in the hexamer. In the crystal structure of the insulin hexamer, only Leu A13 became buried within the A1-A21 region.
Control footprints of HI as a zinc-free monomer and T 6 zinc hexamer are thus consistent with crystallographic data. Such footprinting reliably probes the changes in side-chain accessibility on the formation of interfaces. To the extent that discrepancies were observed in the quantitative extent of change on hexamer assembly (i.e. wherein changes in rate are attenuated relative to calculated decreases in surface area), such effects may reflect conformational fluctuations in solution that are dampened in a crystal lattice.
Footprint of C-domain-Oxidized residues in monomeric HPI are depicted in Fig. 1. Rates of oxidation were calculated for all HPI peptides as shown in Table 1. SA values calculated from the average of 20 NMR models of DKP-proinsulin and from a computational model of the HPI hexamer are also shown in Table 1. A computational model for an HPI monomer was also constructed using the average of 20 NMR structures of DKPproinsulin (see "Experimental Procedures"). Side chains of Phe B1 , His B5 , Cys B7 , and His B10 within HPI peptides B1-B10 exhibited insignificant decreases in oxidation rates (22%) relative to the corresponding insulin-derived peptide. Peptides 6 Residues 1-30 in HPI correspond to B-chain residues B1-B30 in HI; similarly, residues 66 -86 correspond to A-chain residues A1-A21. Residues 31-65 (C-domain) may also be designated C1-C35.
B11-B22 of HPI similarly exhibited no changes in oxidation (within the experimental error) at sites Tyr B16 , Leu B17 , and Arg B22 . A similar correspondence of oxidation rates between HI and HPI monomers was observed at residues Cys A7 , Thr A8 , Ile A10 , Leu A13 , Tyr A14 , and Tyr A19 . These footprinting data were consistent with the SA values derived from the T 6 crystal structure of HI, the solution structure of DKP-proinsulin, and a modeled structure of wild-type HPI.
Although the core insulin moieties of HI and HPI exhibit similar footprints as zinc-free monomers, significant changes in the oxidation rates were observed within peptides B23-B29 (i.e. near the C terminus of the B-chain in HI and near the BC junction of HPI). In particular, rates of oxidation of the side chains of Phe B24 , Phe B25 and Tyr B26 were attenuated by ϳ1.5-fold relative to HI; oxidation for Pro B28 was decreased by 2-fold. Because of the correlation between SA and reaction rates (8,12,22,23), these findings suggest that reactive side chains in peptides B23-B29 are more buried (experienced greatest decrease in SA) in HPI than in HI. These overall results are consistent with the NMR structure of DKP-proinsulin as an engineered monomer (10). In the latter, as in our model wild-type structure, the presence of the C-domain is associated with a decrease

Peptide-specific rate constants in hydroxyl radical footprints
Rate constants for the oxidation of peptides from HI and HPI are shown as monomers and hexamers. Column 1 indicates the position and sequence of observed peptides; column 2 indicates observed oxidized residues. Columns 3 and 6 indicate SA (Å 2 ) of the oxidized residues, as determined by crystallographic data for the HI monomer and hexamer, respectively. Columns 7 and 10 indicate SA (Å 2 ) of the oxidized residues, as determined from NMR data for DKP-proinsulin and from computational modeling of HPI hexamer, respectively. Columns 4, 5, 8, and 9 represent respective rate constants for the oxidation of HI monomer, HI hexamer, wild-type HPI monomer, and HPI hexamer. Amino acid residues are represented with single-letter codes. DECEMBER 23, 2011 • VOLUME 286 • NUMBER 51 in the SA of the side chain of Tyr B26 (by 2.3-and 2.7-fold, respectively).

Structural Biology of Proinsulin Hexamer Assembly
Local differences at sites of substitution in DKP-proinsulin might occur relative to the structure of wild-type HPI in the monomeric state. Indeed, whereas Pro B29 in the NMR structure augmented SA relative to a crystallographic protomer of HI by 2.5-fold, the observed oxidation rate for Pro B28 in the wild-type HPI was decreased by 2-fold relative to its oxidation rate in the HI monomer. These footprinting data thus suggest that the orientation of Pro B29 in DKP-proinsulin differs from that of Pro B28 in the wild-type prohormone. The decreased B28 SA value calculated based on our model structure was consistent with the protein footprinting observations. These observations suggest the utility of MS-based footprinting in assessing engineered models and in particular highlighting the appropriateness of modeling at sites of mutational differences. Because Pro can alter main-chain trajectories, it positioning relative to the BC junction (i.e. at positions B28 or B29) might be associated with changes in C-domain trajectories.
Evidence for the flexibility of the C-domain in the HPI monomer is provided by the large oxidation rate of peptides C2-C34 and C3-C34 (exceeded only by peptides A1-A21). Oxidation of 26% of the amino acid side chains in the C-domain (Leu C7 , Val C9 , Glu C13 , Leu C14 , Pro C18 , Leu C23 , Leu C26 , Leu C28 , and Leu C32 ) was detected; these probes are evenly distributed in the C-domain sequence. Our footprinting results thus suggest that C-domain residues exhibit a uniformly high solvent exposure. The slight difference in the oxidation rates between pep-tides C2-C34 and C3-C34 presumably resulted from minor differences in the efficiency of ionization.
Footprinting studies of HPI and HI as zinc-free monomers thus support a structural model in which the insulin moiety of HPI is similar to that of HI. The surface of peptides B23-B29 is accessible to hydroxyl radical modification in HI but protected in HPI; this difference defines the footprint of the flexible C-domain on the B-domain of HPI. No such shadow was detected on the A-domain surface.
Assembly of HPI Hexamer-A model for a T 6 -like zinc-stabilized HPI hexamer was constructed based on the classical T 6 insulin hexamer using NMR constraints obtained in studies of DKP-insulin (see "Experimental Procedures") as depicted in Fig. 2. MS-based footprinting of the HPI hexamer was undertaken in relation to both the HPI monomer and the HI hexamer; rates of oxidation are shown in Table 1. We observed that respective oxidation rates decreased for the side chains of Phe B1 by 1.9-fold and for the side chains of His B5 , Cys B7 , and His B10 by 1.5-fold within the B1-B10 segment. These data indicate that the Phe B1 exhibited the largest change in SA within this peptide. These findings are consistent with the SA values of these side chains (as calculated from the crystal structure for the HI hexamer and our modeled structure of the HPI hexamer), which also showing the largest drop in the SA for Phe B1 in this segment (Table 1). In the HI hexamer, however, respective modification rates were decreased for the side chains of B1 by 3.5fold and for the side chains of B5, B7, and B10 by 2-fold. We thus observed greater relative N-terminal protection upon hexamer assembly of HI than of HPI.
Peptides B11-B22 of the HPI hexamer exhibited decreased oxidation by 4.3-fold relative to the HPI monomer. The side chains of Tyr B16 , Leu B17 , and Arg B22 (corresponding to the central ␣-helix and succeeding ␤-turn of the B-domain) were identified as sites of oxidation. The SA of Tyr B16 and Leu B17 showed a similar decrease upon hexamer assembly of both HI and HPI in accordance with our footprinting data. In either HPI or HI, The modeled structure of wild-type proinsulin depicting the amino acid residues undergoing modification in protein footprinting experiments is shown. The A-chain is in magenta, and the C-chain is in green. The three peptide fragments in the B-chain are rendered in different shades of blue. FIGURE 2. Molecular modeling of the proinsulin hexamer. The stereo view (wall-eyed representation) of the HPI hexamer with different chains shown as a graphic in different colors is displayed. The model was reconstructed based on the crystal coordinates of insulin hexamer. The model shows that the C-domain in each protomer is absolutely exposed in accord with experimentally observed radiolytic modifications of various amino acid residues in this region.