Prediction of amyloid fibril-forming proteins.

In Alzheimer's disease and spongiform encephalopathies proteins transform from their native states into fibrils. We find that several amyloid-forming proteins harbor an alpha-helix in a polypeptide segment that should form a beta-strand according to secondary structure predictions. In 1324 nonredundant protein structures, 37 beta-strands with > or =7 residues were predicted in segments where the experimentally determined structures show helices. These discordances include the prion protein (helix 2, positions 179-191), the Alzheimer amyloid beta-peptide (Abeta, positions 16-23), and lung surfactant protein C (SP-C, positions 12-27). In addition, human coagulation factor XIII (positions 258-266), triacylglycerol lipase from Candida antarctica (positions 256-266), and d-alanyl-d-alanine transpeptidase from Streptomyces R61 (positions 92-106) contain a discordant helix. These proteins have not been reported to form fibrils but in this study were found to form fibrils in buffered saline at pH 7.4. By replacing valines in the discordant helical part of SP-C with leucines, an alpha-helix is found experimentally and by secondary structure predictions. This analogue does not form fibrils under conditions where SP-C forms abundant fibrils. Likewise, when Abeta residues 14-23 are removed or changed to a nondiscordant sequence, fibrils are no longer formed. We propose that alpha-helix/beta-strand-discordant stretches are associated with amyloid fibril formation.

Amyloid fibrils can be formed from different proteins and are associated with severe diseases like the neurodegenerative Alzheimer's disease and the prion diseases Creutzfeld-Jakob in humans, scrapie in sheep, and bovine spongiform encephalopathy, as well as other organ-specific and systemic amyloidoses (1). The ϳ20 proteins that are associated with amyloid diseases have no obvious common properties in amino acid sequence, three-dimensional structure, or function (2). Despite these differences in native structures, the amyloid fibrils are very similar irrespective from which protein they originate (3). Amyloid fibrils are built up from ␤-strands perpendicular and ␤-sheets parallel to the fiber axis (4). Thus, amyloid-forming proteins that mainly contain ␣-helical structures in their native states must undergo ␣-helix 3 ␤-strand conversions before or during fibril formation. Tinctorial and spectroscopic studies indicate that the conversion of the cellular form of the prion protein (PrP C ) 1 to its fibrillar scrapie counterpart (PrP Sc ) is accompanied by reduction in ␣-helix content and increase in ␤-sheet structure (5). Amyloid ␤-peptide (A␤) fibril formation associated with Alzheimer's disease also involves ␣-helix to ␤-strand conversion (6). Amyloid diseases mostly occur without known precipitating factors (7), and no explanation has yet been found to the occurrence of de novo ␣ 3 ␤ conversion in amyloidforming proteins. This conversion may occur in partially denaturing cellular environments (8), which, however, does not explain why only certain proteins form amyloid. Destabilizing point mutations can cause fibril formation of an otherwise stable protein (9), but point mutations related to inherited forms of human prion diseases do not induce PrP Sc in vitro and are not generally destabilizing (10). The A␤-(1-42)-peptide is highly fibrillogenic, whereas peptides lacking residues 14 -23 are not (11). Proteins that give rise to amyloid may harbor polypeptide segments that make them more prone to undergo ␣ 3 ␤ conversions than nonamyloidogenic proteins. A key question is whether specific helices of these proteins are predisposed to undergo ␣ 3 ␤ conversions.
Lung surfactant protein C (SP-C) is a 35-residue lipopeptide derived from a larger precursor by proteolysis. SP-C isolated from natural sources is composed of an ␣-helix covering positions 9 -34 (12). Monomeric ␣-helical SP-C is thermodynamically unstable, and the peptide irreversibly forms aggregates with ␤-sheet structure (13). SP-C assembles into amyloid fibrils upon incubation in solution, and fibrils composed of SP-C were isolated from a patient with the disease pulmonary alveolar proteinosis (14). The ␣-helix of mature SP-C contains a very long continuous stretch of valine residues, which is unusual since valines are well known to be overrepresented in ␤-strands and underrepresented in helices. Intriguingly, synthetic SP-C peptides are inefficient in helix formation but form insoluble aggregates (15). In contrast, peptides where the polyvaline segment of SP-C was replaced with a helical part of bacteriorhodopsin or a polyleucine stretch, both with high statistical helical propensities, readily form helices (15,16). These features suggest that ␣-helices for which ␤-strands are predicted may be prone to undergo ␣-helix 3 ␤-strand transition and amyloid formation (14). Here we tested this hypothesis by searching for discrepancies between experimentally determined ␣-helices and predicted extended (␤-strand) structures in 1324 nonredundant entries in the Protein Data Bank. This revealed a striking correlation between ␣/␤ discordance and ability to form amyloid fibrils.

EXPERIMENTAL PROCEDURES
Protein Data Set-The May 1999 list of PDB_SELECT (17) was initially used to obtain a nonhomologous set of proteins from the Brookhaven data base (18). This list consists of 1106 sequences with less than 25% residue identity in pairwise comparisons. During the project a new version of PDB_SELECT was released (November 1999 set), and for completeness, proteins that were nonoverlapping with the May list were added to the data set. Using FASTA (19), proteins in the November 1999 set that are not homologous to any of the proteins in the May set (expect value Ͼ0.1) were selected. This resulted in an addition of 218 proteins, and the final data set thus consisted of 1324 nonhomologous proteins.
Experimentally Determined Secondary Structures-The secondary structure elements of the selected proteins were extracted from the Protein Data Bank files by Defined Secondary Structure of Proteins (20), as implemented in ICM (version 2.7, Molsoft LLC, San Diego, CA) (21). This method defines eight secondary structure classes from hydrogen bond patterns: ␣-helix (H), 3 10 -helix (G), -helix (I), extended strand (E), isolated ␤ bridge (B), turn (T), bend (S), and coil (_). Here three classes of secondary structures were used as follows: helix (H, G, and I), strand (E), and loop (B, T, S, and _), since these three classes are employed by the method used for secondary structure prediction.
Predicted Secondary Structures-Secondary structures were predicted using PHD (Profile network from HeiDelberg) (22), a system of neural networks with an overall accuracy of about 72% (23). In 1978, Chou and Fasman (24) calculated the amino acid distributions in helices, P ␣ , and ␤-strands, P ␤ , based on a set of 29 proteins. The P values are defined as the frequency of each amino acid residue in ␣-helices (or ␤-strands) divided by the average frequency of all residues in ␣-helices (or ␤-strands). We recalculated these values using stretches of at least five consecutive residues of helix (H) or ␤-strand (E) in the Protein Data Bank May data set described above. For this purpose transmembrane proteins were removed, resulting in a set of 1091 proteins. The resulting P ␣ and P ␤ values are given in Table I.
Proteins Analyses and Electron Microscopy-SP-C was purified from porcine lungs (12) and the poly(Val 3 Leu)-substituted SP-C analogue (SP-C(Leu)) was synthesized as described (16). D-Alanyl-D-alanine transpeptidase from Streptomyces R61 was a kind gift from Drs. Frère and Joris, University of Liege, Belgium, and triacylglycerol lipase from Candida antarctica and human coagulation factor XIII were purchased from Sigma. For fibrillation studies the latter three proteins were dissolved in phosphate-buffered saline, pH 7.4, at concentrations of 10 -100 M. SP-C and SP-C(Leu) were dissolved at 100 or 250 M in chloroform/methanol/0.1 M HCl, 32:64:5 (by volume), a solvent mixture in which SP-C retains a structure very similar to that in phospholipid bilayers (25). The protein solutions were incubated at 37°C for 3 days; thereafter the solutions were centrifuged at 20,000 ϫ g for 20 min. SP-C and SP-C(Leu) contents in the supernatants at different time points after solubilization were determined by amino acid analysis of triplicate samples. For analyses of fibrils, the pellets were suspended in a small volume of water by low energy sonication for 5 s. Aliquots of 8 l were placed on electron microscopy grids covered by a carbon-stabilized Formvar film. Excess fluid was withdrawn after 30 s, and after airdrying the grids were negatively stained with 2% uranyl acetate in water. The stained grids were examined and photographed in a Philips CM120TWIN electron microscope operated at 80 kV.
␣-Helix/␤-Sheet Discordance dicted secondary structures for the 17 proteins with Ն9-residue discordant segments. The 8-residue discordant segment of A␤ is also included in Fig. 2. Moreover, the discordant segments of human and mouse PrP (26,27) are shown, in addition to the discordant segment of Syrian hamster PrP (28) found by the initial search. The proteins that contain ␣/␤-discordant segments represent a wide selection of structures (ranging from single helical peptides to large globular proteins with complex ␣/␤ architectures), localizations (nuclear, cytosolic, integral and peripheral membrane proteins, as well as extracellular proteins), and species of origin (ranging from virus to human). The set encompasses three previously known amyloidogenic proteins, i.e. the prion protein (13-or 15-residue discordant segment, depending on species, corresponding to helix 2), A␤ (8-residue segment), and SP-C (16-residue segment) (Figs. 1 and 2). No consensus pattern in the primary structures of the discordant segments could be detected. Neither a recently proposed consensus sequence for amyloid-forming proteins (29) nor a binary pattern of hydrophobic and hydrophilic residues found in fibrillating peptides by a combinatorial approach (30) could be observed.
Among the proteins with Ն7-residue discordant segments, five are integral membrane proteins or parts thereof (bacteriorhodopsin, cytochrome c oxidase, SP-C, A␤, and parathyroid hormone receptor). The driving forces for ␣-helix formation in a membrane environment differ from those in aqueous solution (31), and the secondary structure prediction methods used here are based mainly on soluble proteins. However, A␤, which is derived from a trans-and juxtamembrane region of its precursor protein, and SP-C, which is a transmembrane peptide, form amyloid in vivo. The proteins found cover a broad range of functions and many are enzymes (59% compared with 47% in the starting data set) or other ligand-binding proteins (Figs. 1 and 2). In several cases the discordant helices harbor active site or ligand-interacting residues. For example, the metalloproteases astacin (code 1iab) and adamalysin II (code 3aig) and methane monooxygenase (code 1mty) harbor zinc-or iron-binding residues in their respective helix (32-34); the helix of hemebinding protein A (code 1b2v) contains several residues impor-tant for heme binding (35); in the Arf exchange factor ARNO (code 1pbv) the discordant segment is involved in Arf binding (36); the helix of the light-driven ion pump bacteriorhodopsin (code 1bct) binds the photo-sensitive retinal (37); and the activesite serine of Streptomyces R61 transpeptidase is located in the discordant helix (38). Glockshuber et al. (39) observed that PrP shows similarities to signal peptidases and that His-177 then is a putative active-site residue. His-177 is located in the N-terminal part of PrP helix 2, now found to be discordant (Fig. 2).
␣/␤-Discordant Segments Predispose to Amyloid Fibril Formation-Three of the proteins harboring long discordant helices, i.e. transpeptidase from Streptomyces R61, triacylglycerol lipase from C. antarctica, and human coagulation factor XIII, with 15-, 11-, and 9-residue discordant segments were available to us for fibrillation studies. All three proteins were found to form amyloid fibrils in buffer at physiological pH (Fig. 3). Thus, 6/37 proteins with Ն7-residue-long ␣/␤-discordant segments and 4/10 with segments of Ն11 residues have been analyzed, and all form amyloid fibrils.
The correlation between ␣/␤ discordance and fibril formation suggests that there may be a causal connection, and experiments with two of the discordant proteins support this. First, replacement of all valine residues in the discordant segment of SP-C with leucine yields a peptide, SP-C(Leu), with helical conformation as judged by circular dichroism and infrared spectroscopy (16). SP-C(Leu) is in contrast to the discordant native SP-C predicted to form an ␣-helical structure, and the time-dependent aggregation of SP-C and SP-C(Leu) shows striking differences (Fig. 4). SP-C starts to precipitate during the first hours of incubation and shows extensive aggregation after 5-10 days, but SP-C(Leu) shows no signs of precipitation during the same period. Consistently, SP-C(Leu) forms no fibrils or only occasional fibrils after 3 days of incubation at 250 M concentration (data not shown). SP-C, on the other hand, forms abundant fibrils at 100 M concentration after a few hours (14). Second, a synthetic analogue of A␤-(1-42) that lacks residues 14 -23 and thus is devoid of the ␣/␤-discordant stretch (cf. Fig. 2) does not form detectable amyloid fibrils under conditions where A␤-(1-42) readily forms fibrils (11). Moreover, A␤-(1-28) with alanine substitutions at positions 16, 17, and 20 does not form fibrils, whereas A␤-(1-28) forms fibrils that are similar to those formed by A␤-(1-42) (40). Intriguingly, these substitutions revert the discordance of A␤, giving instead a predicted helix between residues 15 and 21 (Fig. 5). DISCUSSION We compared experimentally determined nonredundant protein structures available from Protein Data Bank with secondary structures predicted for the same proteins using PHD and a Chou-Fasman approach. Thereby 37/1324 (2.8%) proteins were found to contain a Ն7-residue ␣-helix that is predicted to form a ␤-strand. These proteins include PrP associated with spongiform encephalopathies, A␤ involved in Alzheimer's disease, and lung SP-C that forms amyloid in pulmonary alveolar proteinosis (Figs. 1 and 2). Lysozyme contains a 5-residue-long discordant helix (positions 30 -34; data not shown), and wildtype lysozyme does not form amyloid. However, two naturally occurring human lysozyme variants, with point mutation Ile-56 3 Thr or Asp-67 3 His, are both unstable and amyloidogenic (9). It remains to be investigated whether the discordant helix of lysozyme contributes to the amyloidogenic nature of the mutants. The remaining proteins known to be associated with amyloid diseases (1,2) are either all ␤ proteins or lack experimentally determined three-dimensional structures. Three proteins now found to be ␣/␤-discordant, i.e. human coagulation factor XIII, Streptomyces R61 transpeptidase, and C. antarctica triacylglycerol lipase, were investigated and found to form ␣-Helix/␤-Sheet Discordance amyloid fibrils upon incubation in aqueous solution at neutral pH (Fig. 3). We also show that fibril formation and peptide aggregation is practically abolished by converting the ␣/␤-discordant stretch of SP-C to a helix composed of residues that statistically favor helix formation (Fig. 4). Likewise, fibril formation is abolished when the discordant stretch of the A␤ peptide is removed or rendered nondiscordant by replacement of three residues (11, 40) (Fig. 5). It is clear that there are a number of ways that amyloid fibrils could be formed, and the data presented herein strongly suggest that long stretches of ␣-helix/␤-strand discordance in proteins predict amyloid fibril formation. It appears reasonable that proteins with discordant helices can form intermediates where the helix has unfolded.
The high ␤-strand propensities of these regions indicate that they are less likely to refold into a helical conformation than are regions for which helices are predicted. ␤-Strands in such species could form ␤-sheets via protein oligomerization, leading to fibril formation. This has been observed for SP-C; once unfolding of the SP-C helix occurs, refolding is not observed by NMR (13) or mass spectrometry, 2 but the peptide instead forms ␤-sheet aggregates and amyloid fibrils (13,14). Proteins that are not known to form fibrils in vivo can form fibrils under in vitro conditions that favor destabilization, like ␣-Helix/␤-Sheet Discordance extremes of pH and presence of organic cosolvents (41,42). From these studies it has been suggested that amyloid formation is not primarily dependent on the amino acid sequence but could be a generic trait of destabilized polypeptides. The findings presented herein indicate that also under physiological conditions, fibril formation is more common than previously anticipated (Fig. 3). On the other hand, SP-C polyvaline 3 polyleucine substitution and K16A/L17A/F20A mutations of A␤-(1-28) practically abolish fibril formation, showing that at least in these cases the fibrillation process is sequence-dependent (40). The clear-cut difference in aggregation between SP-C and its polyleucine analogue (Fig. 4) contradicts side chain hydrophobicity as a general determinant of fibril formation. Further investigations of the fibril formation of ␣/␤-discordant proteins, and site-directed mutants where the discordant helices have been modulated, may shed more light on general principles that underlie amyloid fibril formation under physiological conditions. Helix 2 of PrP is discordant, whereas helices 1 and 3 of PrP were predicted to be helical both with PHD and Chou-Fasmanbased methods (data not shown). Strikingly, in a previous prediction of PrP secondary structure, contradictory results were obtained for the helix 2 region, suggesting that alterna- FIG. 4. Val 3 Leu substitutions in SP-C abolish ␣/␤-discordance and reduce amyloid formation. A, amino acid sequence and predicted secondary structure by PHD and according to Chou-Fasman for a polyleucine analogue of SP-C. As in Fig. 2, the PHD prediction including reliability indices are given in the middle row and the Chou-Fasman data in the top row, but in this case an ␣-helix is predicted by both methods, symbolized by a blue cylinder for the PHD prediction. See Fig. 2 for the sequence of native SP-C and its predicted secondary structure. The localization of the ␣-helix of SP-C(Leu) is inferred from the NMR data of the native peptide (12) and CD and Fourier transform infrared spectroscopic analyses of the analogue (16). B, relative amounts, as determined by amino acid analysis, of SP-C (filled circles) and SP-C(Leu) (open triangles) that remain in solution after centrifugation at 20,000 ϫ g for 20 min at different time points after solubilization. Peptide concentration at start of incubation is 250 M for SP-C(Leu) and 100 M for SP-C. In the case of SP-C, fibrils are readily detected in the 20,000 ϫ g pellets already after a few hours of incubation (14), whereas for SP-C(Leu) none or very few fibrils were found even after 30 days of incubation. ␣-Helix/␤-Sheet Discordance tive conformations of PrP can coexist (43). In the discordant segment of PrP helix 2, the Cys is connected to helix 3 via a disulfide bond, and the Asn is glycosylated. The consequences of these modifications for the ␣/␤-discordance are not known at the present. No posttranslational chemical modifications responsible for the conversion of PrP C to PrP Sc have been found (44). In many cases protein aggregation involves structured folding intermediates (45). PrP can exist in multiple conformations, suggesting that PrP C may be intrinsically flexible and prone to structural transitions (46). Out of eight point mutations, corresponding to polymorphic sites associated with inherited forms of prion disease, T183A localized in the middle of helix 2 resulted in extensive aggregation at a concentration 160-fold lower than that at which the other mutants were soluble when introduced in recombinant mouse PrP-(121-231) (10). Likewise, recombinant human PrP-(90 -231) D178N mutation (in helix 2) gave rise to aggregation, whereas P102L and E200K mutants were soluble (47).
Amyloid is believed to be associated with, and even responsible for, the pathological changes seen during the course of the corresponding diseases (1,3,48). This implies that obstruction of amyloid formation may prevent the occurrence and/or progression of amyloidoses. Attempts have been made to identify compounds that can abrogate fibril formation by interfering with peptide-peptide contacts in fibrils or by inhibiting proteolytic processing that leads to amyloidogenic peptides (49). Our results suggest that ␣/␤-discordant helices are involved in the fibrillation process for some proteins and that stabilization of these segments in helical conformation could prevent amyloid formation. Such an approach appears worthwhile to explore, as the ␣/␤-discordant segments represent small and specific regions, and their localizations in the disease-associated proteins A␤ and PrP are now identified.