Amyotrophic Lateral Sclerosis-associated Copper/Zinc Superoxide Dismutase Mutations Preferentially Reduce the Repulsive Charge of the Proteins*

We provide bioinformatical evidence that protein charge plays a key role in the disease mechanism of amyotrophic lateral sclerosis (ALS). Analysis of 100 ALS-associated mutations in copper/zinc superoxide dismutase (SOD1) shows that these are site-selective with a preference to decrease the proteins' net repulsive charge. For each SOD1 monomer this charge is normally -6. Because biomolecules as a rule maintain net negative charge to assure solubility in the cellular interior, the result lends support to the hypothesis of protein aggregation as an initiating event in the ALS pathogenesis. The strength of the preferential reduction of repulsive charge is higher in SOD1-associated ALS than in other inherited protein disorders.

The motor-neuron disorder amyotrophic lateral sclerosis (ALS) 2 has been associated to more than 100 missense mutations in the radical scavenger copper/zinc superoxide dismutase (SOD1) (1). In analogy with several other neurodegenerative disorders (2), ALS seems associated with protein misfolding and aggregation (3)(4)(5)(6)(7)(8). The principal support for such a disease mechanism is that the neural damage is accompanied by the accumulation of intracellular SOD1 deposits (9,10). Even so, it is not the mature protein inclusions that seem to pose the main problem but rather some yet unidentified precursor accumulating in connection to the macroscopic aggregation process (11)(12)(13)(14)(15)(16). Consistent with such a scenario, transgenic mice suffer neuron stress long before any SOD1 inclusions can be detected (10,17,18). At the molecular level, studies of how the individual ALS-associated mutations affect the SOD1 molecule have further indicated protein stability as an important factor in the disease mechanism (19 -25); the more severe the destabilization caused by the mutation, the faster the disease progression (20). In this study, we shed further light on the disease-provoking properties of the SOD1 molecule by examining bioinformatically the chemical nature of 100 ALS-associated missense mutations. The results reveal an additional disease factor; the mutations have an overall tendency to decrease the net negative charge of the proteins. The negative charge repulsion that generally assures macromolecular solubility in vivo is diminished. This tendency is stronger for ALS mutations than in other hereditary protein disorders and cannot be discerned in the interspecies variability of the SOD1 sequence. In the few contrasting cases where the SOD1 mutations produce an increased repulsive charge, the side chain substitution tends also to be structurally obstructive and to occur in strictly conserved positions. That is, the opposed effect of increased repulsive charge is not sufficiently strong to compensate for the accompanying penalties in protein stability. The observation shows that the decreased net negative charge of SOD1 is a critical contributory factor to the ability of the mutation to cause neural damage. Taken together with the impact of decreased protein stability, the data also provide genetic support for the idea that abnormal associative processes involving SOD1 molecules play a key role in the ALS mechanism.

MATERIALS AND METHODS
Mutation Data-In this study we analyze missense mutations in SOD1 and eight other disease-associated genes. For each gene we only count mutations resulting in a unique amino acid substitution, i.e. multiple mutations at a site resulting in the same amino acid is only counted as one. This is done because we are interested in the effect of the mutations at the protein level. Table 1 shows an overview of the genes studied and the resources used to download the mutations. TTR was selected because the majority of the mutations are amyloidogenic (26). The other seven genes are included because they have been studied before (27) and represent a diverse set of phenotypes. For G6PD we only include type I mutations because Miller and Kumar (27) found these to behave statistically differently from the less severe types II, III, and IV.
Human Genome Data-We downloaded all coding sequences labeled as "protein_coding " from ENSEMBL v36 December 2005 (28). Mitochondrial sequences were excluded because they have a different genetic code. Coding sequences from a total of 22201 genes were analyzed.
Charge-In this study we are interested in how disease mutations affect the total charge of the wild type protein. We consider Arg and Lys as positively charged and Asp and Glu as negative. All of the other amino acids are neutral in our calculations including His, which is usually not protonated at pH 7 under physiological conditions.
Test for Charge Selectivity-To quantify these statements, we made 2 tests of how well the charge distributions of the disease mutations are described by the charge distributions of the models. To this end we classify each mutation into one of three categories: ϩ, Ϫ, or 0 depending on whether the mutation increases, decreases, or leaves unchanged the charge of the wild type protein. Because a mutation must belong to one of the categories, the number of degrees of freedom is two. We use each model to calculate the expected number of mutations in each category and compare it with the observed values. 2 tests are then performed to accept or reject the null hypothesis (the model) (see Table 2).

RESULTS AND DISCUSSION
Codon Bias: Establishing the Base Line for Genetic Variation-Upon examination of how ALS-associated missense mutations affect the charge of SOD1, it is necessary first to account for the intrinsic bias of the genetic code. The three nucleotides of each codon allow at most nine unique missense mutations, representing only a subset of the 20 amino acids. Consequently, when subjected to mutations, different amino acids will have different propensities for altering the charge. This propensity is determined by the exact codons used in the human genome (supplemental Table S1). For simplicity, we define c i as the charge difference between the new and original amino acid, and c as the average over the allowed set of unique missense mutations (M). For human genes there is instead a small propensity for mutations to increase the charge, (c ϭ Ϫ0.036) (supplemental Table  S1). On top of this bias, there is also a biological preference for certain types of mutations caused by variable susceptibility of the different codons to become chemically modified in vivo (29) as will be discussed below. Accordingly, mutations can modulate the charge in two different ways: by selection of the sites for mutation and by selection of the mutations themselves. For example, increased positive charge can result either from random mutations in sites with a negative charge or from mutations that selectively favor positive side chains. Charge Bias of ALS-associated SOD1 Mutations: Comparison with Other Disease Genes-The 100 ALS-associated mutations analyzed in this study have been obtained from the public data base alsod1.iop.kcl.ac.uk/ and are outlined in Fig. 1. According to Equation 1, the average charge difference, c , induced by these mutations is ϩ0.30. This value is an order of magnitude larger than what is expected for random mutations (see above). Because the net charge of wild type SOD1 is Ϫ6, this means that there is a preference for the ALS mutations to reduce the net negative charge of the proteins. This subset of ALS mutations contribute thus to decrease the Coulombic repulsion between the individual SOD1 molecules. To compare the ALS-associated SOD1 mutations with disease-associated mutations in other proteins, we use two different sets of genes. The first set includes eight genes ( Table 1). Seven of these have previously been used by Miller and Kumar (27). As an additional disease gene, we include TTR coupled to familial amyloid polyneuropathy, familial amyloid cardiomyopathy, and central nervous system amyloidoses (26), diseases that resemble SOD1-associated ALS in several respects. In all cases, the genes have a large number of recorded mutations (Table 1). Interestingly, the charge bias of SOD1 is substantially higher than observed for the other genes ( Table 2). Second to SOD1 is TTR with c ϭ 0.16. It is also apparent that c exhibits a wide variation among the nine genes ranging from 0.30 for SOD1 to Ϫ0.1 for L1CAM and G6PD. However, only for SOD1 and TTR do the disease mutations result in a decrease of the net negative charge ( Table 2). The similarity between SOD1 and TTR is possibly coupled to the disease mechanism of these proteins; both familial amyloid polyneuropathy, familial amyloid cardiomyopathy, and central nervous system amyloidoses (26) and ALS (30) are dominantly inherited gain-of-toxic function disorders accompanied by protein aggregation. Second, we compared the ALS-associated SOD1 mutations with all 920 disease-associated genes listed in the SWISSPROT data base in April 2006 (31). For proteins with positive net charge, we multiplied c with Ϫ1 to monitor directly how the mutations affect their Coulombic interactions. The results show that among the 76 genes with more than 50 known disease mutations SOD1 has the second largest c , and among the 159 genes with more than 25 mutations SOD1 has the fourth largest c (Fig. 2). Comparison with genes with fewer mutations yields an overall similar picture but with higher statistical errors. On this basis, we conclude that the preference of ALS mutations to decrease the net negative charge of individual protein molecules is, on the whole, atypical for disease genes. An explanation for this mismatch could be that ALS is a protein misfolding/aggregation disorder that represents only a small subset of the disease-associated genes in the data set. The mechanism of such disorders is structural and triggered by the gain of new cytotoxic properties rather than the loss of protein function.

Models of Mutational Variability:
Establishing the Origin of the Charge Bias-To examine the origin of the charge bias, the mutational spectrum of each disease was compared with three different models of amino acid substitutions: (a) The "interspecies" model describes the evolutionary amino acid variation between different species. This model is obtained by aligning the human amino acid sequence to a set of sequences from other vertebrates in SWISSPROT (31). The number of sequences used for each gene are listed in Table 1, and the sequences were aligned with the T-Coffee software using default parameters (32). (b) The "random" model consists of all amino acid substitutions that can result from single nucleotide substitutions in the wild type sequence. Comparison with this   model shows to what extent the spectrum of observed disease mutations is caused by random mutations in all sites of the wild type gene. (c) The "selective site" model is the same as the random model but counts only substitutions occurring in disease-harboring sites. The three models are compared with the disease mutations of SOD1 and the eight other genes in Table 2.
Interspecies Variability Is Approximately Charge Neutral-From the data in Table 2 and Fig. 3, it is apparent that the variation in amino acid composition across different species yields only small values of c . With the exception of L1CAM and G6PD, c is within one standard deviation of zero for all genes. Miller and Kumar (27) showed that for L1CAM, G6PD, PAX6, CFTR, PAH, and RS1, the disease-associated mutations are generally more extreme than those observed in the interspecies variability. Therefore it is not surprising that the interspecies model poorly reproduces c for the disease mutations. It is nevertheless interesting to note that, although decreased net repulsive charge seems to be a disease factor in ALS, the SOD1 net charge varies quite substantially across the different vertebrates, ranging from Ϫ2 in red deer to Ϫ6 in humans. An explanation for this tolerance to accommodate different net charge could be that the life span of other vertebrates is simply too short to develop ALS (the mean age of onset of ALS associated with SOD1 gene mutations is 46 -47 years) or that the noncharge-related propensity of SOD1 to undergo pathogenic misfolding is different for different species.

The SOD1 Charge Bias Is Not Due to Unusual Codon
Composition-In the random model, where the result depends only on the sequence composition of the different genes, the values of c turn out relatively small and show little variation ( Table 2 and Fig. 3). Accordingly, it can be concluded that the overall codon composition shows little difference between the various disease-associated genes, at least with respect to their intrinsic susceptibility to biasing the charge upon random mutation. This, in turn, means that the atypical c observed for ALS-associated SOD1 mutations is not caused by an unusual codon composition.
ALS Mutations Are Site-selective by Not Occurring in Positions with Positive Charge-Notably, the variation of the c values becomes much wider when we restrict the random mutations to disease-harboring sites ( Table 2). The result shows that the actual types of residues targeted for mutation differ between the different diseases. We also note that the selective site model better reproduces the disease mutations than the random model, i.e. the correlation with the observed values of c improves (Fig. 3). The improvement is particularly clear for SOD1, revealing that the ALS mutations are site-selective. To establish the nature of this site selectivity, we tested the hypothesis that the ALS mutations occur with equal probability in sites with negative, positive and neutral charge. From Table 3 we see that there are 82 mutations in neutral sites, 17 in negative sites, and only one in a positively charged site, compared with expected values of 73.8, 15.5, and 10.7, respectively (Fig. 4). The mismatch shows that the ALS-associated SOD1 mutations are not randomly distributed but site-selective by being underrepresented in positions with positively charged side chains. With the exception for RS1, such pronounced selectivity among charged amino acid positions is not observed for mutations in the other disease-associated proteins in Table 3 (supplemental  Table S2). For TTR with a slightly lower c value of 0.16, the deviation from a random distribution is less clear. For this protein 80 mutations are found in neutral sites, 15 in negative sites, and 5 in positively charged sites compared with expected values of 74.8, 15.3, and 10.2, respectively. Because disease-causing mutations, on the whole, are most frequently found in conserved sites (27), we tested also whether the selection of sites could be based on the degree of conservation rather than on the charge itself. For SOD1, 46.4% of the amino acid positions are strictly conserved in vertebrates, and these positions harbor 62% of the ALS mutations (Fig. 1). Even so, SOD1 displays no clear difference in the degree of conservation between negatively and positively charged sites (supplemental Table S2), reinforcing the idea that for this gene the sites for disease-provoking mutations are influenced by charge. In notable contrast,  Table 2 plotted against the results from interspecies variation, random mutations, and the selective site model. The SOD1 data are shown in red, and the dashed line is the function y ϭ x. The interspecies variation and random mutations account poorly for the disease mutations, whereas the values of c are reasonable well described by the selective model, i.e. the model that assumes random mutations with disease-harboring sites only. The agreement with the selective site model improves, for SOD1 in the negative sites and the neutral disease sites are treated separately. The corresponding value for ALS sites with positive charge cannot be derived because this data set includes only one mutation.

ALS-associated SOD1 Mutations Are Charge-selective
the disease-associated TTR mutations indicate no preference for conserved sites.

Indications of a Selective Preference for Mutation to Arginine-
To examine further the degree of randomness of the ALSprovoking mutations, the neutral and negative sites were also analyzed separately (Fig. 3). Compared with the full set of mutations, the selective site model better reproduces c for the mutations in negative sites, whereas it is poorer for the neutral sites (Table 3). Thus, within the negative sites, the identity of the ALS mutations approximately matches a random distribution. Yet, it is interesting to note that all the 17 mutations in negative sites make the overall repulsive charge smaller, although the selective site model allows charge-preserving mutations. The c value for negative sites (c ϭ 1.2) is accordingly higher than that predicted by the selective site model (c ϭ 0.95), indicating that mutations within this group are to some degree also charge-selective (Table 3). This deviation from a random pattern is emphasized by an even weaker correlation with the selective site model for the neutral sites. Upon restricting the analysis to neutral sites, the p value drops to 0.02; only 5 of the 82 mutations in neutral sites decrease the charge, whereas 16 increase it. Moreover, the selective site model predicts no charge bias for these sites (c ϭ 0.01), although the observed bias is c ϭ 0.13. Mutations to positively charged side chains are overrepresented. Most significantly, 15 of the 82 ALS mutations in neutral sites are to Arg compared with an expected value of 7 from the selective site model (supplemental Table S3). In apparent contrast, there is only one mutation to Lys, but 1.68 expected. Although the detailed mechanism of this selectivity is not yet established, it is apparent that the identity of the targeted side chain plays a key role; of the 15 mutations to Arg, 5 are from Gly and 4 from His (Fig. 1). Apparently there is a biological preference for certain mutations to occur that goes beyond the simple assumptions used in the present analysis. Collectively, this evidence suggests that the ALS mutations in SOD1 are not only selective with respect to the charge of the targeted residue but also to other features that bias the outcome toward positive values of c .  (20), either as direct effects on the monomer or dimer structures or as a result of impaired ability to coordinate the metal ions. The same conclusions can be drawn from thermal unfolding studies (33,34) and crystallographic data (35). However, these SOD1 stability losses are not always observed for ALS mutations that decrease the net negative charge of the proteins. Representative examples of such mutations are E100K, D101N, D124V, D125H, and N139K (33), underpinning the idea that ALS can be triggered either by decreased protein stability or by decreased repulsive forces between individual molecules (20). In this perspective it is interesting that there are also some ALS-associated mutations that actually increase the net repulsive charge of the SOD1 molecule, i.e. V7E, G41D, N86D, G93D, R115G, and G141D (Fig. 4). However, these opposing mutations seem at the same time severely destabilizing by being sterically obstructive, and with the sole exception of G141D, they all occur in strictly conserved sequence positions. V7E displays a thermal midpoint that is lowered by 6.6 degrees (33) matching the melting characteristics of the mutant L144F that is destabilized by ϳ2 kcal/mol (20). Similarly, the apo monomer of G41D is just partly folded in physiological buffer (20), N86D is a type 2 mutation with severely weakened dimer interface (20), 3 and G93D displays a thermal stability significantly lower than that of V7E (33,34). The only mutations for which there are yet no stability data are R115G and G141D, even though it can be noted that the former is directly interfering with the dimer interface (36), and the latter involves a substitution that falls outside the allowed regions of the Ramachandran plot (37). Thus, the ALS mutations with increased repulsive charge can be seen as the exceptions that confirm the rule. Protein disease seems to rely on the combined action of stability and charge, and the apparent ben-  efit of increased repulsive charge is in these cases insufficient to compensate for the accompanying losses in protein stability. In addition, it is reasonable to assume that the gain of cytotoxicity can depend on more specific sequence features like those modulating the intrinsic aggregation propensity of the coil (38 -40) or the aggregation gatekeepers preventing the folded structures from assembling erroneously with one another (41,42). For example, the relatively high fraction of ALS mutations occurring in Gly positions could partly be explained by the intrinsic ability of the glycine residues to inhibit aggregation, a feature that appears to be evolutionarily conserved feature in other systems (43). With the present set of ALS-associated SOD1 mutations, however, we have not been able to explicitly identify such sequence factors. The major reason is that there are as yet no disease mutations that cannot simply be accounted for by decreased SOD1 stability or decreased repulsive charge.

ALS Mutations with Increased Repulsive
Implications for the ALS Mechanism-Because the ALS-associated SOD1 mutations with only few exceptions have full penetrance (44) and inevitably lead to death, the chemical signatures of these mutations report directly on the factors triggering neurodegeneration at a molecular level. On this basis we conclude that mutations that decrease the proteins net repulsive charge not only affect the disease progression, as has been implicated earlier (20), but actually contribute to trigger neural damage. Molecular features that are solely modulating would not have full disease penetrance. From the cases where the ALS mutations are charge neutral or even increase the charge repulsion, we can also conclude that a second ALS-promoting factor is decreased protein stability (20). In other words, ALS can be triggered either by SOD1 mutations that alter the protein charge or by SOD1 mutations that decrease the protein stability. Representative examples of ALS mutations that seem to affect mainly the repulsive charge are E100K, D101N, and N139K (33). Correspondingly, ALS mutations that seem to selectively decrease the SOD1 stability are A4V, L84V, and L106V (20). In addition, there are ALS mutations that reduce both charge and stability, e.g. H43R, D90A, and E100G (20), as well as mutations where the disease factors partly, but not sufficiently, compensate one another, e.g. V7E (33), G41D (20), and G93D (33,34). Altogether this relation to decreased charge and stability constitutes the hallmark for an underlying aggregation process. Notably, aggregation refers here not only to the assembly of identical copies of SOD1 molecules but includes also abnormal interactions between SOD1 and other negatively charged biomolecules, e.g. membranes, DNA/RNA, and other proteins. Moreover, the hereditary penetrance of these factors suggests that this aggregation process is not only a secondary disease event but is actually involved in triggering neurodegeneration in ALS.