Three-dimensional structure of myelin basic protein. II. Molecular modeling and considerations of predicted structures in multiple sclerosis.

A computational model of myelin basic protein (MBP) has been constructed based on the premise of a phylogenetically conserved beta-sheet backbone and on electron microscopical three-dimensional reconstructions. Many residues subject to post-translational modification (phosphorylation, methylation, or conversion of arginines to citrullines) were located in loop regions and thus accessible to modifying enzymes. The triproline segment (residues 99-101) is fully exposed on the back surface of the protein in a long crossover connection between two parallel beta-strands. The proximity of this region to the underlying beta-sheet suggests that post-translational modifications here might have potential synergistic effects on the entire structure. Post-translational modifications that lead to a reduced surface charge could result first in a weakened attachment to the myelin membrane rather than in a gross conformational change of the protein itself. Such mechanisms could be operative in demyelinating diseases such as multiple sclerosis.

A computational model of myelin basic protein (MBP) has been constructed based on the premise of a phylogenetically conserved ␤-sheet backbone and on electron microscopical three-dimensional reconstructions. Many residues subject to post-translational modification (phosphorylation, methylation, or conversion of arginines to citrullines) were located in loop regions and thus accessible to modifying enzymes. The triproline segment (residues 99 -101) is fully exposed on the back surface of the protein in a long crossover connection between two parallel ␤-strands. The proximity of this region to the underlying ␤-sheet suggests that posttranslational modifications here might have potential synergistic effects on the entire structure. Post-translational modifications that lead to a reduced surface charge could result first in a weakened attachment to the myelin membrane rather than in a gross conformational change of the protein itself. Such mechanisms could be operative in demyelinating diseases such as multiple sclerosis.
Myelin basic protein (MBP) 1 is one of the most important proteins of the myelin sheath (1)(2)(3)(4)(5)(6). Its significance is demonstrated in the shiverer mutant of mouse, which has only a small amount of structurally unstable myelin because the gene for MBP is mostly deleted (7,8). This trait is recessive and inherited in a Mendelian manner, indicating that MBP is coded for by a single gene. In mammals, the gene consists of seven exons, and differential splicing of the primary MBP mRNA leads to different isoforms of MBP, i.e. forms of differing molecular weights (9 -11). Alternative splicing of mRNA transcripts is a common mechanism for generating protein diversity. The MBP gene is thus similar to the genes for SV40 T and t antigens, fibrinogen, lens ␣-A crystallin, and troponin T, in all of which primary transcripts with identical termini are alternatively spliced to yield different mature mRNAs (10). The shark MBP gene has also been cloned and has revealed a similar exon structure, indicating that this protein issued early in vertebrate evolution (11).
In mammals, the 18.5-and 14-kDa isoforms of MBP are the most common, although the relative proportions vary during development and among species. Henceforth, unless otherwise specified, we shall be using "MBP" to refer to the 18.5-kDa form. Each isoform of MBP can exist as one of many possible charge isomers, due to various post-translational modifications (3,4). These charge isomers are denoted C1, C2, C3, C4, C5, C6, and C8, according to their elution profile on a cation exchange column at pH 10.5 (12). Component C1 is the least modified and most basic component, while successive components differ sequentially by the loss of a positive charge. Component C8 is an isoform of MBP that does not bind to the resin, and it is the most modified, containing several citrullinyl residues (13). The post-translational modifications include phosphorylation, ADP-ribosylation, and conversion of arginines to citrulline (3,4,13). The latter change is relevant to multiple sclerosis; often four or five arginines are so converted (14). In a recent case of fulminating multiple sclerosis known as Marburg's Disease, a young (26-year-old) woman presented with the disease and died within 6 weeks. In the MBP extracted from the autopsied brain, 18 of 19 arginine residues had been citrullinated (15).
The sequences of most forms of MBP from numerous species are known, with the human and bovine forms having been sequenced first (16,17). We shall henceforth be referring to the human sequence. The single Trp 116 serves as a focus for immunological properties of MBP and is proximal to a triproline (Pro 99 -Pro 100 -Pro 101 ) segment (2). Residues around this triproline segment are often modified. Myelin basic protein has a number of sequence and other similarities with other protein families (18,19). Properties such as charge density, post-translational modification by addition of fatty acids, and overall hydrophobicity are similar in MBP and proteolipid protein from the central nervous system and in the pulmonary surfactant proteins SP-B and SP-C (20). Other sequence similarities with viral proteins have stoked conjectures on viral involvement in multiple sclerosis (21)(22)(23)(24).
Because of the reluctance of MBP to form crystals (25), its detailed tertiary structure is not known. The main structural models of this protein that exist are from the 1980s and represent abstract syntheses of biochemical data and secondary structure prediction algorithms (Refs. 26 -29; see also Refs. 30,31). The structure of a small subsegment (five residues) has recently been solved by nuclear magnetic resonance (32). The most sophisticated structural models of the whole protein remain those of Stoner (27) and Martenson (28), which were based on extensive biochemical and secondary structural data. These structures were represented at the time schematically, with parts thereof as plastic Corey-Pauling-Koltun space-filling models. Our own recent work has comprised electron microscopical investigations of the tertiary structure of bovine MBP, which is almost identical to human MBP (see accompanying paper (33)). In the process of recreating computational representations of both the Stoner and Martenson models for comparison with our experimental results, we designed a revised structure, incorporating our new electron microscopical reconstructions as tertiary structural constraints. This new model of human MBP is presented here.

MATERIALS AND METHODS
The main tool used in this aspect of our studies was the INSIGHT II molecular modeling software package (Biosym Corp., Parsippany, NJ) running on an IBM RISC/6000 Powerstation 3AT (International Business Machines, Markham, Ontario, Canada). This commercial program contains utilities for building polymers of nucleic acids and proteins, multiple sequence alignment and homology modeling, structure refinement by rotamer rotation, energy function minimization, and graphic display of specified sites. There are no crystallographic or NMR structures of any proteins with significant overall sequence similarity to MBP in the Brookhaven Protein Data Bank (PDB). We thus could not construct MBP by homology modeling but used instead a piecemeal strategy. We relied strongly on Stoner's (27,29) and Martenson's (26,28) definitions of the residues forming a ␤-sheet secondary structure.
For Stoner's model (27), we required a flat antiparallel ␤-sheet ( Fig.  1), which we derived initially from the structure of excitation energy transfer bacteriochlorophyll A protein (PDB accession code 3bcl). The INSIGHT program was used to make the putative ␣-helical regions directly using specified values of (,). These ␣-helices were placed approximately where Stoner indicated them to lie. Loops then joined all the secondary structure elements, and the entire structure was refined by rotamer selection and energy function minimization. Rotamer selection means choosing the best of a series of known possible amino acid side chain conformations to minimize steric overlap with any other atoms in the structure. The energy function comprises electrostatic (including hydrogen bonding) and van der Waals forces among the atoms of the model. Physically implausible models (e.g. with overlapping atoms) yield divergent results at this point. This energy is not the free energy of folding of the protein but serves only as an internal guide to refining the structure.
For Martenson's model (28), we required two layered ␤-sheets. The relevant domain of the bacteriochlorophyll A protein was again the starting point, although this time two bacteriochlorophyll A sheets were placed over one another and then spliced together to form the twist as indicated by Martenson in a schematic diagram (see Fig. 1). Loops again joined the ␤-strands. There were no ␣-helices defined here. Finally, our model was based on the previous work by Stoner and Martenson as well as on our new electron microscopical data (33). For this model, we first used the flat antiparallel ␤-sheet from the structure of bacteriochlorophyll A protein and later, for the results presented here, from the structure of severin (PDB accession code 1svr). Where experimental data on the structure were available, such as the NMR structure of a tetradecapeptide repeat (32), they were incorporated into our model. Amino acids 98 -102 in human MBP (Thr-Pro-Pro-Pro-Ser) showed a 100% identity with a pentapeptide segment in endo-␤-Nacetylglucosaminidase F1 (PDB accession code 2ebn), and the latter coordinates were used directly to model the former region.
To begin, the relevant regions of the human MBP sequence were modeled as ␤-strands using calculated (,) angles and overlaid onto the ␤-sheet of severin, and the ␤-strand coordinates of the known structure were used to form the nascent human MBP model. The 2ebn pentapeptide comprising the triproline segment was moved to fit into the right-handed crossover region between adjacent parallel ␤-strands ␤3 and ␤4. The (,) angles of the tetradecapeptide solved recently by NMR were used to model this segment, move it into a correct position with respect to the ␤-sheet, and assign coordinates. Finally, these segments were joined by loops using the electron microscopical reconstructions to constrain the loops' positions. As before, structure refinement comprised rotamer definition and energy function minimization.
Post-translational modifications were constructed interactively using the utilities of the INSIGHT II program. To perform a computational phosphorylation, for example, a phosphate group was selected from an internal library and then bonded onto the rest of the molecule using a built-in function. The bond type (e.g. single) is chosen by the user, as well as the two hydrogen atoms to be removed by the bonding reaction. A computational citrullinlization was performed essentially the same way after deleting one of the terminal NH 2 moieties of the guanidino group of arginine and replacing it with an oxygen atom.

RESULTS AND DISCUSSION
Although the sequences of MBP from many species are known, there is no accurate way to calculate a tertiary structure from them alone. If the atomic structure of a protein with a significantly similar amino acid sequence were known, then one could align the MBP sequence with this homologous sequence and assign atomic coordinates to the amino acids of MBP based on the known structure. Refinement of the predicted structure then involves rotating amino acid side chains to prevent steric overlap, filling in gaps in the structure by loops, and finally modifying bond lengths and atomic coordinates to minimize an energy function as defined above. Unfortunately, this homology modeling approach per se is not viable here because no atomic structures are known of proteins similar enough to MBP. Nonetheless, we have created here three quantitative models of MBP starting from different premises and essentially constructed interactively residue by residue.
Previous Structural Models of Human MBP-In the 1980s, three-dimensional models of MBP were proposed independ- ently by Stoner (27,29) and by Martenson (26,28). Stoner (27) proposed a structural model for MBP comprising a five-␤-sheet backbone, whose regions were predicted using Chou-Fasman and other current secondary structure prediction algorithms and which were arranged in a Greek key formation with two small ␣-helical segments (Fig. 1a). Martenson (28) invoked hydrophobic packing considerations and developed several more complex MBP models based on orthogonally packed ␤-sheets. Immune studies and the properties of splice variants were used to formulate the relative arrangements of the ␤-strands, a series of rules that guide any modeling of MBP. Strand ␤3 had to interact with strand ␤2 and with amino acids further downstream (possibly on strand ␤4 or on strand ␤5). Strand ␤5 had to be on the exterior of the sheet, because a splice variant does not have this segment but still must have a similar tertiary structure. By this reasoning, it is unlikely that strand ␤5 is on the inside of the sheet. The ␤-sheet is mostly antiparallel because this arrangement is stabler than a parallel one. The folded over sheets were the result of ␤-bulges, but no ␣-helices were included here. One of Martenson's six models is presented in Fig. 1b. Neither Stoner's nor Martenson's model ever existed entirely in a numerical form, i.e. as a file of atomic coordinates that could be visualized using molecular graphics programs. Moreover, these models were not unique in that many potential arrangements of the ␤-sheets and connections between them were possible. As part of our electron microscopical investigations of the structure of MBP (33), we required a manipulatable form of each structure for comparison with our results.
To begin with and partly as an academic exercise (in retrospect), we used a new secondary structure prediction method of Rost and Sander (34, 35) based on sequence alignment and neural network prediction and available to us via an automatic mail server (36). The results (not shown) generally confirmed the Stoner and Martenson positioning of the ␤-sheets. However, many considerations other than computational secondary structure prediction (37) support the idea of the ␤-sheet backbone, including experimental circular dichroism data (38 -40), and these arguments are presented well in the original papers (26 -29). Our final decision was to remain conservative and retain residues 14 -21 as ␤1 (␤-strand 1), 37-45 as ␤2, 86 -92 as ␤3, 109 -116 as ␤4, and 149 -157 as ␤5. We scanned the Brookhaven Protein Data Bank, a data base of known protein structures derived by crystallography or NMR, for one comprising a flat, antiparallel ␤-sheet akin to that proposed by Stoner (27). The two best candidates found were excitation energy transfer bacteriochlorophyll A protein and severin. For both the Stoner (27) and Martenson (28) models, the putative ␤-sheet regions of human MBP were threaded into the bacteriochlorophyll A ␤-sheet. The putative ␣-helical regions for the Stoner model were generated de novo. The intervening segments in both models were allowed to form coils. In Fig. 2, we present space-filling representations of our creations of these two models. Interestingly, in both of these structures, a number of the more interesting sites of post-translational modification, such as Ser 7 , Thr 98 , and the various arginines are all exposed on the surface.
A New Structural Model of Human MBP-The Stoner and Martenson models of MBP represent thoughtful syntheses of biochemical data available on this protein. Both structures are plausible within the limitations of a rarefied computational representation. However, the shapes of these models cannot easily be reconciled with our derivation of the appearance of MBP from electron microscopical data (33). Our three-dimensional reconstruction has an outer circumference of approximately 15 nm and a thickness of 2.5 nm (roughly the difference between the outer and inner radii). The Stoner and Martenson models both have the two longest loop regions (between ␤-sheets ␤2 and ␤3 and sheets ␤4 and ␤5) on the same side of the molecule. As a result, the maximum length of these models is about 10 nm. Also, the Martenson model is too thick to fit into the experimentally determined volume. As a result, we constructed a new model to fit our electron microscopical reconstruction.
To begin our new model, we retained the idea of a ␤-sheet backbone envisaged by Stoner and Martenson but did not retain their strand order. The ␤-sheet coordinates were derived from the structure of severin (Fig. 3c) (41). We further reasoned on the basis of total length (circumference) considerations that the two long loops must be on opposite sides of the ␤-sheet. The ␤-sheet was placed in the center of the electron microscopical volume to allow the long loops to fit into it. Martenson's rules (based on immunological and other data and described above) are still consistent with approximately 60 arrangements of the ␤-sheets. Fortunately, the dimensions of the EM reconstruction constrained the number of potential arrangements of the ␤-sheets to two (Fig. 3, a and b). The first topology has two crossovers and an uncommon antiparallel ␤-sheet arrangement. We chose the second topology with only one right-handed (more common than left-handed in biological systems) crossover connection. The first topology with two crossovers would have made the molecule too thick, notwithstanding that one crossover would have been too short to traverse the needed distance. The ␤-sheet was oriented so that the right-handed crossover connection was on the outer surface. Surface "bumps" on the exterior fit the crossover better, and modifiable amino acids became accessible. Amino acids 5-11 were linked using (,) angles derived from a recent NMR study (32).
In Fig. 4, we show the correspondence between our model and the electron microscopical reconstruction of bovine MBP/C1 in low salt buffer (33). This structure was an open "C" shape. A second reconstruction in higher salt buffer yielded a more compact form of the protein, but a form that could be seen to represent a closing in of the "C". The human MBP model could be fit into this new volume simply by cutting bonds to the loop regions, reorienting them, and resplicing. In Fig. 5, a and b, more detailed space-filling representations show the general shape and especially the modifiable sites of both of these new human MBP models. As noted above, the backbone of the structure is a ␤-sheet modeled after that of severin. Both severin and MBP are actin-and lipid-binding proteins (41)(42)(43)(44). The triproline segment is fully exposed on the back surface of the protein, in a crossover loop between adjacent parallel strands. There is a positive congruence in that this region in the bacterial endo-␤-N-acetylglucosamine F1 (2ebn) is also located in a crossover between parallel ␤-strands. In our human MBP model, many sites of post-translational modification are clustered around the triproline segment. It is tempting to speculate that the clustering of modifiable amino acids at such a position connecting two ␤-strands has structural importance.
We can suggest with somewhat more certainty that the reduced surface charge density upon citrullinization of arginines (as occurs in multiple sclerosis) will reduce the interaction with lipids in the myelin membrane, accounting for a certain amount of destabilization.
Given the limited resolution of the first electron microscopical reconstructions (33), the correspondence that could be achieved between the experimental data and our model is remarkable. Although we do not wish to encourage the practice of formulating atomic models of proteins based solely on such electron microscopical structures, this strategy appears to have been fruitful here for human MBP. This is a small protein for which no direct structural information has hitherto been available, yet for which a wealth of biochemical data could be exploited to initiate and guide the model building process. The envelope of the three-dimensional reconstruction within which we had to fit the atomic model was a valuable constraint, which reduced the number of possible topologies of ␤-strand and loop arrangement to only two. The extended "C" shape of the protein is credible given the peripheral membrane association of this protein, and its conformational flexibility in different conditions is also likely (38 -40). The recent literature has examples of other proteins that have been described as "holy" (sic., meaning "holey", i.e. with a distinct nonglobular fold) (45-48). There is also a family of pleckstrin homology domains (49) identified in various membrane-associated and signal transduction proteins that resembles our MBP model and might have a significant relationship.
Conclusions and Future Directions-Multiple sclerosis is a human demyelinating disease. An autoimmune response to one or more of its protein components is thought to be part of the pathogenesis. It has been postulated that MBP is the agent of autoimmunity and that post-translational modifications of MBP play a key role in the demyelinating process at the molecular level. Knowledge of the tertiary structures of the MBP isoforms and charge isomers is essential to understanding the organization of the myelin membrane and the mechanisms of development of autoimmunity in multiple sclerosis. However, MBP has not been crystallized, and may never be crystallized, for high resolution x-ray diffractometry.
A three-dimensional model of MBP structure can now only be formulated with the assistance of structure prediction algorithms and on the basis of extensive biochemical and biophysical data that are available. In the accompanying paper (33), we have described our structural studies of bovine MBP charge isomer C1 associated with lipid monolayers. By electron microscopical angular reconstitution, we found bovine MBP/C1 to posses an overall "C" shape. In this paper, we used molecular modeling software to create quantititive atomic coordinate models of human MBP and to localize certain post-translational modifications relevant to multiple sclerosis. The threedimensional electron microscopical reconstruction served as an envelope within which an atomic model comprising five ␤-sheets and a large proportion of irregular coil could be uniquely packaged. In this model, the most important modifiable amino acids are accessible to enzymes such as kinases. To this extent, the model is plausible.
The model for human MBP that we present here is the first of its kind for this very important protein. It was inspired by model-building exercises of the last decade, and we anticipate that details will change as new data are incorporated into it. Its value shall lie in its utility as a workbench for incorporating experimental evidence and for formulating experimentally testable hypotheses. One idea suggested here is that deimination of argininyl residues, with the accompanying loss of positive charge, does not destabilize MBP's tertiary structure per se but rather its interactions with negatively charged lipids. FIG. 5. Ribbon and van der Waals representations of our model of human MBP viewed from different angles. a, a model to fit the three-dimensional electron microscopical reconstruction (an open "C" shape) from low salt buffer. b, a model to fit the three-dimensional electron microscopical reconstruction (a more compact form) from high salt buffer. Arginines subject to citrullinization are presented in red. The triproline segment forms part of the crossover connection, part of which is accentuated in light blue. Serines that are colored green are subject to phosphorylation. A threonine residue (Thr 98 ) that is phosphorylated by a mitogen-activated protein kinase is in purple. Hydrogen atoms are colored white, and other atoms are colored yellow.