Identification of Phosphorylation Sites on Neurofilament Proteins by Nanoelectrospray Mass Spectrometry*

Neurofilament (NF) proteins are intermediate filaments found in the neuronal cytoskeleton. Phosphorylation of these proteins is considered an important factor in the assembly of filaments and determination of filament caliber and stability. Mammalian neurofilaments are composed of three polypeptide subunits, NF-L, NF-M, and NF-H, all of which are phosphorylated. Here we used techniques for the mass spectrometric sequencing of proteins from polyacrylamide gels to analyze in vivo phosphorylation sites on NF-M and NF-L. Neurofilaments were isolated from rat brain and enzymatically digested in gel. The resulting peptides were analyzed and sequence data obtained by nanoelectrospray mass spectrometry. Four phosphorylation sites have been found in the C-terminal domain of NF-M: serines 603, 608, 666, and 766. Two of these are found in lysine-serine-proline (KSP) motifs and two in the variant motifs, glutamic acid-serine-proline (ESP) and valine-serine-proline (VSP). Serine 55 in NF-L was not found to be phosphorylated, which confirms the possible role of phosphorylation and dephosphorylation of this site in early neurofilament assembly. The techniques used enable sequence data and characterization of posttranslational modifications to be obtained for each individual subunit directly from polyacrylamide gels.

Neurofilament (NF) proteins, a class of neuronal intermediate filament proteins, form the principal component of the axonal cytoskeleton. They are composed of three polypeptide subunits, NF-L (61 kDa), NF-M (95 kDa), and NF-H (115 kDa) (1)(2)(3), with apparent molecular masses of 68, 150, and 200 kDa on SDS-polyacrylamide gel electrophoresis, respectively. They share several structural features in common with all intermediate filament proteins: a highly conserved ␣-helical rod domain, essential for filament formation (4,5), a short N-terminal head that may be involved in the regulation of filament assembly (6), and a C-terminal domain of variable length (7)(8)(9).
NFPs, 1 especially NF-M and NF-H, are known to be highly phosphorylated in vivo (10 -12). Most of the phosphate is associated with serine residues in KSP motifs within the C-termi-nal domains of the larger subunits (10,11,(13)(14)(15). The phosphorylated tails of these proteins are thought to protrude as side arms from the assembled filament, enabling cross-linking between filaments and interactions with other cytoskeletal proteins. These interactions have been shown to be regulated by phosphorylation (16 -19). Phosphorylation of the tail domains is also thought to increase spacing between filaments, hence regulating axonal caliber (20 -24) and, in addition, possibly playing a role in the control of axonal transport (24).
Neurofilaments are also phosphorylated on their N-terminal head domains (25,26). This region of NF-L is known to be important in filament assembly (6,27,28), and phosphorylation of intermediate filaments at their head domains is a proven mechanism for the control of assembly (29). Serine 55 on NF-L has been found to be phosphorylated in vivo, yet it displays rapid turnover soon after NF-L synthesis (30). These findings have led to the suggestion that this site may be involved to some extent in the regulation of filament assembly and architecture (12, 30 -32).
Abnormal phosphorylation of neurofilaments is associated with some neurodegenerative diseases such as motor neuron disease (amyotrophic lateral sclerosis) and the Lewy bodies characteristic of Parkinson's disease and Lewy body dementia (33)(34)(35). Accumulations of highly phosphorylated neurofilaments, normally found only in axons, are observed in perikarya and proximal axons in these conditions. This suggests that aberrant phosphorylation may play a role in the pathology of these diseases.
To understand the function and importance of neurofilaments as components of the neuronal cytoskeleton and their involvement in neurodegenerative disease, it is essential to comprehend the mechanisms by which these proteins are phosphorylated. Determination of the endogenous phosphorylation sites is therefore important as this information will lead to identification of the kinases involved.
Previously, the analysis of sites of NFP phosphorylation has relied largely on conventional Edman sequencing following lengthy chromatographic separations or two-dimensional phosphopeptide mapping of proteins radiolabeled with phosphate. Comparison of two-dimensional peptide maps of NF-L from optic axons of mice intravitreally injected with 32 P with those from in vitro labeled NF-L has been used to characterize serine 55 as an in vivo phosphorylation site on NF-L (30). This involved HPLC purification and subsequent Edman degradation of a phosphopeptide common to both samples followed by in vitro phosphorylation and sequencing of a corresponding synthetic peptide to locate the exact site of phosphorylation. Serine 473 has also been shown to be an in vivo phosphorylation site on NF-L (36) by Edman sequencing of HPLC-purified peptides from rat spinal cord corresponding to phosphopeptides labeled by in vitro phosphorylation. Six sites on the C-terminal domain of NF-M (37) have been identified by comparison of NF-M peptides from in vitro 32 P-labeled cytoskeletal preparations with those from metabolically labeled dorsal root ganglia. Peptides from unlabeled NF-M eluting at the same retention time as the labeled phosphopeptides were then sequenced by Edman degradation, and the phosphorylation sites were determined. Elhanany et al. (38) used a combination of microsequencing and mass spectrometry to identify nine endogenous phosphorylated KSP sites in the C-terminal region of rat NF-H.
Recently, a technique for the sequencing of proteins directly from polyacrylamide gels using electrospray (39) in combination with tandem mass spectrometry (40) has been developed (41). This allows sequence data and characterization of posttranslational modifications, using parent ion scans (42,43), to be obtained from total digest mixtures using, exclusively, mass spectrometric techniques. Proteins are able to be uniquely identified with the use of peptide tags and data base searching (44).
We have used this technology in conjunction with small scale immobilized metal affinity chromatography (IMAC) (45,46) to identify phosphopeptides in digest mixtures of NFPs and then to characterize the sites of phosphorylation. Having covered more than 80% of the sequence of NF-L, no endogenous phosphorylation sites were found, suggesting dephosphorylation of the head terminal domain during filament assembly. Four endogenous phosphorylation sites have been found within the C-terminal domain of NF-M, including one that has not been previously reported.

EXPERIMENTAL PROCEDURES
Materials-Except where otherwise noted, all chemicals used were purchased from Sigma and were of the highest quality available. Hydroxylapatite (HTP) was from Bio-Rad Laboratories Ltd. (Hertfordshire, UK). Acetic acid and formic acid (AnalaR grade) were from BDH Laboratory Supplies (Merck Ltd., Leicestershire, UK) The proteolytic enzymes trypsin (bovine, sequencing grade), endoproteinase Asp-N (from Pseudomonas fragi, sequencing grade), and endoproteinase Glu-C (from Staphylococcus aureus strain V8, sequencing grade) were obtained from Boehringer Mannheim UK Ltd. (East Sussex, UK). For mass spectrometric analysis and gel spot preparation, HPLC grade methanol and acetonitrile (Rathburn Chemicals, Scotland) were used.
Immobilized Metal Affinity Chromatography-Empty miniature Protein Chemistry Systems (PCS) desalting columns (Hewlett-Packard, Cheshire, UK) were packed with chelating Sepharose high performance slurry (70 l in 20% (v/v) ethanol, Pharmacia) and washed with water (2 ml) followed by 0.1 M acetic acid, pH 3.1 (solution A, 500 l). 0.1 M FeCl 3 solution (50 l in solution A) was applied followed by washing with solution A (500 l) to remove excess iron. The dried peptide mixture in the gel digest was then loaded (dissolved in 50 l of solution A), and the column was washed with buffer A (500 l). The phosphopeptides were eluted with 0.1 M Tris-HCl (300 l), pH 8.5, and the eluate dried before being analyzed by nanoelectrospray mass spectrometry.
Nanoelectrospray (nanoES) Mass Spectrometry (MS)-Needles for nanoelectrospray mass spectrometry were made with a micropipette puller (Sutter Instrument Co., Novato, CA) from borosilicate glass capillaries (Clark Electromedical Instruments, Pangbourne, Reading, UK) as described by Wilm and Mann (49). They were gold-coated in a vapor desorption instrument. Dried protein digests were dissolved in 5% (v/v) formic acid and desalted on a miniature PCS column selfpacked with ϳ20 l of POROS R2 sorbent (PerSeptive Biosystems, Framingham, MA) as described (41). The sample was not eluted directly into the spraying needle but was dried and then taken up in 10 l of spraying solution (50% (v/v) methanol, 1% (v/v) formic acid in water for positive ion or 50% (v/v) methanol, 5% (v/v) ammonia in water for negative ion), and 1 l was inserted into the needle. Electrospray mass spectra were acquired on an API III triple quadrupole machine (Perkin-Elmer Sciex, Ontario, Canada) equipped with a nanoES ion source developed by Wilm and Mann (49,50). Q 1 scans were performed with 0.1-Da mass step. For operation in the MS/MS mode, Q 1 was set to transmit a mass window of 2 Da for both parent and product ion scans, and spectra were accumulated with 0.2-Da mass steps. Dwell time was 1 ms for all scans except for parent ion scans, where it was 3 ms. Resolution was set so that fragment masses could be assigned to more than 1 Da. Collision energy was individually tuned for each peptide for optimum MS/MS spectra. A new needle was used for each experiment. Spectra interpretation was performed using BioMultiView (Sciex) software.

RESULTS
Analysis of Tryptic Digests-NFPs purified from rat brain were run on SDS-polyacrylamide gel electrophoresis, and NF-M and NF-L were tryptically digested in gel. After desalting on POROS, analysis of the total digest mixtures of NF-M and NF-L was performed by nanoES mass spectrometry. Fragmentation of peptide ions by collision-induced dissociation (CID) tandem MS resulted in partial sequences being obtained that, in combination with the mass, were sufficient to characterize unambiguously the peptides from the known protein sequences (1,2,44).
Phosphopeptides were isolated from the digest mixture using a small scale IMAC technique, which takes advantage of the affinity of phosphopeptides for immobilized Fe 3ϩ ions (45). Following small scale desalting, the samples were analyzed by nanoES MS/MS. Phosphopeptides were identified within the digest mixtures using scans for the parents of m/z 79 in negative ion mode (42,43) and, for neutral loss of 49, [M-H 3 PO 4 ϩ 2H] 2ϩ , in positive ion mode (53). The parent ion scan shows ions that fragment to produce an ion at m/z 79 (the phospho group, PO 3 Ϫ ). The neutral loss scan shows the masses of precursor ions that lose the phosphate group as a neutral fragment.
The Q 1 spectra of IMAC-purified samples mainly showed only ions corresponding to phosphopeptides in comparison with the total digest spectra (Fig. 1, A and B). A few non-phosphopeptides containing histidine residues were also seen, since histidine is thought to have some affinity to the packing used. The use of the IMAC column reduced the problem of electrospray ion suppression by reduction of the complexity of the peptide mixture. This reduction increased the phosphorylated peptide ion signal strength and thus improved the quality of the product ion spectra.
Identification of Serines 603, 608, 666, and 766 as Phosphorylation Sites on NF-M-Four phosphorylation sites, which must have been generated in vivo, were found within the Cterminal region of NF-M. Three of these, two in KSP motifs and one in the variant ESP motif, have been reported previously by Xu et al. (37) on the basis of metabolically labeled dorsal root ganglia. The fourth site has not been reported before in NF-M and lies within a VSP motif. Fig. 1 shows the identification of phosphopeptides in the NF-M tryptic digest. The total digest mixture was simplified by the use of IMAC (Fig. 1, A and B), and the resulting Q 1 scan revealed several potential phosphopeptides (Fig. 1B). The parents of m/z 79 scan in negative ion mode was then used to further narrow the field of candidate phosphopeptides (Fig. 1C) to five. The neutral loss of 49 scan in positive ion mode (Fig. 1D) revealed an additional phosphopeptide and confirmed the presence of those identified in the parent ion scan.
The peptides and sites of phosphorylation were then identified by CID tandem mass spectrometry in positive ion mode, resulting in product ion spectra (Fig. 2). Sequence tags were constructed from the resulting fragment ion masses, and hence peptides were identified from the published sequence (Table I Fragmentation of the 3ϩ ion of P1 (m/z 769.6) in positive ion mode gave the partial sequence 603 pSPVP from doubly charged YЉ ions ( Fig. 2A), which, from the known sequence, is sufficient to characterize the peptide and sites of phosphorylation as residues 601-620 with both serines phosphorylated (AK 603 pSPVPK 608 pSPVEEVKPKPEAK, M r 2305.3). Fragments corresponding to the loss of 608 pS were also seen as singly charged b ions.
MS/MS of the 3ϩ ion (m/z 742.1) of P2 resulted in doubly charged YЉ ions, giving the partial sequence SPVPK 608 pS, which confirmed the identity of the peptide as residues 601-620 with serine 608 phosphorylated (M r 2225.4). This indicates that serine 603 is heterogeneously phosphorylated within the NF-M molecule.
The product ion spectrum of the 3ϩ ion of P3 (m/z 676.0) displayed doubly charged YЉ ion fragments corresponding to the sequence VPK and confirming the identity of the peptide as residues 603-620 with serine 608 phosphorylated (M r 2026.2). This is the expected site for phosphorylation on this peptide since trypsin has cleaved at lysine 602, which it may not have done had serine 603 been phosphorylated.
Fragmentation of the 2ϩ ion of P4 (m/z 798.2) gave a full series of YЉ ions (Fig. 2B), thereby characterizing the peptide as residues 757-771 and confirming that serine 766 is phosphorylated (GVVTNGLDV 766 pSPAEEK, M r 1594.6). The unphosphorylated form of this peptide was also characterized, again indicating heterogeneity of phosphorylation. The signal-tonoise ratio of the CID spectrum of P5 was not sufficient to produce any useful sequence information.
CID MS/MS of the ion at m/z 546.4 corresponding to the 2ϩ of P6 gave a series of b ions (sequence VK) that enabled characterization of the peptide as KAE 666 pSPVKEK (residues 663-671, M r 1095.1.) with serine 666 carrying a phosphate.
Analysis of Endoproteinase Glu-C and Asp-N Digests-NF-M and NF-L were digested in gel with endoproteinases Glu-C and Asp-N. Desalted samples were analyzed by nanoES MS/MS, and the peptides were identified, thus giving greater sequence coverage for each protein. Total sequence coverage gained for NF-L was 81% and for NF-M, 64% (Fig. 3).
Endoproteinase Glu-C-digested samples were subjected to  Table I). The signal-to-noise ratio of the CID spectra of these peptides was not sufficient to produce any useful sequence information. Fragmentation in positive ion mode of the ion at m/z 738.4 seen in the neutral loss scan (not shown) gave sequence data corresponding to residues 600 -612, with serine 608 phosphorylated (P7, Table I).
Analysis of NF-L-81% of the sequence of NF-L was covered by mass spectrometric sequencing; however, no phosphopeptides were found within the NF-L molecule. IMAC of the digest mixtures also failed to reveal any phosphopeptides. Upon phosphorylation with cAMP-dependent protein kinase, phosphorylated residues were identified within the NF-L molecule using this technique, 2 which suggests that the absence of endogenous sites was not due to a technical problem. These findings seem consistent with the notion that NF-L is mainly phosphorylated at its head domain and that these phosphates are removed before assembly into filaments. The glutamic acid-rich tail region of NF-L containing 74 residues, including serine 473, which has been found to be phosphorylated in vivo (36), could not be located despite the use of several different proteases. This may perhaps be due to incomplete digestion of this region or suppression of ionization of the peptides in the total digest mixture by these residues.

DISCUSSION
Phosphorylation of neurofilament proteins is considered an important factor in the assembly of filaments, determination of filament caliber, and stability and plays a potential role in the pathology of several neurodegenerative diseases. Until now, conventional approaches have been employed in the analysis of phosphorylation of NFPs. These methods have involved metabolic labeling, either in vivo or of cultured neurons, and twodimensional phosphopeptide mapping or lengthy chromatographic separations after various enzymatic digestions. Phosphopeptides have then been sequenced and sites located by the use of conventional protein sequencing. These methods nearly always require the purification to homogeneity of often limited quantities of the neurofilament subunits and separation of the digest mixtures and are limited by the sensitivity of Edman degradation.
Here we use techniques developed by Mann and co-workers (41) for the mass spectrometric sequencing of proteins directly from polyacrylamide gels, hence eleviating the necessity for subunit separation and enabling sequencing of peptides from a total digest mixture at lower sensitivity levels. The nature of the fragments generated by tandem MS of peptides allows unique characterization of the peptide, both from an unknown protein with the use of peptide tags and data base searching (44) and, as in this case, from a known protein sequence.
Phosphopeptides were identified within the digest mixtures by the use of parent ion (42,43) and neutral loss (53) scans. Once distinguished from the peptide mixture, the phosphopeptides could be fragmented during the same experiment, yielding sequence data and hence uniquely characterizing the peptide and site of phosphorylation. The extremely low flow rate of the nanospray technique (49) requires the use of just 1 l of sample/30 min of analysis and therefore enables the acquisition of a large quantity of sequence data with low sample consumption.
A small scale IMAC technique was developed using minidesalting columns, supplied empty (Hewlett Packard), which were self-packed with a small amount of chelating Sepharose high performance slurry (Pharmacia). This enabled small scale separation of phosphopeptides from the digest mixture due to their affinity for Fe 3ϩ ions (45), resulting in improved sensitivity for the phosphopeptide ions and augmenting the quality of the sequence data obtained. This can be explained due to the fact that in a mixture of peptides, the more strongly ionizing components can often suppress the signal from those that are more weakly ionizing. Therefore, by reducing the complexity of the mixture by IMAC, the likelihood of suppression is also reduced (46).
Using these methods, we have characterized four phosphorylation sites within the C-terminal domain of NF-M and covered 64% of the sequence by mass spectrometric sequencing (Fig. 3). Three of the sites have been reported previously; two in KSP motifs and one in the variant motif, ESP. In addition, we have characterized a novel site found within a VSP motif that may also be phosphorylated by a proline-directed kinase.
We have also been able to gain some insight in the heterogeneity of phosphorylation of some of these residues. Serine 603 has been found in both its phosphorylated and non-phosphorylated forms, as has serine 766. There is evidence to suggest that the phosphorylation of NFPs is heterogeneous (54,55) and that the state of phosphorylation changes as the proteins are transported down the axon (56). Our results are in agreement with these findings and identify particular sites displaying this heterogeneity, although we cannot unambiguously rule out loss of phosphate during neurofilament manipulation.
We have covered 81% of the NF-L sequence (Fig. 3) by mass spectrometric sequencing but found no sites of phosphorylation. The fact that no sites were found in the head region of NF-L seems to confirm the hypothesis that these sites undergo phosphorylation and dephosphorylation during filament assembly (12,30,31), since we started with assembled neurofilaments for the preparation. When we phosphorylated NF-L with cAMP-dependent protein kinase, which is known to phosphorylate the head domain of NF-L (30, 57), we were able to identify sites of phosphorylation by mass spectrometric sequencing, 2 which suggests that the lack of endogenous sites found is a true result rather than a practical limitation.
Some regions of the sequence of both proteins, namely those that are rich in glutamic acid residues, remain uncovered. This may be caused either by incomplete digestion or by poor ionization of these peptides due to the presence of these residues.
In addition to phosphorylation, neurofilaments have been shown to be posttranslationally modified by O-linked N-acetylglucosamine (51,52). The parent ion scan approach may also be used in the identification of glycopeptides in a digest mixture (42) as N-hexosamines give a characteristic oxonium ion at m/z 204. We have used this scan for parents of m/z 204 in the analysis of NF-M and NF-L but found no evidence of glycopeptides. This may be due to the relatively low amounts of these species within neurofilaments (51, 52) compared with those modified by phosphorylation (11).
The sensitivity of this technique, coupled with the ease and speed of sample preparation and analysis, should enable NFPs and other proteins to be analyzed for posttranslational modifications from a variety of sources. These may include cells stimulated to activate signal transduction cascades and tissue from diseases such as motor neuron disease.