Determination of the Complete Amino Acid Sequence for the Coat Protein of Brome Mosaic Virus by Time-of-Flight Mass Spectrometry

Time-of-flight mass spectrometry (TOFMS) has been applied to determine the complete coat protein amino acid sequences of a number of distinct brome mosaic virus (BMV) isolates. Ionization was carried out by both electrospray ionization and matrix-assisted laser desorption/ionization (MALDI). After determining overall coat protein masses, the proteins were digested with trypsin or Lys-C proteinases, and the digestion products were analyzed in a MALDI QqTOF mass spectrometer. The N terminus of the coat protein was found to be acetylated in each BMV isolate analyzed. In one isolate (BMV-Valverde), the amino acid sequence was identical to that predicted from the cDNA sequence of the “type” isolate, but deviations from the predicted amino acid sequence were observed for all the other isolates analyzed. When isolates were propagated in different host taxa, modified coat protein sequences were observed in some cases, along with the original sequence. Sequencing by TOFMS may therefore provide a basis for monitoring the effects of host passaging on a virus at the molecular level. Such TOFMS-based analyses assess the complete profiles of coat protein sequences actually present in infected tissues. They are therefore not subject to the selection biases inherent in deducing such sequences from reverse-transcribed viral RNA and cloning the resulting cDNA.

Among the parameters characterizing a virus are the mass and the amino acid sequence of the viral coat protein. Molecular masses have traditionally been determined by gel electrophoresis; a method whereby the accuracy and resolution are incapable of distinguishing proteins that differ in mass by less than ϳ1%. The amino acid sequence provides more definitive information, but most sequences currently in the literature have been deduced from nucleic acid sequences of specific cDNA clones. Thus they fail to take into account changes that are not defined by the nucleic acid, such as post-translational modifications. They also fail to provide a complete profile of the viral coat protein population actually present in infected tissue.
With the development of electrospray ionization (ESI) 1 and matrix-assisted laser desorption/ionization (MALDI), as well as new types of mass analyzers, mass spectrometry (MS) now affords a rapid and efficient approach for obtaining more detailed information about virus coat protein sequences in infected tissue .
The simplest type of MS observation determines the mass of the coat protein subunit. This can be carried out with an accuracy of a few daltons (in a few tens of kilodaltons), which is usually sufficient to distinguish coat proteins of different viruses or different isolates of the same virus species (15). However, much more information can be deduced by digesting the protein with specific endopeptidases and analyzing the digestion products by MS (i.e. peptide mapping). These analyses can often define the parts of the protein sequences that agree with those predicted from nucleic acid data and thus identify the regions where the amino acid sequence differs from prediction.
Still more detailed information can be obtained from tandem mass spectrometry (MS/MS), in which a given proteolytic fragment ion is selected by one mass analyzer and is forced to collide with the molecules of a target gas. The resulting daughter ions are then characterized by a second mass analyzer. This increases the analytical power of MS and allows the rapid determination of amino acid sequences from multiple protein specimens, even if they are closely related.
We have applied these approaches to investigate a number of plant viruses. Such viruses are of considerable economic importance, but in addition, the use of plant viruses as model systems for basic investigations has several advantages. (a) Propagation and maintenance are easier than for human and animal viruses, allowing faster experiment cycles. This is particularly important in experiments designed to investigate the link between altered virulence achieved by serial passage through alternative hosts and specific changes in viral coat protein amino acid sequence (see the example below). (b) Compared with many human and animal viruses, plant viruses pose little risk to humans. (c) There is little difficulty in meeting requirements for health and care of hosts to be infected, and experiments can accommodate high rates of host destruction if necessary. "Cruelty to plants" has not yet captured the attention of placard-waving demonstrators. (d) Most plant viruses are considerably less complex than most human, animal, and * The work at the University of Manitoba was supported by grants from the Natural Sciences and Engineering Research Council of Canada and Grant GM59240 from the National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18  We have determined the ratio of m/z (ϳ27,000) for intact virions of brome mosaic virus (BMV) (15,16), measured the coat protein masses of more than 20 plant virus isolates by time-of-flight mass spectrometry (TOFMS) (17), and subsequently determined the complete amino acid sequences of several groups of ssRNA plant viruses (18 -22). These measurements have allowed us to identify variations among individual isolates. Analyses of such variations may point to evolutionary relationships, whether the variations are caused by post-translational modifications, deletions, point mutations, or other mechanisms.
As an illustration of how these methods can be applied to closely related virus isolates, we describe here TOFMS measurements on a group of geographic isolates of brome mosaic virus (BMV). This virus is the type member of the bromoviridae (23) and was one of the first multipartite-genome ssRNA plant viruses to have its genome completely sequenced (24 -26). It is found in the Great Plains of North America and has also been reported from Europe and western Asia (27). In nature, BMV predominantly infects grasses and cereals and may cause sporadic crop losses (28). It is unusual in its ability to also infect an array of experimental broadleaf hosts (27).

EXPERIMENTAL PROCEDURES
Virus Isolates and Sample Preparation-We first examined the BMV isolates listed in Table I. BMV-P and BMV-N were obtained from the Cereal Research Center collection of virus isolates (28); isolates investigated subsequently, such as BMV-P* and -N*, were derived from these in 1999 by repropagation in alternate hosts. The other isolates listed (BMV1 to BMV4) were supplied to the Manitoba TOFMS laboratory as coded samples from the collection of Kansas State University; their subsequent identification (after the analyses were carried out) is listed in the next column. Care was taken to propagate and purify the BMV isolates separately and to eliminate the possibilities of crosscontamination. Virus purification was carried out according to published methods (28,29).
After the removal of low molecular weight impurities (Ͻ5000 Da) by centrifugal filtration, the purified coat protein was lyophilized to determine its amount. A 1% (w/w, enzyme/protein) trypsin or endoproteinase Lys-C solution was added to the protein solution (1 g/l) in 25 mM ammonium bicarbonate, followed by overnight enzymatic digestion at 37 C. The reaction was terminated by freezing, and the products were stored at Ϫ20°C for subsequent MALDI and ESI analyses.
Mass Spectrometry-Measurements of the undigested protein masses were performed on MALDI/TOF (32) and ESI/TOF (33) mass spectrometers. Some of the initial measurements of the proteolytic fragments produced by digestion with trypsin or Lys-C proteinases were carried out on the same instruments. However, mass determinations of proteolytic fragments and MS/MS measurements on the daughter ions in the more difficult cases were made on the Manitoba/Sciex prototype tandem quadrupole/TOF mass spectrometer (QqTOF) (34) coupled to a MALDI source (35,36), shown in Fig. 1. The MALDI matrix consisted of 2,5-dihydroxybenzoic acid, and argon or nitrogen were used as collision gases; energies of 50 -180 eV were applied. For comparison, monoisotopic mass values of the peptide fragments from either enzymatic digestions or collision-induced dissociation (CID) were calculated using the computation program ProMac (Sciex, Concord, ON).
In addition, some ESI measurements, including parent ion scans (37,38), were made on a Micromass Quattro-LC ESI triple quadrupole instrument. Here the parent ion scans were carried out for arginine, i.e. the third quadrupole was set to detect m/z 175 (diagnostic of Arg), while the first quadrupole was scanned, thus yielding an m/z spectrum of all  a Measured MALDI masses were obtained from the measured m/z values of the MH ϩ ions, and measured ESI masses were obtained from the deconvoluted ESI mass spectra.
b Calculated molecular masses were obtained from the measured sequences taking account of acetylation at the N-terminus and the other modifications observed (see text). parent ions containing arginine. In addition, such scans often give better signal/background ratios than ordinary MS spectra.

Preliminary Characterization of the BMV Isolates
Typical MALDI and ESI spectra of the intact protein subunits are shown in Figs. 2 and 3. The measured average masses for the coat proteins of the isolates (Table I) all differed from the value of 20,253 Da calculated from the published coat protein sequence of the type isolate (24 -26) (derived from the cloned nucleic acid sequence, assuming deletion of the N-terminal methionine), and the masses of most isolates differed from each other. Clearly a number of modifications in the coat proteins had occurred. To define these modifications, we carried out a series of MS and MS/MS measurements after proteolytic digestion of the proteins, as described above. These digests provided progressively more detailed information that finally made it possible to characterize the differences among the BMV isolates.
In the initial protein sequencing experiments, ESI/TOF MS was used to analyze tryptic digests of BMV-P. Peptide mapping and MS/MS measurements on the tryptic fragments led to confirmation of the type sequence (derived from the published cDNA sequence of BMV (24 -26)) for 180 of 189 amino acids (17). The undefined residues were contained within two short regions near the N terminus (residues 2-8 and 20 -26), and we deduced from the measurements that these stretches included at least two differences from the type sequence.
To resolve this uncertainty, further digestions of the BMV-P samples (with Lys-C proteinase as well as trypsin) were carried out, and the masses of the peptide fragments from each digest set were measured by MALDI as well as ESI mass spectrometry. MS spectra from BMV-P (Lys-C digest) are shown in Figs. 4 and 5, and the observed fragments are listed in Table II.
Consistent with the initial analyses, no ions corresponding to the predicted masses of the (2-8) and (20 -26) fragments were found. However, a prominent singly charged ion could be observed at m/z ϭ 679.4 in the ESI spectrum, although not in its MALDI counterpart, and we realized that its mass is 42 Da greater than the predicted mass of the fragment encompassing residues 2-8. This is consistent with acetylation at the N terminus, a common post-translational modification. The acetylation was confirmed by the MS/MS spectrum of this parent ion shown in Fig. 6, which contains a complete set of acetylated b ions together with a complete set of y ions, of which only y 7 (i.e. the parent ion [MϩH] ϩ ) is acetylated. Moreover, the diagnostic ion at m/z ϭ 679.4 is present in the ESI spectra from the digests of all samples tested, consistent with N-terminal acetylation of the BMV coat protein (and deletion of the N-terminal methionine) for all the BMV isolates. This observation proved to be the key that allowed us to determine the complete coat protein sequences of all the isolates examined.

Determination of the Coat Protein Sequences of the Isolates
is the simplest case. The N-terminal of its coat protein is acetylated, but its mass and the masses of its peptide digest fragments are otherwise identical with those deduced from the published type sequence (24 -26), consistent with the absence of any other modification. MS/MS measurements confirmed this result, as well as the absence of zero-sum mutations.
BMV-P-Prominent ions at m/z ϭ 445 are found in both ESI and MALDI spectra (Figs. 4 and 5) and in the ESI parent ion spectrum (Fig. 6), consistent with the predicted mass of residues 20 -22, and thus reducing the unknown region to residues 23-26. The only alteration in that region consistent with the The measurement was performed on the MALDI/TOF mass spectrometer (32) using myoglobin (16,952 Da) as an internal standard and sinapinic acid as matrix.

FIG. 3. ESI/TOF m/z spectrum of the BMV-P coat protein.
The protein solution was prepared in methanol/water solution (v/v, 1:1) containing 5% acetic acid at pH 2.5. The m/z spectrum shown was acquired on the ESI/TOF mass spectrometer (33), and a deconvoluted mass spectrum is shown in the inset. overall mass is a substitution of Arg for Trp at residue 23, and this interpretation is supported by the observation of ions at m/z ϭ 347.0 (sequence 24 -26, TAR), 503.2 (sequence 23-26, RTAR) and 601.4 (sequence 20 -23, RNRR) (see also BMV-T measurements below). The modification is consistent with a single base change of the nucleic acid sequence (UGG to CGG). Together with acetylation at the N terminus, it then accounts fully for the observed mass differences (Tables I and II) between BMV-P and the published type sequence (24 -26).
It is interesting to note that Wang et al. (39,40) have reported a BMV isolate from Nebraska whose sequence is identical with that of BMV-P, indicating that the same mutation has arisen at different geographic locations.
BMV-T-The ions mentioned above are also observed in the MS spectra of the BMV-T isolate. In addition, they appear in the BMV-T parent ion scan for m/z ϭ 175, an Arg marker (Fig.  7). However, the most convincing evidence for the Trp 3 Arg modification was obtained when fragments from a Lys-C digestion of BMV-T (pv47) were analyzed in the MALDI-QqTOF instrument (Table III), enabling high mass accuracy to be obtained for both parents and daughters. The ion observed at m/z ϭ 3685.106 has a mass 29.973 Da smaller than the value deduced from the published nucleotide sequence for fragment 9 -41, consistent with the modification Trp 3 Arg (calculated mass difference ⌬m ϭ 29.978 Da). MS/MS measurements confirmed that Trp had indeed mutated to Arg at residue 23 in BMV-T, as well as in BMV-P.
Another anomalous fragment ion was observed at m/z ϭ 2886.503 (Table III). This is 14.024 Da larger than the mass of the deduced fragment encompassing residues 166 -189, sug-  BMV-W-The overall mass of BMV-W is smaller than that of BMV-P by about 100 Da (Table I), and a prominent doubly charged ion appears in the parent ion spectrum (Fig. 7) of BMV-W at m/z ϭ 868.2 Da, corresponding to a mass of 99 Da less than the calculated mass of fragment 24 -41. The mass of this ion was determined accurately by a MALDI QqTOF measurement of the tryptic digest, which yielded the spectrum shown in Fig. 8. In this spectrum, the ion corresponding to fragment 24 -41 (expected mass 1834.055 Da) is replaced by another peak at m/z ϭ 1734.981, with a monoisotopic mass 99.074 Da smaller than predicted. Because valine has a mass of  99.068 Da, this suggests that one of the three valines in the segment (TARVQPVIVEPLAAGQGK) has been deleted. MS/MS measurements on the m/z ϭ 1735 ion, shown in Fig. 9a, indicated that both valine residues 30 or 32 were present, but at first inspection a deletion of Val-27 appeared to give good agreement between the MS/MS data and calculations, as shown in Table V. Indeed there would be almost perfect agreement if the mass were measured to only one or two decimal places (typical accuracies for some other methods of MS measurement). However, closer examination showed that all the b-ion mass differences (observed Ϫ calculated) are negative, with an average ⌬ ϭ Ϫ11 mDa, whereas the average ⌬ for the y ions is only Ϫ0.1 mDa (note that the latter figure is much less sensitive to residue assignment errors near the N terminus). We then realized that another possibility lay in replacing Arg with Gly at residue 26, which gives an almost identical calculated mass change (⌬m ϭ 99.080 Da). This stimulated a more careful search of the MS/MS spectrum of Fig. 9a, resulting in the discovery of the small peaks shown in Fig. 9b at m/z ϭ 230.118 and 1505.853 (initially unassigned), corresponding to b 3 and y 15 in the breakup of the peptide with the R 3 G substitution. Furthermore, this substitution yields the daughter ion masses shown in Table VI, where the b-ion mass differences now alternate between positive and negative, with an average ⌬ ϭ 0.4 mDa, a considerable improvement. Thus we conclude that residue 26 in BMV-W has undergone an R 3 G mutation, corresponding to a single genetic code change of AGG to GGG in the nucleic acid sequence.

Effect of Change of Propagation Host on BMV Coat Protein Sequences
A change in virulence of a virus by passaging through a different host is a well established phenomenon (41). It is interesting to see if such passaging is accompanied by a change in the coat protein sequence, a possible explanation of the change in virulence.
In this case, the BMV-P isolate had been propagated in Little Club wheat in 1989 (28), after which the purified lyophilized virus was used for the ESI mass measurement listed in Table  I. To provide material for additional analyses, BMV-P was propagated in 1999 from the same source used in 1989 (preserved in the Cereal Research Center collection), except that BW155 wheat was used, as Little Club was no longer readily available. This repropagated virus is denoted as BMV-P*.
Two distinct masses appeared in the deconvoluted ESI spectrum of BMV-P* (Fig. 10). The minor peak (at 20,265 Da) corresponded to the one previously observed for BMV-P, but the predominant peak had a mass indistinguishable from that previously determined for BMV-T (20,279 Da), suggesting a mutation of Val to Ile or Leu at position 167, as deduced above for BMV-T. This was confirmed by mass mapping and by MS/MS measurements (not shown).
A further change in the coat protein sequence of BMV-P was observed when the propagation host was changed from BW155 wheat (BMV-P*) to AC Assiniboia oat (yielding BMV-P* 2 ). In  addition to peaks corresponding to sequences identical to BMV-P (20,265 Da) and BMV-T (20,279 Da), BMV-P* 2 had a new peak at 20,315 Da (Table VII). Analysis by tandem QqTOF mass spectrometry showed that this new sequence had a mutation of Leu to Phe at position 35, again accounted for by a single nucleic acid base change (CUC to UUC). When the BW155 wheat propagation host (BMV-P*) was changed instead to Seneca maize (generating BMV-P* 3 ), the same three peaks were observed, but there was a higher proportion of the 20,315 Da peak. After BMV-P* 3 was in turn inoculated on TR241 barley and propagated (generating BMV-P* 4 ), the purified virus no longer contained the protein of 20,315 Da, but had a main peak at 20,279 Da and a new small peak at 20,295 Da. The sequence of the 20,295 Da peak was identical to that of BMV-V.
A similar phenomenon was observed when BMV-N, like BMV-P, was propagated in 1999 in BW155 (rather than Little Club) wheat, yielding BMV-N*. Two protein peaks were observed in the deconvoluted ESI spectrum of BMV-N* (Table  VII). The larger peak (20,194 Da) corresponded to the mass previously measured for BMV-N, but there was a new peak (at 20,225 Da) whose mass did not correspond to that of any of the previously analyzed BMV isolates. The MS spectrum of a BMV-N* Lys-C digest was similar to that for BMV-W with two exceptions. First, the monoisotopic ion corresponding to residues 9 -15 in BMV-W (and BMV-P, see Table II) appeared at m/z ϭ 946.542 instead of the calculated value of 918.506 Da, and MS/MS measurements (not shown) indicated that this was caused by a substitution of Ala 3 Val at residue 12 (corresponding to a single base change GCG to GUG). Second, the ion at m/z ϭ 1904.984 corresponding to residues 112-130 in BMV-W (and BMV-T, see Table III) was present, but there was an additional peak at m/z ϭ 1934.993. MS/MS measurements on this ion (not shown) indicated that it also corresponded to residues 112-130, but with an Ala 3 Thr substitution at position 122 (corresponding to the single base change GCA to ACA). Thus the 30.009 Da mass difference observed in the 112-130 doublet (calculated separation 30.011 Da) is accounted for, as well as the difference in the overall masses between BMV-N and BMV-N*.
BMV-N appeared less responsive than BMV-P to further changes in propagation host. After the initial set of changes induced by the use of BW155 rather than Little Club wheat (Table VII), which generated BMV-N*, no further changes were observed when the propagation host was subsequently shifted from BW155 wheat to AC Assiniboia oat, then from BW155 wheat to Seneca maize, and finally re-propagated in Seneca maize.

DISCUSSION
Mass Spectrometric Methods-The measurements reported here were carried out over a period of several years (15)(16)(17)(18)(19)(20), during which techniques and available instrumentation improved greatly. In particular, the mass measurement accuracy for both proteolytically generated fragments and daughter ions improved by about two orders of magnitude from the original observations in 1996 to the recent measurements with the QqTOF instrument. This led to a marked increase in the level of confidence in the sequence assignments, notably in distinguishing between Val deletion and Arg 3 Gly substitution in BMV-W, as described above. In that case, the ability to obtain a mass accuracy better than ϳ10 mDa for the daughter ions was clearly important.
In addition, it is useful to note that the peak that defined the N-terminal acetylation for all the BMV samples was observed only in the ESI spectrum, not in the MALDI spectrum, emphasizing the advantage of having both modes of ionization available (42,43).
Significance of Polymorphism in BMV Coat Protein (CP) Sequences-Coat protein molecular weights inferred from the sequence analysis of cDNA clones, while accurate, do not show the complete profile of the viral coat protein population actually present in infected plant tissue. Comparisons of the CP sequences of a set of distinct BMV isolates not only show FIG. 10. Change in the mass spectrum of BMV-P produced by a change in propagation host. Proteins were dissolved in methanol/ water solution (v/v, 1:1) containing 5% acetic acid, and m/z measurements were carried out on the ESI/TOF mass spectrometer (33). The deconvoluted mass spectra are shown. A, BMV-P*, i.e., after the host change to BW155 wheat; B, BMV-P before the host change, i.e. propagated in Little Club wheat. consistent discrete polymorphisms, but also identify a specific host-attenuation effect on the profile of the CP population of the BMV-N and BMV-P isolates. If CP sequences of defined BMV isolates remained invariant, we would expect the type isolate BMV-T (pv47) CP sequence to be identical with that reported in the literature (27,29). Instead, the BMV-T (pv47) sequence differs from the reported one by a Trp 3 Arg substitution at position 23 and a Val 3 Ile/Leu substitution at position 167, as indicated above. To ensure that this was not because of a mix-up of the isolates, the virus purifications and TOFMS analyses were repeated using double-blind coding (BMV-1), and the same results were obtained. The CP sequence of the BMV-V isolate (30), however, was identical with that previously reported for BMV-T (pv47) (29). The fact that BMV-P and the Nebraska isolate reported by Wang et al. (39,40) have identical CP sequences, emphasizes that similar mutations may occur independently at different geographic sites. Fig. 11 shows the modifications observed in the isolates of Table I. These probably arise because replication of viral RNA occurs at a high error rate (10 Ϫ4 per cycle), much higher than DNA. There is thus a numerically large pool of viruses with minor mutations produced with each round of viral replication; minor mutations are those that do not affect critical functions of viral replication, packaging, or transport. A particular minor mutant may easily gain predominance after hundreds of replication cycles, even though it is only favored slightly in a given host.
The inability to distinguish whether Val is substituted by Ile or Leu at residue 167 in BMV-T is a chronic difficulty in MS measurements with low energy CID. Because Ile and Leu have the same mass, MS can only distinguish between them by methods that are sensitive to the detailed structure of the side chains, such as high energy CID. Nevertheless, in this case we do have some additional information available, the initial nucleic acid sequence (GUC). Substitution of Ile requires a codon change from GUC to AUC (i.e. G 3 A, a purine 3 purine transition), whereas substitution of Leu requires a change from GUC to CUC (i.e. G 3 C, a purine 3 pyrimidine transversion). In general, transitions are significantly more probable than transversions (44), consistent with examinations of viral coat proteins; 20 of 28 nucleic acid substitutions that we have observed in these compounds have been transitions. We conclude that the substituent at residue 167 in BMV-T is probably Ile.

Role of Mass Spectrometry in Plant Virus
Research-Analysis by MALDI or ESI mass spectrometry of the viral coat protein is an approach that can distinguish virus isolates or closely related virus strains rapidly and definitively. Moreover, this method reveals directly any post-translational modifications, which of course could not be deduced from nucleic acid sequencing. Here, for example, the N-terminal residue in all the BMV isolates studied was modified by acetylation, a common phenomenon in ssRNA plant virus coat proteins, reported elsewhere for members of the tobamo-, potex-, and poty-virus groups. Structural information of this kind may contribute to understanding how the virus functions, and what factors account for variations in virulence among different isolates.
Moreover, reverse transcription of viral RNA and amplification of the reverse-transcribed cDNA will only yield the protein that is coded by the particular nucleic acid sequence selected. Mass spectrometry, by contrast, gives a reasonably accurate picture of the distribution of sequences in a mixed population of virus coat proteins, because the relatively small changes in sequence observed here are unlikely to produce much change in MS sensitivity. The data shown in Table VII and in Figs. 10 and 11 show that it is now possible to measure by TOFMS the change in profile of the population of coat proteins of a given virus isolate when the propagation host is changed. The changes we observe in the coat protein sequence provide one possible explanation for the change in virulence produced by passaging through a different host. a Calculated molecular mass was obtained from the measured sequence taking account of the observed acetylation at the N-terminus, as also shown in Table I. b The intensity was determined by integrating the area under the peaks.