Characterization of the Agent of “High Plains Disease”

The “32-kDa” protein specifically associated with high plains disease was characterized by time-of-flight mass spectrometry, after the agent had been isolated in pure culture by “vascular puncture inoculation,” a novel mechanical means of transmission. Two isolates from different geographic locations each consisted of a mixture of subpopulations that were highly homologous to an amino acid sequence derived from a nucleotide sequence (U60141) deposited in GenBank™ by the Nebraska group as “the probable N-protein of high plains virus.” However, the U60141 sequence was found to be incomplete; de novo sequencing of peptides produced by proteolytic digestions of the 32-kDa band from an SDS-PAGE separation showed that an additional 18 amino acid residues were present at the N terminus. BLAST (basic local alignment search tool) examination of the sequence showed no significant homology with any protein in the databases, indicating that the infectious agent of high plains disease is likely a member of a hitherto unclassified virus group.

although its status as a plant virus has not been confirmed by the International Committee on the Taxonomy of Viruses.
The disease is characterized by the development of chlorotic spots that parallel the leaf veins. As the disease progresses, the chlorotic spots merge to give a generalized chlorosis. However, symptoms vary greatly with different host genotypes, environments, plant growth stage at time of initial infection, and combination with other infecting viruses. HPV is transmitted by the same wheat curl mite (Aceria tosichella Keifer) vector as wheat streak mosaic virus (3,6,7). The similarities in (a) transmission by eriophyid mite vectors, (b) induction of spot mosaic symptoms, and (c) formation of 150 -200 nm double membrane particles in the cytoplasm of affected cells suggest that HPV is a member of a group of virus-like plant pathogens that cause wheat spot mosaic, fig mosaic, thistle mosaic, rose rosette, and redbud yellow ringspot diseases (5).
No one has yet succeeded in isolating and characterizing infectious HPV particles. Until recently, there was no other way to propagate the disease agent except to use live wheat curl mites. Using wheat curl mites for experimental transmission and propagation approximates what happens to plants grown in the field but makes it very difficult to obtain HPV infections free of wheat streak mosaic virus, as the vector usually harbors both this virus and HPV (8).
We demonstrate here that the "32-kDa" protein specifically associated with HPV can be characterized by time-of-flight mass spectrometry (TOFMS), following HPV isolation in pure culture by "vascular puncture inoculation" (9, 10, 7), a novel mechanical means of transmission.

EXPERIMENTAL PROCEDURES
HPV Isolates-The HPV isolates to be characterized were obtained from symptomatic maize (Zea mays L.) collected in Kansas in 1996 (KS96) and Idaho in 1997 (ID97). Serological and infectivity assays were used to confirm the absence of other viruses (particularly wheat streak mosaic virus) in each HPV isolate. The identity of HPV was indicated by infectivity assay and SDS-PAGE. Confirmation was obtained by ELISA and Western blotting, using antiserum developed and provided by S. G. Jensen (United States Department of Agriculture/ Agricultural Research Service, Lincoln, NE). Use of this antiserum permitted direct comparison of serological results from these investigations with those of previous studies of HPV (3, 4, 6 -8). To ensure purity, the HPV isolates were maintained in culture in "Spirit" corn by vascular puncture inoculation of kernels (7,9,10).
Generation of HPV Protein for Analysis by TOFMS-HPV-specific protein for mass spectrometric (MS) analysis was obtained after electrophoretic separation (SDS-PAGE), staining with Coomassie Blue R-250, and excision of the ϳ32 kDa protein band. The excised HPV-specific protein band from each gel was digested with various endoproteinases as described by She et al. (11), using sequencing grade trypsin, Lys-C, Glu-C, and Asp-N purchased from Roche Diagnostic Corp. Enzymatic digestion was performed on the gel-separated proteins using either: 10 ng of trypsin, 50 ng of Lys-C, or Glu-C, in 25 mM ammonium bicarbonate or else 50 ng of Asp-N in 10 mM Tris-HCl (pH 7.6) solution. Peptide extractions with sonication usually yielded sufficient material for analyses covering most of the protein sequence.
Derivatization-To facilitate de novo peptide sequencing, we enhanced the production of y ions from some of the selected peptides by derivatizing the in-gel tryptic digest with sulfonic acid. The peptides were dehydrated, then reacted with chlorosulfonylacetyl chloride, as described by Keough et al. (12). Improved yield of the sulfonated product was achieved by repeating the reaction three times.
Mass Spectrometry and Characterization of Peptide Amino Acid Sequences-After digestion or chemical modification, the samples were analyzed by matrix-assisted laser desorption ionization (MALDI) on a QqTOF mass spectrometer (13) as described previously (14,15), using 2,5-dihydroxybenzoic acid as matrix. MS/MS daughter ion spectra were obtained after collision-induced dissociation (CID) in the QqTOF collision cell. Protein or peptide sequences were then determined by manual interpretation of the MS and MS/MS measurements and identified by data base searching with the computer programs MS-Tag, BLAST, or FASTA.

Confirmation of the Identity of HPV-specific Protein
Protein extracts from infected tissue (propagated by vascular puncture inoculation) were shown by ELISA to react to an HPV antiserum (3). When examined by SDS-PAGE, the extracts yielded a ϳ32-kDa band ( Fig. 1), which was not present in control extracts from uninfected tissue, and which reacted in Western blots to the specific antiserum.  Table II: peak 1, ribosomal proteins; peak 2, ␤-glucosidase aggregating factor precursor; peak 3, putative high plains virus protein U60141; peak 4, unknown protein (not found in data base).

Determination of the Protein Amino Acid (aa) Sequence
Deducing aa Sequences from the Initial Tryptic Digests- Fig. 2 shows products from the tryptic digest of a gel slice containing the 32-kDa protein band extracted from isolate ID97. The spectrum is fairly complex, making analysis difficult. In addition, we believed initially that we were faced with a completely de novo sequencing problem, although this turned out to be only partially true. Thus it was clearly important to simplify the problem by choosing appropriate sequencing strategies.
Analysis of the prominent 2448/2476 Da "doublet" shown in Fig. 2 illustrates the difficulty. MS/MS measurements yielded similar daughter ion spectra in the lower m/z range, from which we could extract putative y 1 to y 16 ions, indicating that the two parents were closely related. However, we were unable to find y fragments between y 16 and the parent ion, or their b ion counterparts, so the most interesting parts of the peptides remained undefined.
Here we found sulfonation at the N terminus (12) especially valuable. Fig. 3 shows the doublet before and after sulfonation. Although sulfonation has somewhat worsened the signal/noise ratio, the defining peaks are still well above background, and unit mass resolution is still obtained. Fig. 4 shows the corresponding MS/MS spectra. Thanks to sulfonation, these spectra can be interpreted directly as complete series of y ions up to y 22 (as labeled in Fig. 4 and listed in Table I), permitting straightforward deduction of the corresponding aa sequences. Both peaks indeed correspond to the same peptide, apart from the two indicated single nucleotide mutations.
Similar analyses of a number of the peaks in Fig. 2 (with or without sulfonation) yielded the other peptide sequences exhibited in Table II; all were confirmed by MS/MS measurements. Searches of protein sequence databases revealed that some of these peptides corresponded to fragments of known ribosomal proteins, and some were homologous to ␤-glucosidase aggregat-  Table V. ing factor precursor, as indicated in the table. These proteins all have calculated masses in the range 27-32 kDa (Table II), so it is not surprising to find them in the rather large gel slice examined. However, other peaks were more directly related to our investigation. We had hoped only that some of the observed peptides might serve to characterize HPV and that perhaps there would be some homology between these putative HPV peptides and the recorded sequences of other viruses. Instead, we were pleasantly surprised to find that most of the peptides corresponded almost exactly to portions of a specific amino acid sequence listed as AAB03575 and designated as the "the prob-able N-protein of high plains virus," 2 of which we had not previously been aware. AAB03575 had been deduced from the cDNA sequence (U60141) of a reverse-transcribed RNA component of an infectious isolate prepared by the Nebraska group, and deposited in GenBank TM , but not otherwise published. Thus our mass spectrometric measurements served to confirm the previously tentative identification of U60141 with high plains disease (see discussion below).
Unexpectedly, however, the tryptic peptide fragment at k Masses (and ⌬m, values) were calculated for the incorrect nominal assignment just noted. l L means Leu or Ile, and the N-terminal L is actually acetylated Ala (@A); see text.
m/z ϭ 3522.653 that included the N-terminal sequence deduced from U60141 had an additional single glutamic acid (Asp) residue at its N terminus (Fig. 5), showing that the data baselisted sequence was not complete. Moreover, several additional peaks had been observed and their corresponding peptide fragments sequenced de novo (Table II, part 4). While these sequences all corresponded to the same protein, its peptides had no closely matching counterparts in any protein data base. Was this part of a separate protein, or was its sequence part of the "complete" HPV N-protein sequence? None of the observed tryptic fragments overlapped the U60141-deduced aa sequence (Table II), leaving this question open. Additional Proteolytic Digests-To decide between the two possibilities, we generated new sets of fragments using different endoproteinases (Asp-N and Glu-C). Fortunately this succeeded in generating peptides that also started from the N terminus of the "unknown" protein but were considerably larger. Sequencing these peptides by MS and MS/MS measurements gave overlaps of as many as 13 residues with the sequence predicted by U60141; see Table III. Thus an additional 18 residues were found to be part of the actual HPV aa sequence, so the sequence predicted by the nucleic acid data (U60141) is indeed incomplete (see discussion below).
Correction of the N-terminal Residue Assignment-One discrepancy remained. The assignment of N-terminal residues for the "new" peptides shown in Tables I and III was based on a nominal interpretation of the MS data that allowed an error tolerance up to ϳ50 mDa, but closer examination reveals inconsistencies in this interpretation. Note that all values of ⌬m for the new N-terminal peptides are negative, with an average value of about Ϫ32 mDa (Table III), whereas ⌬m for the collection of peptides listed in the upper part of Table II has an root mean square value ϳ7.8 mDa and an average value Ͻ0.8 mDa (similar to the values obtained in previous measurements on the same instrument (14,15)). Such a discrepancy might easily be overlooked if less accurate MS techniques had been used, but it is clearly unacceptable for measurements with our QqTOF instrument.
MS/MS measurements on the new peptides yielded additional information. Fig. 6 shows the simplest example: the daughter ion spectrum from breakup of the 607-Da parent ion. Table IVa lists the difference between observed and calculated masses for the daughter ions on the basis of the nominal sequence assignment. All b ion masses differ from the calculated values by more than 30 mDa. By contrast, the masses of the y 1 to y 4 daughter ions all differ from calculation by 5 mDa or less, whereas the mass of y 5 differs by 37 mDa. Results of MS/MS measurements on the other new peptides were consistent with this pattern (see Supplemental Figs. S1-S5), unambiguously pointing to an error in the assignment of the first residue.
We then realized that the mass (113.048 Da) of acetylated alanine (@A) is almost identical to that of leucine/isoleucine (113.084 Da), yielding a plausible alternative assignment for the N-terminal residue. Table IVb shows the masses calculated assuming @A to be at the N terminus. These all agree with the observed masses of the b and y ions within 5 mDa, and similar improvements were obtained in the ⌬m values of Tables I and III, convincing evidence that acetylated alanine rather than leucine/isoleucine constitutes the actual N terminus of the HPV protein. (We note that hydroxylation of proline yields a compound (hydroxyproline) with the same elemental composition as @A, and thus the same mass, but this is an unusual modification that would not be expected to occur at the N terminus.) Amino Acid Variations in the KS96 and ID97 HPV Isolates-The sequence deduced from U60141 consists of 270 aa residues, whereas the sequences determined for both KS96 and ID97 contain, as noted above, an extra 18 residues at the N termi-

LLSFK
nus. An additional three isolates from different geographic locations (not described here) were also found to contain the same extra residues, so it appears that HPV normally contains 288 aa residues.
The overall results are shown in Table V. For both isolates, more than 85% of the residues were determined. The isolates each consist of a mixture of subpopulations, one of which contains an aa sequence corresponding directly to the one derived   Table III and text). Daughter ions from other N-terminal parents are shown in Supplemental Figs. S1-S5.  Tables I and V. In all cases, the C-terminal 270-aa residues of the HPV isolates are highly homologous to the U60141 sequence, and from Table V it can be seen that a consensus sequence accounting for about 88% of the observed residues can be assembled. DISCUSSION Previous attempts to characterize the HPV agent have been hindered by the need to transmit the infectious agent with the wheat curl mite vector, which naturally transmits wheat streak mosaic virus along with HPV. Vascular puncture inoculation eliminates the problem, making it possible to establish pure cultures of HPV-infected plants and thus ensuring that protein unique to infected plants is indeed disease-specific.
Our investigation demonstrates the power of combining this method of sample preparation with protein characterization by mass spectrometry. In the present case, the analysis was aided by the unanticipated discovery that our MS-determined sequence had extensive homology with an aa sequence that had been deduced from a partial cDNA of double-stranded RNA extracted from ribonucleoproteins of diseased plants (U60141). Nevertheless, our MS analysis revealed that the U60141 sequence was incomplete, and indeed this might have been suspected from the absence of an open reading frame in U60141, likely arising from the inherent difficulties in generating fulllength cDNA clones from double-stranded RNA.
Although the MS-based approach to analyzing HPV-specific protein avoids these difficulties entirely, determining the complete protein sequence nonetheless required a thorough MS analysis. Data base searching followed simply by protein identification would of course have yielded no information about the additional aa residues at the N terminus, illustrating the limitations of the commonly used "high-throughput" measurements. Moreover, it was essential in this case to use an array of different proteinases to obtain digest fragments that would define the additional residues. This emphasizes the need to adapt one's strategy to observations as they are made. Another feature of the measurement was the ability to distinguish between leucine and acetylated alanine (⌬m ϭ 36 mDa) as the N-terminal residue, which could only be accomplished with high mass accuracy MS. As an obvious consequence of this observation, we predict that a complete cDNA transcript of RNA from the HPV-specific ribonucleoprotein will contain codons for Ala-Leu/Ile-Ser . . . rather than Leu/Ile-Leu/Ile-Ser . . . corresponding to the N-terminal of the protein.
Several naturally occurring sequence variants were readily characterized in these analyses as components of the overall population of HPV-specific protein molecules. This is a critical advantage of mass spectrometry over nucleic acid sequencing of viral nucleic acid cDNA, as the latter necessarily involves the analysis of solely a clonal subpopulation. Examination of the sequence by BLAST (16,17) failed to find any significant homologies (except for U60141) in the protein or translated nucleic acid sequence data bases. Nevertheless, as mentioned above, biological evidence suggests that HPV is related to a group of unusual virus-like plant pathogens that cause wheat spot mosaic, fig mosaic, thistle mosaic, rose rosette, and redbud yellow ringspot diseases (5), none of which have yet been sequenced. On the basis of the present results, we believe that a good method of investigating such a relationship at the molecular level, is to extract protein fractions from plants affected by these diseases and subject them to MS sequence analysis; this task is now much easier with a "type" sequence (HPV) in hand. Such evidence will be important in addressing the outstanding questions of the taxonomic affilations of this group of unusual plant pathogens.