Differential Extraction and Protein Sequencing Reveals Major Differences in Patterns of Primary Cell Wall Proteins from Plants*

The proteins of the primary cell walls of suspension cultured cells of five plant species,Arabidopsis, carrot, French bean, tomato, and tobacco, have been compared. The approach that has been adopted is differential extraction followed by SDS-polyacrylamide gel electrophoresis (PAGE), rather than two-dimensional gel analysis, to facilitate protein sequencing. Whole cells were washed sequentially with the following aqueous solutions, CaCl2, CDTA (cyclohexane diaminotetraacetic acid, DTT (dithiothreitol), NaCl, and borate. SDS-PAGE analysis showed consistent differences between species. From the 233 proteins that were selected for sequencing, 63% gave N-terminal data. This analysis shows that (i) patterns of proteins revealed by SDS-PAGE are strikingly different for all five species, (ii) a large number of these proteins cannot be identified by data base searches indicating that a significant proportion of wall proteins have not been previously described, (iii) the major proteins that can be identified belong to very different classes of proteins, (iv) the majority of proteins found in the extracellular growth media are absent from their respective cell wall extracts, and (v) the results of the extraction process are indicative of higher order structure. It appears that aspects of speciation reside in the complement of extracellular wall proteins. The data represent a protein resource for cell wall studies complementary to EST (expressed sequence tag) and DNA sequencing strategies.

The plant cell wall is a dynamic system generally considered to be composed of more than 90% carbohydrate polymers. Proteins, phenolics and possibly lipids make up the remainder of the wall (1)(2)(3). To date, most research interest has been in the carbohydrate components because of considerations of their structural role and commercial interest. This has led to a number of models for the integration, interpolymeric association, and assembly of the wall (3,4). By comparison, our knowledge of the complexity of protein in the plant cell wall is in a less advanced state. Much of the understanding of the range of structural wall proteins has come from cDNA and genomic cloning exercises and has led to the identification of glycine-, cysteine-, proline-, and hydroxyproline-rich subsets of wall proteins. In addition, many extracellular enzymes have been iden-tified that are required for the restructuring and modification of this dynamic extracellular matrix which underpin its role in defense, detoxification, signaling, cell-cell recognition, cell expansion, cell adhesion, cell separation, translocation, differentiation, and morphogenesis (2,5,6). However, there is a lack of direct studies on the proteins themselves and the true range of extracellular proteins and their species differences remains to be elucidated. The present work describes the systematic extraction and sequencing of the major primary wall proteins from five species representing four families of plants.
Since whole plant tissue is complicated by the presence of different tissue types including lignified secondary walls, we have chosen to use suspension cultures as a source of experimental material due to their relative uniformity. There are a large number of studies which show that molecular probes for proteins and enzymes derived from tissue cultured cells locate in a predictable way in the intact plant, and thus tissue cultures can be a reliable guide to a substantial number of phenomena in the intact plant. To cover a diverse range of species, we have used suspension cultures of Arabidopsis, carrot, French bean, tomato, and tobacco. The reasons for choosing these species relates to their academic and commercial interest. Arabidopsis has not been as extensively used in tissue culture but is currently the target of an extensive EST 1 sequencing exercise, and an international program aimed at the sequencing of its entire genome is under way (7,8). Carrot, a member of the Umbelliferae, has been an important species in modeling embryogenesis and elongation growth (9). Tissue cultures of the leguminous species, French bean, have been extensively used as a model system for cell wall biosynthesis and modifications during responses to pathogens (2). The two solanaceous species tobacco and tomato allow a comparison of the extent of conservation of wall proteins within one family. Tomato is important commercially and was the subject of the first successful attempts to modify the expression of wall proteins by transformation (10). Tobacco is an important model plant for attempts to modify cellulose extractability by modifying lignification (11).
The aim is to generate a protein resource for the plant cell wall analogous to current efforts of EST and genomic sequencing, so allowing for the future identification of potential biochemical function arising from these exercises. Homology searches of the derived amino acid sequence data from the present study should provide a firm indication of the number of components of the cell wall to which function is still to be ascribed.

MATERIALS AND METHODS
Plant Material-The derivation and maintenance of cultures of Arabidopsis (12), tobacco (13), tomato (14), and French bean (15) have been described previously. The carrot cultures (previously unpublished) were grown in Murashige and Skoog (16) basal salts supplemented with 3% sucrose and 2 mg/ml 2,4-D. Suspension cultures were grown in 100-ml batches in 250-ml Erlenmeyer flasks under a 16-h photoperiod (Arabidopsis, tomato, and carrot) or in the dark (French bean and tobacco) at 24°C while rotary shaken at 130 rpm and subcultured every 7-10 days.
Extraction and Preparation of Cell Wall Proteins-Cells growing 4 -5 days after subculture were harvested by filtration on Miracloth. The cells were washed three times with dH 2 O (3 ml/g fr. weight). All subsequent manipulations were conducted at 4°C. The cells were stirred in three volumes of 0.2 M CaCl 2 for 30 min and collected by filtration on Miracloth and washed three more times with dH 2 O as before. Subsequent extractions, each with three volumes, were carried out sequentially on the same cells for 30 min each, with 50 mM CDTA in 50 mM sodium acetate, pH 6.5, followed by 2 mM DTT, 1 M NaCl, and finally 0.2 M borate, pH 7.5. The borate extraction was conducted at room temperature. Between extractions cells were washed on the filter three times, each with three volumes of dH 2 O. Extracts and the culture media were refiltered through GF/A paper before being dialyzed against a 10-fold excess of dH 2 O with three changes. The samples were lyophilized and reconstituted in SDS-PAGE loading buffer (17) prior to analysis. The lyophilized culture filtrates were reconstituted at 30 g/l, whereas all the other extracts were reconstituted at 15 g/l in gel loading buffer.
Protein Analysis-SDS-PAGE was carried out on 10% gels, using Bio-Rad Mini-PROTEAN II apparatus and stained with Coomassie Brilliant Blue (17). Since protein recoveries varied between the different extracts within each species and also between species the amount of extract loaded per lane was optimized by SDS-PAGE: typically between 10 and 20 l of reconstituted extract was used per lane. For sequencing purposes, this process was also necessary due to the complexity and uneven abundance of the components within each extract, and more than one loading was used to maximize the acquisition of sequence data for the minor protein bands. For N-terminal amino acid sequence determination, proteins were transferred onto Problot membrane (Applied Biosystems, Inc. (ABI), Foster City, CA) and visualized by Coomassie Blue staining as directed in the Problot manual. Amino acid sequence analysis was performed in an ABI model 477 Sequencer. Sequence similarities were determined using the electronic mail "BLAST" series of programs (18) against the non-redundant protein, non-redundant DNA, and non-redundant EST data bases maintained by the National Center for Biotechnology Information.

Analytical Strategy
Although there have been a number of systematic protein sequencing projects in plants, usually based on two-dimensional isoelectric focusing/SDS-PAGE systems of whole plant extracts (19 -22), these are limited in the amount of information that was obtained due to mass limitations of the resolved material. Whereas the resolving power of two-dimensional gels is much higher than that of SDS-PAGE gels, the loading of each band per gel is higher for the latter generating sufficient mass for routine N-terminal analysis; the main problem is resolution. Accordingly we have chosen to reduce the complexity of the initial mixture of proteins by differentially extracting the cell wall from intact suspension cultured cells successively with CaCl 2 , CDTA, DTT, NaCl, and borate. The rationale for the initial wash was based on the successful use of CaCl 2 to extract wall proteins (23). CDTA would be expected to remove any proteins associated with the pectin fraction since calcium would promote such binding. DTT was chosen to reduce any protein-protein interactions based upon cysteine disulfide bonds. NaCl was used to extract any strongly ionically bound proteins. Finally, borate was used to disrupt any interactions due to glycoprotein side chains and other saccharides in the wall. At all stages, intervening water washes were used. For all species, secreted proteins were also characterized in the culture filtrate. Microscopic examination of the cells after these extractions showed the cells to be plasmolyzed, demonstrating that the plasma membrane of the cells remained intact, indicating that none of the extracted components were cytosolic.

Extraction and Patterns of Cell Wall Proteins
SDS-PAGE analysis reveals that the subsets of wall proteins obtained by successively washing whole cells are distinct for each reagent employed and for each species. Sequential extracts are shown for Arabidopsis (Fig. 1A), carrot (Fig. 1B), French bean (Fig. 1C), tomato (Fig. 1D), and tobacco (Fig. 1E). It can be seen for each species that the proteins found in the culture filtrate exhibit a strikingly different profile from that of their respective wall fractions. Similarly, there are distinct subsets of proteins extracted by CaCl 2 , CDTA, DTT, NaCl, and borate ( Fig. 1, A-E). Individual extractions, hereafter referred to as non-sequential extractions, carried out with the same series of reagents on the Arabidopsis and tomato cells yielded different subsets of proteins to those acquired through sequential extraction (Fig. 2, A and B). Since there is clearly a difference between the pattern of proteins extracted using sequential and non-sequential extraction, both approaches were used to maximize sequence data for these two species. It is obvious that within the complex three-dimensional matrix that represents the cell wall there must be some restriction on the dynamic state of the wall or else one would not observe differential extraction.

Systematic Sequencing
A summary of the proteins from each species for which N-terminal sequencing was attempted is shown in Table I. The N-terminal sequences that were obtained, along with the amino acid yield in the first sequencing cycle and the corresponding protein molecular weights, are listed in Table II.
Arabidopsis Cell Wall Proteins-From the 86 proteins selected for sequencing from the Arabidopsis extracts ( Fig. 1A), 47 sequences were obtained, two of which yielded double sequences: band F in the CaCl 2 extract and band F in the nonsequential NaCl extract. In the case of band F from the CaCl 2 extract, it was possible to decipher the double sequence into separate sequences. This was performed by a process of subtraction since both the sequences within the CaCl 2 band F also appear as single sequences at other molecular weights within the same CaCl 2 extract, i.e. CaCl 2 band E and CaCl 2 band O. Discounting sequences that were present at either more than one molecular weight or were within other fractions left 31 unique sequences, which were found within the Arabidopsis wall extracts. The presumptive glycoprotein sequence starting "KVPV" is present in several extracts at more than one molecular weight (culture filtrate, band E; CaCl 2 , bands E and F; non-sequential CDTA, bands A and B) and may represent differential glycosylation. Evidence for microheterogeneity in primary sequence can be seen in bands F and G of the Arabidopsis culture filtrate proteins, which contained identical sequences except for the tyrosine or proline at position two. Bands K and L of the culture filtrate were found to contain identical sequences beginning with "ASSS." The sequence beginning "NPNY" was found in the CaCl 2 extract, bands F, D, and O; the sequential CDTA extract, band D; and the nonsequential NaCl extract, band E. The sequence beginning with "IPCR" was found in the CaCl 2 extract, band R and the nonsequential CDTA extract, band K. The sequence beginning "RIPG" was found within the CaCl extract, band Q and the non-sequential NaCl extract, band K. The sequence beginning with "ARKF" was seen in the sequential DTT and NaCl extracts, bands B and H, respectively, and also within bands F and G of the non-sequential DTT extract. Bands I and J from the non-sequential CDTA extract were also found to contain the same sequence beginning with "KDLX." Carrot Cell Wall Proteins-All of the 10 carrot sequences were unique to carrot and were different from each other.
French Bean Cell Wall proteins-From the 26 French bean proteins that were processed for sequencing (Fig. 1C), 21 sequences were obtained, of which six shared the same N-terminal four amino acids: "NYDK." These sequences (culture filtrate, bands A and B; CaCl 2 , bands C, D, L, and O) were identical apart from the CaCl 2 extract's band O, where the sequence diverged substantially after the fourth amino acid. All of the other bean sequences were only seen in this species and were not represented at multiple molecular weights. Not counting sequences that appeared more than once left a total of 17 unique sequences in the bean extracts.
Tomato Cell Wall Proteins-Seventy-eight proteins were selected for sequencing from the tomato extracts (Fig. 1D), of which 46 yielded sequence information. Four of these proteins contained double sequences: culture filtrate band G, CaCl 2 bands D and O, and the non-sequential NaCl band L, all of which were deciphered into two individual sequences. This was possible by subtraction, in a similar way to deciphering the Arabidopsis double sequences. From the 46 sequences obtained, 30 were unique. The shortfall is due to a total of nine sequences, which were found at more than one position, or extract. Bands containing the sequence beginning "AKDF" were seen in the culture filtrate as bands G, H, and Ia (band I could be resolved into four more components by increasing the polyacrylamide concentration from 10 to 14% in the electrophoresis separating gel). The sequence beginning "KSTD" was also seen in band Ib of the culture filtrate and as the second component of band G of the same extract except that the N-terminal residue Lys was absent in the latter case. The sequence beginning "EQXG" was seen a total of eight times: culture filtrate, band E; CaCl 2 , bands J, K, and L; the sequential CDTA extract, bands E and F; the sequential NaCl extract, band E; and the non-sequential DTT extract, band E. However, the CaCl 2 band J sequence was only the same up to the sixth residue, after which it diverged compared with culture filtrate band E. The sequence beginning "ANAK" appeared twice: once in the non-sequential NaCl extract, band G, and as one of the two short sequences that belong to CaCl 2 band D. Bands containing the sequence beginning with "VAGK" were CaCl 2 , bands G and H; non-sequential NaCl extract, bands I and J; and band B of the non-sequential borate extract. These sequences were identical except residue 11, which was Ala in band G of the CaCl 2 extract and Leu in band J of the nonsequential NaCl extract. Sequences beginning with either "SNPN" or as "TNPN" appeared four times: twice beginning with TNPN, band Q in the CaCl 2 extract and as band G in the sequential CDTA extract, and twice beginning with SNPN as band F in the sequential NaCl extract and as band H in the non-sequential borate extract. Apart from this heterogeneity at the N terminus, all four sequences were identical. The sequence beginning "ALVE" appeared as band O in the nonsequential NaCl extract and as one of the two short sequences that were identified from band O of the CaCl 2 extract. Inciden- tally, the second of the short sequences from band O of the tomato CaCl 2 extract (beginning "ADRE") was also noted within the Arabidopsis CaCl 2 extract, band T. Bands containing the sequence beginning with "EVLY" were seen in the CaCl 2 extract, band F and in the non-sequential NaCl extract, band H. Finally, the sequence beginning with "ELQL" was seen in the non-sequential NaCl extract within bands L, M, and the non-sequential borate band D.
Tobacco Cell Wall Proteins-Twenty-seven of the protein bands from the tobacco extracts were selected for sequencing (Fig. 1E). Only two of the sequences were found to be either in more than one of the tobacco extracts or at more than one molecular weight within the same extract. The first begins with "GEQP" and was in the CaCl 2 extract as band C and band I. The sequence from band C was 18 amino acids long and that of band I, 17 amino acids long; both were identical, except in the case of band C's sequence, which had an extra Asn present at amino acid position 13. The second tobacco sequence to appear in two different extracts began with "EQCQ" and was found within the CaCl 2 extract, band M and the CDTA extract, band E. All the other tobacco sequences are listed as they appear in Table II. Discounting any sequences that appeared more than once in these extracts left a total of 20 unique tobacco sequences.
Overall Series of Sequencing Cell Wall Proteins-On average 63% of the proteins from all of the plant species proved to be sequenceable, but these were only from the bands selected for analysis that were generated in this study. Generally the culture filtrate proteins yielded a high success rate in terms of the numbers of proteins that were sequenced: 92%, 11 out of 12, for Arabidopsis; 67%, 8 out of 12, for tomato; 100% for bean and tobacco: two and three proteins, respectively. In contrast, the two culture filtrate proteins selected from the carrot culture filtrate did not sequence. Those proteins extracted with CaCl 2 also gave a high success rate when sequenced: 60%, 12 out of 20, for Arabidopsis; 100%, eight out of eight for carrot; 94%, 16 out of 17, for French bean; 71%, 12 out of 17, for tomato and 100%, 17 out of 17, for tobacco. In those extracts that were made sequentially after the initial CaCl 2 wash, the number of proteins that were successfully sequenced from those selected dropped off significantly. For example, in the sequential CDTA extracts 40%, two out of five, for Arabidopsis; 43%, three out of seven, for tomato; 40%, two out of five, for carrot, 60%, three out of five, for bean and 33%, two out of six, for tobacco were sequenceable. In the subsequent sequential extracts where protein bands were targeted for sequencing, considerably less proteins were amenable to this type of analysis. It was because of this continuing decline in successfully obtaining sequences that paralleled each step in the sequential extraction that none of the sequential borate proteins were selected for sequencing.

Homology Searches
The identities given for each sequence similarity listed in Table II can be defined as the percentage of amino acid matches for that query sequence against a sequence found in a partic-ular data base. It should be noted that this figure does not take into account any conservative substitution. Throughout this study there have been numerous examples of a single sequence that was either present at more than one molecular weight within the same extract, or was present in more than one extract. It was therefore general practice to terminate sequence runs after the first four or five sequencing reaction cycles if this was observed. However, it is clear that microheterogeneity is occasionally seen between almost identical sequences, especially when longer sequences were obtained.
In the case of Arabidopsis, the classic model plant, which is a Brassica (an order that also includes plants such as oil seed rape and cauliflower), the proteins that could be identified were a subtilisin-like protease, ␤-1,3-glucanase, two xyloglucan endotransglycosidases, an extracellular ribonuclease, cellulase, a carrot-like glycoprotein, an Arabidopsis hypothetical protein, and an expansin. Those that showed lower sequence similarities, with less than 65% identities, were a cytochrome P-450, triose-phosphate isomerase, and a mouse T-cell receptor. Nineteen proteins did not show any similarity to any member of the data bases searched.
The member of the Umbelliferae, carrot, was included because it has been used to mimic elongation growth and embryogenesis. Only one of the 10 sequences obtained from the carrot wall fractions bore any similarity to data base sequences, unlikely as it may seem, which corresponded to an unknown genomic DNA sequence from human.
The legume, French bean, has been used to model both differentiation and elicitor-induced pathogen stress (2). Identified wall proteins include extensin and a hybrid proline-rich, cysteine-rich chitin-binding protein (2). The bean culture was the only one in this study that led to the identification of hydroxyproline-rich glycoproteins. There were also two proteins that had a degree of similarity to two bacterial proteins, one of which was an iron hydrogenase. Thirteen of the French bean proteins were unlike any members present within the data bases searched.
For the two model solanaceous species, tobacco and tomato, there were several proteins, common to both species, that could be identified in similarity searches and based on these results may therefore have similar functions. For example, both species appeared to have forms of osmotin, chitinase, and pathogen-related proteins: tobacco P7 curled leaf protein in the case of tomato extracts and tobacco protein P10 in the case of the tobacco extracts. The tomato extracts were also found to contain proteins that bore similarity to other known extracellular proteins, which were an extracellular ribonuclease, ␤-1,3-glucanase, subtilisin-like protease, ␤-exoglucanase, and a peroxidase. One of the other tomato proteins that was found to have an analogue within the data bases was a human IgG heavy chain fragment. Altogether, 18 of the tomato proteins could not be identified by data base searches. Whereas the tomato fractions only contained a single form of chitinase, similar to one from French bean, the tobacco extracts were found to have two forms, one resembling a French bean chitinase and the other a tobacco chitinase. The tobacco extracts also contained a protein that bore similarity to an integral stone protein associated with the human urinary tract. Altogether, 15 of the tobacco proteins could not be identified by data base searches.

DISCUSSION
The proteins of the plant extracellular matrix comprise a subset that contains both structural proteins and enzymes. Many of these have been identified at the protein level, but a large number have also been characterized from gene sequences on the basis of a repeat motif or a secretory leader sequence. In comparison to present knowledge of the proteins of some metabolic pathways such as the Calvin cycle, shikimic acid, and lignin biosynthetic pathways, the components of which have been completely cloned, the proteins and their cognate genes of the extracellular matrix are poorly characterized. This is probably due in part to the relative inaccessibility of wall proteins in the plant. Additionally, differentiation of the  cells, which in part resides in profound changes in wall structure, and the extent to which individual proteins are immobilized within these structures increases complexity. One ap-proach to increase knowledge of the range of wall proteins is to use tissue-cultured cells, which can be grown in bulk to allow characterization of proteins of walls that resemble primary walls. These cultures can also be stimulated to mimic developmental processes such as elongation growth and embryogenesis in carrot cells or xylogenesis and secondary wall formation as in Zinnia or French bean (24). A wide range of cells have also been used to mimic aspects of pathogen stress using fungal elicitors. A distinct advantage of cell suspensions is that they can be subjected to washing regimes that elute non-covalently bound proteins without disrupting the cells. Our studies demonstrate, by comparing the plant species, that (i) patterns of proteins revealed by SDS-PAGE are strikingly different, (ii) a large number of these proteins that were sequenced cannot be identified by data base searches, and (iii) the major proteins that can be identified belong to very different classes of proteins. It appears that aspects of speciation resides in the complement of extracellular and cell wall proteins. Although it is difficult to discriminate between an extracellular protein and a cell wall protein in planta, this is not the case with suspensioncultured cells and is exemplified since most of the protein sequences found within the culture filtrates were absent from the subsequent salt-eluted extracts. Moreover, the culture filtrate profiles for each individual species was different to any of their respective wall extracts (Fig. 1, A-E).
The use of tissue cultures to model structures and processes in the whole plant has been validated by a large number of studies. These show that purification of proteins and cloning of cDNAs that originate through this route give rise to antibodies and cDNAs that can be used on the whole plant and locate to the predicted tissues and cells. Examples include structural proteins such as glycine-rich proteins (25), hydroxyproline-rich glycoproteins (5), proline-rich proteins (26), and enzymes such as peroxidases (23), chitinases, and laccases (27). Analyses of the sequences generated in this study validate the methodology, since many are known wall proteins and the cells remain intact. Only two potential cytoplasmic proteins have been revealed, a triose-phosphate isomerase and a P-450, of which their degree of identity was relatively low. Indeed, the types of extracellular proteins we have come across and identified through homology searches encompass the carbohydrate-modifying enzymes such as xyloglucan endotransglycosidases, cellulases, and glucanases; other examples of wall proteins are the expansin, peroxidase, protease, ribonuclease, chitinases, extensin, and proline-rich protein.
There are a number of reasons why certain known wall proteins may be absent from this study. The most conspicuous example is perhaps members of the hydroxyproline-rich glycoprotein family, only two of which were detected in the bean extracts. Their absence could be because they are present as minor components of the wall and consequently not detected here, or that they are only expressed in response to a particular environmental or developmental response that is not present within the culture regimes (28). The proteins could be N-terminally blocked, or it may also be that they are not readily extracted by these reagents, or that the more heavily glycosylated types are not stained with the Coomassie dye. It would seem that many of the enzyme activities associated with the wall are only known through function so may not yet be present within the data bases and are still waiting to be characterized by purification and subsequent sequencing.
For certain sequences within each species, there also existed a certain degree of heterogeneity in terms of the occasional amino acid substitution and also their appearance at different molecular weights. The former may be explained by the proteins originating from different genes and the latter by posttranslational mechanisms such as glycosylation.
Cloning of any of the protein sequences reported here will undoubtedly be accelerated by the ever increasing numbers of ESTs that are being characterized. One surprising outcome of this research is that the sequenced Arabidopsis proteins did not have a larger proportion of "hits" in the EST data base. This could be because the majority of ESTs do not represent fulllength cDNAs. Greater exploitation of the EST data base would thus require extensive internal amino acid sequence data to be generated if this were the case. It is envisaged that, within the foreseeable future, most if not all of the genes in several plants will be available in public access data bases. However, it has also been estimated that the function of at least 70% of the Arabidopsis genes alone, at present, cannot be identified by sequence homology to genes or proteins that have already been designated a function (7,8). Therefore, having prior knowledge relating to a gene, such as its expression pattern or the eventual location of the expressed protein, may help in elucidating function. This information should therefore complement new data from systematic DNA sequencing exercises in that it gives a direct localization of the protein identified in this way.