Crystal Structure of Polygalacturonase from Erwinia carotovora ssp. carotovora *

The crystal structure of the 40-kDa endo-polygalacturonase from Erwinia carotovora ssp.carotovora was solved by multiple isomorphous replacement and refined at 1.9 Å to a conventional crystallographicR-factor of 0.198 and R free of 0.239. This is the first structure of a polygalacturonase and comprises a 10 turn right-handed parallel β-helix domain with two loop regions forming a “tunnel like” substrate-binding cleft. Sequence conservation indicates that the active site of polygalacturonase is between these two loop regions, and comparison of the structure of polygalacturonase with that of rhamnogalacturonase A fromAspergillus aculeatus enables two conserved aspartates, presumed to be catalytic residues, to be identified. An adjacent histidine, in accord with biochemical results, is also seen. A similarity in overall electrostatic properties of the substrate-binding clefts of polygalacturonase and pectate lyase, which bind and cleave the same substrate, polygalacturonic acid, is also revealed.

Polygalacturonases are classified as family 28 glycosyl hydrolases (1); they hydrolyze the ␣-1,4 glycosidic bond in polygalacturonic acid or in the homogalacturonan regions of pectin (2)(3)(4). Plant polygalacturonases are important in fruit ripening and senescence, whereas the microbial enzymes are concerned with pathogen attack. Erwinia carotovora ssp. carotovora, which produces the endo-polygalacturonase PehA, is a major causal agent of soft rot in cultivated plants (5). The amino acid sequence of PehA has 20% identity with that of rhamnogalacturonase A from Aspergillus aculeatus (RGase A) whose structure has recently been solved (6). RGase A cleaves the ␣-1,2 glycosidic bond between alternating rhamnose and galactose in the rhamnogalacturonan regions of pectin. Glycosidases either retain or invert the configuration of the anomeric carbon of the cleaved bond (7)(8)(9), and polygalacturonases and rhamnogalacturonases have been shown to be inverting enzymes (10,11). The separation of the two acid residues involved in catalysis is on average 9.0 -10.0 Å in inverting enzymes, a spacing argued to allow both substrate and a water molecule to fit between the general acid and the general base (12,13). The architectures and mechanism of RGase A, PehA, and all family 28 glycosidases are expected to be conserved, whereas specificity has diverged. Comparison of the structures of PehA and RGase A is therefore expected to aid the identification of the catalytic amino acids and reveal the molecular details responsible for their differences in specificity. Several experiments suggest the presence of a histidine close to the active center of PehA (14,15), but there is no histidine in the active site of RGase A. Phage P22 tailspike protein with endorhamnosidase activity may also be related to these enzymes (16), and active site residues have been proposed for tailspike protein based on structural studies (17,18). There is also functional similarity between PehA and pectate lyase, which cleaves the same substrate but by a calcium-dependent ␤-elimination reaction. This functional similarity might be expected to result in similar size, shape, and surface charge of the substrate-binding clefts, differences might reflect their detailed specificity for degree or pattern of methylation or for branches in the pectin molecule. The structures of three pectate lyases (19 -21) and two pectin lyases (22,23) have been determined. No structure of polygalacturonase has been reported to date; however, three successful crystallizations have been reported (24 -26). We now report the crystal structure of endo-polygalacturonase from E. carotovora ssp. carotovora refined at 1.9 Å resolution, suggest the identity of the catalytic amino acids, and reveal similarities in the binding clefts of polygalacturonase and pectate lyase.

MATERIALS AND METHODS
Enzyme Expression, Purification, and Crystallization-The endo-polygalacturonase gene from E. carotovora ssp. carotovora (PehA) was expressed in Bacillus subtilis (27). The 26-amino acid signal sequence is cleaved, and the mature polygalacturonase produced (PehA) comprised 376 amino acids (28). Purification and crystallization of PehA have been described in detail elsewhere (26). Single crystals were grown by the hanging drop method using 18% polyethylene glycol 6000, 0.2 M magnesium acetate, 0.1 M sodium cacodylate at pH 6.5 as the reservoir solution, and the protein drop comprised 4 l of protein at 5 mg/ml plus 2 l of reservoir. Crystals were harvested into mother liquor of the same composition as reservoir except that the polyethylene glycol 6000 concentration was increased to 20%. The crystals are monoclinic, space group C2, with a ϭ 81.3 Å, b ϭ 53.0 Å, c ϭ 103.1 Å, and ␤ ϭ 112.6 o . There is a single molecule in the asymmetric unit.
Data Collection-Because the PehA crystals were susceptible to radiation damage, all data were collected using crystals cooled to 100 K in the nitrogen stream of an Oxford Cryosystems Cryocooler. The cryoprotectant was mother liquor supplemented with 20% glycerol and 10% 2-propanol. Native data were collected using the EMBL X31 beam line at the DORIS storage ring, DESY (ϭ1.000Å) equipped with a MAR300 image plate. Data from the chloroplatinite derivative prepared at pH 6.5 were collected using beam line DW21b at LURE (Orsay, France) ( ϭ 1.016 Å) equipped with a MAR300. Lutetium chloride data were also collected at LURE but using DW32 ( ϭ 0.970Å) with MAR345. Data from the chloroplatinite derivative prepared at pH 7.5 and from the platimum orange (chloro(2, 2Ј:6Ј, 2Љterpyridine)platinum(II)chloride) derivative were collected using a MacScience (Siemens) rotating anode generator with double mirrors and nickel filter ( ϭ 1.5418Å) equipped * This work was supported by the Biotechnology and Biological Sciences Research Council of the United Kingdom. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The with DIP1030 image plate. The five data sets were reduced using DENZO and SCALEPACK 1 and the CCP4 program suite (29).
Structure Solution-Three platinum derivatives were prepared by soaking native crystals under the following conditions: 25 mM chloroplatinite for 16 h at pH 6.5, 25 mM chloroplatinite for 1.5 h at pH 7.5 using 0.1 M HEPES buffer in place of cacodylate, and 3 mM platinum orange for 65 h at pH 6.5. The fourth derivative was prepared by soaking a native crystal in 20 mM lutetium chloride for 9 h at pH 6.5. The cryo-solutions also contained the heavy atoms at the same concentration as in the soaks. All subsequent calculations were made using the CCP4 suite unless otherwise stated. Difference Pattersons were successfully interpreted for the three platinum derivatives. The chloroplatinite derivative at pH 6.5 had two sites with a consistent crossvector. The chloroplatinite soak at pH 7.5 was made with the intention of increasing the occupancy of these sites, but the result was binding to a different site. The platinum orange site is again different. The two lutetium sites were found using cross-phased difference Fouriers. ML-PHARE (30) was used for the refinement of the heavy atom sites and for the calculation of phases. The phases were subsequently improved used the program SOLOMON (31).
Model Building and Refinement-Model building using "O" (32) running on a Silicon Graphics Indigo 2 Extreme was facilitated by the availability of the coordinates of RGase A. A complete model of the protein was built using iterative rounds of crystallographic refinement with XPLOR (33,34) and ARP (35) with intervening manual revisions of the structure. 5% of the diffraction data were used for cross-validation purposes (36). ARP was not always successful in reducing the R-factor and R free values but nevertheless gave the map with clearest indication of how to rebuild. The final refinement used the XPLOR solvent correction terms input to REFMAC (37). Atomic positions and isotropic B-factors were refined. The model stereochemistry was evaluated using the programs PROCHECK (38) and WHATCHECK (39). 1 Z. Otwinowski, unpublished programs. The various crystallographic parameters are defined as follows: R merge (I), ⌺ ԽI 1 Ϫ ϽIϾԽ/⌺ I i , where I i is the intensity of the i-th observation, ϽIϾ is the mean intensity of the reflection, and the summation extends over all data. R iso , ⌺ԽF PH Ϫ F p Խ/⌺ԽF P Խ, the mean relative isomorphous difference between the native protein (F P ) and the derivative (F PH ) data; R Cullis , ⌺ԽԽF PH ϩ F P Խ Ϫ F H Խ/⌺ԽF PH Ϫ F P Խ, where F H is the calculated heavy atom structure factor contribution; phasing power, ϽF H Ͼ/ϽEϾ, where E is the root mean square lack of closure; R, ⌺ Խ F obs Ϫ F calc Խ / ⌺ ԽF obs Խ, where F obs and F calc represent respectively the observed and calculated structure factors.  (38) Ramachandran plot Residues in the most favoured regions (%) 86.7 Residues in additionally allowed regions (%) 13.0 Non-glycine residues in disallowed regions (%) 0.0 Overall G-factor ϩ0.10 Superimposition of Protein Structures-Superimposition of the structures of RGase A and of phage P22 tailspike protein on PehA was achieved using "O" (32).

RESULTS AND DISCUSSION
The results of the data collection, phasing, refinement, and the stereochemistry of the final model are presented in Table I. The final model comprises all 376 amino acids of mature PehA plus 295 water molecules. The stereochemistry as judged by PROCHECK is good, with an overall G-factor of ϩ0.1, compared with a typical value of Ϫ0.3 for refinement at this resolution. Additionally the water molecules have good hydrogen bonding patterns. The overall fold of PehA is related to that of RGase A but with different loop structures and extensions (Fig.  1, A and B). However, the structure of PehA could not be solved by molecular replacement using RGase A as the search model. The structure comprises 10 complete coils of right-handed parallel ␤-helix architecture plus an ␣-helix in an additional coil at the N-terminal end and an incomplete coil at the C-terminal end of the protein. Surprisingly, there is no polypeptide covering the C-terminal end of the parallel ␤-helix and no C-terminal extension. The last strand of parallel ␤-strand two (PB2) is important in stabilizing the PehA structure, which is necessary for secretion via the Out pathway (40). The B-factors, a measure of flexibility, are lowest for the central turns and increase toward the N-terminal and C-terminal ends of the parallel ␤-helix, the increase is most dramatic toward the uncapped C-terminal end. Superimposed on this general trend in Bfactors are spikes correlating with the long loops protruding from the parallel ␤-helix domain (Fig. 2). In contrast to the PehA structure, RGase A has an additional ␣-helix in the N-terminal extension and additional polypeptide chain at the C-terminal end of the parallel ␤-helix forming an antiparallel ␤-strand at the C-terminal end of parallel ␤-sheet 1 (PB1) followed by an extension that partially covers parallel ␤-sheet 3 (PB3). The loop structures are also different. Although the pectate and pectin lyases have three parallel ␤-sheets (Fig. 1C), PB1, PB2, and PB3, PehA and RGase A have an additional parallel ␤-sheet, PB1a (18). PehA has two disulfides, one be-  tween Cys 15 just prior to the first ␤-strand of PB2 and Cys 35 after the C-terminal end of the first ␣-helix. The second disulfide, Cys 89 -Cys 99 is in the long loop preceding strand two of PB1. There are two cis-peptides, Pro 116 and Ser 257 . The first is a perturbation of the usually regular ␣ L -conformation turn between PB2 and PB3, the second is in the loop preceding ␤-strand seven of PB1.
Alignment of 36 polygalacturonase sequences reveals four conserved regions: Asn 201 -Thr 202 -Asp 203 , Gly 222 -Asp 223 -Asp 224 , Gly 250 -His 251 -Gly 252 , and Arg 280 -Ile 281 -Lys 282 . These sequences cluster on or before ␤-strands 5, 6, 7, and 8 of PB1. Asn-Thr-Asp forms strand 5, and Arg-Ile-Lys forms strand 8 of PB1. Gly-Asp-Asp forms the turn before ␤-strand 6, and Gly-His-Gly forms the turn before ␤-strand 7 of PB1. This clustering is a clear indication of functional conservation and is strongly suggestive that the region on the surface of ␤-strands 5-8 of PB1 and the adjacent loops form the catalytic site (Fig.  1D). The location of this site in a pronounced cleft further strengthens this proposal.
The pronounced substrate-binding cleft is formed by two long loops that precede strands two and three of PB1 and three loops that follow strands seven, eight, and nine. Ser 257 , in cis-conformation in the loop preceding the seventh ␤-strand of PB1 may be important in maintaining the geometry of the substratebinding cleft. The cleft is at an angle to the axis of the parallel ␤-helix as a result of the long loops toward the N-terminal end preceding PB1 and long loops toward the C-terminal end of the parallel ␤-helix occurring after PB1. This cleft is more "tunnellike" than that of the pectate lyases, pectin lyases, or RGase A and is characterized by the presence of conserved aspartates and lysines at its center. Arginines and lysines also line the sides of the substrate-binding cleft, making the overall electrostatic potential in the substrate-binding cleft positive (Fig. 3). In this respect PehA is similar to B. subtilis pectate lyase with calcium bound and different from Aspergillus niger pectin lyase and RGase A.
RGase A may be superimposed on PehA with a root mean square displacement of 1.7 Å for 280 equivalent ␣-carbon at-oms. Comparison of the superimposed PehA and RGase A structures reveals that in the active site region a number of aspartates and lysines are conserved (Fig. 4). ). The active site is between two lysine-rich loop regions. The positive potential is similar to that seen in BsPel with calcium bound (22) and dissimilar from that of pectin lyase (22) or RGase A (6). A. aculeatus rhamnogalacturonase A (B). This figure was produced using QUANTA (43) after superimposition of the two structures using O (32). Aspartates 202 (177) and 223 (197) are conserved between PehA and RGase A and are therefore proposed to be the catalytic aspartates. The numbers in parentheses refer to RGase A. Asp 205 is substituted by histidine in some polygalacturonases, and Asp 224 is substituted by a glutamate in RGase A (Glu 198 (Fig. 4).

FIG. 4. The active site clefts of E. carotovora polygalacturonase (A) and
Three carboxylates have been proposed to be involved in the activity of P22 tailspike protein (17, 18); Asp 395 and Glu 359 are proposed to be general bases, and Asp 392 is proposed to be the general acid. Steinbacker et al. also argue that tailspike is an inverting glycosidase. It has previously been argued that to accommodate a water molecule and substrate between the catalytic carboxylates the distance between the carboxylates must be 9.0 -10.0 Å (12,13). However, the proposed general bases Glu 359 and Asp 395 are 8.2 and 5.8 Å, respectively, from the proposed general acid, Asp 392 . Protonation must occur in the plane defined by the nucleophilic water, the anomeric carbon, and the glycosidic oxygen. However, it may be possible for protonation of the glycosidic oxygen and nucleophilic attack at the anomeric carbon to be from the same side of the bond in ␣-linked polysaccharides rather than opposite sides with a resulting shorter separation of carboxylates. If so, this would help to explain the short spacing between the conserved carboxylates in PehA and RGase A.
The ␣-carbon positions of Asp 392 (proposed general acid) and Asp 395 of tailspike protein can be superimposed on Asp 202 and Asp 205 of PehA and the overall structural superposition is then 2.0 Å for 178 equivalent ␣-carbons. The side chain of Asp 202 superimposes rather well but not that of 205, which is in any case not conserved in all polygalacturonases. It is clear that the active site geometry and distribution of charge in tailspike protein is different from that of PehA and RGase A.
The structural results therefore suggest that aspartates 202 and 223 are directly involved in the mechanism of PehA and that the carboxylate at position 224 is probably also involved. One of the two conserved water molecules bound to these carboxylates may be the nucleophilic water. A more detailed assignment of the role of these carboxylates and water molecules awaits determination of the structure of a nonproductive enzyme substrate complex.