Crystal Structure of Ser-22/Ile-25 Form Crambin Confirms Solvent, Side Chain Substate Correlations*

It is not agreed that correlated positions of disordered protein side chains (substate correlations) can be deduced from diffraction data. The pure Ser-22/Ile-25 (SI form) crambin crystal structure confirms correlations deduced for the natural, mixed sequence form of crambin crystals. Physical separation of the mixed form into pure SI form and Pro-22/Leu-25 (PL form) crambin and the PL form crystal structure determination (Ya-mano, A., and Teeter, M. M. (1994) J. Biol. Chem. 269, 13956–13965) support the proposed (Teeter, M. M., Roe, S. M., and Heo, N. H. (1993) J. Mol. Biol. 230, 292–311) correlation model. Electron density of mixed form crambin crystals shows four possible pairs of side chain conformations for heterogeneous residue 22 and nearby Tyr-29 (2 2 (cid:53) 4, two conformations for each of two side chains). One combination can be eliminated because of short van der Waals’ contacts. However, only two alternates have been postulated to exist in mixed form crambin: Pro-22/Tyr-29A and Ser-22/Tyr-29B. In crystals of the PL form, Pro-22 and Tyr-29A are found to be in direct van der Waals’ contact (Yamano, A., and Teeter, M. M. (1994)

Motion correlated over 5-8 Å (liquid-like movement) has been shown by the non-Bragg technique of x-ray diffuse scattering to be important in insulin and lysozyme crystals (1,2). State of the art molecular dynamics methods cannot model such correlations (3), perhaps because of inadequate sampling of conformational substates (4). Multiple substates of nearly equal energy are also proposed for myoglobin based on spectro-scopic evidence (5-7), but spectroscopy is not well suited to elucidate the nature of these substates. Neither is NMR, unless extremely tight distance restraints are used (8).
Diffraction from a crystal is averaged over many unit cells and over the time spent on data collection. It is generally believed that this averaging precludes extracting dynamic information, such as occurrence of multiple substate correlations from an x-ray structure. However, nonrandom correlations will contribute to Bragg reflections. Given diffraction data beyond 1.4 Å (9), the correlations can be modeled as substate disorder and provide insight into protein dynamics. If one could physically separate the substates and study each separately, one could prove such correlations exist and derive the rules for the correlation.
Crambin presents an excellent system for such an experiment. Crambin from the natural source contains two sequence isomers in a 3:2 ratio (10, 11), the so-called mixed form of crambin. The major isomer has Pro and Leu at positions 22 and 25, respectively (the PL form); 1 the minor isomer has Ser and Ile at the same positions (the SI form). In the mixed form crystal structure, side chain electron densities for heterogeneous residues are superimposed (Pro and Ser at residue 22 and Leu and Ile at residue 25). The Tyr-29 side chain from a 2 1 -screw axis-related molecule has close contacts with the Pro or Ser residue and adopts two conformations. A proposed correlation of the Tyr-29 conformation with the identity of the amino acid at residue 22 (12) has been supported by the PL form structure (13). Now the second or SI form of crambin has been purified by fast protein liquid chromatography and crystallized. It establishes the side chain correlations definitively and establishes associated solvent interactions.
In this paper, first the proposed mixed form protein networks are extended to water disorder using stereochemical "rules," such as van der Waals' contacts and hydrogen bonding. Second, these postulated networks are compared with the crystal structures of the physically separated pure forms of crambin: the PL form structure and the newly determined SI form structure. These results establish that the x-ray structure of the mixed form of crambin can elucidate substate spatial correlations between side chains and solvent, as proven by the pure form structures.
Alternative conformations at disordered residues and water molecules are designated by attaching A and B to the residue number. Such disordered conformations are often correlated with neighboring residue disorder through space and may represent conformational substates of the protein. For example, Ser-22A, Tyr-29A, and Wat-132A represent one disordered substate correlated through space with the alternates Ser-22B, Tyr-29B, and Wat-132B.
Crambin was purified to a single sequence form (13), and crystals of the SI form were grown by vapor diffusion techniques (14). Conditions were similar to those previously used (15) but with an initial reservoir concentration of 50% ethanol. In contrast to other forms, seeding by methods such as the streak seeding technique (16) was essential to nucleate crystal growth. Here a submicroscopic mixed form crystal served as the seed crystal, and small crystals appeared along the streak * This work was supported by National Science Foundation Grants DMB 89-04337 and MCB-9219857 (to M. M. T.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The atomic coordinates (code 1abl) and structure factors (code 1ablsf) have been deposited in the Protein Data Bank, Brookhaven National Laboratory, Upton, NY.
Diffraction data were collected to 0.89 Å resolution on a Rigaku AFC5 four circle diffractometer on a Rigaku RU-200 rotating anode generator. The crystal was flash cooled (17) to 150 K with a Molecular Structure Corporation rigid tube low temperature device. Refinement consisted of PROLSQ restrained least squares (18) alternating with interactive rebuilding using the program FRODO (19) on an Evans & Sutherland PS390. The initial model, which was the mixed form structure at 130 K without side chains for residues 22 and 25 but including hydrogen, was first refined with isotropic temperature factors against 1.5 Å data. Hydrogens were refined, because it is difficult to fix or ride them in PROLSQ. The resolution was extended to 0.89 Å in three resolution steps (1.2, 1.0, and 0.89 Å). Three-parameter anisotropic temperature factors (20) were introduced after convergence with isotropic refinement. 95 cycles of PROLSQ refinement brought the standard Rfactor down to 14.7% (with R err (⌺F o /⌺F o ) of 9.5%). The final model has 495 heavy atoms (349 protein atoms, 140 water sites, and 2 ethanol sites) and 429 hydrogen atoms, for a total of 824 atoms. Table I summarizes refinement statistics for  the SI form structure, and Table II summarizes the agreement with stereochemical restraints. Errors are estimated from a Luzzati plot (21) to be about 0.08 Å for the SI form, 0.06 for the PL form, and 0.06 for the mixed form crambin (true from full matrix refinement of the mixed form is 0.022 Å) (22).
In the SI form, seven residues (15.2%) have multiple conformations. This is less than the eight residues in the PL form (17.4%) and considerably less than the mixed form (28.3%), where sequence heterogeneity plays a major role.
The overall structure of the SI form of crambin ( Fig. 1) is very similar to that of the PL form (at 150 K (13)) and the mixed form (at 293 K (10) and at 130 K (12)). The largest structural differences might be expected at the turn from residues 19 -22, because Ser-22 is more flexible than Pro. However, the rms deviation is only 0.056 Å between the SI and PL forms. Based on this analysis, one would predict for the pure form structures that waters associated with the missing form would have altered occupancies. Key would be the weak density sites Wat-132A/B. They should be considerably stronger in the SI form but absent from the PL form. Indeed in the mixed form the sum of occupancy and B value average (͗B͘) for these waters are 0.4 and 11, whereas for the SI structure, the occupancy sum is 0.8 and ͗B͘ is 3.1. Fig. 3 shows the electron density and atomic model of the pure PL form structure at the same region that is shown in Fig.  2. The electron density is consistent with the elimination of the Ser and Ile side chains. The density at residue 22 matches Pro. The phenol ring of residue 29 takes the A conformation and Tyr-29A O makes allowed van der Waals' contacts with Pro-22 C ␦ and C ␥ . Water sites Wat-132A and Wat-132B are absent in this structure. The water-protein conformations perfectly match the green network in Fig. 2. The rms deviation from the mixed form structure is 0.065 Å over Tyr-29A, Pro-22, Wat-47, Wat-82, and Wat-182A. Fig. 4 shows the electron density and the atomic model for the SI form structure. Residue 22 electron density is inter-  a DINC is the change in the minimum van der Waals' contact distance.
b The weight for the structure factors in refinement (the "target" of ͉F o  Ϫ ͉F c ) was modeled by the function wt ϭ (1/) 2 preted as a Ser with disordered O ␥ , and no Pro is present. Tyr-29 takes the Tyr-29B conformation, and waters 132A/132B are enhanced as predicted. This structure is nearly identical to the red network in Fig. 2, except for Wat-182B. The rms deviation between this and the mixed form structure is 0.267 Å for Tyr-29B, Ser-22A/B, Wat-47, Wat-132A/B, and Wat-182B. An additional water site (Wat-182C) could be modeled in Fig. 2 (elongated density on 182A), because an additional water site is visible from the Ser/Ile structure (Wat-182A). From the above comparisons, one can conclude that interpenetrating disorder networks can be separated by optimizing van der Waals' contacts and hydrogen bonds. Because networks 2Ј and 3Ј account for all the electron density in the mixed form crystal, these are the only networks needed to account for the mixed form disorder.
Why is the network 1Ј not present in nature? Stereochemical requirements alone cannot exclude this possibility, because it neither violates van der Waals' contact limits nor has inappropriate hydrogen bonds. However, if the phenol ring of residue 29 took the Tyr-29A conformation and the side chain of residue 22 were Ser, there would be a large vacancy around Wat-132A, Wat-132B, and Pro-22 C ␦ . The potential empty space is eliminated by the spatial correlation among side chains and water molecules. In other words, space or vacuum is not allowed at a protein surface, probably because it is energetically unfavorable.
Further, in the SI form structure, this space filling can be seen from the alternate water conformations identified in electron density maps. Wat-182A is shifted downward (Fig. 4) to fill the empty space created by the absence of the Tyr-29A conformation. Another water site (182B) alternates with this site 2.31 Å away and hydrogen bonds to Wat-132. Wat-132A/B disorder appears for similar reasons. Both alternate pairs fill the available space and optimize packing and hydrogen bonding.
From these disordered water molecules, the importance of solvent for protein flexibility is evident. The full rationalization of the correlations derived from Fig. 2 must involve solventmediated interactions.
Proposed disorder networks in the mixed form crambin are extended to solvent and confirmed by the pure Pro-22/Leu-25 (13) and Ser-22/Ile-25 forms of crambin. Here the two disordered forms resulting from sequence differences were physi- Pure Crambin Confirms Solvent-mediated Substates 9599 cally separated by fast protein liquid chromatography, and each was crystallized. The spatial correlations implied from the mixed form structure were proven by examining both protein and water from the two pure form structures. Water was critical for this confirmation. Derived rules for correlations provide insight into the structure and dynamics of proteins in general. In this paper, we have proven that correlated conformations obey simple stereochemical rules and have alternates that fill space. The same logic used here should apply to assigning multiple conformational substates where sequence differences are not involved (13,24). These results demonstrate that dynamic correlation does occur and can be deduced from an x-ray structure at 1 Å resolution using fundamental principles. Such elucidation is important for understanding the mechanisms of such important proteins as lysozyme and myoglobin.