Proteins at Work

C-terminal Src kinase (Csk) phosphorylates and down-regulates the Src family tyrosine kinases (SFKs). Crystallographic studies of Csk found an unusual arrangement of the SH2 and SH3 regulatory domains about the kinase core, forming a compact structure. However, recent structural studies of mutant Csk in the presence of an inhibitor indicate that the enzyme accesses an expanded structure. To investigate whether wt-Csk may also access open conformations we applied small angle x-ray scattering (SAXS). We find wt-Csk frequently occupies an extended conformation where the regulatory domains are removed from the kinase core. In addition, all-atom structure-based simulations indicate Csk occupies two free energy basins. These basins correspond to ensembles of distinct global conformations of Csk: a compact structure and an extended structure. The transitions between these structures are entropically driven and accessible via thermal fluctuations that break local interactions. We further characterized the ensemble by generating theoretical scattering curves for mixed populations of conformations from both basins and compared the predicted scattering curves to the experimental profile. This population-combination analysis is more consistent with the experimental data than any rigid model. It suggests that Csk adopts a broad ensemble of conformations in solution, populating extended conformations not observed in the crystal structure that may play an important role in the regulation of Csk. The methodology developed here is broadly applicable to biological macromolecules and will provide useful information about what ensembles of conformations are consistent with the experimental data as well as the ubiquitous dynamic reversible assembly processes inherent in biology.

The Src family of tyrosine kinases (SFK) 2 are modular signaling enzymes involved in the control of cellular growth and dif-ferentiation (1). The members of this family contain three important structural domains: a C-terminal tyrosine kinase domain, comprised of a small and large lobe, preceded by the noncatalytic regulatory SH2 and SH3 domains (2). SFKs also contain a unique region and an N-terminal myristic acid for membrane association. The activity of c-Src, the prototype for the SFKs, is up-regulated by phosphorylation of Tyr-416 (in the activation loop of the kinase domain) and dephosphorylation of Tyr-527 (in the C-terminal tail) ( Fig. 1A) (3,4). While phosphorylation of the activation loop is autocatalytic, phosphorylation of the C-terminal tail is inhibitory and requires Csk (5). Like the SFKs, Csk contains a C-terminal tyrosine kinase domain and N-terminal SH2 and SH3 domains (Fig. 1A) (6). Unlike SFKs, Csk lacks an inhibitory C-terminal tail, is not regulated through phosphorylation, and does not possess an N-terminal sequence for membrane localization. Instead, Csk is constitutively active and increased activity is coupled to its association with membrane adaptor proteins, for example Csk-binding protein (Cbp) (7). Cbp localizes Csk to the membrane where phosphorylation of SFKs by Csk occurs and up-regulates Csk activity through engagement of the Csk SH2 domain (8). How the latter occurs is not well understood. Hydrogen-deuterium exchange studies coupled with mass spectrometric analyses indicate cross-talk between the SH2 and kinase domains within Csk. This cross-talk is activated by binding to Cbp and/or reduction of a disulfide bond distal to the active site within the kinase domain (9 -11). In addition to this atypical mode of regulation, Csk also differs from many tyrosine kinases in its substrate specificity. Broad substrate specificity is common for tyrosine kinases, whereas Csk displays high specificity for SFKs (12,13).
Although c-Src and Csk share considerable sequence homology in the kinase, SH2 and SH3 domains, they differ substantially in their tertiary structures. The inactive form of c-Src (Tyr-527 phosphorylated, Tyr-416 unphosphorylated) is compact, with the SH2 and SH3 domains interacting with the kinase domain ( Fig. 1B) (14 -16). In c-Src, dephosphorylation of Tyr-527 results in dissociation of the C-terminal tail from the SH2 domain (Fig. 1B), which leads to the activation of c-Src (17). Thus, activation of c-Src involves reorientation of the regulatory domains and the loss of most interdomain contacts. For Csk, the regulatory domains adopt a different conformation relative to the kinase domain where the SH2 and SH3 domains make contacts with the small lobe of the kinase domain (Fig.  1B). This configuration is maintained by two linker segments that connect the three domains.
Crystallographic analysis of Csk and c-Src suggest that tyrosine kinases can position regulatory domains about a central kinase core in multiple configurations (14 -18). These configurations are important for controlling catalytic activity of the kinase domain. For Csk, interactions between the SH2 domain and small lobe of the kinase domain, which are facilitated by the SH2-kinase linker, are necessary for efficient catalysis (10,19). When the regulatory domains are removed activity decreases by about two orders of magnitude (20,21). In contrast, the SH2 and SH3 domains play a repressive role in controlling the tyrosine kinase activity of c-Src through interaction with the C-terminal tail and the SH2 kinase linker (15). These studies indicate that for tyrosine kinases configurations of the regulatory domains relative to the kinase domain is of functional importance. Whereas the crystallographic data may lead one to believe that domain movement in c-Src is highly cooperative and operates in a switch-like manner, recent studies demonstrate that the regulatory mechanism is more complex. For example, whereas SAXS studies reveal that activation of c-Src is coupled to an increase in the radius of gyration, modeling studies indicate that an open form akin to that in the crystal structure represents only a minor fraction (15%) of the overall population of molecules (22). Therefore, the data are consistent with the possibility that c-Src may adopt a broad ensemble of solution conformations upon activation, where the catalytically active open conformation is only one of these forms. In the case of Csk, the different arrangement of domains raises the question of whether this unique form is the major Csk conformation or whether other species are populated under solution conditions.
Here, we employed SAXS measurements to describe the solution structure of Csk. Interestingly, we find that Csk samples an extended conformation relative to the crystal structure. Rigid body modeling indicates that the SH2 and SH3 domains adopt positions where they are not tightly associated with the kinase core. Recent studies have shown the viability of molecular dynamics simulations to provide a structural interpreta-tion of SAXS data (23)(24)(25). In this study, molecular dynamics simulations that employ an all-atom structure-based model (26) are applied in two ways to characterize the ensemble of CSK conformations present in solution: (a) comparing the structures obtained from the simulation with the experimental SAXS data and (b) reconstructing the distribution of states in solution using a system constrained mainly by the entropy. These studies taken together with solution studies of Src (22) suggest that divergent configurations resulting from global motions of multi-domain protein kinases may be common and that solution approaches are necessary to identify and characterize these conformations.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-The full-length Csk protein was expressed in Escherichia coli strain BL21(DE3) (27), and purified by Ni 2ϩ affinity chromatography (28). The purified full-length enzyme was dialyzed against 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 5 mM DTT with 15% (v/v) glycerol and then concentrated to 107 M and stored at Ϫ80°C.
Small Angle X-ray Scattering Measurements-SAXS data were collected at 25°C at beamline 4 -2 of the Stanford Synchrotron Radiation Laboratory. Scattering was independent of protein concentration (0.5-2.5 mg/ml) indicating that interparticle ordering and aggregation are negligible. Data shown were obtained with 1 mg/ml protein (data not shown for other concentrations).
Predicted Scattering for Crystal Structures-Theoretical scattering curves were generated from the available structure of Csk (1K9A) (18) with the CRYSOL software package (29). Default parameters were used with the following exceptions: maximum order of harmonics ϭ 50, order of Fibonacci grid ϭ 18, number of points ϭ 100, and maximum s-value ϭ 0.18. Here, we measure the goodness of fit of each structure by calculating 2 , which is defined in Equation 1, where N q is the number of data points in the scattering curve, I(q) is the SAXS intensity and (q) is the experimental error of I exp (q). Small Angle Scattering Data Analysis-The data were converted from a TIFF image to I(q) versus q using SasTool. q is the angular dependence of the scattering profile, which can be expressed as q ϭ 4(sin/), where is half the scattering angle and is the wavelength of scattered x-rays. The radius of gyration, R g , was determined by Guinier analysis. The goodness of the linear fit of the data in the low q range indicated no significant nonspecific aggregation of the protein.
Rigid Body Modeling-Rigid body modeling was performed using SASREF (30). SASREF employs a simulated annealing protocol to generate mechanically allowed, alternative (from the PDB structure) conformations of multi-domain proteins followed by generation of predictive scattering profiles using CRYSOL. The scattering profiles generated from rotation and translation of individual domains are compared with the experimental data until a best-fit is determined by 2 analysis. Csk was separated into 4 rigid bodies: the SH3 domain (residues 4 -71), the SH2 domain (residues 82-174), the small lobe of the kinase domain (residues 191-269), and the large lobe of the kinase domain (residues 270 -450). The structures associated with the SH3-SH2 (residues 72-81) and the SH2-kinase (residues 175-190) linkers were replaced with the maximum finite lengths accessible for these tethers to allow unrestricted movements and rotations of the structural domains. The length of each tether was 2.5 Å times the number of residues in the linker. The small lobe and large lobe of the kinase domain were distanced 5-10 Å, allowing for movement of the lobes.
Analysis of the Structural Ensemble via Molecular Dynamics Simulations-We employed molecular dynamics simulations to generate an ensemble of candidate structures (i.e. structures that may explain the scattering profile). From this ensemble, theoretical scattering profiles were generated, and these profiles were compared with the experimentally measured data.
We used an all-atom structure-based forcefield for the individual domains but allowed only steric interactions among them. These models are based on the concepts of energy landscape theory (31,32), have a low computational cost, and are in good agreement with experiments (33)(34)(35)(36). The functional form of the all-atom structure-based forcefield is in Equation 2, r 0 , 0 , 0 , and 0 are given the values found in the crystal structure (PDB code: 1K9A) (18) and nc ϭ 2.5 Å. ⑀ r ϭ 100/Å 2 , ⑀ ϭ 20/rad 2 , ⑀ ϭ 10/rad 2 , and ⑀ nc ϭ 0.01. Each native atom-atom contact interacts via a Leonard-Jones 12-6 interaction, where the energetic minimum corresponds to the native distance ij (as defined by the crystal structure). A contact is defined as any atom pair that is (a) separated by less than 6 Å, is (b) separated by at least 4 residues in sequence, and (c) has no atom between them (i.e. the "Shadow Algorithm" (37). ⑀ NC and NC define the excluded volume of each atom. Explicit representation of the atoms ensures that non-physical states, such as those with overlapping atoms or unrealistic bond angles, are disallowed. Contact and dihedral interactions were weighted as previously described (26). To facilitate rapid sampling of all possible domain configurations, we removed all stabilizing trans-domain interactions and gave no configurational bias to the non-rigid dihedral angles (i.e. dihedrals not restrained by orbital hybridization) in the linkers.
The forcefield files for Gromacs (38) were generated by an online resource (37). A timestep of 0.0005 time units was used and the simulation was coupled to a temperature bath via Langevin dynamics. The total simulation time was 20,000 time units, which corresponds to ϳ100 s (39). Similar to previous studies (22,23,39,40), we compare the structures in the simulated ensemble to the SAXS data by generating theoretical SAXS profiles for each structure. Here, we measure the goodness of fit of each structure by calculating 2 .

RESULTS
Solution Scattering of Csk-SAXS analysis provides information on the global size and shape of a protein in solution (40). The profile represents the average scattering from all conformational states weighted by the number of molecules in each of those conformational states. Here we employed SAXS to provide information on the solution size and shape of Csk and to define an ensemble of conformations for Csk in solution. Data were analyzed using the q range 0.0078 Å Ϫ1 to 0.1809 Å Ϫ1 . The SAXS curve (I(q) versus q) for wild-type Csk is shown in Fig. 2.
Predicted Solution Scattering of the Csk Crystal Structure-To compare the experimentally determined solution scattering to the Csk crystal structure, predictive solution scattering data were generated from the Csk crystal structure using CRYSOL. The theoretical scattering profile of the crystal structure differs extensively from the experimental SAXS data ( 2 ϭ 9.9), especially in the low q region, which is dominated by the overall shape of the molecule ( Fig. 2A, inset). The experimental R g value of 38.3 Ϯ 0.1 Å, determined from Guinier analysis, is considerably larger than the calculated values of 32.6 Å for the crystal structure. The calculated R g value is from CRYSOL, which accounts for the solvation shell. This difference in R g values indicates Csk adopts a larger structure in solution than expected from crystallographic analysis.
The higher intensity of the experimental data at low q indicates either the presence of oligomers or an extended monomer structure. Dimerization via the SH3 domains has been proposed based on size exclusion chromatography experiments (41). The crystal structure of a potential Csk dimer with association through an SH3-SH3 interface is presented in Fig. 2C. We generated a theoretical scattering curve for this structure and the data is compared with the experimental scattering in Fig. 2A. While the theoretical curve for the dimer has a lower systemic deviation ( 2 ϭ 6.8), it fits poorly to the experimental data. Additionally, the R g of 39.7 Å for the dimer indicates a it is larger than the scattering proteins. Therefore we investigated the possibility of conformational changes in the monomer.
Rigid Body Refinement of the Crystal Structure-To explore the possibility of an extended monomer we used rigid-body motions in SASREF coupled with CRYSOL analysis. For rigid body modeling, the domains of Csk were separated into the SH3 domain, the SH2 domain, the small lobe of the kinase domain, and the large lobe of the kinase domain. The SH3-SH2 linker (Lys-72-Pro-81) and SH2-kinase linker (Gly-175-Leu-190) were removed to facilitate movement of the domains about one another. The theoretical scattering curve for the best-fit model ( 2 ϭ 6.0) is presented in Fig. 2A along with the associated structure (Fig. 2D). The resulting structure is expanded in comparison to the crystal structure and the SH2 and SH3 domains are no longer in contact with the kinase domain. The rigid body model fits very well at q Յ 0.1 Å Ϫ1 , indicating an extended monomer likely represents the overall topology of Csk in solution. However, the high deviation from the predicted curves in the high q (Ͼ0.1 Å Ϫ1 ) suggests the possible presence of distinct subpopulations of Csk with different conformational properties on the native landscape (42). These populations may not be accessible by rigid body analysis because fluctuations within the independent domains themselves such as local unfolding may be occurring.
Flexible Body Refinement via Molecular Dynamics Simulations-Whereas rigid body modeling is commonly used to determine possible tertiary conformations that agree with SAXS data, proteins are not rigid bodies. Rather, protein domains are often connected through relatively flexible linker regions, which may possess a wide range of possible configurations. In addition, local breathing motions or unfolding within the domain may occur. An additional limitation of rigid body modeling is that the iterative process where each step of structural refinement is driven by the fit to the experimental data may result in trapping in local minima during simulation. To account for these possibilities, we utilized all-atom structurebased simulations to generate an ensemble (Fig. 3) of structures that spans many more degrees of freedom than is allowed in rigid body refinement (see "Experimental Procedures"). In addition, these simulations avoid trapping as they are generated independently of the experimental data.
Conventional molecular dynamics simulations (i.e. explicit solvent and explicit electrostatics) for proteins the size of Csk (ϳ450 residues) are limited to timescales on the order of nanoseconds, or possibly a microsecond. Since large-scale rearrangements of tertiary structure occur on much longer timescales, it is not possible to exhaustively search the available phase space using these forcefields, and the generated ensemble would therefore be heavily biased by the initial configuration. All atom structure-based simulations circumvent this limitation (26). In addition to structure-based simulations being grounded in the energy landscape theory of protein folding and function (43)(44)(45), the simplicity of the forcefield allows for much longer timescales to be simulated with minimal computational cost. The potential for structure-based forcefields for modeling purposes have recently been demonstrated through successes in quaternary structure prediction (44) and atomic modeling of cryo-em reconstructions (46).
We performed all atom structure-based simulations allowing freedom of movement of the protein. Native attractive interactions are included only for individual domains and domain-domain interactions arise only from the excluded volume. The results from this system provide the most entropically favorable domain configurations. The resulting ensemble of structures (40,000) were compared with the experimental scattering data. A plot of the free energy as a function of the R g and 2 for the generated ensemble is presented in Fig. 3A. Two free energy minima are evident in this plot, with basin 1 centered at an R g of ϳ33 Å and basin 2 centered at an R g of ϳ37 Å.
We first analyzed the resulting structures by calculating the overall distances between each domain of Csk (SH3, SH2, small lobe) and the large lobe. A plot of the distance of the individual domains to the center of the large lobe in the structures generated over time is given in Fig. 3B. The small lobe undergoes small changes in relative position in the resulting structures, whereas the SH2 and SH3 domains sample distances as large as 10 nm from the large lobe. These data indicate that the system is highly dynamic and that the regulatory domains are only transiently associated with the catalytic kinase core, as opposed to the tight association of these domains with the kinase core as seen in x-ray crystallography (18).
Representative structures of each basin appear in Fig. 4A. In basin 1 the small and large lobe of the kinase domain appear compact. The SH2 domain is proximal to the small lobe but not in intimate contact as in the crystal structure. The SH2-SH3 linker is unfolded and the SH3 domain is extended away from the rest of the molecule. In basin 2 the small lobe and large lobe appear in an open conformation and the SH2 and SH3 domains are both extended away from the kinase core. Overall the structures occupying basin 1 are more compact than those residing in basin 2. Notably, in basin 2 the SH2-kinase linker appears more extended overall. This linker is important in domain communication and catalytic activity (10).
Simulated Structures Suggest an Ensemble of Extended Structures in Solution-We further examined the fit of the basin structures to the experimental data by randomly selecting 20 representative structures from each basin and analyzing the fitted scattering profiles (Table 1 and Fig. 4B). Structures from basin 1 deviate from the experimental scattering in the low q, but fit well in the very high q. In basin 2 the scattering curve fits very well in the low q range, then deviates after q ϭ 0.06 Å Ϫ1 . The structures in the basins give a better description of the solution behavior of Csk (Fig. 4A) than that represented by the available crystal structure (Fig. 2B) and demonstrate, for the first time, that Csk adopts an alternative, extended structure unlike that previously reported. However, individual conformations within either basin cannot fully recapitulate the observed experimental scattering curve. The simulations indicate that conformations from both basins can be populated (Fig. 3), and enzymological evidence indicates multiple conformations in solution with distinct activities (47). Thus, given that the experimental SAXS profile represents the weighted average over all conformations sampled in solution we examined the possibility that combining conformations from these two basins may yield a better description of the experimental scattering curve. We randomly paired selected structures from basin 1 and basin 2. We then calculated theoretical SAXS profiles for linear combinations of these structures observed in the simulation in Equation 4, where I i (q) is the intensity profile and w i is the weight of structure i. We determined I combined (q) for all combinations of w i ϭ 0.1n, where n is an integer between 1 and 10. 2 was then calculated between each I combined (q) and the experimental profile. The lowest 2 along with the corresponding population for each pair appear in Table 1.
In each case the combination of basin structures improved upon the fit of the individual structures throughout the entire curve. Therefore, our population combination analysis finds that not only is Csk extended in solution but also that there are multiple, interconverting conformations in solution. All combinations of the random structures from the basin populations yield comparable scattering curves ( 2 values 2.9 -4.1), which are superior to the crystal structure ( 2 ϭ 9.9) and the rigid body model ( 2 ϭ 6.0). Representative scattering curves from one pair of structures appears in Fig. 4B. These results indicate the need for a mixed population of conformations to describe the experimental data. Taken together, these results indicate that Csk is a highly dynamic ensemble, comprised mostly of elongated conformations. It is likely that the ensemble average observed in SAXS experiments represents not only these conformations, but also other transiently populated stable and activated states.

SAXS Analysis Reveals an Elongated Structure in Solution-
The SAXS data indicates an extended monomer structure of Csk compared with the compact one observed in the crystal structure (18). Both the rigid body refinement and molecular dynamics simulation yield a rearrangement of domains relative to the crystal structure, where the domains extend away from one another. For the related tyrosine kinase Btk, recent SAXS studies show that the SH3, SH2, and kinase domains also experience an extended structure akin to that for Csk (48). Whereas inactive Src is globular, scattering studies have shown that the inactive form lacking C-tail phosphorylation samples an open conformation where SH2 and SH3 domain contacts with the kinase domain appear broken (22). Together, these studies indicate that multi-domain protein kinases are highly flexible in solution and can adopt unique forms that are likely important for catalytic regulation.
Csk Is Comprised of an Ensemble of Structures in Solution-Solution scattering profiles of proteins represent the average scattering from all conformational states weighted by the number of molecules in each of those conformational states. Rigid body modeling of SAXS data provides only one structure representing an average conformation in solution. This method of modeling does not fully describe the structural information provided by solution scattering experiments. Additionally, rigid body modeling is driven by the experimental data. Several recent studies have accounted for this, developing new methods to model the ensemble of structures presented in experimental solution scattering (22)(23)(24)(25)49). Whereas these methods can improve the fit for independent domain rotation and translations, they do not account for local unfolding events that may account for con-formation change. Therefore, we applied all atom structurebased simulations to investigate the structural ensemble of Csk in solution using no knowledge of the experimental scattering data. Over the time course of the simulation, Csk adopts many conformations with the vast majority significantly more extended than the crystal structure.
The ensemble of conformations presented here give further insight to the arrangement of the regulatory domains in Csk. During the simulation, the SH2 and SH3 domains undergo large fluctuations in distance from the kinase domain. Given the parameterization used in the simulation, we are able to rapidly sample all possible tertiary arrangements. For this reason, we are confident that our ensemble is not biased toward any particular arrangements. Therefore, we believe we have found the best possible tertiary arrangements that explain the data.
SAXS Provides Insight to the Regulation of Proteins-These various conformations of the apo form of Csk may play a role in its regulation. Unlike SFKs that possess N-terminal fatty acylation for membrane localization, Csk lacks this posttranslational modification and is largely cytosolic. Upon signaling, Csk is recruited to the plasma membrane by Cbp and various other adaptor proteins, which bind the SH2 domain (7). Other adaptors include Caveolin-1, a major protein constituent in caveolae, and CagA, a virulent component of Helicobacter pylori, the bacteria involved in ulcers (50,51). In addition to localization, Cbp also increases the activity of Csk, suggesting that interactions of the adaptor protein with the SH2 domain increase catalytic processes in the distal kinase domain (11). We have been able to probe this longrange phenomena using hydrogen-deuterium exchange kinetics. Upon addition of a phosphotyrosine peptide based on the interaction surface of Cbp, we observed changes in deuterium incorporation rates in the SH2-kinase linker and in the glycine-rich loop. Interestingly, we showed that mutations in a critical phenylalanine in the SH2-kinase linker substantially decreases catalytic activity, mimicking a form of the enzyme lacking the full SH2 domain (10). These findings indicate that interactions between the SH2 and kinase domains are essential for catalytic efficiency.
It is well established that interaction with binding partners can shift the population of conformational states (52)(53)(54)(55). Our observation of extended structures where the SH2 domain does not make contacts with the kinase domain suggests that physiological effectors such as Cbp could activate the enzyme by increasing the population of species with a closed conformation and interacting SH2 and kinase domains. In addition to SH2 binding partners, Csk interacts with a protein phosphatase (PEP) through its SH3 domain and with its substrate Src via the kinase domain (56 -58). It is possible that such interactions may further drive a more compact ensemble with greater catalytic efficiency. How these proteins modify the structural envelope and generate long-range effects in the kinase core are vitally important for a complete picture on how Csk controls SFK function and cell growth and differentiation.
The results presented here along with other recent SAXS studies supports the view that proteins are likely composed of a dynamic ensemble of structures. Whereas crystallo- graphic analysis provides a detailed molecular description of structure, they may capture only a few conformations and usually under non-catalytic conditions. SAXS studies examine proteins under solution conditions without nonspecific interaction between proteins, providing a description of the functional ensemble of conformations of the protein under catalytically functional conditions. By combining the welldefined domain structures from crystal structures with solution scattering profiles and employing molecular dynamic and theoretical analyses, we are able to produce a more detailed and descriptive model of a protein in solution.
Although the SAXS data does not provide sufficient resolution to definitely conclude that multiple basins are needed to describe the data, it strongly suggests this possibility. The methodology we developed is broadly applicable to all biological macromolecules and should provide useful information not only on conformation, but on the ubiquitous dynamic reversible assembly processes inherent in the workings of biomolecular machines.