Molecular dynamics simulations reveal the conformational dynamics of Arabidopsis thaliana BRI1 and BAK1 receptor-like kinases

The structural motifs responsible for activation and regulation of eukaryotic protein kinases in animals have been studied extensively in recent years, and a coherent picture of their activation mechanisms has begun to emerge. In contrast, non-animal eukaryotic protein kinases are not as well understood from a structural perspective, representing a large knowledge gap. To this end, we investigated the conformational dynamics of two key Arabidopsis thaliana receptor-like kinases, brassinosteroid-insensitive 1 (BRI1) and BRI1-associated kinase 1 (BAK1), through extensive molecular dynamics simulations of their fully phosphorylated kinase domains. Molecular dynamics simulations calculate the motion of each atom in a protein based on classical approximations of interatomic forces, giving researchers insight into protein function at unparalleled spatial and temporal resolutions. We found that in an otherwise “active” BAK1 the αC helix is highly disordered, a hallmark of deactivation, whereas the BRI1 αC helix is moderately disordered and displays swinging behavior similar to numerous animal kinases. An analysis of all known sequences in the A. thaliana kinome found that αC helix disorder may be a common feature of plant kinases.

Eukaryotic protein kinases (ePKs) 2 act as dynamic switches, the units of cellular computation, optimized not for turnover of phosphorylated product but rather for precise tuning of activation through post-translational modification and proteinprotein interaction (1)(2)(3)(4). This ability of kinases to respond to multiple inputs allows for adaptable programmed responses integrating numerous signals (4 -7). Because of their central role in processing external signals, specifically in relation to cell growth and proliferation and therefore cancer, animal ePKs have long been the target of efforts to understand the structural basis of their activation and regulation (for thorough reviews, see Refs. 1 and 8).
Several features have been suggested to be necessary for ePK activity (Fig. 1), including proximal positioning and proper folding of the ␣C helix, formation of a salt bridge between the conserved glutamate on the ␣C helix and the conserved lysine on the ␤4 sheet (Lys-Glu salt bridge), formation of the regulatory and catalytic hydrophobic spines, outward positioning of the conserved phenylalanine in the DFG motif away from the active site, and unfolding of the activation loop (1,8). Careful positioning of these regulatory components is necessary for an ePK to perform catalysis, meaning that any conformation with a regulatory structure out of place is inactive. Despite the resultantly enormous multiplicity of possible inactive states, several distinct conformations recur throughout the ePKs. For example, the human cytoplasmic tyrosine kinase Src takes on an inactive conformation where the unphosphorylated activation loop folds and obscures the active site and the ␣C helix swings away from the active site in concert with breakage of the Lys-Glu salt bridge (9, 10) (references to a "Src-like" state throughout the text exclusively denotes distal swinging of the ␣C helix).
Because the focus of research on ePKs has been on clades directly relevant to medicine (i.e. the vertebrates), it remains unclear how similar the activation processes of non-animal ePKs are to those in animals. Given the highly conserved nature of ePK structures across the eukaryotes (1), some insight into non-animal ePK activation can be gleaned from the expansive structural information available from animal systems; however, the possibility of unique activation mechanisms not found in animal ePKs remains.
Brassinosteroid-insensitive 1 (BRI1) and BRI1-associated kinase 1 (BAK1) are members of the leucine-rich repeat receptor-like kinases (LRR-RLKs), a group of structurally similar membrane receptors within the monophyletic RLK clade, which is homologous to and shares functional characteristics with the animal Pelle kinases (11). Although LRR-RLKs are intrinsic membrane proteins, each with an extracellular domain, a single transmembrane helix, a juxtamembrane domain, and an intracellular dual-specificity kinase domain, we have performed simulations of each kinase domain in isolation. These two proteins play a critical role in control of plant growth and development along with other processes (12)(13)(14), associat-ing with and reciprocally phosphorylating one another upon binding of a brassinosteroid hormone to the BRI1 extracellular domain (15)(16)(17)(18)(19). Activation of BRI1 and BAK1 triggers a downstream cascade of phosphorylation and dephosphorylation events causing alteration of gene expression and ultimately plant growth (13,20,21).
Here, we studied the conformational dynamics of the Arabidopsis thaliana BRI1 and BAK1 through atomistic molecular dynamics (MD) simulation. Biomolecular MD simulation uses the initial coordinates of all atoms from a crystal structure as an input, assigns an initial velocity to each atom, and numerically integrates the classical equations of motion to yield a trajectory through time. Interactions between atoms are represented in approximate form with parameters chosen according to experiments and quantum mechanical calculations. Although structural biology techniques are constantly improving in their ability to resolve protein structures, MD simulation provides unparalleled detail into the conformational dynamics of biomolecules (22,23). With the power of this technique, we sought to characterize the conformational dynamics governing the activity of BRI1 and BAK1, specifically in terms of known ePK activation features.

Results
To identify the structural features important for BRI1 and BAK1 activation, we performed extensive molecular dynamics simulations of their isolated kinase domains. Despite the fact that the simulated BRI1 kinase domain was fully phosphorylated, a necessary condition for complete BRI1 activation, we observed a high degree of ␣C helix disorder and breaking of the conserved Lys-Glu salt bridge (Fig. 2a). Still, the active-like state is the most stable region (Fig. 2a, region I), although there does not appear to be a free energy barrier to unfolding of the ␣C helix with the Lys-Glu salt bridge formed (Fig. 2a, region II) until there is already a significant degree of disorder. It is unclear how this unfolding behavior concurrent with a formed Lys-Glu salt bridge affects catalytic activity. Disorder in the BRI1 ␣C helix is supported by CD spectroscopy experiments of the isolated BRI1 ␣C helix peptide, showing an intermediate degree of helicity as compared with the disordered human epidermal growth factor receptor (EGFR) and helical hSrc ␣C helix peptides (supplemental Fig. S1 and Tables S1 and S2).
The Lys-Glu salt bridge can be broken, whereas the ␣C helix remains ordered (Fig. 2a, unlabeled region above region I) likely due to interaction between Lys 911 and ATP, whereas further distance between Lys 911 and Glu 927 coincides with greater fluctuation in the same distance. The region characterized by a large distance between Lys 911 and Glu 927 (Fig. 2a, regions III and IV) covers a range of ␣C helix disorder similar to the region with a formed Lys-Glu salt bridge (Fig. 2a, regions I and II).
The distance between Lys 911 and Glu 927 is measured between the terminal nitrogen atom in lysine and the ␦-carbon in glutamate, meaning that changes in this distance while the ␣C helix remains ordered could be caused by conformational changes in the side chains, swinging of the ␣C helix outward, or some combination of the two. Region III in Fig. 2a represents breaking of the Lys-Glu salt bridge coupled with distal movement of the ␣C helix (supplemental Fig. S2). This suggests the presence of a Src-like inactive state in BRI1 where the ␣C helix remains folded but swings outward from its active-like position, breaking the Lys-Glu salt bridge while preventing rapid reformation by conformational changes in the side chains alone. The Src-like inactive state represents the second most stable region of the free energy landscape, possibly meaning that transitions to this state are the main mechanism of BRI1 deactivation when only considering its internal conformational dynamics. Glu 927 interacts extensively with Arg 1032 on the activation loop when the Glu 927 -Lys 911 interaction is broken (supplemental Fig. S3) in a manner similar to human Src kinase (24).
Similarly to BRI1, BAK1 has an active-like state (Fig. 3a, region I), a region with ␣C helix disorder but retaining the Lys-Glu salt bridge (Fig. 3a, region II), and a state with a folded and active-like positioned ␣C helix and a broken Lys-Glu salt bridge due to interaction between Lys 317 and ATP (Fig. 3a, region III). However, BAK1 does not take on a Src-like inactive conformation as there is no distal swinging of the folded ␣C helix (supplemental Fig. S4) and minimal interaction between Glu 334 and Lys 439 , the lysine residue in the position on the activation loop analogous to BRI1 Arg 1032 (supplemental Fig. S5). Furthermore, conformations with a marginally broken Lys-Glu salt bridge are far less stable in BAK1 as compared with BRI1 ( Fig. 3, a and b, region III). The most striking feature of the BAK1 free energy landscape is the large, well populated area where the ␣C helix is highly disordered (Fig. 3a, region IV), allowing for a large range of distances between Lys 317 and Glu 334 . Breaking of the Lys-Glu salt bridge allows for a greater extent of ␣C helix disorder in BAK1 than in BRI1.
It is unclear whether the disorder in the BAK1 and BRI1 ␣C helices is primarily due to interaction with neighboring residues or intrinsic disorder of the ␣C helix itself. In the former case, a stable ␣C helix sequence of another kinase substituted for the native sequence would likely also display disorder. Because of issues with solubility for the BAK1 ␣C helix peptide, instead of performing CD spectroscopy we substituted the otherwise stable hSrc ␣C helix for the native BAK1 ␣C helix sequence in silico. We found that the hSrc ␣C helix remains stable relative to the native BAK1 helix, eliminating a large portion of the unfolded region (Fig. 4a, region IV) although still allowing for notable deviation from a purely helical form (Fig.  4a, regions II and IV). All together, the data suggest the presence of intrinsic disorder, although we cannot rule out a role for structural context from these results alone.
In addition to the stabilized ␣C helix, the conserved Lys-Glu salt bridge is also stabilized where the ␣C helix remains folded in the main region of breakage (Fig. 4a, region III). As with native BAK1, there does not appear to be a Src-like inactive state (supplemental Fig. S6).
Although the extent of unfolding of the Src ␣C helix is lesser than that of the native BAK1 ␣C helix, there is little barrier to partial unfolding when the Lys-Glu salt bridge is formed. Regions I and II of Fig. 4a appear to constitute a kinetically indistinct state where the BAK1-Src ␣C helix can partially unfold and refold while constrained by the Lys-Glu salt bridge, whereas breaking of the Lys-Glu salt bridge occurs primarily by way of region I into region III over a free energy barrier of ϳ2-3 kcal⅐mol Ϫ1 . This transition is likely due to a rotation of the lysine side chain toward the negatively charged ATP phosphate groups as is seen in native BRI1 and BAK1. Interestingly, there is a small, low free energy region near 5-Å ␣C helix r.m.s.d. and 20-Å Lys-Glu salt bridge distance (Fig. 4a, above region IV). However, the low free energy of this region is likely due to sampling error (see supplemental material for further discussion). Additionally, we calculated the mean solvent-accessible surface area of each residue in the ␣C helix of BRI1, BAK1, and BAK1-Src for comparison with future potential hydrogen-deuterium exchange experiments (supplemental Fig. S7).
To investigate the universality of ␣C helix disorder in the A. thaliana kinome, we obtained a previously identified set of 930 A. thaliana ePK sequences (25) and used the PONDR-FIT web server (26) to predict disorder propensities of residues in putative ␣C helix regions and mapped the predicted disorder propensities onto a previously published A. thaliana kinome phylogenetic tree (Fig. 5a) (25). It is evident from this tree that ␣C helix disorder is predicted to be widespread across the A. thaliana kinome, suggesting that transitions between ordered and disordered states in this region could be a common regulatory mechanism. We repeated the same analysis for the Oryza sativa (supplemental Figs. S8 and S9) and human kinomes (supplemental Figs. S10 and S11), finding similar levels of disorder in putative ␣C helix regions.
As a check for consistency with our simulations, we found that the hSrc sequence is predicted to have low disorder across the ␣C helix region, whereas hEGFR and BRI1 are predicted to display disorder qualitatively similar to the average A. thaliana ePK (Fig. 5b). In contrast, BAK1 displays disorder near the upper boundary of the interquartile range for all examined residues (Fig. 5b) and is clearly on the upper tail of the distribution of average ␣C helix residue disorder (Fig. 5c). Several ePK families are predicted to have a majority of members with high ␣C helix disorder, including the CRR, NEK, and SnRK1 families (supplemental Fig. S12).

Discussion
It is no surprise that BRI1 and BAK1 would be heavily regulated given their great importance in plant growth and development (14,(27)(28)(29)(30)(31). Intrinsic disorder in BAK1 and, to a lesser extent, BRI1 ␣C helices may be one such mechanism of regulation. Given that fully phosphorylated, ATP-bound BRI1 and BAK1 were able to spontaneously transition from active-like to inactive conformations characterized by broken Lys-Glu salt bridges and varying degrees of ␣C helix unfolding (Fig. 6), the presence of intrinsic ␣C disorder could provide insight into how BRI1-BAK1 association leads to their activation. It is possible that ␣C helix stabilization due to physical interaction between BRI1 and BAK1 is necessary for activation in addition to phosphorylation in a fashion similar to the metazoan EGFR, a receptor tyrosine kinase implicated in multiple cancers (9,32,33).
In human EGFR, disorder in the ␣C helix is believed to be suppressed by ligand-induced homodimerization and forma- tion of an asymmetric dimer in the kinase domains with one kinase acting as an activator for the other (9,33,34). By comparison with hEGFR, our discovery of highly populated states in large-timescale MD simulations of the BRI1 and BAK1 kinase domains characterized by disorder in the ␣C helices suggests that dimerization of BRI1 and BAK1 might promote catalytically competent conformations of both proteins via stabilization of their respective ␣C helices as has been suggested previously based on the similarity of inhibitor-binding interfaces in hEGFR and BRI1 (32). However, as we have simulated BRI1 and BAK1 isolated from all potentially interacting proteins and relieved of their juxtamembrane and C-terminal tail domains, there are other mechanisms through which BRI1 and BAK1 could suppress ␣C helix disorder, and pointed experiments are needed to provide more definitive evidence for this hypothesis.
To the best of our knowledge, EGFR is the only other ePK that displays ␣C helix disorder in previous MD studies (33). This could be more due to the small fraction of ePKs that have been simulated than general stability of the ␣C helix given the high levels of disorder we have predicted in three different kinomes. However, there is ␣C helix disorder in a crystal structure of human AKT1 caused by inhibitor binding (35).
Additionally, distal swinging of the ␣C helix away from the active site is a well established mechanism of deactivation in numerous animal ePKs where interactions with regulatory domains or other kinases causes proximal relocation of the ␣C helix and subsequent activation (1,9,34,36). BRI1 is able to transition to a similar Src-like inactive state in our simulations, suggesting that BRI1 may fit within the paradigms of animal ePK allosteric control. Distal swinging of the ␣C helix is commonly observed in MD simulations of mammalian protein kinases, including human Lyn and CDK2 (37), EGFR (38), and Src (10), and through analysis of available crystal structures (36,37). In contrast, BAK1 does not have a well defined inactive state, which complicates inference of an activation pathway without considering interactions with BRI1 or other LRR-RLKs.
It is also worth noting that the conditions of our simulations preclude a flip of the DFG motif phenylalanine into the ATPbinding pocket as ATP is already bound, and we saw no Src-like folding of the activation loop into a helix (supplemental Fig.  S13), presumably due to the phosphorylation state of the simulated proteins. However, these regulatory mechanisms may also play a role in BRI1 and BAK1 activation, and future simu-  lations of the unphosphorylated apo kinase domains could address this question.
Intrinsically disordered regions (IDRs) are thought to confer versatility to proteins in terms of their interaction partners, and there is evidence that IDRs are enriched in protein-protein interaction network hub proteins (39,40), including hubs in kinase interaction networks (41). BAK1, which has considerable ␣C helix disorder in our simulations, is known to have an important role not just in brassinosteroid signaling but also in innate immunity, cell death control, and light response, associating with several other LRR-RLKs (42,43). It is possible that BAK1 is able to form active dimers with multiple LRR-RLKs because of intrinsic disorder in its ␣C helix. Although it is not known how BAK1 interacts with other LRR-RLKs, disorder in the BAK1 ␣C helix and similarities between a protein-binding site inhibiting dimerization of hEGFR and the corresponding region of BRI1 (32) could point to EGFR-like complex formation. In that case, the ␣C helix of BAK1 could serve as a general interaction surface with BRI1 and other LRR-RLKs.
Given that ␣C helix disorder is expected throughout a large fraction of the A. thaliana kinome (Fig. 5), it seems possible that this regulatory mechanism is present in multiple other plant ePKs. Both the O. sativa (supplemental Figs. S8 and S9) and human kinomes (supplemental Figs. S10 and S11) display profiles of ␣C helix disorder similar to the A. thaliana kinome. For the human kinome, this result is consistent with a previous analysis finding that 83% of human ePKs contain at least one IDR somewhere in their kinase domain (41). High levels of ␣C helix disorder in the kinomes of two evolutionarily distant groups of organisms could indicate that stabilization of the ␣C helix is a general strategy for ePK activation across the eukaryotes. However, more evidence is needed to substantiate the universality of ␣C helix disorder in ePKs.
The activation pathways of BRI1 and BAK1 are still unknown as there are no inactive, unphosphorylated structures available for either receptor (supplemental Fig. S14). Future studies will need to address how phosphorylation contributes to BRI1 and BAK1 activation as a necessary but apparently insufficient step.

Simulation details
All simulations were performed using the AMBER 14 molecular dynamics package (44) and the CHARMM36 force field (45-47) on the Blue Waters petascale computing facility. All simulation systems were set up using the VMD (48) plug-in Psfgen 1.6 and converted to AMBER format using the CHAMBER software tool (49). Starting coordinates for BRI1 and BAK1 kinase domains were taken from available crystal structures (BRI1 Protein Data Bank code 4OA2 (50); BAK1 Protein Data Bank codes 3TL8 (chains A, D, G, and H) (51), 3UIM (29), and 3ULZ) (see supplemental Fig. S15 for exact sequences and supplemental Table S3 for phosphorylation states). All segments missing from the BRI1 and BAK1 crystal structures were modeled using the SWISS-MODEL web server (53). To create the chimeric BAK1 kinase, the ␣C helix of hSrc kinase (Protein Data Bank code 1Y57 (54)) was grafted into the 3ULZ and 3UIM structures in place of the native ␣C helix (see supplemental Fig. S15 for exact sequence).
Starting structures were solvated in water boxes with dimensions of ϳ90 ϫ 70 ϫ 63 Å with TIP3P model molecules (55). Sodium and chloride ions were added to neutralize the charge of all systems and bring salt concentration to ϳ150 mM. An ATP molecule with two magnesium ions bound, taken from previous simulations, was inserted into the binding pocket for all structures in place of the modified adenosine phosphate

Conformational dynamics of Arabidopsis receptor-like kinases
molecules used in crystallization, aligned with the adenosine ring of the corresponding ATP analogue. All systems were subjected to 10,000 steps of energy minimization and equilibrated for 8 -10 ns in constant temperature and pressure conditions at 300 K and 1 atm, maintained using Langevin dynamics and a Berendsen barostat. Simulations were performed using a 2-fs time step, periodic boundary conditions, particle mesh Ewald electrostatics (56), and constraints of hydrogen-containing bonds using the SHAKE algorithm (57,58). Equilibrated structures were then equilibrated for another 10 ns to obtain average dihedral angle potential energies for calculation of accelerated MD (AMD) (59) parameters (supplemental Table S4) according to Wereszczynski and McCammon (60).
For BAK1,10 replicates of each of the six structures were simulated using AMD for a total time given in supplemental Table S5. BRI1 AMD simulations were initiated from a single crystal structure, and two rounds of adaptive AMD sampling were performed, first with a single trajectory and for the second and third rounds with 25 trajectories run in parallel for a total simulation time given in supplemental Table S5. Finally, 25 AMD simulations were run in parallel serially for a total time indicated in supplemental Table S5. Similarly, BAK1-Src AMD simulations were initiated from each of the two structures indicated in supplemental Table S4, run for one round of adaptive AMD with 25 parallel trajectories, and finally run with 25 trajectories in parallel serially for total times indicated in supplemental Table S5. Ideally, the AMD simulation strategy for all three systems would be identical, and it is possible that some of the differences observed between the free energy landscapes of each system could be due in part to differences in sampling schemes. The final round of AMD sampling for each system was clustered in the space of the distance between the most distal side chain nitrogen in the lysine and carbon in glutamate within the conserved Lys-Glu salt bridge and ␣C helix r.m.s.d. from the crystal structures (or initial structure for the BAK1-Src chimera) into 100 states, and the nearest neighbors of the cluster centroids were chosen as starting structures for unbiased simulation. An aggregate of 30 -40 s of unbiased MD simulation time was performed for each system (supplemental Table S5).

Markov state model construction
The entire Markov state model (MSM) construction process was performed using the MSMBuilder 3 Python package for all systems (61). All trajectories were subsampled so that the time difference between consecutive frames in a trajectory was 200 ps. We chose the r.m.s.d. of the atoms contained in the ␣C helix and separately the N-terminal lobe excluding the ␣C helix with respect to the corresponding reference structure (BRI1, Protein Data Bank code 4OA2 (50); BAK1, Protein Data Bank code 3TL8, chain A (51); initial structure of BAK1-Src, Protein Data Bank code 1Y57 (54) for the ␣C helix and Protein Data Bank code 3UIM (29) for the remainder of the kinase domain) as features for construction of MSMs for our systems of interest. Each of the two metrics for BRI1 and BAK1 trajectories was normalized by subtracting the mean and dividing by the standard deviation where both statistics were calculated over all trajectories from the appropriate protein. The ranges of valid lag times for the BRI1, BAK1, and BAK1-Src models were both chosen to be 50 -150 ns based on convergence of implied timescales across models built with a range of cluster numbers (supplemental Figs. S16 -S18). We then used the Osprey variational cross-validation package to select lag times of 50 ns for all three models and cluster numbers of 278, 319, and 386 for the BRI1, BAK1, and BAK1-Src models, respectively, which maximized the model-generalized matrix Rayleigh quotient crossvalidation scores using shuffle-split cross-validation (62). Using these MSM parameters, we estimated reversible maximum likelihood transition probability matrices for all models. Error analysis was conducted by constructing Bayesian MSMs (63) with the same parameters as the corresponding maximum likelihood MSMs, utilizing Metropolis Markov chain Monte Carlo to sample the prior distribution of transition matrices (supplemental Figs. S19 -S21).
Free energy plots were constructed by first creating a normalized two-dimension histogram for each of N states using the structures in each state projected onto the distance between the glutamate ␦-carbon atom and the lysine side chain nitrogen in the conserved Lys-Glu salt bridge and the helical content of the ␣C helix as defined in the NAMD 2.11 manual (64) as detailed in the supplemental Methods. Each state histogram (h i (x, y) for state i) for the given model was then weighed by the equilibrium probability ( i ) of the corresponding state calculated from the transition probability matrix and summed together according to the following equation. where R is the gas constant and T is the temperature. For comparison, non-MSM-weighted free energy plots are shown in supplemental Figs. S22-S24.

Bioinformatics analysis
The list of the 940 A. thaliana kinases identified by Zulawski et al. (25) was used to create a multiple sequence alignment (MSA) using Clustal Omega (65) after the kinase sequences were retrieved from the Arabidopsis Information Resource (66). Using the Biopython package (52), the position of the conserved glutamate in the ␣C helix (BAK1, 334; BRI1, 927) in the MSA was identified, and every kinase without a gap at that position was selected for further analysis (a total of 930 of the original 940 sequences). Sequences putatively containing the ␣C helix of each kinase were produced by taking a subsequence consisting of the conserved glutamate position and 20 residues in both directions for a total of 41 residues. All 930 subsequences were submitted to the PONDR-FIT consensus disorder prediction web server (26), and the average disorder propensity over the 15-residue segment corresponding to the BAK1 ␣C helix for each sequence was calculated and mapped onto the A thaliana kinome phylogenetic tree from Zulawski et al. (25)